Tuesday, 31 December 2024

docker less Caddy

A
Spontaneous
Breakage

Here's the duckdns token... All you need to get HTTPS? 



The thing spat out:
expected (OK) but got (KO), url: [https://www.duckdns.org/update?domains=....duckdns.org&token=&txt=dV...kc&verbose=true]


The token went missing between that env and that thing its doing that...

As a separate curl request with my token put into it, works fine. With token = '', as the above url indicates, the server indeed responds "KO".

In other log noise, allowing larger UDP packets can be good for performance. There's probably a whole series of labs based around tweaking settings like this:
sysctl -w net.core.rmem_max=7500000

And reminder to tweak this:
Via (between) these commands:
Supposing you copied in your .ssh/authori... already

Now... I'm trying to

replicate

this problem on a debian system, shall we?
Installing docker via https://docs.docker.com/compose/install/
Was that the right way?
Nothing happens while doing the docker compose up
...until I docker compose down and try again

After a while, ping google.com becomes "Destination Host Unreachable"
Ie, internet gone, but I can be ssh'd into that computer from the LAN.
At that moment the journal says:
Dec 29 13:40:40 v connmand[704]: vethcf1d5d6 {newlink} index 32 address 6E:0B:C5:02:38:35 mtu 1500
Dec 29 13:40:40 v connmand[704]: vethcf1d5d6 {newlink} index 32 operstate 6 <UP>
Dec 29 13:41:05 v avahi-daemon[625]: Joining mDNS multicast group on interface vethcf1d5d6.IPv4 with address 169.254.216.172.
Dec 29 13:41:05 v avahi-daemon[625]: New relevant interface vethcf1d5d6.IPv4 for mDNS.
Dec 29 13:41:05 v avahi-daemon[625]: Registering new address record for 169.254.216.172 on vethcf1d5d6.IPv4.
Dec 29 13:41:05 v connmand[704]: vethcf1d5d6 {add} address 169.254.216.172/16 label vethcf1d5d6 family 2
Dec 29 13:41:05 v connmand[704]: vethcf1d5d6 {add} route 169.254.0.0 gw 0.0.0.0 scope 253 <LINK>
Dec 29 13:41:05 v connmand[704]: vethcf1d5d6 {add} route 0.0.0.0 gw 0.0.0.0 scope 253 <LINK>
Dec 29 13:41:05 v connmand[704]: wlxc4731ec7aa65 {del} route 0.0.0.0 gw 192.168.1.1 scope 0 <UNIVERSE>
Dec 29 13:41:05 v connmand[704]: vethcf1d5d6 {add} route 0.0.0.0 gw 0.0.0.0 scope 253 <LINK>
That last one looks dodgy, and probably wouldn't affect ssh...
I watch -n0.3 sudo route add default gw 192.168.1.1
This was going to do the trick but apparently doesn't anymore.

None of my docker compose projects get going - and the laptop seems to reliably turn off suddenly at some point...

Lets get off that laptop...

Hmm... I change the domain, requiring:
docker compose down --volumes
docker compose up --build

It builds and stuff, but the action is lame:
caddy-1     | {"level":"info","ts":1735460097.7095096,"logger":"tls.obtain",
"msg":"obtaining certificate","identifier":"voulais.duckdns.org"}
caddy-1     | {"level":"info","ts":1735460097.7118206,"logger":"tls.issuance.acme",
"msg":"using ACME account","account_id":"https://acme-staging-v02.api.letsencrypt.org/acme/acct/177978324","account_contact":[]}
caddy-1     | {"level":"info","ts":1735460098.6765864,"logger":"tls.issuance.acme.acme_client",
"msg":"trying to solve challenge","identifier":"voulais.duckdns.org","challenge_type":"dns-01","ca":"https://acme-staging-v02.api.letsencrypt.org/directory"}
caddy-1     | {"level":"error","ts":1735460099.6868665,"logger":"tls.issuance.acme.acme_client",
"msg":"cleaning up solver","identifier":"voulais.duckdns.org","challenge_type":"dns-01","error":"no memory of presenting a DNS record for \"_acme-challenge.voulais.duckdns.org\" (usually OK if presenting also failed)"}
caddy-1     | {"level":"error","ts":1735460099.8798797,"logger":"tls.obtain",
"msg":"could not get certificate from issuer","identifier":"voulais.duckdns.org","issuer":"acme-v02.api.letsencrypt.org-directory","error":"[voulais.duckdns.org] solving challenges: presenting for challenge: adding temporary record for zone \"duckdns.org.\": DuckDNS request failed, expected (OK) but got (KO), url: [https://www.duckdns.org/update?domains=voulais.duckdns.org&token=&txt=wxmfH7orpNQRdOScCZPSObetw8bavTfbTmfe_Y40r1g&verbose=true], body: KO (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/177978324/21646456774) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)"}
caddy-1     | {"level":"error","ts":1735460099.8799827,"logger":"tls.obtain",
"msg":"will retry","error":"[voulais.duckdns.org] Obtain: [voulais.duckdns.org] solving challenges: presenting for challenge: adding temporary record for zone \"duckdns.org.\": DuckDNS request failed, expected (OK) but got (KO), url: [https://www.duckdns.org/update?domains=voulais.duckdns.org&token=&txt=wxmfH7orpNQRdOScCZPSObetw8bavTfbTmfe_Y40r1g&verbose=true], body: KO (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/177978324/21646456774) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)","attempt":10,"retrying_in":1200,"elapsed":4822.382168292,"max_duration":2592000}
So I believe letsencrypt is challenging us to host this text record, which we're failing to do.

Also, failing to reboot the fedora vm I'm trying to try out.
This turns out to be nothing, but is on the theme of computer gore.
wget s -O- > .ssh/authorized_keys
sudo service sshd start
Allows me to remote into this fedora vm!
Your hostname in the terminal prompt comes from $HOSTNAME, which comes from the ssh session! So it'll be whatever you named that ip in eg /etc/hosts.

Then I had this problem:
cos-jamo-1  | npm error Error: Could not read package.json: Error: EACCES: permission denied, open '/app/package.json'
Which is one of my first Fedora-isms!
Without volumes, Docker is much more predictable since everything stays contained.
Volumes are essential for getting my code changes loaded fast - if we put these in the container it would need rebuilding on every change.
Another option is to continuously lsyncd them into the container, which will use|generate inotify on both sides so it seems instant, and you don't end up having to stare intensely at the spot something might appear, like a caveman.
Another option is to sshfs but it won't generate inotify, it's basically just ftp over ssh into a fuse mount.
So anyway, just add this ,z:

cos-jamo:
...
volumes:
- .:/app:exec,z

The z option specifically tells SELinux to relabel the mounted content so containers can share it. Even though ll looks the same (the traditional Unix permissions haven't changed), SELinux has modified the security context behind the scenes. You can see these labels with:

ls -Z

So anyway, you can see all your listened addresses with sudo netstat -plant
0.0.0.0 means it is listening for every address on the machine.

Did I say Fedora? Now we're breaking in Debian 13 (Trixie) because EVERYTHING IS BROKEN!
It's a bad time when lots of things fall over at once.
This Christmas-NY period is totally haunted, do not not be with your people, on holiday, at this point.

Aaaand... Randomly, trying to up this letz project. Which isn't an ideal test subject as it contains a ton of python to build. The remote host randomly turns off!
It may just be this laptop I guess. I had to use one of its cpu heatsink screws to secure the cdrom drive, but I should probably just hot glue it.

Anyway, top 5 squirrels of 2024 comes out tomorrow!

So, lots of computers now, all encrypted, yet they can be unlocked remotely as per this wonderfully written guide: https://www.cyberciti.biz/security/how-to-unlock-luks-using-dropbear-ssh-keys-remotely-in-linux/
 but alas, I am on wireless so I don't bother.

RIGHT
After some googling and so on, I wrote:

A Bug Report


I was using this bit of Caddyfile, as seen via docker exec in the container:
dns duckdns {f6e-aaa-bbb-ccc-b86}


As implied by this part of the README:

dns duckdns {env.DUCKDNS_API_TOKEN}

Which I guess is a linguistic red-herring, stuff in {} interpolated to api_token => value before we get to UnmarshalCaddyfile(d *caddyfile.Dispenser) ..? Speculation.


Anyway. That doesn't trip this:

	if p.Provider.APIToken == "" {
		return d.Err("missing API token")
	}


and goes on to fail, the token parameter is casually empty:
caddy-1 | {"level":"error","ts":1735636941.773746,"logger":"tls.obtain","msg":"will retry","error":"[voulais.duckdns.org] Obtain: [voulais.duckdns.org] solving challenges: presenting for challenge: adding temporary record for zone \"duckdns.org.\": DuckDNS request failed, expected (OK) but got (KO), url: [https://www.duckdns.org/update?domains=voulais.duckdns.org&token=&txt=yWJ3zVVwwIRPxw14J3f2riEuFD805UOkC4OIFCwJcno&verbose=true], body: KO (order=https://acme-staging-v02.api.letsencrypt.org/acme/order/178240924/21688731104) (ca=https://acme-staging-v02.api.letsencrypt.org/directory)","attempt":5,"retrying_in":600,"elapsed":610.848246992,"max_duration":2592000}


And that's pretty much it. No idea why. Debugger time?


Other syntax variations do cause errors, eg with spaces or on a new line:

dns duckdns { $DUCKDNS_API_TOKEN }
dns duckdns {
    $DUCKDNS_API_TOKEN
}

Maybe it's on libdns/duckdns to double-check api_token != '' as it goes along.
Seems weird.


Thanks!


PS I of course made it more confusing by having a docker-compose.yml that did:

    volumes:
      - caddy_data:/data
      - caddy_config:/config


that was retaining an old config that worked, from before I made everything look neat with those extra curly braces, which I just didn't need. This stuck-state fell over a few days ago, somehow, as per chaos. For those playing along at home, you need to:

docker compose down --volumes
docker compose up --build


I've been rate limited now, it says "too many certificates (5) already issued" which is probably how many times I did the above.


Another random detail: I'm always "waiting on internal rate limiter" for 0.00005 seconds, which takes two log lines or 1/5th of all the log lines per tls.obtain.


And thanks again, it was super nice having HTTPS just go, as it did initially, and duck another little bill and personal info leak. Thanks.


My project is here: https://github.com/stylehouse/jamola/blob/main/docker-compose.yaml


Someone else in the same ditch who got me out: https://caddy.community/t/dns-challenge-with-duckdns/14994


And so, I wander back to development.

No idea why the other instances I tried to set up just didn't wanna.

I'll have to put that screw back in...

Further

I ended up moving the front end to a cloud host!

See here for the details of how to do that:

Caddy gave no indication of problems, yet wouldn't update duckdns, so I had to call it myself:
curl "https://www.duckdns.org/update?domains=voulais&token=...&ip=170.64.141.221"

And now it's GOOOOOOOOOOOOO




Serenely.

Yes. Mid refactoring I... Have lunch.
It's beautiful out there.
I come back, hit the space bar, and torrid techno bumps along in one lonely speaker.
I reverse the nearest RCA plugs and they say it's likely software.
Most of the state can be demolished and reset with:

systemctl --user restart pipewire pipewire-pulse

But this breaks my browsers' ability to find the audioDevice.
Even pavucontrol can't see any Input Devices except the Monitors now.
So I reboot.
Then production is down...
Why did the cos webserver not come up automatically when I rebooted?

s@s:~/src/jamola$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d4e4d18f3cd jamola-caddy "caddy run --config …" 41 hours ago Up About an hour 80/tcp, 2019/tcp, 443/udp, 0.0.0.0:9443->443/tcp, [::]:9443->443/tcp jamola-caddy-1
2fa20892e414 jamola-router-config "docker-entrypoint.s…" 41 hours ago Up About an hour jamola-router-config-1
a6be264aa4c5 letz-cos-bitz "docker-entrypoint.s…" 7 weeks ago Up About an hour 127.0.0.1:9000->3000/tcp letz-cos-bitz-1
3b63c9938f2c letz-pl "./serve.pl" 7 weeks ago Up About an hour 127.0.0.1:1812->1812/tcp letz-pl-1
b34a27a9db9f letz-py2 "bash -c 'python py/…" 7 weeks ago Up About an hour 127.0.0.1:8000->8000/tcp letz-py2-1
e210a81ca6f5 letz-cos "docker-entrypoint.s…" 7 weeks ago Up About an hour 127.0.0.1:3000->3000/tcp, 127.0.0.1:9229->9229/tcp letz-cos-1
s@s:~/src/jamola$ docker compose up -d
WARN[0000] The "ROUTER_URL" variable is not set. Defaulting to a blank string.
WARN[0000] The "ROUTER_USERNAME" variable is not set. Defaulting to a blank string.
WARN[0000] The "ROUTER_PASSWORD" variable is not set. Defaulting to a blank string.
[+] Running 3/3
✔ Container jamola-router-config-1 Running 0.0s
✔ Container jamola-caddy-1 Running 0.0s
✔ Container jamola-cos-jamo-1 Started 0.7s
s@s:~/src/jamola$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f3370488e75c jamola-cos-jamo "/usr/local/bin/dock…" 4 seconds ago Up 3 seconds 127.0.0.1:9090->3000/tcp jamola-cos-jamo-1
1d4e4d18f3cd jamola-caddy "caddy run --config …" 41 hours ago Up About an hour 80/tcp, 2019/tcp, 443/udp, 0.0.0.0:9443->443/tcp, [::]:9443->443/tcp jamola-caddy-1
2fa20892e414 jamola-router-config "docker-entrypoint.s…" 41 hours ago Up About an hour jamola-router-config-1
a6be264aa4c5 letz-cos-bitz "docker-entrypoint.s…" 7 weeks ago Up About an hour 127.0.0.1:9000->3000/tcp letz-cos-bitz-1
3b63c9938f2c letz-pl "./serve.pl" 7 weeks ago Up About an hour 127.0.0.1:1812->1812/tcp letz-pl-1
b34a27a9db9f letz-py2 "bash -c 'python py/…" 7 weeks ago Up About an hour 127.0.0.1:8000->8000/tcp letz-py2-1
e210a81ca6f5 letz-cos "docker-entrypoint.s…" 7 weeks ago Up About an hour 127.0.0.1:3000->3000/tcp, 127.0.0.1:9229->9229/tcp letz-cos-1

also, the autossh connection should keep trying forever every 30s, as it currently gives up shortly after the first traffic from caddy and failing attempt to connect to localhost:9090:

s@s:~$ sudo systemctl status jamola-frontend-reverse-tunnel.service
jamola-frontend-reverse-tunnel.service - AutoSSH tunnel to cloud proxy
Loaded: loaded (/etc/systemd/system/jamola-frontend-reverse-tunnel.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-01-08 17:34:12 NZDT; 1min 45s ago
Main PID: 9714 (autossh)
Tasks: 2 (limit: 18938)
Memory: 1.6M (peak: 2.1M)
CPU: 169ms
CGroup: /system.slice/jamola-frontend-reverse-tunnel.service
├─9714 /usr/lib/autossh/autossh -M 0 -N -R 0.0.0.0:3000:localhost:9090 -p 2023 d -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3"
└─9717 /usr/bin/ssh -N -R 0.0.0.0:3000:localhost:9090 -p 2023 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" d
Jan 08 17:34:12 s systemd[1]: Started jamola-frontend-reverse-tunnel.service - AutoSSH tunnel to cloud proxy.
Jan 08 17:34:12 s autossh[9714]: port set to 0, monitoring disabled
Jan 08 17:34:12 s autossh[9714]: starting ssh (count 1)
Jan 08 17:34:12 s autossh[9714]: ssh child pid is 9717
s@s:~$ sudo systemctl status jamola-frontend-reverse-tunnel.service
jamola-frontend-reverse-tunnel.service - AutoSSH tunnel to cloud proxy
Loaded: loaded (/etc/systemd/system/jamola-frontend-reverse-tunnel.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-01-08 17:34:12 NZDT; 4min 30s ago
Main PID: 9714 (autossh)
Tasks: 2 (limit: 18938)
Memory: 1.6M (peak: 2.1M)
CPU: 171ms
CGroup: /system.slice/jamola-frontend-reverse-tunnel.service
├─9714 /usr/lib/autossh/autossh -M 0 -N -R 0.0.0.0:3000:localhost:9090 -p 2023 d -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3"
└─9717 /usr/bin/ssh -N -R 0.0.0.0:3000:localhost:9090 -p 2023 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" d
Jan 08 17:34:12 s systemd[1]: Started jamola-frontend-reverse-tunnel.service - AutoSSH tunnel to cloud proxy.
Jan 08 17:34:12 s autossh[9714]: port set to 0, monitoring disabled
Jan 08 17:34:12 s autossh[9714]: starting ssh (count 1)
Jan 08 17:34:12 s autossh[9714]: ssh child pid is 9717
Jan 08 17:36:05 s autossh[9717]: connect_to localhost port 9090: failed.
Jan 08 17:36:13 s autossh[9717]: connect_to localhost port 9090: failed.
Jan 08 17:36:34 s autossh[9717]: connect_to localhost port 9090: failed.

it is defined here:





But it's down again shortly later...

in journalctl it says

Jan 08 19:27:32 s autossh[16868]: port set to 0, monitoring disabled
Jan 08 19:27:32 s autossh[16868]: max start count reached; exiting

but this has no clues:

 s@s:~$ sudo systemctl status jamola-frontend-reverse-tunnel.service 
● jamola-frontend-reverse-tunnel.service - AutoSSH tunnel to cloud proxy
     Loaded: loaded (/etc/systemd/system/jamola-frontend-reverse-tunnel.service; enabled; preset: enabled)
     Active: activating (auto-restart) since Wed 2025-01-08 19:27:32 NZDT; 17s ago
    Process: 16868 ExecStart=/usr/bin/autossh -M 0 -N -R 0.0.0.0:3000:localhost:9090 -p 2023 d -o ServerAliveInterval=30 >
   Main PID: 16868 (code=exited, status=0/SUCCESS)
        CPU: 5ms

The dot is grey now. This is a very wishy-washy way to present the failure and giving up of this service...

So systemctl seems bad, unless it's just me.
Why is this so hard? Should we just use supervisord? Should we just generate a passwordless key to use to get into ssh-tunnel-destiny on the cloud host from ssh-tunnel-source on the local host?
The latter.

Well, if you rename a container in the compose file before you down it, you'll need to:
docker compose down --remove-orphans

And if you change a config, eg the ssh key(s) we env into place, you must down, then up
# For changing configs (like Caddyfile): docker compose down docker compose up -d # For just changing .env values: docker compose up -d

The difference is because configs are treated as immutable container resources, while environment variables are part of the runtime configuration.

theproxy/*

to run the public Caddy server elsewhere:





No comments:

Post a Comment