A grant of the form `{src: alice, dst: autogroup:internet, via:
tag:exit1}` was loading without error but stripping every exit node
from alice's view: `tailscale exit-node list` returned "no exit nodes
found".
Two sites skipped autogroup:internet at the compile / steering layer:
compileViaForNode's *AutoGroup arm produced no FilterRule for the
via-tagged exit node, and ViaRoutesForPeer's *AutoGroup arm produced
no Include/Exclude. With pm.needsPerNodeFilter true, the exit node's
matchers were empty, BuildPeerMap could not link source to exit, and
RoutesForPeer's ReduceRoutes stripped 0.0.0.0/0 and ::/0 from
AllowedIPs.
The skip belongs at the wire-format layer (ReduceFilterRules), not at
the compile layer that also feeds internal matchers. Lift
autogroup:internet handling into both *AutoGroup arms with the same
shape used for *Prefix destinations: emit a TheInternet rule on
via-tagged exit advertisers; surface peer.ExitRoutes() in Include
when the peer carries the via tag, Exclude otherwise.
ReduceFilterRules continues to keep the rule on exit-route
advertisers' wire output and strip it elsewhere, preserving SaaS
PacketFilter encoding.
Also drop compileViaForNode's early len(SubnetRoutes)==0 return:
SubnetRoutes excludes exit routes, so the early return pre-empted the
autogroup:internet branch on nodes that only advertise exit routes.
Existing tests pinning the buggy behaviour (TestViaRoutesForPeer
subtests, TestCompileViaGrant case) flipped to the new contract.
Fixes#3233
Force-disconnect leaves stale routes in the container's network
namespace: libnetwork removes the host-side veth but the
namespace-internal route survives. The next ConnectNetwork on the
same network then fails with "cannot program address X/16 in sandbox
interface because it conflicts with existing route", and the route
never resolves on its own. Bounded retry around ConnectNetwork
exhausts MaxElapsedTime instead of recovering.
Without Force, libnetwork drains the namespace routes synchronously
during disconnect and ConnectNetwork sees a clean slate. Cable-pull
semantic is preserved: docker still tears down the endpoint at the
namespace level, leaving in-flight TCP half-open inside the
container's view, verified via paired probe-timeout pairs in HA
prober logs while both routers are physically disconnected.
Fixes#3234
`time.After(ProbeTimeout)` returned a single channel shared by every
probe goroutine in the cycle. Only the first goroutine to receive the
deadline tick drains the channel; any other goroutine still waiting on
its `responseCh` is then stuck forever, `wg.Wait()` never returns, and
the scheduler loop in `app.go` stalls on the next tick. The condition
fires whenever two or more nodes time out in the same cycle — common
under cable-pull where IsOnline lags reality and both routers stay in
the candidate set as half-open TCP.
Move the timer inside each goroutine so every probe has its own
deadline.
Updates #3234
The generator scans `integration/` recursively for `Test*` functions
and emits one CI job per match. Helper subpackages like
`dockertestutil` and `tsic` host plain unit tests that should run
under `go test`, not as Docker-based integration matrix entries.
Limit the scan to depth 1 so only top-level `integration/*_test.go`
files contribute job names.
Libnetwork endpoint cleanup is eventually consistent. A back-to-back
disconnect+connect on the same network can race teardown and return a
transient error. Wrap the daemon calls in bounded exponential backoff
so TestHASubnetRouterFailoverDockerDisconnect no longer flakes on
phase 4c reconnect.
Fixes#3234
electPrimaryRoutes' all-unhealthy fallback picked candidates[0]
(lowest NodeID) regardless of who was prev. Under cable-pull
semantics IsOnline lags reality (long-poll TCP half-open), so
both routers stay in candidates and both go Unhealthy via the
prober — the fallback then churned primary to a node that was
itself unreachable.
Prefer prev when still in candidates; fall through to
candidates[0] only when prev is gone. Anti-blackhole holds.
Update the property test reference model and split the unit
test into existence (KeepsAPrimary) and identity
(PreservesPrevious) cases.
Fixes#3203
Add DisconnectFromNetwork/ReconnectToNetwork on TailscaleClient
backed by pool.Client.DisconnectNetwork.
Exercise single-router fail+recover either side, sequential dual
failure, and simultaneous dual failure. The dual-failure legs
assert no flap to a known-bad primary; the single-router-return
legs check traffic only because docker network disconnect
transiently fails probes on sibling routers.
Fails on parent; passes after the fix.
Updates #3203
Three regression tests for the user scenario: an in-process
Disconnect/Reconnect, a tailscale-down/up integration test, and
an iptables -j DROP cable-pull integration test.
Updates #3203
Restore the legacy auto-clear at write boundaries that drop HA
candidacy: Disconnect, SetApprovedRoutes(empty), and
UpdateNodeFromMapRequest shrinking advertised routes to empty.
Plus a defensive guard in SetNodeUnhealthy.
Updates #3203
Replace routes.PrimaryRoutes reads with NodeStore. Connect bumps
SessionEpoch; Disconnect re-checks it inside UpdateNode so the
check and mutation are atomic against a concurrent Connect on
the same node.
The connect_race regression test is carried in its final
SessionEpoch form.
Updates #3203
- test: comment that the !regReq.Expiry.IsZero() gate also covers
the tags-only PreAuthKey path
- CHANGELOG: note pre-existing 0001-01-01 rows self-heal on
re-registration rather than being backfilled
When a user owned node registers or re registers with a PreAuthKey and the
client sends zero client expiry while node.expiry is set to 0, the expiry
column ends up stored as 0001-01-01 00:00:00 instead of NULL. Two sites in
HandleNodeFromPreAuthKey build a non nil pointer to regReq.Expiry even when
the value is zero time, and the needsDefaultExpiry guard only replaces it
when s.cfg.Node.Expiry > 0, so the pointer to zero time survives to the
database.
Convert an unset regReq.Expiry to nil before handing it off so the
needsDefaultExpiry path is the only place that assigns a non nil pointer.
This is a narrower sibling of #3170 on the user owned PreAuthKey path. The
regression was introduced alongside the fix for #3111 in 6337a3db.
TestEnablingExitRoutes runs without an ACL, so tailcfg.FilterAllowAll
hides any policy-path regression. Add a sibling that applies the literal
#3212 policy via hsic.WithACLPolicy after registration and approval,
then asserts each peer carries 0.0.0.0/0 + ::/0 in AllowedIPs and
ExitNodeOption is true — the daemon-derived bool that drives
`tailscale exit-node list`.
Updates #3212
compileFilterRules skipped autogroup:internet destinations to keep them
out of the wire-format PacketFilter, but those same compiled rules are
the source of pm.matchers — and Node.CanAccess relies on a matcher whose
DestsIsTheInternet covers the public internet to surface exit-node peers
to ACL sources. With the skip in place no such matcher existed, exit
nodes silently dropped out of the source's peer list, and the docs'
exit-node walkthrough stopped working: `tailscale exit-node list`
returned "no exit nodes found" and `tailscale set --exit-node=<ip>`
returned "no node found in netmap with IP".
Drop the compile-time skip so autogroup:internet flows through normal
matcher derivation, and teach ReduceFilterRules to keep the resulting
client packet-filter rule on exit-route advertisers — Tailscale SaaS
sends those rules to exit nodes so the kernel filter accepts traffic
forwarded by autogroup:internet sources.
Verified against a live tailnet on 2026-04-28 via tscap; the b17/b18
captures land under testdata/issue_3212/ as a regression guard. The
captures are isolated from testdata/routes_results/ because the broader
TestRoutesCompat machinery assumes a CIDR-prefix wire format that
differs from the IPSet-range form SaaS emits for autogroup:internet —
aligning that wire format is tracked separately.
Fixes#3212
Add an integration test that runs the tailscale-rs axum example
against headscale end-to-end. The test provisions one headscale
instance, one Go tailscale probe client (tsic), and one
tailscale-rs node (tsric) on the same tailnet, then verifies:
- the tailscale-rs node registers and is assigned both an IPv4
and an IPv6 address
- the Go probe sees the tailscale-rs node as a peer in its status
- GET /index.html and GET /assets/index.css from the axum server
return the expected content over the tailnet
- three sequential POST /count calls return distinct, incrementing
counter values, proving netstack state is maintained across
multiple TCP connections
This is the first integration test that exercises a non-Go
Tailscale client against headscale, giving end-to-end coverage of
the control protocol for alternate implementations.
Add TailscaleRustInContainer (tsric), a Rust-client counterpart to
integration/tsic. It runs the axum example from tailscale-rs in a
Docker container and exposes the same lifecycle hooks as tsic
(Shutdown, SaveLog, Execute, WriteFile) so integration tests can
treat it as any other Tailscale node.
Dockerfile.tailscale-rs clones tailscale-rs at build time, so no
local source checkout is required. The repo URL and ref are Docker
build arguments (TAILSCALE_RS_REPO, TAILSCALE_RS_REF) exposed as
tsric.WithRepo / tsric.WithRef options. The HEADSCALE_INTEGRATION_
TAILSCALE_RS_IMAGE environment variable provides an escape hatch
for using a pre-built image instead of building from source.
The Dockerfile patches the cloned root Cargo.toml to expose
ts_control's insecure-keyfetch feature through the tailscale crate
so the axum example can fetch the control key over plain HTTP.
The integration harness serves the control plane without TLS,
which is the only mode tailscale-rs can register against until it
grows a way to inject a custom CA bundle.
compileFilterRules, compileGrants, and updateLocked guarded the
"no rules so allow all" fallback with len(pol.Grants) == 0, which
matches both an absent grants field and an explicit empty array.
JSON {"grants": []} unmarshals to a non-nil empty slice; it should
compile to zero filter rules (deny all) to match Tailscale SaaS,
but the length check sent it down the FilterAllowAll path.
Distinguish absent (nil) from explicit-empty by switching the guard
to pol.Grants == nil, the same asymmetry already used for ACLs.
{} keeps allowing all; {"acls": []} and {"grants": []} now both
deny all.
Fixes#3211
The scheduled job has failed every night since 2026-04-13. Two
prior fixes (race-condition guards and splitting the combined
where predicate in #3200) did not address the actual cause:
`flatten` collapses nested records in a list-of-records pipeline,
so after `gh api ... | from json | flatten` the `label` and `user`
columns no longer exist - their fields are lifted to the top
level (with prefix only on naming collisions). `where label.name
== ...` and `where user.type != "Bot"` then both reference a
column that is not there.
`gh api --paginate` already returns a single concatenated JSON
array, so `from json` produces a list of records directly and no
flattening is needed. Drop both `| flatten` calls.
Verified locally with nu 0.108 against the events stream of issue
#3178: without flatten, `where event == "labeled" | where
label.name == "needs-more-info" | last` returns the labeled event
with its `label` record intact.
GitHub's /issues/:n/events endpoint returns a mixed-schema table.
Only labeled/unlabeled rows carry a `label` column. Nu's `where`
does not short-circuit on missing columns, so the combined
predicate `event == "labeled" and label.name == ...` dereferenced
`label.name` on every row and crashed on the first non-labeled
event with `nu::shell::column_not_found`.
The scheduled job has failed every night since 2026-04-13 (the
first run with a labeled issue), so no `needs-more-info` issue
has been auto-closed.
Split into two sequential `where` filters so `label.name` is only
accessed on rows that have the column.
Summarise the ingest rewrite, the SaaS-matching collision rule, and the
BREAKING change from random-suffix to numeric-suffix collision labels
and from "invalid-<rand>" to the literal "node" fallback.
Updates #3188
Ingest (registration and MapRequest updates) now calls
dnsname.SanitizeHostname directly and lets NodeStore auto-bump on
collision. Admin rename uses dnsname.ValidLabel + SetGivenName so
conflicts are surfaced to the caller instead of silently mutated.
Three duplicate invalidDNSRegex definitions, the old NormaliseHostname
and ValidateHostname helpers, EnsureHostname, InvalidString,
ApplyHostnameFromHostInfo, GivenNameHasBeenChanged, generateGivenName
and EnsureUniqueGivenName are removed along with their tests.
ValidateHostname's username half is retained as ValidateUsername for
users.go.
The SaaS-matching collision rule replaces the random "invalid-xxxxxx"
fallback and the 8-character hash suffix; the empty-input fallback is
the literal "node". TestUpdateHostnameFromClient now exercises the
rewrite end-to-end with awkward macOS/Windows names.
Fixes#3188Fixes#2926Fixes#2343Fixes#2762Fixes#2449
Updates #2177
Updates #2121
Updates #363
NodeStore's writer goroutine now resolves GivenName collisions inside
applyBatch: on PutNode/UpdateNode the landing label gets -N appended
until unique, matching Tailscale SaaS. Empty labels fall back to the
literal "node".
SetGivenName exposes the admin-rename path: validates via
dnsname.ValidLabel and rejects on collision with ErrGivenNameTaken,
so renames do not silently rewrite behind the caller.
Updates #3188
Updates #2926
Updates #2343
Updates #2762
Existing HA tests verify server-side primary election; these add
end-to-end assertions from a viewer client's perspective, that marking
the primary unhealthy or revoking its approved route propagates through
the netmap so the viewer sees the flip.
Updates #3157
Resolve by GivenName (unique per tailnet) before Hostname (client-
reported, may collide); within each pass, pick the lowest NodeID so
results are deterministic across NodeStore snapshot iterations.
Updates #3157
change.Merge keeps the first PingRequest seen when merging, which
means a later probe's callback URL is silently dropped if a stale merge
is in-flight. Document the contract at Merge and at doPing so callers
know to serialise probes.
Updates #3157
dev derives additional ports from --port + offsets (metrics, gRPC,
debug). A --port near uint16 max would overflow silently; add up-front
validation that rejects values that would push derived ports over 65535.
Updates #3157
DestsIsTheInternet now reports the internet when either family's /0
is contained (0.0.0.0/0 or ::/0), matching what operators expect when
they write the /0 directly. Also documents MatchFromStrings fail-open.
Updates #3157
The /debug/ping callback hits /machine/ping-response on the main
TLS router, not the noise chain, so URLIsNoise stays false. Document
this at the emit site to prevent accidental changes.
Updates #3157
Blocked callers waiting on a pingTracker response channel would
hang forever if the server Close()d mid-probe. Drain the pending map on
Close so those goroutines unblock and exit cleanly.
Updates #3157
X-Frame-Options: DENY and frame-ancestors 'none' stop clickjacking
of OIDC, register-confirm, and debug HTML pages. nosniff and no-referrer
are cheap defence-in-depth for the same surfaces.
Updates #3157
chi routes only HEAD to the handler, but assert explicitly so a
future router config change cannot silently accept GET/POST and leak
latency bytes or side-effects.
Updates #3157
elem-go does not escape attribute values, so the raw query reaches
the rendered HTML verbatim. Pre-escape with html.EscapeString to prevent
reflected XSS.
Updates #3157
Subnet routers that advertise routes must not accept peer routes.
With co-router visibility the HA primary's subnet appears in co-routers'
AllowedIPs, and --accept-routes installs a system route that conflicts
with local subnet forwarding.
Updates #3157
Data-driven tests for via grants combined with HA primary routes:
crossed via tags on same prefix, mixed via+regular across HA pairs,
four-way HA, and the kitchen-sink scenario. Each case uses an inline
topology captured from SaaS.
Updates #3157
End-to-end exercise of via-grant compilation against SaaS captures:
peer visibility, AllowedIPs, PrimaryRoutes, and per-rule src/dst
reachability from each viewer's perspective.
Updates #3157
Typed Capture/Input/Node/Topology structs for golden SaaS captures.
Schema drift between the tscap capture tool and headscale now becomes a
compile error instead of a silent test pass.
Updates #3157