Commit Graph

4102 Commits

Author SHA1 Message Date
Kristoffer Dalby
3d5c0af4e7 state: preserve previous primary when all HA advertisers unhealthy
electPrimaryRoutes' all-unhealthy fallback picked candidates[0]
(lowest NodeID) regardless of who was prev. Under cable-pull
semantics IsOnline lags reality (long-poll TCP half-open), so
both routers stay in candidates and both go Unhealthy via the
prober — the fallback then churned primary to a node that was
itself unreachable.

Prefer prev when still in candidates; fall through to
candidates[0] only when prev is gone. Anti-blackhole holds.

Update the property test reference model and split the unit
test into existence (KeepsAPrimary) and identity
(PreservesPrevious) cases.

Fixes #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
27c9113af8 integration: regenerate workflow for HA docker disconnect test
Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
7bb86f2c16 integration: HA cable-pull lifecycle test
Add DisconnectFromNetwork/ReconnectToNetwork on TailscaleClient
backed by pool.Client.DisconnectNetwork.

Exercise single-router fail+recover either side, sequential dual
failure, and simultaneous dual failure. The dual-failure legs
assert no flap to a known-bad primary; the single-router-return
legs check traffic only because docker network disconnect
transiently fails probes on sibling routers.

Fails on parent; passes after the fix.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
863fa2f815 servertest, integration: cover HA both-offline recovery
Three regression tests for the user scenario: an in-process
Disconnect/Reconnect, a tailscale-down/up integration test, and
an iptables -j DROP cable-pull integration test.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
9f7c8e9a07 state: clear Unhealthy when node leaves HA candidate set
Restore the legacy auto-clear at write boundaries that drop HA
candidacy: Disconnect, SetApprovedRoutes(empty), and
UpdateNodeFromMapRequest shrinking advertised routes to empty.
Plus a defensive guard in SetNodeUnhealthy.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
66ac785c22 state: delete routes package, port primary route tests
Remove hscontrol/routes/. Port the named scenarios and the rapid
property test to hscontrol/state/.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
437754aeea state: switch consumers to NodeStore primary routes
Replace routes.PrimaryRoutes reads with NodeStore. Connect bumps
SessionEpoch; Disconnect re-checks it inside UpdateNode so the
check and mutation are atomic against a concurrent Connect on
the same node.

The connect_race regression test is carried in its final
SessionEpoch form.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
da927eb018 state: compute primary routes inside NodeStore snapshot
Add primaries and isPrimary maps to Snapshot plus an election
algorithm. No callers yet.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
942313a10a types: move DebugRoutes from routes to types
Unblocks deletion of the routes package.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
1fe682b141 types: add Unhealthy and SessionEpoch fields to Node
Runtime-only (gorm:"-") fields read by the HA primary route refactor.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
010a5564c5 all: rephrase prose to fit codebase voice
Reword comments, one doc paragraph, and one test failure message
so the prose reads naturally. No behaviour change.
2026-04-29 16:22:19 +01:00
Akhilesh Arora
de60982d83 state: note tagged-path coverage and self-healing behaviour for #3199
- test: comment that the !regReq.Expiry.IsZero() gate also covers
  the tags-only PreAuthKey path
- CHANGELOG: note pre-existing 0001-01-01 rows self-heal on
  re-registration rather than being backfilled
2026-04-29 13:06:38 +01:00
Akhilesh Arora
bcfaf6ad68 CHANGELOG: note nil expiry preservation fix 2026-04-29 13:06:38 +01:00
Akhilesh Arora
0e10ca4e9a state: preserve nil expiry on user owned registration when no default is configured
When a user owned node registers or re registers with a PreAuthKey and the
client sends zero client expiry while node.expiry is set to 0, the expiry
column ends up stored as 0001-01-01 00:00:00 instead of NULL. Two sites in
HandleNodeFromPreAuthKey build a non nil pointer to regReq.Expiry even when
the value is zero time, and the needsDefaultExpiry guard only replaces it
when s.cfg.Node.Expiry > 0, so the pointer to zero time survives to the
database.

Convert an unset regReq.Expiry to nil before handing it off so the
needsDefaultExpiry path is the only place that assigns a non nil pointer.

This is a narrower sibling of #3170 on the user owned PreAuthKey path. The
regression was introduced alongside the fix for #3111 in 6337a3db.
2026-04-29 13:06:38 +01:00
Kristoffer Dalby
ba251e7b47 integration: cover exit nodes via autogroup:internet ACL (#3212)
TestEnablingExitRoutes runs without an ACL, so tailcfg.FilterAllowAll
hides any policy-path regression. Add a sibling that applies the literal
#3212 policy via hsic.WithACLPolicy after registration and approval,
then asserts each peer carries 0.0.0.0/0 + ::/0 in AllowedIPs and
ExitNodeOption is true — the daemon-derived bool that drives
`tailscale exit-node list`.

Updates #3212
2026-04-29 11:24:33 +01:00
Kristoffer Dalby
c7a0ca709f policy: surface exit nodes via autogroup:internet (#3212)
compileFilterRules skipped autogroup:internet destinations to keep them
out of the wire-format PacketFilter, but those same compiled rules are
the source of pm.matchers — and Node.CanAccess relies on a matcher whose
DestsIsTheInternet covers the public internet to surface exit-node peers
to ACL sources. With the skip in place no such matcher existed, exit
nodes silently dropped out of the source's peer list, and the docs'
exit-node walkthrough stopped working: `tailscale exit-node list`
returned "no exit nodes found" and `tailscale set --exit-node=<ip>`
returned "no node found in netmap with IP".

Drop the compile-time skip so autogroup:internet flows through normal
matcher derivation, and teach ReduceFilterRules to keep the resulting
client packet-filter rule on exit-route advertisers — Tailscale SaaS
sends those rules to exit nodes so the kernel filter accepts traffic
forwarded by autogroup:internet sources.

Verified against a live tailnet on 2026-04-28 via tscap; the b17/b18
captures land under testdata/issue_3212/ as a regression guard. The
captures are isolated from testdata/routes_results/ because the broader
TestRoutesCompat machinery assumes a CIDR-prefix wire format that
differs from the IPSet-range form SaaS emits for autogroup:internet —
aligning that wire format is tracked separately.

Fixes #3212
2026-04-29 11:24:33 +01:00
Kristoffer Dalby
a7d405a255 ci: regenerate test-integration.yaml for TestTailscaleRustAxum
Pick up the new tailscale-rs integration test in the CI job matrix.
2026-04-29 10:13:43 +01:00
Kristoffer Dalby
775bc3a715 integration: add TestTailscaleRustAxum for tailscale-rs
Add an integration test that runs the tailscale-rs axum example
against headscale end-to-end. The test provisions one headscale
instance, one Go tailscale probe client (tsic), and one
tailscale-rs node (tsric) on the same tailnet, then verifies:

  - the tailscale-rs node registers and is assigned both an IPv4
    and an IPv6 address
  - the Go probe sees the tailscale-rs node as a peer in its status
  - GET /index.html and GET /assets/index.css from the axum server
    return the expected content over the tailnet
  - three sequential POST /count calls return distinct, incrementing
    counter values, proving netstack state is maintained across
    multiple TCP connections

This is the first integration test that exercises a non-Go
Tailscale client against headscale, giving end-to-end coverage of
the control protocol for alternate implementations.
2026-04-29 10:13:43 +01:00
Kristoffer Dalby
ea968e2f5d integration/tsric: add TailscaleRustInContainer package
Add TailscaleRustInContainer (tsric), a Rust-client counterpart to
integration/tsic. It runs the axum example from tailscale-rs in a
Docker container and exposes the same lifecycle hooks as tsic
(Shutdown, SaveLog, Execute, WriteFile) so integration tests can
treat it as any other Tailscale node.

Dockerfile.tailscale-rs clones tailscale-rs at build time, so no
local source checkout is required. The repo URL and ref are Docker
build arguments (TAILSCALE_RS_REPO, TAILSCALE_RS_REF) exposed as
tsric.WithRepo / tsric.WithRef options. The HEADSCALE_INTEGRATION_
TAILSCALE_RS_IMAGE environment variable provides an escape hatch
for using a pre-built image instead of building from source.

The Dockerfile patches the cloned root Cargo.toml to expose
ts_control's insecure-keyfetch feature through the tailscale crate
so the axum example can fetch the control key over plain HTTP.
The integration harness serves the control plane without TLS,
which is the only mode tailscale-rs can register against until it
grows a way to inject a custom CA bundle.
2026-04-29 10:13:43 +01:00
Kristoffer Dalby
2e1a716a9a policy/v2: fix empty grants/acls returning FilterAllowAll
compileFilterRules, compileGrants, and updateLocked guarded the
"no rules so allow all" fallback with len(pol.Grants) == 0, which
matches both an absent grants field and an explicit empty array.
JSON {"grants": []} unmarshals to a non-nil empty slice; it should
compile to zero filter rules (deny all) to match Tailscale SaaS,
but the length check sent it down the FilterAllowAll path.

Distinguish absent (nil) from explicit-empty by switching the guard
to pol.Grants == nil, the same asymmetry already used for ACLs.
{} keeps allowing all; {"acls": []} and {"grants": []} now both
deny all.

Fixes #3211
2026-04-29 08:55:07 +01:00
Kristoffer Dalby
174e409da6 github: drop nu flatten in needs-more-info timer
The scheduled job has failed every night since 2026-04-13. Two
prior fixes (race-condition guards and splitting the combined
where predicate in #3200) did not address the actual cause:
`flatten` collapses nested records in a list-of-records pipeline,
so after `gh api ... | from json | flatten` the `label` and `user`
columns no longer exist - their fields are lifted to the top
level (with prefix only on naming collisions). `where label.name
== ...` and `where user.type != "Bot"` then both reference a
column that is not there.

`gh api --paginate` already returns a single concatenated JSON
array, so `from json` produces a list of records directly and no
flattening is needed. Drop both `| flatten` calls.

Verified locally with nu 0.108 against the events stream of issue
#3178: without flatten, `where event == "labeled" | where
label.name == "needs-more-info" | last` returns the labeled event
with its `label` record intact.
2026-04-28 16:36:32 +01:00
QEDeD
3672a2df3a Fix typo in API key creation help text
Correct loose (opposite of tight) to lose (opposite of keep).
2026-04-28 08:50:26 +02:00
Kristoffer Dalby
ce5d1ba8f8 github: split nu where in needs-more-info timer
GitHub's /issues/:n/events endpoint returns a mixed-schema table.
Only labeled/unlabeled rows carry a `label` column. Nu's `where`
does not short-circuit on missing columns, so the combined
predicate `event == "labeled" and label.name == ...` dereferenced
`label.name` on every row and crashed on the first non-labeled
event with `nu::shell::column_not_found`.

The scheduled job has failed every night since 2026-04-13 (the
first run with a labeled issue), so no `needs-more-info` issue
has been auto-closed.

Split into two sequential `where` filters so `label.name` is only
accessed on rows that have the column.
2026-04-18 15:39:29 +01:00
Kristoffer Dalby
4e1d83ecef CHANGELOG: document hostname cleanroom rewrite
Summarise the ingest rewrite, the SaaS-matching collision rule, and the
BREAKING change from random-suffix to numeric-suffix collision labels
and from "invalid-<rand>" to the literal "node" fallback.

Updates #3188
2026-04-18 15:12:21 +01:00
Kristoffer Dalby
d6dfdc100c hscontrol: route hostname handling through dnsname and NodeStore
Ingest (registration and MapRequest updates) now calls
dnsname.SanitizeHostname directly and lets NodeStore auto-bump on
collision. Admin rename uses dnsname.ValidLabel + SetGivenName so
conflicts are surfaced to the caller instead of silently mutated.

Three duplicate invalidDNSRegex definitions, the old NormaliseHostname
and ValidateHostname helpers, EnsureHostname, InvalidString,
ApplyHostnameFromHostInfo, GivenNameHasBeenChanged, generateGivenName
and EnsureUniqueGivenName are removed along with their tests.
ValidateHostname's username half is retained as ValidateUsername for
users.go.

The SaaS-matching collision rule replaces the random "invalid-xxxxxx"
fallback and the 8-character hash suffix; the empty-input fallback is
the literal "node". TestUpdateHostnameFromClient now exercises the
rewrite end-to-end with awkward macOS/Windows names.

Fixes #3188
Fixes #2926
Fixes #2343
Fixes #2762
Fixes #2449
Updates #2177
Updates #2121
Updates #363
2026-04-18 15:12:21 +01:00
Kristoffer Dalby
a2c3ac095e state: auto-bump GivenName on collision and add SetGivenName
NodeStore's writer goroutine now resolves GivenName collisions inside
applyBatch: on PutNode/UpdateNode the landing label gets -N appended
until unique, matching Tailscale SaaS. Empty labels fall back to the
literal "node".

SetGivenName exposes the admin-rename path: validates via
dnsname.ValidLabel and rejects on collision with ErrGivenNameTaken,
so renames do not silently rewrite behind the caller.

Updates #3188
Updates #2926
Updates #2343
Updates #2762
2026-04-18 15:12:21 +01:00
Florian Preinstorfer
f1494a32ce Update links to Tailscale documentation 2026-04-18 09:33:41 +02:00
Florian Preinstorfer
7e6c7924ad Document availability of autgroup:internet 2026-04-18 09:33:41 +02:00
Florian Preinstorfer
9ea09ea4b6 Remove changelog section for 0.28.1 2026-04-18 09:33:41 +02:00
Kristoffer Dalby
436d3db28e servertest: add dynamic HA failover tests
Existing HA tests verify server-side primary election; these add
end-to-end assertions from a viewer client's perspective, that marking
the primary unhealthy or revoking its approved route propagates through
the netmap so the viewer sees the flip.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
978f1e3947 state: tie-break ResolveNode by GivenName then lowest NodeID
Resolve by GivenName (unique per tailnet) before Hostname (client-
reported, may collide); within each pass, pick the lowest NodeID so
results are deterministic across NodeStore snapshot iterations.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
2530d86f1b change: document PingRequest merge first-wins foot-gun
change.Merge keeps the first PingRequest seen when merging, which
means a later probe's callback URL is silently dropped if a stale merge
is in-flight. Document the contract at Merge and at doPing so callers
know to serialise probes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
1a58b77271 cmd/dev: validate --port fits the derived-port range
dev derives additional ports from --port + offsets (metrics, gRPC,
debug). A --port near uint16 max would overflow silently; add up-front
validation that rejects values that would push derived ports over 65535.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
427b2f15ee matcher: clarify DestsIsTheInternet single-family semantics
DestsIsTheInternet now reports the internet when either family's /0
is contained (0.0.0.0/0 or ::/0), matching what operators expect when
they write the /0 directly. Also documents MatchFromStrings fail-open.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
93e8c7285f debug: explain URLIsNoise choice in ping callback
The /debug/ping callback hits /machine/ping-response on the main
TLS router, not the noise chain, so URLIsNoise stays false. Document
this at the emit site to prevent accidental changes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
842f36225e state: drain pending pings on Close
Blocked callers waiting on a pingTracker response channel would
hang forever if the server Close()d mid-probe. Drain the pending map on
Close so those goroutines unblock and exit cleanly.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
0567cb6da3 app: add security headers middleware
X-Frame-Options: DENY and frame-ancestors 'none' stop clickjacking
of OIDC, register-confirm, and debug HTML pages. nosniff and no-referrer
are cheap defence-in-depth for the same surfaces.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
5a7cafdf85 noise: reject non-HEAD on PingResponseHandler
chi routes only HEAD to the handler, but assert explicitly so a
future router config change cannot silently accept GET/POST and leak
latency bytes or side-effects.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f3eb9a7bba templates: escape query value in ping page
elem-go does not escape attribute values, so the raw query reaches
the rendered HTML verbatim. Pre-escape with html.EscapeString to prevent
reflected XSS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
3a4af8cf87 integration: remove --accept-routes from via steering routers
Subnet routers that advertise routes must not accept peer routes.
With co-router visibility the HA primary's subnet appears in co-routers'
AllowedIPs, and --accept-routes installs a system route that conflicts
with local subnet forwarding.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
ec48f34e1c CHANGELOG: document subnet-to-subnet ACL fixes
Fixes #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
164d659dd2 servertest: add TestViaGrantHACompat for via+HA compat tests
Data-driven tests for via grants combined with HA primary routes:
crossed via tags on same prefix, mixed via+regular across HA pairs,
four-way HA, and the kitchen-sink scenario. Each case uses an inline
topology captured from SaaS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
7d104b8c8d servertest: add via grant map compat tests
End-to-end exercise of via-grant compilation against SaaS captures:
peer visibility, AllowedIPs, PrimaryRoutes, and per-rule src/dst
reachability from each viewer's perspective.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
a7c9721faa policy/v2: overhaul compat test infrastructure
Reworks the ACL/routes/grant/SSH compat harnesses to read
testcapture.Capture typed files, per-scenario topologies, strict error
wording match, and shared helpers. Surfaces policy-engine drift against
Tailscale SaaS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f34dec2754 testcapture: add typed capture format package
Typed Capture/Input/Node/Topology structs for golden SaaS captures.
Schema drift between the tscap capture tool and headscale now becomes a
compile error instead of a silent test pass.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
1059c678c4 hscontrol/types: silence zerolog by default in tests
Tests were dumping megabytes of zerolog output on failure; silence
at init and let individual tests opt in via SetGlobalLevel when they need
log-driven assertions.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
affaa1a31d policy/v2: align SSH check action with SaaS wire format
SSH check rules now emit CheckPeriod in seconds (matching
Tailscale SaaS) instead of nanoseconds. Adds golden compat tests covering
accept/check modes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
ded51a4d30 policyutil: fix reduceCapGrantRule and add route reduction
reduceCapGrantRule was dropping rules whose CapGrant IPs overlap a
subnet route; treat subnet routes as part of node identity so those rules
survive reduction. ReduceFilterRules now also reduces route-reachable
destinations.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
b051e7b2bc policy/v2: wire PolicyManager through compiledGrant
Threads PolicyManager into compiledGrant so via grants resolve
viewer identity at compile time instead of re-resolving per MapRequest.
Adds a matchersForNodeMap cache invalidated on policy reload and on node
add/remove.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
b01e67e8e5 types: consider subnet routes as source identity in ACL matching
CanAccess now treats a node's advertised subnet routes as part of
its source identity, so an ACL granting the subnet-owner as source lets
traffic from the subnet through. Matches SaaS semantics.

Updates #3157
2026-04-17 16:31:49 +01:00