645 Commits

Author SHA1 Message Date
Kristoffer Dalby
76ee29352b servertest: cover via-grant exit-node visibility end-to-end
TestGrantViaExitNodeInternetVisibility boots a server, applies a
policy that scopes autogroup:internet to a tag, registers a tagged
exit advertiser and a regular client, and asserts the client's netmap
surfaces the exit node with 0.0.0.0/0 and ::/0 in AllowedIPs — the
substrate the Tailscale client reads to populate
`tailscale exit-node list`.

TestGrantViaExitNodeNoFilterRules retains its assertion (literal /0
absent from the exit node's PacketFilter, matching SaaS PacketFilter
encoding); only its docstring is updated to reflect that the exit
node now does receive a TheInternet-shaped rule, just not the
literal /0 form.

Updates #3233
2026-04-30 19:22:45 +01:00
Kristoffer Dalby
2b7f15abaa policy/v2: surface autogroup:internet via grants on exit nodes
A grant of the form `{src: alice, dst: autogroup:internet, via:
tag:exit1}` was loading without error but stripping every exit node
from alice's view: `tailscale exit-node list` returned "no exit nodes
found".

Two sites skipped autogroup:internet at the compile / steering layer:
compileViaForNode's *AutoGroup arm produced no FilterRule for the
via-tagged exit node, and ViaRoutesForPeer's *AutoGroup arm produced
no Include/Exclude. With pm.needsPerNodeFilter true, the exit node's
matchers were empty, BuildPeerMap could not link source to exit, and
RoutesForPeer's ReduceRoutes stripped 0.0.0.0/0 and ::/0 from
AllowedIPs.

The skip belongs at the wire-format layer (ReduceFilterRules), not at
the compile layer that also feeds internal matchers. Lift
autogroup:internet handling into both *AutoGroup arms with the same
shape used for *Prefix destinations: emit a TheInternet rule on
via-tagged exit advertisers; surface peer.ExitRoutes() in Include
when the peer carries the via tag, Exclude otherwise.
ReduceFilterRules continues to keep the rule on exit-route
advertisers' wire output and strip it elsewhere, preserving SaaS
PacketFilter encoding.

Also drop compileViaForNode's early len(SubnetRoutes)==0 return:
SubnetRoutes excludes exit routes, so the early return pre-empted the
autogroup:internet branch on nodes that only advertise exit routes.

Existing tests pinning the buggy behaviour (TestViaRoutesForPeer
subtests, TestCompileViaGrant case) flipped to the new contract.

Fixes #3233
2026-04-30 19:22:45 +01:00
Kristoffer Dalby
94ec607bca state: per-goroutine deadline in HA probe cycle
`time.After(ProbeTimeout)` returned a single channel shared by every
probe goroutine in the cycle. Only the first goroutine to receive the
deadline tick drains the channel; any other goroutine still waiting on
its `responseCh` is then stuck forever, `wg.Wait()` never returns, and
the scheduler loop in `app.go` stalls on the next tick. The condition
fires whenever two or more nodes time out in the same cycle — common
under cable-pull where IsOnline lags reality and both routers stay in
the candidate set as half-open TCP.

Move the timer inside each goroutine so every probe has its own
deadline.

Updates #3234
2026-04-30 12:52:05 +01:00
Kristoffer Dalby
3d5c0af4e7 state: preserve previous primary when all HA advertisers unhealthy
electPrimaryRoutes' all-unhealthy fallback picked candidates[0]
(lowest NodeID) regardless of who was prev. Under cable-pull
semantics IsOnline lags reality (long-poll TCP half-open), so
both routers stay in candidates and both go Unhealthy via the
prober — the fallback then churned primary to a node that was
itself unreachable.

Prefer prev when still in candidates; fall through to
candidates[0] only when prev is gone. Anti-blackhole holds.

Update the property test reference model and split the unit
test into existence (KeepsAPrimary) and identity
(PreservesPrevious) cases.

Fixes #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
863fa2f815 servertest, integration: cover HA both-offline recovery
Three regression tests for the user scenario: an in-process
Disconnect/Reconnect, a tailscale-down/up integration test, and
an iptables -j DROP cable-pull integration test.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
9f7c8e9a07 state: clear Unhealthy when node leaves HA candidate set
Restore the legacy auto-clear at write boundaries that drop HA
candidacy: Disconnect, SetApprovedRoutes(empty), and
UpdateNodeFromMapRequest shrinking advertised routes to empty.
Plus a defensive guard in SetNodeUnhealthy.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
66ac785c22 state: delete routes package, port primary route tests
Remove hscontrol/routes/. Port the named scenarios and the rapid
property test to hscontrol/state/.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
437754aeea state: switch consumers to NodeStore primary routes
Replace routes.PrimaryRoutes reads with NodeStore. Connect bumps
SessionEpoch; Disconnect re-checks it inside UpdateNode so the
check and mutation are atomic against a concurrent Connect on
the same node.

The connect_race regression test is carried in its final
SessionEpoch form.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
da927eb018 state: compute primary routes inside NodeStore snapshot
Add primaries and isPrimary maps to Snapshot plus an election
algorithm. No callers yet.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
942313a10a types: move DebugRoutes from routes to types
Unblocks deletion of the routes package.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
1fe682b141 types: add Unhealthy and SessionEpoch fields to Node
Runtime-only (gorm:"-") fields read by the HA primary route refactor.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
010a5564c5 all: rephrase prose to fit codebase voice
Reword comments, one doc paragraph, and one test failure message
so the prose reads naturally. No behaviour change.
2026-04-29 16:22:19 +01:00
Akhilesh Arora
de60982d83 state: note tagged-path coverage and self-healing behaviour for #3199
- test: comment that the !regReq.Expiry.IsZero() gate also covers
  the tags-only PreAuthKey path
- CHANGELOG: note pre-existing 0001-01-01 rows self-heal on
  re-registration rather than being backfilled
2026-04-29 13:06:38 +01:00
Akhilesh Arora
0e10ca4e9a state: preserve nil expiry on user owned registration when no default is configured
When a user owned node registers or re registers with a PreAuthKey and the
client sends zero client expiry while node.expiry is set to 0, the expiry
column ends up stored as 0001-01-01 00:00:00 instead of NULL. Two sites in
HandleNodeFromPreAuthKey build a non nil pointer to regReq.Expiry even when
the value is zero time, and the needsDefaultExpiry guard only replaces it
when s.cfg.Node.Expiry > 0, so the pointer to zero time survives to the
database.

Convert an unset regReq.Expiry to nil before handing it off so the
needsDefaultExpiry path is the only place that assigns a non nil pointer.

This is a narrower sibling of #3170 on the user owned PreAuthKey path. The
regression was introduced alongside the fix for #3111 in 6337a3db.
2026-04-29 13:06:38 +01:00
Kristoffer Dalby
c7a0ca709f policy: surface exit nodes via autogroup:internet (#3212)
compileFilterRules skipped autogroup:internet destinations to keep them
out of the wire-format PacketFilter, but those same compiled rules are
the source of pm.matchers — and Node.CanAccess relies on a matcher whose
DestsIsTheInternet covers the public internet to surface exit-node peers
to ACL sources. With the skip in place no such matcher existed, exit
nodes silently dropped out of the source's peer list, and the docs'
exit-node walkthrough stopped working: `tailscale exit-node list`
returned "no exit nodes found" and `tailscale set --exit-node=<ip>`
returned "no node found in netmap with IP".

Drop the compile-time skip so autogroup:internet flows through normal
matcher derivation, and teach ReduceFilterRules to keep the resulting
client packet-filter rule on exit-route advertisers — Tailscale SaaS
sends those rules to exit nodes so the kernel filter accepts traffic
forwarded by autogroup:internet sources.

Verified against a live tailnet on 2026-04-28 via tscap; the b17/b18
captures land under testdata/issue_3212/ as a regression guard. The
captures are isolated from testdata/routes_results/ because the broader
TestRoutesCompat machinery assumes a CIDR-prefix wire format that
differs from the IPSet-range form SaaS emits for autogroup:internet —
aligning that wire format is tracked separately.

Fixes #3212
2026-04-29 11:24:33 +01:00
Kristoffer Dalby
2e1a716a9a policy/v2: fix empty grants/acls returning FilterAllowAll
compileFilterRules, compileGrants, and updateLocked guarded the
"no rules so allow all" fallback with len(pol.Grants) == 0, which
matches both an absent grants field and an explicit empty array.
JSON {"grants": []} unmarshals to a non-nil empty slice; it should
compile to zero filter rules (deny all) to match Tailscale SaaS,
but the length check sent it down the FilterAllowAll path.

Distinguish absent (nil) from explicit-empty by switching the guard
to pol.Grants == nil, the same asymmetry already used for ACLs.
{} keeps allowing all; {"acls": []} and {"grants": []} now both
deny all.

Fixes #3211
2026-04-29 08:55:07 +01:00
Kristoffer Dalby
d6dfdc100c hscontrol: route hostname handling through dnsname and NodeStore
Ingest (registration and MapRequest updates) now calls
dnsname.SanitizeHostname directly and lets NodeStore auto-bump on
collision. Admin rename uses dnsname.ValidLabel + SetGivenName so
conflicts are surfaced to the caller instead of silently mutated.

Three duplicate invalidDNSRegex definitions, the old NormaliseHostname
and ValidateHostname helpers, EnsureHostname, InvalidString,
ApplyHostnameFromHostInfo, GivenNameHasBeenChanged, generateGivenName
and EnsureUniqueGivenName are removed along with their tests.
ValidateHostname's username half is retained as ValidateUsername for
users.go.

The SaaS-matching collision rule replaces the random "invalid-xxxxxx"
fallback and the 8-character hash suffix; the empty-input fallback is
the literal "node". TestUpdateHostnameFromClient now exercises the
rewrite end-to-end with awkward macOS/Windows names.

Fixes #3188
Fixes #2926
Fixes #2343
Fixes #2762
Fixes #2449
Updates #2177
Updates #2121
Updates #363
2026-04-18 15:12:21 +01:00
Kristoffer Dalby
a2c3ac095e state: auto-bump GivenName on collision and add SetGivenName
NodeStore's writer goroutine now resolves GivenName collisions inside
applyBatch: on PutNode/UpdateNode the landing label gets -N appended
until unique, matching Tailscale SaaS. Empty labels fall back to the
literal "node".

SetGivenName exposes the admin-rename path: validates via
dnsname.ValidLabel and rejects on collision with ErrGivenNameTaken,
so renames do not silently rewrite behind the caller.

Updates #3188
Updates #2926
Updates #2343
Updates #2762
2026-04-18 15:12:21 +01:00
Florian Preinstorfer
f1494a32ce Update links to Tailscale documentation 2026-04-18 09:33:41 +02:00
Kristoffer Dalby
436d3db28e servertest: add dynamic HA failover tests
Existing HA tests verify server-side primary election; these add
end-to-end assertions from a viewer client's perspective, that marking
the primary unhealthy or revoking its approved route propagates through
the netmap so the viewer sees the flip.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
978f1e3947 state: tie-break ResolveNode by GivenName then lowest NodeID
Resolve by GivenName (unique per tailnet) before Hostname (client-
reported, may collide); within each pass, pick the lowest NodeID so
results are deterministic across NodeStore snapshot iterations.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
2530d86f1b change: document PingRequest merge first-wins foot-gun
change.Merge keeps the first PingRequest seen when merging, which
means a later probe's callback URL is silently dropped if a stale merge
is in-flight. Document the contract at Merge and at doPing so callers
know to serialise probes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
427b2f15ee matcher: clarify DestsIsTheInternet single-family semantics
DestsIsTheInternet now reports the internet when either family's /0
is contained (0.0.0.0/0 or ::/0), matching what operators expect when
they write the /0 directly. Also documents MatchFromStrings fail-open.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
93e8c7285f debug: explain URLIsNoise choice in ping callback
The /debug/ping callback hits /machine/ping-response on the main
TLS router, not the noise chain, so URLIsNoise stays false. Document
this at the emit site to prevent accidental changes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
842f36225e state: drain pending pings on Close
Blocked callers waiting on a pingTracker response channel would
hang forever if the server Close()d mid-probe. Drain the pending map on
Close so those goroutines unblock and exit cleanly.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
0567cb6da3 app: add security headers middleware
X-Frame-Options: DENY and frame-ancestors 'none' stop clickjacking
of OIDC, register-confirm, and debug HTML pages. nosniff and no-referrer
are cheap defence-in-depth for the same surfaces.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
5a7cafdf85 noise: reject non-HEAD on PingResponseHandler
chi routes only HEAD to the handler, but assert explicitly so a
future router config change cannot silently accept GET/POST and leak
latency bytes or side-effects.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f3eb9a7bba templates: escape query value in ping page
elem-go does not escape attribute values, so the raw query reaches
the rendered HTML verbatim. Pre-escape with html.EscapeString to prevent
reflected XSS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
164d659dd2 servertest: add TestViaGrantHACompat for via+HA compat tests
Data-driven tests for via grants combined with HA primary routes:
crossed via tags on same prefix, mixed via+regular across HA pairs,
four-way HA, and the kitchen-sink scenario. Each case uses an inline
topology captured from SaaS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
7d104b8c8d servertest: add via grant map compat tests
End-to-end exercise of via-grant compilation against SaaS captures:
peer visibility, AllowedIPs, PrimaryRoutes, and per-rule src/dst
reachability from each viewer's perspective.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
a7c9721faa policy/v2: overhaul compat test infrastructure
Reworks the ACL/routes/grant/SSH compat harnesses to read
testcapture.Capture typed files, per-scenario topologies, strict error
wording match, and shared helpers. Surfaces policy-engine drift against
Tailscale SaaS.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f34dec2754 testcapture: add typed capture format package
Typed Capture/Input/Node/Topology structs for golden SaaS captures.
Schema drift between the tscap capture tool and headscale now becomes a
compile error instead of a silent test pass.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
1059c678c4 hscontrol/types: silence zerolog by default in tests
Tests were dumping megabytes of zerolog output on failure; silence
at init and let individual tests opt in via SetGlobalLevel when they need
log-driven assertions.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
affaa1a31d policy/v2: align SSH check action with SaaS wire format
SSH check rules now emit CheckPeriod in seconds (matching
Tailscale SaaS) instead of nanoseconds. Adds golden compat tests covering
accept/check modes.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
ded51a4d30 policyutil: fix reduceCapGrantRule and add route reduction
reduceCapGrantRule was dropping rules whose CapGrant IPs overlap a
subnet route; treat subnet routes as part of node identity so those rules
survive reduction. ReduceFilterRules now also reduces route-reachable
destinations.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
b051e7b2bc policy/v2: wire PolicyManager through compiledGrant
Threads PolicyManager into compiledGrant so via grants resolve
viewer identity at compile time instead of re-resolving per MapRequest.
Adds a matchersForNodeMap cache invalidated on policy reload and on node
add/remove.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
b01e67e8e5 types: consider subnet routes as source identity in ACL matching
CanAccess now treats a node's advertised subnet routes as part of
its source identity, so an ACL granting the subnet-owner as source lets
traffic from the subnet through. Matches SaaS semantics.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
f49c42e716 testdata: add SaaS captures for compat tests
Golden captures of SaaS filter-rules and netmaps across the ACL,
grant, routes, and SSH corpora. These back the data-driven compat tests
that verify headscale's policy output against Tailscale SaaS verbatim.

Updates #3157
2026-04-17 16:31:49 +01:00
Kristoffer Dalby
0378e2d2c6 servertest: add HA health probing tests
Five scenarios: healthy probes, failover on unhealthy primary,
no-flap recovery, connect clears unhealthy, no-op without HA routes.

Updates #2129
Updates #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
8a97dd134b app: wire HA health prober into scheduled tasks
Run the prober on a ticker in scheduledTasks. Enabled by default
(10s interval, 5s timeout). No-op when no HA routes exist.

Fixes #2129
Fixes #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
90e65ccd63 state: add HA health prober
Ping HA subnet routers each probe cycle and mark unresponsive nodes
unhealthy. Reconnecting a node clears its unhealthy state since the
fresh Noise session proves basic connectivity.

Updates #2129
Updates #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
786ce2dce8 routes: add health dimension to HA primary route election
Track unhealthy nodes in PrimaryRoutes so primary election skips them.
When all nodes for a prefix are unhealthy, keep the first as a degraded
primary rather than dropping the route entirely.

Anti-flap is built in: a recovered node becomes standby, not primary,
because updatePrimaryLocked keeps the current primary when still
available and healthy.

Updates #2129
Updates #2902
2026-04-16 15:10:56 +01:00
Kristoffer Dalby
c9dbea5c18 templates: improve ping page spacing and design system usage
- Remove redundant inline button/input styles that duplicate CSS
- Use CSS variables for input (dark mode support)
- Use A(), Ul(), Ol(), P() wrappers from general.go
- Add expandable explanation of what the ping tests
- Fix section spacing rhythm (spaceXL before results, space2XL
  before connected nodes)
- Add flex-wrap for mobile responsiveness
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
0e5569c3fc templates: add detailsBox collapsible component
Add a reusable <details>/<summary> component to the shared design
system. Styled to match the existing card/box component family
(border, radius, CSS variables for dark mode).

Collapsed by default with a clickable summary line.
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
97778c9930 all: add tests for PingRequest implementation
Unit tests for Change (IsEmpty, Merge, Type, PingNode constructor),
ping tracker (register/complete/cancel lifecycle, concurrency, latency),
and end-to-end servertests exercising the full round-trip with real
controlclient.Direct instances.

Updates #2902
Updates #2129
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
b113655b71 all: implement PingRequest for node connectivity checking
Implement tailcfg.PingRequest support so the control server can verify
whether a connected node is still reachable. This is the foundation for
faster offline detection (currently ~16min due to Go HTTP/2 TCP retransmit
behavior) and future C2N communication.

The server sends a PingRequest via MapResponse with a unique callback
URL. The Tailscale client responds with a HEAD request to that URL,
proving connectivity. Round-trip latency is measured.

Wire PingRequest through the Change → Batcher → MapResponse pipeline,
add a ping tracker on State for correlating requests with responses,
add ResolveNode for looking up nodes by ID/IP/hostname, and expose a
/debug/ping page (elem-go form UI) and /machine/ping-response endpoint.

Updates #2902
Updates #2129
2026-04-15 10:53:35 +01:00
Kristoffer Dalby
de5b1eab68 templates: use table layout for registration confirm details
Replace the bullet list of device details with a two-column table
for cleaner visual hierarchy. Labels are bold and left-aligned,
values right-aligned with subtle row separators. The machine key
value uses an inline code style.

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
f066d12153 assets: fix logo alignment and error icon centering
Tighten the SVG viewBox to the actual content bounding box and
remove hardcoded width/height attributes so the browser no longer
adds horizontal padding via preserveAspectRatio. The "h" wordmark
now left-aligns with the page content below it.

Replace the error icon SVG path (which had an off-center X) with
a simple circle + two crossed lines drawn from a centered viewBox.
Both icons now use fill="currentColor" for dark mode adaptation.

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
3918020551 templates: use CSS variables in all shared components
Replace hardcoded Go color constants with var(--hs-*) and
var(--md-*) CSS custom properties in externalLink, orDivider,
card, warningBox, downloadButton, and pageFooter. This ensures
all components follow the dark mode theme automatically.

Also switch pageFooter from div to semantic footer element and
simplify externalLink by letting CSS handle link styling.

Updates juanfont/headscale#3182
2026-04-13 17:23:47 +01:00
Kristoffer Dalby
93860a5c06 all: apply formatter changes 2026-04-13 17:23:47 +01:00