Expand TestPrimaryRoutesProperty (5 -> 9 ops). New ops mirror the
production shapes the failure cases hit: BatchProbeResults via
UpdateNodes, SimultaneousDisconnect via UpdateNodes, SetApprovedRoutes
that leaves announced RoutableIPs intact, OfflineExpiry that keeps
Unhealthy set. The model now tracks announced and approved separately
and recomputes the intersection.
Strengthen the per-op assertions to cover invariants the model alone
cannot prove: every primary must be online, every primary must
currently advertise its prefix, no flap onto an unhealthy candidate
when a healthy one was available, no flap off a previous primary that
remains a healthy candidate. The check now takes a pre-op snapshot so
the anti-flap rule has a stable reference.
Add TestHAProberProperty in servertest. It drives a real TestServer
with three HA-route-advertising clients through rapid-drawn sequences
of ClientDisconnect / ClientReconnect / ProberTick / WaitForSnapshot
ops and re-checks the same shape invariants after every step.
Document the system in hscontrol/state/HA_INVARIANTS.md: a state
machine over (Healthy+Online, Unhealthy+Online, Offline,
OfflineExpired), fifteen numbered invariants with predicates and
violation paths, and a coverage matrix mapping each invariant to its
unit, servertest, and integration tests. Three rows pin the recent
fixes to the invariants they enforce.
requirePrimaryStable in TestHASubnetRouterFailoverDockerDisconnect
Phase 5a (simultaneous cable-pull of both routers) intermittently
caught the primary flipping to the offline r1. Both probe goroutines
mark their target unhealthy back-to-back; SetNodeUnhealthy publishes a
fresh NodeStore snapshot each call, so the intermediate snapshot — r1
unhealthy, r2 still healthy — runs the election with one healthy
candidate left and picks it. The next snapshot then enters the
all-unhealthy preserve-prev path, which preserves the wrong choice.
Collect probe results from the cycle and apply them through a new
NodeStore.UpdateNodes batched op so the election only runs once, with
the cycle's final health state. PolicyChange dispatch moves outside
the wg.Go goroutines and fires once if the primary assignment
actually changed.
The HA prober dispatches a PingRequest, waits ProbeTimeout (5s), and
marks the node unhealthy if no callback arrives. A node that bounced
its poll session between probe cycles satisfies two conditions that
conspire to fail TestHASubnetRouterFailover: a probe queued against
the previous session is silently dropped when the worker writes to
the closed connection (timeout always fires), and a probe sent
immediately after reconnect lands while wgengine is still rebuilding
magicsock state from the new netmap. Either path installs a spurious
unhealthy bit, which sends the preserved-primary anti-flap the wrong
way.
Record the session observed at dispatch time and drop the timeout
path if the node reconnected since. Require the session to survive
a full probe cycle before a timeout can drive a failover.
TestGrantViaSubnetFilterRules pins exact-equality dst. Add a sibling
for the broader-dst case so the regression sits at the server level
alongside the policy-engine unit test.
Updates #3267
Endpoints, Tags and ApprovedRoutes serialize as JSON on Node. GORM's
struct Updates path skips fields it considers zero, and reflect treats
a nil slice as zero — clearing any of these columns via the State
persist path would leave the previous value in the database.
Introduce Strings, Prefixes and AddrPorts as named slice types whose
IsZero() always reports false, so GORM keeps the column in the UPDATE
regardless of the slice being nil or empty. JSON marshalling is
unchanged: nil serializes to null, empty to []. List() returns the
underlying unnamed slice for callers (mainly testify assertions over
reflect.DeepEqual) that distinguish the named type from its base.
Regenerated types_clone.go and types_view.go follow the field-type
swap. Test assertions across hscontrol/{db,state,servertest} updated
to call .List() where reflect.DeepEqual previously matched the raw
slice type.
Fixes#3110
Tailscale stamps tailcfg.NodeAttrDefaultAutoUpdate on every node's
CapMap with a JSON bool reflecting the tailnet-wide auto-update
default. Headscale grows an auto_update.enabled config option and
emits the cap accordingly from TailNode -- the cap leaves the
unmodelledTailnetStateCaps strip list and is compared in full by the
nodeAttrs compat suite.
testNodeAttrsSuccess drives cfg.AutoUpdate.Enabled from
tf.Input.Tailnet.Settings.DevicesAutoUpdatesOn so each capture's
expected emission matches the SaaS state it was taken under. Two
captures cover both branches:
- nodeattrs-tailnet-devices-auto-updates-on -> [true]
- nodeattrs-tailnet-devices-auto-updates-off -> [false]
The Tailscale v2 TailnetSettings API does not expose the Send Files
toggle, so the compat suite cannot vary cfg.Taildrop.Enabled per
capture. TestTaildropDisabledWithholdsFileSharingCap covers the off
path directly in servertest.
Taildrive (drive:share and drive:access) is policy-driven per
Tailscale's documented behaviour
(https://tailscale.com/docs/features/taildrive). The previous
always-on baseline emission diverged from SaaS for every node not
targeted by a drive nodeAttr -- a real semantic divergence that the
compat suite caught once the test moved to comparing TailNode output
against the captured netmaps.
types.Node.TailNode no longer stamps the drive pair. Operators
wanting taildrive add a nodeAttrs entry:
"nodeAttrs": [
{ "target": ["*"], "attr": ["drive:share", "drive:access"] }
]
unmodelledTailnetStateCaps shrinks accordingly. The baseline-divergence
group is gone; every entry left in the list is genuinely unmodelled
(user-role caps, unimplemented features, tailnet metadata, internal
tuning).
servertest's TestNodeAttrsBaselineCapsAlwaysOn expects the smaller
baseline (admin + ssh + file-sharing). Integration TestGrantCapDrive
grants the drive caps explicitly via NodeAttrs to exercise the
policy-driven emission path.
Adds a data-driven test that loads testdata/nodeattrs_results/*.hujson
and diffs the captured SaaS-rendered netmaps against headscale's
compileNodeAttrs output. Each capture is one scenario the SaaS
control plane has rendered against the same policy headscale is asked
to compile -- the test enforces shape parity per node.
tailnet_state_caps.go enumerates the caps SaaS emits where headscale
has no equivalent concept yet (user-role admin/owner, tailnet lock,
services host, app connectors, internal magicsock and SSH tuning,
tailnet-state metadata) plus the always-on baseline (admin, ssh,
file-sharing) and the taildrive pair. stripUnmodelledTailnetStateCaps
filters both sides of cmp.Diff so the comparison focuses on the
policy-driven caps. PeerCapMap encodes which caps the Tailscale
client reads from the peer view (suggest-exit-node when exit routes
are approved, etc.) for use by the mapper.
testcapture switches to typed tailcfg/netmap/filtertype/apitype
values so schema drift between the capture tool and headscale
becomes a compile error rather than a silent test failure. Existing
compat suites (acl, grants, routes, ssh, issue_3212) move to the
typed shape.
The 53 SelfNode netmap captures and the 7 anonymizer-corrupted
suggest-charmander -> suggest-exit-node restorations in
routes_results / issue_3212 ride along.
WithSelfNode and buildTailPeers merge each node's policy CapMap
into the tailcfg.Node.CapMap they emit. State.NodeCapMap and
State.NodeCapMaps wrap the policy manager: NodeCapMap returns a
defensive clone per call; NodeCapMaps snapshots the full per-node
map once for batched callers, amortising pm.mu acquisition across
a peer build.
generateDNSConfig grew a per-node CapMap argument so it can apply
nodeAttr-driven DNS overlays. The nextdns DoH rewrite hardens against
policy-controlled inputs:
- nextDNSDoHHost anchors the prefix match instead of substring,
so a hostile resolver URL cannot smuggle a nextdns hostname in
a path or query.
- nextDNSProfileFromCapMap accepts only profile names matching
[A-Za-z0-9._-]{1,64} and picks the lexicographically first when
multiple are granted -- deterministic, no shell metacharacters
or URL fragments through.
- addNextDNSMetadata composes the rewritten URL via url.Parse +
url.Values rather than fmt.Sprintf, so existing query strings
on the resolver URL survive and metadata cannot inject a new
component.
WithTaildropEnabled in servertest controls cfg.Taildrop.Enabled per
test so cap/file-sharing emission can be toggled in tests that need
to verify the off path.
TestGrantViaExitNodeInternetVisibility boots a server, applies a
policy that scopes autogroup:internet to a tag, registers a tagged
exit advertiser and a regular client, and asserts the client's netmap
surfaces the exit node with 0.0.0.0/0 and ::/0 in AllowedIPs — the
substrate the Tailscale client reads to populate
`tailscale exit-node list`.
TestGrantViaExitNodeNoFilterRules retains its assertion (literal /0
absent from the exit node's PacketFilter, matching SaaS PacketFilter
encoding); only its docstring is updated to reflect that the exit
node now does receive a TheInternet-shaped rule, just not the
literal /0 form.
Updates #3233
Three regression tests for the user scenario: an in-process
Disconnect/Reconnect, a tailscale-down/up integration test, and
an iptables -j DROP cable-pull integration test.
Updates #3203
Restore the legacy auto-clear at write boundaries that drop HA
candidacy: Disconnect, SetApprovedRoutes(empty), and
UpdateNodeFromMapRequest shrinking advertised routes to empty.
Plus a defensive guard in SetNodeUnhealthy.
Updates #3203
Replace routes.PrimaryRoutes reads with NodeStore. Connect bumps
SessionEpoch; Disconnect re-checks it inside UpdateNode so the
check and mutation are atomic against a concurrent Connect on
the same node.
The connect_race regression test is carried in its final
SessionEpoch form.
Updates #3203
Existing HA tests verify server-side primary election; these add
end-to-end assertions from a viewer client's perspective, that marking
the primary unhealthy or revoking its approved route propagates through
the netmap so the viewer sees the flip.
Updates #3157
chi routes only HEAD to the handler, but assert explicitly so a
future router config change cannot silently accept GET/POST and leak
latency bytes or side-effects.
Updates #3157
Data-driven tests for via grants combined with HA primary routes:
crossed via tags on same prefix, mixed via+regular across HA pairs,
four-way HA, and the kitchen-sink scenario. Each case uses an inline
topology captured from SaaS.
Updates #3157
End-to-end exercise of via-grant compilation against SaaS captures:
peer visibility, AllowedIPs, PrimaryRoutes, and per-rule src/dst
reachability from each viewer's perspective.
Updates #3157
Unit tests for Change (IsEmpty, Merge, Type, PingNode constructor),
ping tracker (register/complete/cancel lifecycle, concurrency, latency),
and end-to-end servertests exercising the full round-trip with real
controlclient.Direct instances.
Updates #2902
Updates #2129
Replace zcache with golang-lru/v2/expirable for both the state auth
cache and the OIDC state cache. Add tuning.register_cache_max_entries
(default 1024) to cap the number of pending registration entries.
Introduce types.RegistrationData to replace caching a full *Node;
only the fields the registration callback path reads are retained.
Remove the dead HSDatabase.regCache field. Drop zgo.at/zcache/v2
from go.mod.
Replace httptest (real TCP sockets) with tailscale.com/net/memnet
so all connections stay in-process. Wire the client's tsdial.Dialer
to the server's memnet.Network via SetSystemDialerForTest,
preserving the full Noise protocol path.
Also update servertest to use the new Node.Ephemeral.InactivityTimeout
config path introduced in the types refactor, and add WithNodeExpiry
server option for testing default node key expiry behaviour.
Updates #1711
Rename all 594 test data files from .json to .hujson and add
descriptive header comments to each file documenting what policy
rules are under test and what outcome is expected.
Update test loaders in all 5 _test.go files to parse HuJSON via
hujson.Parse/Standardize/Pack before json.Unmarshal.
Add cross-dependency warning to via_compat_test.go documenting
that GRANT-V29/V30/V31/V36 are shared with TestGrantsCompat.
Add .gitignore exemption for testdata HuJSON files.
Remove TestGrantViaExitNodeSteering and TestGrantViaMixedSteering.
Exit node traffic forwarding through via grants cannot be validated
with curl/traceroute in Docker containers because Tailscale exit nodes
strip locally-connected subnets from their forwarding filter.
The correctness of via exit steering is validated by:
- Golden MapResponse comparison (TestViaGrantMapCompat with GRANT-V31
and GRANT-V36) comparing full netmap output against Tailscale SaaS
- Filter rule compatibility (TestGrantsCompat with GRANT-V14 through
GRANT-V36) comparing per-node PacketFilter rules against Tailscale SaaS
- TestGrantViaSubnetSteering (kept) validates via subnet steering with
actual curl/traceroute through Docker, which works for subnet routes
Updates #2180
Use per-node compilation path for via grants in BuildPeerMap and MatchersForNode to ensure via-granted nodes appear in peer maps. Fix ViaRoutesForPeer golden test route inference to correctly resolve via grant effects.
Updates #2180
Add golden test data for via exit route steering and fix via exit grant compilation to match Tailscale SaaS behavior. Includes MapResponse golden tests for via grant route steering verification.
Updates #2180
Add three tests that verify control plane behavior for grant policies:
- TestGrantViaSubnetFilterRules: verifies the router's PacketFilter
contains destination rules for via-steered subnets. Without per-node
filter compilation for via grants, these rules were missing and the
router would drop forwarded traffic.
- TestGrantViaExitNodeFilterRules: same verification for exit nodes
with via grants steering autogroup:internet traffic.
- TestGrantIPv6OnlyPrefixACL: verifies that address-based aliases
(Prefix, Host) resolve to exactly the literal prefix and do not
expand to include the matching node's other IP addresses. An
IPv6-only host definition produces only IPv6 filter rules.
Updates #2180
Add servertest grant policy control plane tests covering basic grants, via grants, and cap grants. Fix ReduceRoutes in State to apply route reduction to non-via routes first, then append via-included routes, preventing via grant routes from being incorrectly filtered.
Updates #2180
WithTags was defined but never passed through to CreatePreAuthKey.
Fix NewClient to use CreateTaggedPreAuthKey when tags are specified,
enabling tests that need tagged nodes (e.g. via grant steering).
Updates #2180
Three corrections to issue tests that had wrong assumptions about
when data becomes available:
1. initial_map_should_include_peer_online_status: use WaitForCondition
instead of checking the initial netmap. Online status is set by
Connect() which sends a PeerChange patch after the initial
RegisterResponse, so it may not be present immediately.
2. disco_key_should_propagate_to_peers: use WaitForCondition. The
DiscoKey is sent in the first MapRequest (not RegisterRequest),
so peers may not see it until a subsequent map update.
3. approved_route_without_announcement: invert the test expectation.
Tailscale uses a strict advertise-then-approve model -- routes are
only distributed when the node advertises them (Hostinfo.RoutableIPs)
AND they are approved. An approval without advertisement is a dormant
pre-approval. The test now asserts the route does NOT appear in
AllowedIPs, matching upstream Tailscale semantics.
Also fix TestClient.Reconnect to clear the cached netmap and drain
pending updates before re-registering. Without this, WaitForPeers
returned immediately based on the old session's stale data.
Add three test files designed to stress the control plane under
concurrent and adversarial conditions:
- race_test.go: 14 tests exercising concurrent mutations, session
replacement, batcher contention, NodeStore access, and map response
delivery during disconnect. All pass the Go race detector.
- poll_race_test.go: 8 tests targeting the poll.go grace period
interleaving. These confirm a logical TOCTOU race: when a node
disconnects and reconnects within the grace period, the old
session's deferred Disconnect() can overwrite the new session's
Connect(), leaving IsOnline=false despite an active poll session.
- stress_test.go: sustained churn, rapid mutations, rolling
replacement, data integrity checks under load, and verification
that rapid reconnects do not leak false-offline notifications.
Known failing tests (grace period TOCTOU race):
- server_state_online_after_reconnect_within_grace
- update_history_no_false_offline
- rapid_reconnect_peer_never_sees_offline
Split TestIssues into 7 focused test functions to stay under cyclomatic
complexity limits while testing more aggressively.
Issues surfaced (4 failing tests):
1. initial_map_should_include_peer_online_status: Initial MapResponse
has Online=nil for peers. Online status only arrives later via
PeersChangedPatch.
2. disco_key_should_propagate_to_peers: DiscoPublicKey set by client
is not visible to peers. Peers see zero disco key.
3. approved_route_without_announcement_is_visible: Server-side route
approval without client-side announcement silently produces empty
SubnetRoutes (intersection of empty announced + approved = empty).
4. nodestore_correct_after_rapid_reconnect: After 5 rapid reconnect
cycles, NodeStore reports node as offline despite having an active
poll session. The connect/disconnect grace period interleaving
leaves IsOnline in an incorrect state.
Passing tests (20) verify:
- IP uniqueness across 10 nodes
- IP stability across reconnect
- New peers have addresses immediately
- Node rename propagates to peers
- Node delete removes from all peer lists
- Hostinfo changes (OS field) propagate
- NodeStore/DB consistency after route mutations
- Grace period timing (8-20s window)
- Ephemeral node deletion (not just offline)
- 10-node simultaneous connect convergence
- Rapid sequential node additions
- Reconnect produces complete map
- Cross-user visibility with default policy
- Same-user multiple nodes get distinct IDs
- Same-hostname nodes get unique GivenNames
- Policy change during connect still converges
- DERP region references are valid
- User profiles present for self and peers
- Self-update arrives after route approval
- Route advertisement stored as AnnouncedRoutes
Extend the servertest harness with:
- TestClient.Direct() accessor for advanced operations
- TestClient.WaitForPeerCount and WaitForCondition helpers
- TestHarness.ChangePolicy for ACL policy testing
- AssertDERPMapPresent and AssertSelfHasAddresses
New test suites:
- content_test.go: self node, DERP map, peer properties, user profiles,
update history monotonicity, and endpoint update propagation
- policy_test.go: default allow-all, explicit policy, policy triggers
updates on all nodes, multiple policy changes, multi-user mesh
- ephemeral_test.go: ephemeral connect, cleanup after disconnect,
mixed ephemeral/regular, reconnect prevents cleanup
- routes_test.go: addresses in AllowedIPs, route advertise and approve,
advertised routes via hostinfo, CGNAT range validation
Also fix node_departs test to use WaitForCondition instead of
assert.Eventually, and convert concurrent_join_and_leave to
interleaved_join_and_leave with grace-period-tolerant assertions.
Add three test files exercising the servertest harness:
- lifecycle_test.go: connection, disconnection, reconnection, session
replacement, and mesh formation at various sizes.
- consistency_test.go: symmetric visibility, consistent peer state,
address presence, concurrent join/leave convergence.
- weather_test.go: rapid reconnects, flapping stability, reconnect
with various delays, concurrent reconnects, and scale tests.
All tests use table-driven patterns with subtests.
Add a new hscontrol/servertest package that provides a test harness
for exercising the full Headscale control protocol in-process, using
Tailscale's controlclient.Direct as the client.
The harness consists of:
- TestServer: wraps a Headscale instance with an httptest.Server
- TestClient: wraps controlclient.Direct with NetworkMap tracking
- TestHarness: orchestrates N clients against a single server
- Assertion helpers for mesh completeness, visibility, and consistency
Export minimal accessor methods on Headscale (HTTPHandler, NoisePublicKey,
GetState, SetServerURL, StartBatcher, StartEphemeralGC) so the servertest
package can construct a working server from outside the hscontrol package.
This enables fast, deterministic tests of connection lifecycle, update
propagation, and network weather scenarios without Docker.