Commit Graph

4153 Commits

Author SHA1 Message Date
Kristoffer Dalby
8ea4cd3faa types/node, policy/v2: drop taildrive caps from baseline emission
Taildrive (drive:share and drive:access) is policy-driven per
Tailscale's documented behaviour
(https://tailscale.com/docs/features/taildrive). The previous
always-on baseline emission diverged from SaaS for every node not
targeted by a drive nodeAttr -- a real semantic divergence that the
compat suite caught once the test moved to comparing TailNode output
against the captured netmaps.

types.Node.TailNode no longer stamps the drive pair. Operators
wanting taildrive add a nodeAttrs entry:

  "nodeAttrs": [
    { "target": ["*"], "attr": ["drive:share", "drive:access"] }
  ]

unmodelledTailnetStateCaps shrinks accordingly. The baseline-divergence
group is gone; every entry left in the list is genuinely unmodelled
(user-role caps, unimplemented features, tailnet metadata, internal
tuning).

servertest's TestNodeAttrsBaselineCapsAlwaysOn expects the smaller
baseline (admin + ssh + file-sharing). Integration TestGrantCapDrive
grants the drive caps explicitly via NodeAttrs to exercise the
policy-driven emission path.
2026-05-13 14:22:30 +02:00
Kristoffer Dalby
5ebc53c29e types/node, mapper, policy/v2: assemble self CapMap inside TailNode
types.NodeView.TailNode takes a selfPolicyCaps tailcfg.NodeCapMap
parameter and merges it into the baseline. The mapper's WithSelfNode
hands it the policy result via state.NodeCapMap; peer-path callers
pass nil because peer-side CapMap is set downstream via
policyv2.PeerCapMap.

The nodeAttrs compat test now diffs the full TailNode self-view
output against captured SaaS netmaps. Before this change the test
compared compileNodeAttrs alone -- the policy-only output -- and
needed a strip list to compensate for the missing baseline. With
TailNode on the diff path, baseline emission is exercised end-to-end
by every capture; a regression in TailNode breaks the suite.

unmodelledTailnetStateCaps drops cap/ssh and cap/file-sharing now
that both sides emit them identically. The file header is rewritten
to read as 'caps SaaS emits where headscale has no equivalent yet'
rather than the more confusing 'shape divergence' framing.
2026-05-13 14:22:30 +02:00
Kristoffer Dalby
b3f795f0b4 mapper, policy/v2: stamp suggest-exit-node on Peer.CapMap when exit routes approved
The Tailscale client surfaces 'use this peer as your exit node' when
the peer's CapMap carries the tailcfg.NodeAttrSuggestExitNode cap.
SaaS emits it only on peers whose advertised exit routes are
approved -- not every peer that just has the cap in its own
nodeAttrs slot.

policyv2.PeerCapMap encodes that emission rule: it walks the
peer's own self-CapMap (built from compileNodeAttrs) and surfaces
the gated entries (today just suggest-exit-node when the peer
IsExitNode). Mapper.buildTailPeers calls it for each peer instead
of merging the peer's full nodeAttrs CapMap onto its peer view.

allCapMaps snapshots the full per-node CapMap once per peer-list
build so pm.mu is acquired once rather than per peer.
2026-05-13 14:22:30 +02:00
Kristoffer Dalby
078b9e308f policy/v2: SaaS-derived compat tests for nodeAttrs
Adds a data-driven test that loads testdata/nodeattrs_results/*.hujson
and diffs the captured SaaS-rendered netmaps against headscale's
compileNodeAttrs output. Each capture is one scenario the SaaS
control plane has rendered against the same policy headscale is asked
to compile -- the test enforces shape parity per node.

tailnet_state_caps.go enumerates the caps SaaS emits where headscale
has no equivalent concept yet (user-role admin/owner, tailnet lock,
services host, app connectors, internal magicsock and SSH tuning,
tailnet-state metadata) plus the always-on baseline (admin, ssh,
file-sharing) and the taildrive pair. stripUnmodelledTailnetStateCaps
filters both sides of cmp.Diff so the comparison focuses on the
policy-driven caps. PeerCapMap encodes which caps the Tailscale
client reads from the peer view (suggest-exit-node when exit routes
are approved, etc.) for use by the mapper.

testcapture switches to typed tailcfg/netmap/filtertype/apitype
values so schema drift between the capture tool and headscale
becomes a compile error rather than a silent test failure. Existing
compat suites (acl, grants, routes, ssh, issue_3212) move to the
typed shape.

The 53 SelfNode netmap captures and the 7 anonymizer-corrupted
suggest-charmander -> suggest-exit-node restorations in
routes_results / issue_3212 ride along.
2026-05-13 14:22:30 +02:00
Kristoffer Dalby
3f73ed5404 config, types: move randomize_client_port from server config to policy file
Tailscale models the randomize-client-port toggle as a top-level
field on the ACL policy. Headscale now matches that shape: the
server-config randomize_client_port key is removed, the toggle
lives in the policy file as randomizeClientPort, and per-node
opt-in via nodeAttrs is also supported.

Operators upgrading from a config-set randomize_client_port hit
depr.fatalWithHint at startup, which prints the deprecation message
and points at the new policy field rather than silently dropping
the toggle. The default carries over (false) so operators who never
set it are unaffected. config-example.yaml ships a REMOVED stanza
showing the migration.

types/node.go drops the cfg.RandomizeClientPort read from
TailNode -- the cap is now policy-driven through compileNodeAttrs
and the tail_test.go expectations follow.
2026-05-13 14:22:30 +02:00
Kristoffer Dalby
6fcff9e352 mapper, state: deliver nodeAttrs through MapResponse and harden nextdns DoH rewrite
WithSelfNode and buildTailPeers merge each node's policy CapMap
into the tailcfg.Node.CapMap they emit. State.NodeCapMap and
State.NodeCapMaps wrap the policy manager: NodeCapMap returns a
defensive clone per call; NodeCapMaps snapshots the full per-node
map once for batched callers, amortising pm.mu acquisition across
a peer build.

generateDNSConfig grew a per-node CapMap argument so it can apply
nodeAttr-driven DNS overlays. The nextdns DoH rewrite hardens against
policy-controlled inputs:

  - nextDNSDoHHost anchors the prefix match instead of substring,
    so a hostile resolver URL cannot smuggle a nextdns hostname in
    a path or query.
  - nextDNSProfileFromCapMap accepts only profile names matching
    [A-Za-z0-9._-]{1,64} and picks the lexicographically first when
    multiple are granted -- deterministic, no shell metacharacters
    or URL fragments through.
  - addNextDNSMetadata composes the rewritten URL via url.Parse +
    url.Values rather than fmt.Sprintf, so existing query strings
    on the resolver URL survive and metadata cannot inject a new
    component.

WithTaildropEnabled in servertest controls cfg.Taildrop.Enabled per
test so cap/file-sharing emission can be toggled in tests that need
to verify the off path.
2026-05-13 14:22:30 +02:00
Kristoffer Dalby
a4f05b0962 policy/v2: parse, validate, and compile nodeAttrs
ACL policies now accept a top-level nodeAttrs block. Each entry hands
a list of tailcfg node capabilities to every node matching target.
Accepted target forms are the same as acls.src and grants.src: users,
groups, tags, hosts, prefixes, autogroup:member, autogroup:tagged,
and *. autogroup:self, autogroup:internet, and autogroup:danger-all
are rejected at validate time because none describes a stable
identity set a node-level attribute can attach to.

NodeAttrGrant carries Targets, Attrs, and IPPool. IPPool is parsed
but rejected at validate time -- the allocator that consumes it is
not yet implemented. nodeAttrUnsupportedCaps lists caps SaaS accepts
that headscale cannot act on (funnel today) and rejects them with a
tracking-issue link in the error.

compileNodeAttrs resolves each entry's targets, then maps every
targeted node to a tailcfg.NodeCapMap of the entry's attrs. Per-node
IPs are cached once per call so the inner attr loop is O(grants)
instead of O(grants * nodes) IP allocations.

PolicyManager grows NodeCapMap (per-node), NodeCapMaps (snapshot for
batched callers), and NodesWithChangedCapMap (drain buffer for the
self-broadcast diff). refreshNodeAttrsLocked appends to the drain
rather than overwriting so a SetUsers/SetNodes between SetPolicy and
the drain cannot lose the policy-reload diff.
2026-05-13 14:22:30 +02:00
Florian Preinstorfer
c4ab267c36 Refresh features page 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
109bfc404c Refresh docs for Grants
- Mention policy as generic term that covers ACLs or Grants
- Refresh routes policy examples
- Remove Headscale specific exit node separation. Use via instead.

Fixes: #3087
2026-05-12 14:12:29 +02:00
Florian Preinstorfer
1a64d950fd Document supported autogroups once 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
edb7ad0f81 Rewrite ACL docs as policy
- Rename from acl.md to policy.md and setup redirect links
- Mention both ACLs and Grants
- Remove most old ACL docs and replace with links to Tailscale docs
- Add "Getting started" section
- Add section about notable differences
2026-05-12 14:12:29 +02:00
Florian Preinstorfer
892ffffc4a Remove misleading comment 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
e13f0458bb Remove redundant prefix 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
68b0014871 Use distroless without quotes 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
484462898b Remove link to sqlite
Other mentions of SQLite don't link either.
2026-05-12 14:12:29 +02:00
Florian Preinstorfer
45b698dbac Shorten container introduction 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
14ce7e9106 Remove link to Arch AUR headscale-git
Its outdated and unmaintained.
2026-05-12 14:12:29 +02:00
Florian Preinstorfer
84c7f0d450 Link to development builds 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
c7f221dd0a Fix typo and wording 2026-05-12 14:12:29 +02:00
Florian Preinstorfer
163363a12a Use docs instead of KB 2026-05-12 14:12:29 +02:00
Kristoffer Dalby
f03d41ea9a CHANGELOG: document policy tests (beta)
Fixes #1803
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
d5b2837231 policy/v2: match default proto set for tests with no proto
The policy `tests` block lets entries omit `proto`. Tailscale's client
maps that to the default protocol set {TCP, UDP, ICMP, ICMPv6} — the
captured packet_filter_matches show all four IANA numbers explicitly
when no proto is set — and a rule restricted to any one of them
satisfies an empty-proto reachability test.

srcReachesDst was passing the empty Protocol through unchanged, which
landed an empty []int in ruleMatchesProto. The matcher then short-
circuited to "no match" for every rule with a non-empty IPProto
restriction, including TCP-only grants compiled from `ip: ["tcp:80"]`.
The bug surfaced in the captured allpass-acls-and-grants-mixed
scenario: the grant `tag:client → webserver:80` was reachable in the
compiled filter but the empty-proto test could not see it.

Expand the empty Protocol to the default set at the call site so
ruleMatchesProto's intersection check sees the right requested
protocols. Drop the now-dead empty-requestedProtos branch from the
matcher. The last divergence drops out of knownPolicyTesterDivergences
as a result.

Updates #1803
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
e4e209f919 policy/v2: canonicalize Protocol form during unmarshal
Tailscale accepts both named ("tcp") and numeric IANA ("6") protocol
forms wherever a Protocol value is allowed. Headscale stored whichever
form the user wrote, leaving downstream code with two equivalents to
handle separately. validateProtocolPortCompatibility only recognised
the named constants and rejected the numeric form, so a policy with
`proto: "6", dst: ["host:443"]` was rejected at parse time even though
SaaS accepts it.

Resolve the disagreement by normalising to the named form during
Protocol.UnmarshalJSON. Every downstream consumer now sees one form
regardless of what the user wrote, so layered guards like
`|| protocol == "6"` in the validator are unnecessary.

Updates #1803
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
f172dba0e3 policy/v2: validate tests block at parse boundary
A `tests` entry describes one connection attempt to one specific
host on one specific port over a connection-oriented protocol, and
asserts whether it is allowed or denied. Five shape rules follow —
single-port dst, proto in {tcp, udp, sctp, ""}, no
autogroup:internet dst, no CIDR-typed dst (raw `/N` or hosts:-alias
to a multi-host prefix), at least one of accept/deny — and every
one was previously silently accepted by headscale even though
Tailscale SaaS rejects them as "test(s) failed".

Enforce them in one pass over `pol.Tests` from `Policy.validate()`,
reusing the existing parse-time multierr aggregation. The same
shapes remain valid inside ACL or Grant destinations where the rule
does not apply; the validator only walks the tests array.

The compat runner now treats parse-time errors equivalently to
SetPolicy errors so the captured Tailscale body still matches via
substring regardless of which step surfaces the rejection. Nine
divergences resolved by this validation pass drop out of
knownPolicyTesterDivergences.

Updates #1803
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
c0774a739b policy/v2: add policytester captures recorded from Tailscale SaaS
57 captures covering the alias × outcome matrix for the tests block,
recorded against a real Tailscale SaaS tailnet. Replayed by
TestPolicyTesterCompat.

Bump the check-added-large-files pre-commit threshold to 1024 KB —
captures include verbose per-node netmaps and one is 620 KB.

Updates #1803
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
7bc701179b policy/v2: add policytester compat test runner
Pin headscale's accept/reject decision and error body against
Tailscale SaaS by replaying captures recorded from a real tailnet.
Mirrors the tailscale_grants_compat_test.go pattern: glob over
testdata/policytest_results/, one t.Run per file, parse-or-SetPolicy
error must contain the captured api_response_body.message.

errPolicyTestsFailed is "test(s) failed" — Tailscale's literal body —
so substring match works against captured response bodies. Per-test
detail (src, dst, expected vs got) is preserved below the prefix for
the CLI / config-reload paths that don't have an audit endpoint.

knownPolicyTesterDivergences gates the 12 mismatches the captures
will surface so the suite stays green; engine fixes in follow-up
commits drop the entries as each is resolved.

Updates #1803
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
b29ae25356 policy/v2: evaluate the tests block on user-initiated writes
v2 silently dropped policy.tests, so a policy that contradicted its
own assertions still applied. Resolve src/dst via the existing Alias
machinery, walk the compiled global filter rules (acls and grants
both contribute), and run on every user-write boundary: SetPolicy,
the file watcher, and `headscale policy check`. A failing test
rejects the write before it mutates live state.

Boot-time reload skips evaluation; an already-stored policy that
references a deleted user shouldn't lock the server out.

`headscale policy check` is a thin frontend for the new CheckPolicy
gRPC method. The server-side handler builds a fresh PolicyManager
from the request bytes and the state's live users/nodes, runs
SetPolicy on the sandbox so the tests block executes, and returns
the result through gRPC status. No persistence, no policy_mode
coupling. --bypass-grpc-and-access-database-directly opens the DB
directly when the server is not running.

cmd/headscale/cli/root.go no longer special-cases `policy check` in
init() (the early return from PR #2580 broke --config registration
and viper priming for --bypass).

integration/cli_policy_test.go covers policy_mode={file,database} x
fixture={acl-only, acl+passing-tests, acl+failing-tests} x
bypass={false,true} = 12 rows.

Updates #1803

Co-authored-by: Janis Jansons <janhouse@gmail.com>
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
56146de377 proto: add CheckPolicy RPC
CheckPolicy validates a candidate policy against a running server's
live users and nodes (running its tests block) without persisting
anything. Used by 'headscale policy check' to replace the in-process
validation path the CLI runs today, which would otherwise need its
own database connection.

Updates #1803
2026-05-12 11:54:54 +01:00
Kristoffer Dalby
c3df84e354 policy/matcher: include CapGrant.Dsts in match destinations
MatchFromFilterRule only read DstPorts[].IP into the destination
IPSet. Cap-grant-only filter rules (e.g. tailscale.com/cap/relay)
carry their destinations in CapGrant[].Dsts, so the derived matchers
had empty dest sets and BuildPeerMap / ReduceNodes never exposed the
cap target to its source nodes. Without a companion IP-level grant
the relay node stayed invisible, so clients never tried to use it
and connections sat on DERP.

Union CapGrant[].Dsts into the destination IPSet alongside DstPorts.
Restores peer-visibility for any cap-grant-only relationship; the
peer-relay flow is the most visible instance.

Fixes #3256
2026-05-11 14:55:06 +01:00
Kristoffer Dalby
795a1efe9b ci: fetch full history in golangci-lint job
revgrep needs pull_request.base.sha in the local clone to compute
the diff against new code. With fetch-depth: 2, only HEAD and one
parent are fetched, so a stale base SHA (when main moves between
PR syncs) is not reachable and revgrep falls through, surfacing
pre-existing issues outside the PR scope.
2026-05-11 10:34:58 +01:00
Kristoffer Dalby
dc733767c4 Dockerfile.tailscale-HEAD,Dockerfile.derper: bump golang to 1.26.3
tailscale upstream go.mod now requires 1.26.3.
2026-05-11 10:34:58 +01:00
Lealem Amedie
542091e82b Add unit test 2026-05-11 09:25:26 +01:00
Lealem Amedie
6cd919d411 mapper: include UserProfiles in policy-change MapResponses 2026-05-11 09:25:26 +01:00
Kristoffer Dalby
2f907edf87 hscontrol/types: regenerate types_clone.go for viewer bump
cmd/viewer in tailscale.com/cmd v1.97.0-pre emits new(*x) instead
of ptr.To(*x). No behaviour change.
2026-05-11 08:46:12 +01:00
Kristoffer Dalby
9621a97ebe ci, pre-commit: validate vendor hash via vendorhash check
Replace the grep/awk hash extraction in build.yml with a structured
vendorhash check step; the PR review comment now reads expected/
actual values directly from $GITHUB_OUTPUT instead of scraping Nix
stderr. Add a prek hook so divergence is caught locally before push.
2026-05-11 08:46:12 +01:00
Kristoffer Dalby
e470774f6a cmd/vendorhash: track vendor SRI in flakehashes.json
Move the headscale vendorHash out of flake.nix into a content-
addressed flakehashes.json maintained by a small Go tool. The
schema and goModFingerprint algorithm mirror upstream tailscale's
tool/updateflakes so a future shared library extraction is trivial.

vendorhash check verifies flakehashes.json against the current
go.mod/go.sum. Hot path is a sha256 over those two files, so
re-runs without input change are essentially free; only an actual
fingerprint drift triggers go mod vendor + nardump.SRI.

vendorhash update recomputes both fields and rewrites the JSON.
The nix-vendor-sri devShell shim now wraps it.
2026-05-11 08:46:12 +01:00
Kristoffer Dalby
980622e9a5 flake.nix, go.mod: bump tailscale.com to v1.97.0-pre
Pulls in the cmd/nardump library split (tailscale/tailscale#19551)
so flakehashes.json tooling can import nardump.SRI directly.

Side effects: Go directive bumps to 1.26.2 and the nixpkgs lock
advances to a revision shipping go 1.26.2.
2026-05-11 08:46:12 +01:00
Kristoffer Dalby
4e0c2b8556 cmd/headscale/cli: validate users in policy check
Add --bypass-grpc-and-access-database-directly to policy check so
the new ambiguous-user validator runs against the live user list.
Without the flag, policy check stays a syntax-only check and the
success message says so.

Updates #3160
2026-05-09 11:28:12 +01:00
Kristoffer Dalby
bc9fb6d403 hscontrol/policy/v2: reject ambiguous user references at load time
When a user@ token resolved to more than one DB row, ACL and SSH
rules referencing it were silently dropped at compile time, leaving
clients with SSHPolicy={rules: null} and no signal to the admin.

Validate every Username reference in groups, tagOwners,
autoApprovers, ACLs and SSH rules at NewPolicyManager and SetPolicy
and return ErrMultipleUsersFound. Missing-user tokens stay tolerant
per #2863.

Updates #3160
2026-05-09 11:28:12 +01:00
Möhsün Babayev
585d0c01bc docs(config): fix typo in config-example.yaml
Fixes a typo in the description of `metrics_listen_addr` property.
2026-05-09 05:14:08 +02:00
Möhsün Babayev
01eb5402f9 docs(setup): fix typo in requirements.md
Fix the typo in spelling of "Let's Encrypt".
2026-05-09 05:14:08 +02:00
MunMunMiao
e597f4c8a0 Add Headscale UI to web UI documentation 2026-05-09 05:02:44 +02:00
SAY-5
01e548e030 state: avoid nil deref in registration handlers when old user is missing
Mirror the guard from HandleNodeFromPreAuthKey in HandleNodeFromAuthPath.
Both functions log the old user's name in the "different user" branch
when an existing NodeStore entry under the same machine key belongs to
another user. UserView.Name dereferences the backing User pointer
unconditionally, so when the cached node was loaded with a non-nil
UserID but a nil User (Preload join missed the row, or upstream code
left the snapshot in that shape), the log call panics with a nil-pointer
dereference at hscontrol/types/types_view.go:97.

The panic is caught by the http2 server's runHandler for the noise
control plane, so the process keeps running but every retry produces a
new panic — production has observed bursts of ~1.9k panics per hour
during a tailscaled reconnect loop. The gRPC/OIDC entry has no equivalent
recover and would surface the panic to the caller.

Guard both call sites with oldUser.Valid() and fall back to an empty
old-user name when the pointer is nil. The "Creating new node for
different user" log line still includes the existing node ID, hostname,
machine key, and new user, so operator visibility is preserved.

Add reproduction tests for both handlers seeding the orphan shape
directly into NodeStore via PutNodeInStoreForTest.

Co-Authored-By: Kristoffer Dalby <kristoffer@dalby.cc>
2026-05-06 07:23:02 +01:00
Kristoffer Dalby
9482cdf590 testdata: drop unused uppercase SSH-*.hujson fixtures
The 39 SSH-*.hujson files in hscontrol/policy/v2/testdata/ssh_results/
were legacy hand-written "expected SSH rules" snippets superseded by
the lowercase tscap captures (ssh-*.hujson). The active loader in
TestSSHDataCompat globs ssh-*.hujson; filepath.Glob is case-sensitive
on Linux so the uppercase set was loaded by no test.

The duplication caused permanent dirty git state on case-insensitive
filesystems (APFS, NTFS) where only one of SSH-A1.hujson and
ssh-a1.hujson can physically exist in the working tree.

Add an assertion to TestSSHDataCompat that the loader picks up every
*.hujson under ssh_results/ so future fixture migrations cannot leave
stranded files behind.

Fixes #3240
2026-05-05 11:59:01 +01:00
primewildy
3d0f597b23 oidc: handle groups claim as string or array (FlexibleStringSlice)
Some OIDC providers (notably JumpCloud) return the `groups` claim as
a plain string when the user belongs to a single group, rather than
a single-element array:

  Single group:    {"groups": "MyGroup"}
  Multiple groups: {"groups": ["Group1", "Group2"]}

This causes `json.Unmarshal` to fail with:

  cannot unmarshal string into Go struct field OIDCClaims.groups of type []string

This is the same class of issue as juanfont#2293 (FlexibleBoolean for
email_verified). The fix follows the same pattern: introduce a
FlexibleStringSlice type with a custom UnmarshalJSON that accepts
both a string and a []string, and use it for the Groups field in
both OIDCClaims and OIDCUserInfo.
2026-05-04 15:26:53 +02:00
Kristoffer Dalby
76ee29352b servertest: cover via-grant exit-node visibility end-to-end
TestGrantViaExitNodeInternetVisibility boots a server, applies a
policy that scopes autogroup:internet to a tag, registers a tagged
exit advertiser and a regular client, and asserts the client's netmap
surfaces the exit node with 0.0.0.0/0 and ::/0 in AllowedIPs — the
substrate the Tailscale client reads to populate
`tailscale exit-node list`.

TestGrantViaExitNodeNoFilterRules retains its assertion (literal /0
absent from the exit node's PacketFilter, matching SaaS PacketFilter
encoding); only its docstring is updated to reflect that the exit
node now does receive a TheInternet-shaped rule, just not the
literal /0 form.

Updates #3233
2026-04-30 19:22:45 +01:00
Kristoffer Dalby
2b7f15abaa policy/v2: surface autogroup:internet via grants on exit nodes
A grant of the form `{src: alice, dst: autogroup:internet, via:
tag:exit1}` was loading without error but stripping every exit node
from alice's view: `tailscale exit-node list` returned "no exit nodes
found".

Two sites skipped autogroup:internet at the compile / steering layer:
compileViaForNode's *AutoGroup arm produced no FilterRule for the
via-tagged exit node, and ViaRoutesForPeer's *AutoGroup arm produced
no Include/Exclude. With pm.needsPerNodeFilter true, the exit node's
matchers were empty, BuildPeerMap could not link source to exit, and
RoutesForPeer's ReduceRoutes stripped 0.0.0.0/0 and ::/0 from
AllowedIPs.

The skip belongs at the wire-format layer (ReduceFilterRules), not at
the compile layer that also feeds internal matchers. Lift
autogroup:internet handling into both *AutoGroup arms with the same
shape used for *Prefix destinations: emit a TheInternet rule on
via-tagged exit advertisers; surface peer.ExitRoutes() in Include
when the peer carries the via tag, Exclude otherwise.
ReduceFilterRules continues to keep the rule on exit-route
advertisers' wire output and strip it elsewhere, preserving SaaS
PacketFilter encoding.

Also drop compileViaForNode's early len(SubnetRoutes)==0 return:
SubnetRoutes excludes exit routes, so the early return pre-empted the
autogroup:internet branch on nodes that only advertise exit routes.

Existing tests pinning the buggy behaviour (TestViaRoutesForPeer
subtests, TestCompileViaGrant case) flipped to the new contract.

Fixes #3233
2026-04-30 19:22:45 +01:00
Kristoffer Dalby
ecaf56e0a0 integration: drop Force flag on docker network disconnect
Force-disconnect leaves stale routes in the container's network
namespace: libnetwork removes the host-side veth but the
namespace-internal route survives. The next ConnectNetwork on the
same network then fails with "cannot program address X/16 in sandbox
interface because it conflicts with existing route", and the route
never resolves on its own. Bounded retry around ConnectNetwork
exhausts MaxElapsedTime instead of recovering.

Without Force, libnetwork drains the namespace routes synchronously
during disconnect and ConnectNetwork sees a clean slate. Cable-pull
semantic is preserved: docker still tears down the endpoint at the
namespace level, leaving in-flight TCP half-open inside the
container's view, verified via paired probe-timeout pairs in HA
prober logs while both routers are physically disconnected.

Fixes #3234
2026-04-30 12:52:05 +01:00
Kristoffer Dalby
94ec607bca state: per-goroutine deadline in HA probe cycle
`time.After(ProbeTimeout)` returned a single channel shared by every
probe goroutine in the cycle. Only the first goroutine to receive the
deadline tick drains the channel; any other goroutine still waiting on
its `responseCh` is then stuck forever, `wg.Wait()` never returns, and
the scheduler loop in `app.go` stalls on the next tick. The condition
fires whenever two or more nodes time out in the same cycle — common
under cable-pull where IsOnline lags reality and both routers stay in
the candidate set as half-open TCP.

Move the timer inside each goroutine so every probe has its own
deadline.

Updates #3234
2026-04-30 12:52:05 +01:00
Kristoffer Dalby
d1443a431c integration: skip subpackage tests in workflow generator
The generator scans `integration/` recursively for `Test*` functions
and emits one CI job per match. Helper subpackages like
`dockertestutil` and `tsic` host plain unit tests that should run
under `go test`, not as Docker-based integration matrix entries.
Limit the scan to depth 1 so only top-level `integration/*_test.go`
files contribute job names.
2026-04-30 12:52:05 +01:00