headscale

mirror of https://github.com/juanfont/headscale.git synced 2026-07-15 04:29:09 +09:00

Author	SHA1	Message	Date
Kristoffer Dalby	5ebc53c29e	types/node, mapper, policy/v2: assemble self CapMap inside TailNode types.NodeView.TailNode takes a selfPolicyCaps tailcfg.NodeCapMap parameter and merges it into the baseline. The mapper's WithSelfNode hands it the policy result via state.NodeCapMap; peer-path callers pass nil because peer-side CapMap is set downstream via policyv2.PeerCapMap. The nodeAttrs compat test now diffs the full TailNode self-view output against captured SaaS netmaps. Before this change the test compared compileNodeAttrs alone -- the policy-only output -- and needed a strip list to compensate for the missing baseline. With TailNode on the diff path, baseline emission is exercised end-to-end by every capture; a regression in TailNode breaks the suite. unmodelledTailnetStateCaps drops cap/ssh and cap/file-sharing now that both sides emit them identically. The file header is rewritten to read as 'caps SaaS emits where headscale has no equivalent yet' rather than the more confusing 'shape divergence' framing.	2026-05-13 14:22:30 +02:00
Kristoffer Dalby	b3f795f0b4	mapper, policy/v2: stamp suggest-exit-node on Peer.CapMap when exit routes approved The Tailscale client surfaces 'use this peer as your exit node' when the peer's CapMap carries the tailcfg.NodeAttrSuggestExitNode cap. SaaS emits it only on peers whose advertised exit routes are approved -- not every peer that just has the cap in its own nodeAttrs slot. policyv2.PeerCapMap encodes that emission rule: it walks the peer's own self-CapMap (built from compileNodeAttrs) and surfaces the gated entries (today just suggest-exit-node when the peer IsExitNode). Mapper.buildTailPeers calls it for each peer instead of merging the peer's full nodeAttrs CapMap onto its peer view. allCapMaps snapshots the full per-node CapMap once per peer-list build so pm.mu is acquired once rather than per peer.	2026-05-13 14:22:30 +02:00
Kristoffer Dalby	078b9e308f	policy/v2: SaaS-derived compat tests for nodeAttrs Adds a data-driven test that loads testdata/nodeattrs_results/*.hujson and diffs the captured SaaS-rendered netmaps against headscale's compileNodeAttrs output. Each capture is one scenario the SaaS control plane has rendered against the same policy headscale is asked to compile -- the test enforces shape parity per node. tailnet_state_caps.go enumerates the caps SaaS emits where headscale has no equivalent concept yet (user-role admin/owner, tailnet lock, services host, app connectors, internal magicsock and SSH tuning, tailnet-state metadata) plus the always-on baseline (admin, ssh, file-sharing) and the taildrive pair. stripUnmodelledTailnetStateCaps filters both sides of cmp.Diff so the comparison focuses on the policy-driven caps. PeerCapMap encodes which caps the Tailscale client reads from the peer view (suggest-exit-node when exit routes are approved, etc.) for use by the mapper. testcapture switches to typed tailcfg/netmap/filtertype/apitype values so schema drift between the capture tool and headscale becomes a compile error rather than a silent test failure. Existing compat suites (acl, grants, routes, ssh, issue_3212) move to the typed shape. The 53 SelfNode netmap captures and the 7 anonymizer-corrupted suggest-charmander -> suggest-exit-node restorations in routes_results / issue_3212 ride along.	2026-05-13 14:22:30 +02:00
Kristoffer Dalby	3f73ed5404	config, types: move randomize_client_port from server config to policy file Tailscale models the randomize-client-port toggle as a top-level field on the ACL policy. Headscale now matches that shape: the server-config randomize_client_port key is removed, the toggle lives in the policy file as randomizeClientPort, and per-node opt-in via nodeAttrs is also supported. Operators upgrading from a config-set randomize_client_port hit depr.fatalWithHint at startup, which prints the deprecation message and points at the new policy field rather than silently dropping the toggle. The default carries over (false) so operators who never set it are unaffected. config-example.yaml ships a REMOVED stanza showing the migration. types/node.go drops the cfg.RandomizeClientPort read from TailNode -- the cap is now policy-driven through compileNodeAttrs and the tail_test.go expectations follow.	2026-05-13 14:22:30 +02:00
Kristoffer Dalby	6fcff9e352	mapper, state: deliver nodeAttrs through MapResponse and harden nextdns DoH rewrite WithSelfNode and buildTailPeers merge each node's policy CapMap into the tailcfg.Node.CapMap they emit. State.NodeCapMap and State.NodeCapMaps wrap the policy manager: NodeCapMap returns a defensive clone per call; NodeCapMaps snapshots the full per-node map once for batched callers, amortising pm.mu acquisition across a peer build. generateDNSConfig grew a per-node CapMap argument so it can apply nodeAttr-driven DNS overlays. The nextdns DoH rewrite hardens against policy-controlled inputs: - nextDNSDoHHost anchors the prefix match instead of substring, so a hostile resolver URL cannot smuggle a nextdns hostname in a path or query. - nextDNSProfileFromCapMap accepts only profile names matching [A-Za-z0-9._-]{1,64} and picks the lexicographically first when multiple are granted -- deterministic, no shell metacharacters or URL fragments through. - addNextDNSMetadata composes the rewritten URL via url.Parse + url.Values rather than fmt.Sprintf, so existing query strings on the resolver URL survive and metadata cannot inject a new component. WithTaildropEnabled in servertest controls cfg.Taildrop.Enabled per test so cap/file-sharing emission can be toggled in tests that need to verify the off path.	2026-05-13 14:22:30 +02:00
Kristoffer Dalby	a4f05b0962	policy/v2: parse, validate, and compile nodeAttrs ACL policies now accept a top-level nodeAttrs block. Each entry hands a list of tailcfg node capabilities to every node matching target. Accepted target forms are the same as acls.src and grants.src: users, groups, tags, hosts, prefixes, autogroup:member, autogroup:tagged, and . autogroup:self, autogroup:internet, and autogroup:danger-all are rejected at validate time because none describes a stable identity set a node-level attribute can attach to. NodeAttrGrant carries Targets, Attrs, and IPPool. IPPool is parsed but rejected at validate time -- the allocator that consumes it is not yet implemented. nodeAttrUnsupportedCaps lists caps SaaS accepts that headscale cannot act on (funnel today) and rejects them with a tracking-issue link in the error. compileNodeAttrs resolves each entry's targets, then maps every targeted node to a tailcfg.NodeCapMap of the entry's attrs. Per-node IPs are cached once per call so the inner attr loop is O(grants) instead of O(grants nodes) IP allocations. PolicyManager grows NodeCapMap (per-node), NodeCapMaps (snapshot for batched callers), and NodesWithChangedCapMap (drain buffer for the self-broadcast diff). refreshNodeAttrsLocked appends to the drain rather than overwriting so a SetUsers/SetNodes between SetPolicy and the drain cannot lose the policy-reload diff.	2026-05-13 14:22:30 +02:00
Florian Preinstorfer	c4ab267c36	Refresh features page	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	109bfc404c	Refresh docs for Grants - Mention policy as generic term that covers ACLs or Grants - Refresh routes policy examples - Remove Headscale specific exit node separation. Use via instead. Fixes: #3087	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	1a64d950fd	Document supported autogroups once	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	edb7ad0f81	Rewrite ACL docs as policy - Rename from acl.md to policy.md and setup redirect links - Mention both ACLs and Grants - Remove most old ACL docs and replace with links to Tailscale docs - Add "Getting started" section - Add section about notable differences	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	892ffffc4a	Remove misleading comment	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	e13f0458bb	Remove redundant prefix	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	68b0014871	Use distroless without quotes	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	484462898b	Remove link to sqlite Other mentions of SQLite don't link either.	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	45b698dbac	Shorten container introduction	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	14ce7e9106	Remove link to Arch AUR headscale-git Its outdated and unmaintained.	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	84c7f0d450	Link to development builds	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	c7f221dd0a	Fix typo and wording	2026-05-12 14:12:29 +02:00
Florian Preinstorfer	163363a12a	Use docs instead of KB	2026-05-12 14:12:29 +02:00
Kristoffer Dalby	f03d41ea9a	CHANGELOG: document policy tests (beta) Fixes #1803	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	d5b2837231	policy/v2: match default proto set for tests with no proto The policy `tests` block lets entries omit `proto`. Tailscale's client maps that to the default protocol set {TCP, UDP, ICMP, ICMPv6} — the captured packet_filter_matches show all four IANA numbers explicitly when no proto is set — and a rule restricted to any one of them satisfies an empty-proto reachability test. srcReachesDst was passing the empty Protocol through unchanged, which landed an empty []int in ruleMatchesProto. The matcher then short- circuited to "no match" for every rule with a non-empty IPProto restriction, including TCP-only grants compiled from `ip: ["tcp:80"]`. The bug surfaced in the captured allpass-acls-and-grants-mixed scenario: the grant `tag:client → webserver:80` was reachable in the compiled filter but the empty-proto test could not see it. Expand the empty Protocol to the default set at the call site so ruleMatchesProto's intersection check sees the right requested protocols. Drop the now-dead empty-requestedProtos branch from the matcher. The last divergence drops out of knownPolicyTesterDivergences as a result. Updates #1803	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	e4e209f919	policy/v2: canonicalize Protocol form during unmarshal Tailscale accepts both named ("tcp") and numeric IANA ("6") protocol forms wherever a Protocol value is allowed. Headscale stored whichever form the user wrote, leaving downstream code with two equivalents to handle separately. validateProtocolPortCompatibility only recognised the named constants and rejected the numeric form, so a policy with `proto: "6", dst: ["host:443"]` was rejected at parse time even though SaaS accepts it. Resolve the disagreement by normalising to the named form during Protocol.UnmarshalJSON. Every downstream consumer now sees one form regardless of what the user wrote, so layered guards like `\|\| protocol == "6"` in the validator are unnecessary. Updates #1803	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	f172dba0e3	policy/v2: validate tests block at parse boundary A `tests` entry describes one connection attempt to one specific host on one specific port over a connection-oriented protocol, and asserts whether it is allowed or denied. Five shape rules follow — single-port dst, proto in {tcp, udp, sctp, ""}, no autogroup:internet dst, no CIDR-typed dst (raw `/N` or hosts:-alias to a multi-host prefix), at least one of accept/deny — and every one was previously silently accepted by headscale even though Tailscale SaaS rejects them as "test(s) failed". Enforce them in one pass over `pol.Tests` from `Policy.validate()`, reusing the existing parse-time multierr aggregation. The same shapes remain valid inside ACL or Grant destinations where the rule does not apply; the validator only walks the tests array. The compat runner now treats parse-time errors equivalently to SetPolicy errors so the captured Tailscale body still matches via substring regardless of which step surfaces the rejection. Nine divergences resolved by this validation pass drop out of knownPolicyTesterDivergences. Updates #1803	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	c0774a739b	policy/v2: add policytester captures recorded from Tailscale SaaS 57 captures covering the alias × outcome matrix for the tests block, recorded against a real Tailscale SaaS tailnet. Replayed by TestPolicyTesterCompat. Bump the check-added-large-files pre-commit threshold to 1024 KB — captures include verbose per-node netmaps and one is 620 KB. Updates #1803	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	7bc701179b	policy/v2: add policytester compat test runner Pin headscale's accept/reject decision and error body against Tailscale SaaS by replaying captures recorded from a real tailnet. Mirrors the tailscale_grants_compat_test.go pattern: glob over testdata/policytest_results/, one t.Run per file, parse-or-SetPolicy error must contain the captured api_response_body.message. errPolicyTestsFailed is "test(s) failed" — Tailscale's literal body — so substring match works against captured response bodies. Per-test detail (src, dst, expected vs got) is preserved below the prefix for the CLI / config-reload paths that don't have an audit endpoint. knownPolicyTesterDivergences gates the 12 mismatches the captures will surface so the suite stays green; engine fixes in follow-up commits drop the entries as each is resolved. Updates #1803	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	b29ae25356	policy/v2: evaluate the tests block on user-initiated writes v2 silently dropped policy.tests, so a policy that contradicted its own assertions still applied. Resolve src/dst via the existing Alias machinery, walk the compiled global filter rules (acls and grants both contribute), and run on every user-write boundary: SetPolicy, the file watcher, and `headscale policy check`. A failing test rejects the write before it mutates live state. Boot-time reload skips evaluation; an already-stored policy that references a deleted user shouldn't lock the server out. `headscale policy check` is a thin frontend for the new CheckPolicy gRPC method. The server-side handler builds a fresh PolicyManager from the request bytes and the state's live users/nodes, runs SetPolicy on the sandbox so the tests block executes, and returns the result through gRPC status. No persistence, no policy_mode coupling. --bypass-grpc-and-access-database-directly opens the DB directly when the server is not running. cmd/headscale/cli/root.go no longer special-cases `policy check` in init() (the early return from PR #2580 broke --config registration and viper priming for --bypass). integration/cli_policy_test.go covers policy_mode={file,database} x fixture={acl-only, acl+passing-tests, acl+failing-tests} x bypass={false,true} = 12 rows. Updates #1803 Co-authored-by: Janis Jansons <janhouse@gmail.com>	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	56146de377	proto: add CheckPolicy RPC CheckPolicy validates a candidate policy against a running server's live users and nodes (running its tests block) without persisting anything. Used by 'headscale policy check' to replace the in-process validation path the CLI runs today, which would otherwise need its own database connection. Updates #1803	2026-05-12 11:54:54 +01:00
Kristoffer Dalby	c3df84e354	policy/matcher: include CapGrant.Dsts in match destinations MatchFromFilterRule only read DstPorts[].IP into the destination IPSet. Cap-grant-only filter rules (e.g. tailscale.com/cap/relay) carry their destinations in CapGrant[].Dsts, so the derived matchers had empty dest sets and BuildPeerMap / ReduceNodes never exposed the cap target to its source nodes. Without a companion IP-level grant the relay node stayed invisible, so clients never tried to use it and connections sat on DERP. Union CapGrant[].Dsts into the destination IPSet alongside DstPorts. Restores peer-visibility for any cap-grant-only relationship; the peer-relay flow is the most visible instance. Fixes #3256	2026-05-11 14:55:06 +01:00
Kristoffer Dalby	795a1efe9b	ci: fetch full history in golangci-lint job revgrep needs pull_request.base.sha in the local clone to compute the diff against new code. With fetch-depth: 2, only HEAD and one parent are fetched, so a stale base SHA (when main moves between PR syncs) is not reachable and revgrep falls through, surfacing pre-existing issues outside the PR scope.	2026-05-11 10:34:58 +01:00
Kristoffer Dalby	dc733767c4	Dockerfile.tailscale-HEAD,Dockerfile.derper: bump golang to 1.26.3 tailscale upstream go.mod now requires 1.26.3.	2026-05-11 10:34:58 +01:00
Lealem Amedie	542091e82b	Add unit test	2026-05-11 09:25:26 +01:00
Lealem Amedie	6cd919d411	mapper: include UserProfiles in policy-change MapResponses	2026-05-11 09:25:26 +01:00
Kristoffer Dalby	2f907edf87	hscontrol/types: regenerate types_clone.go for viewer bump cmd/viewer in tailscale.com/cmd v1.97.0-pre emits new(x) instead of ptr.To(x). No behaviour change.	2026-05-11 08:46:12 +01:00
Kristoffer Dalby	9621a97ebe	ci, pre-commit: validate vendor hash via vendorhash check Replace the grep/awk hash extraction in build.yml with a structured vendorhash check step; the PR review comment now reads expected/ actual values directly from $GITHUB_OUTPUT instead of scraping Nix stderr. Add a prek hook so divergence is caught locally before push.	2026-05-11 08:46:12 +01:00
Kristoffer Dalby	e470774f6a	cmd/vendorhash: track vendor SRI in flakehashes.json Move the headscale vendorHash out of flake.nix into a content- addressed flakehashes.json maintained by a small Go tool. The schema and goModFingerprint algorithm mirror upstream tailscale's tool/updateflakes so a future shared library extraction is trivial. vendorhash check verifies flakehashes.json against the current go.mod/go.sum. Hot path is a sha256 over those two files, so re-runs without input change are essentially free; only an actual fingerprint drift triggers go mod vendor + nardump.SRI. vendorhash update recomputes both fields and rewrites the JSON. The nix-vendor-sri devShell shim now wraps it.	2026-05-11 08:46:12 +01:00
Kristoffer Dalby	980622e9a5	flake.nix, go.mod: bump tailscale.com to v1.97.0-pre Pulls in the cmd/nardump library split (tailscale/tailscale#19551) so flakehashes.json tooling can import nardump.SRI directly. Side effects: Go directive bumps to 1.26.2 and the nixpkgs lock advances to a revision shipping go 1.26.2.	2026-05-11 08:46:12 +01:00
Kristoffer Dalby	4e0c2b8556	cmd/headscale/cli: validate users in policy check Add --bypass-grpc-and-access-database-directly to policy check so the new ambiguous-user validator runs against the live user list. Without the flag, policy check stays a syntax-only check and the success message says so. Updates #3160	2026-05-09 11:28:12 +01:00
Kristoffer Dalby	bc9fb6d403	hscontrol/policy/v2: reject ambiguous user references at load time When a user@ token resolved to more than one DB row, ACL and SSH rules referencing it were silently dropped at compile time, leaving clients with SSHPolicy={rules: null} and no signal to the admin. Validate every Username reference in groups, tagOwners, autoApprovers, ACLs and SSH rules at NewPolicyManager and SetPolicy and return ErrMultipleUsersFound. Missing-user tokens stay tolerant per #2863. Updates #3160	2026-05-09 11:28:12 +01:00
Möhsün Babayev	585d0c01bc	docs(config): fix typo in config-example.yaml Fixes a typo in the description of `metrics_listen_addr` property.	2026-05-09 05:14:08 +02:00
Möhsün Babayev	01eb5402f9	docs(setup): fix typo in requirements.md Fix the typo in spelling of "Let's Encrypt".	2026-05-09 05:14:08 +02:00
MunMunMiao	e597f4c8a0	Add Headscale UI to web UI documentation	2026-05-09 05:02:44 +02:00
SAY-5	01e548e030	state: avoid nil deref in registration handlers when old user is missing Mirror the guard from HandleNodeFromPreAuthKey in HandleNodeFromAuthPath. Both functions log the old user's name in the "different user" branch when an existing NodeStore entry under the same machine key belongs to another user. UserView.Name dereferences the backing User pointer unconditionally, so when the cached node was loaded with a non-nil UserID but a nil User (Preload join missed the row, or upstream code left the snapshot in that shape), the log call panics with a nil-pointer dereference at hscontrol/types/types_view.go:97. The panic is caught by the http2 server's runHandler for the noise control plane, so the process keeps running but every retry produces a new panic — production has observed bursts of ~1.9k panics per hour during a tailscaled reconnect loop. The gRPC/OIDC entry has no equivalent recover and would surface the panic to the caller. Guard both call sites with oldUser.Valid() and fall back to an empty old-user name when the pointer is nil. The "Creating new node for different user" log line still includes the existing node ID, hostname, machine key, and new user, so operator visibility is preserved. Add reproduction tests for both handlers seeding the orphan shape directly into NodeStore via PutNodeInStoreForTest. Co-Authored-By: Kristoffer Dalby <kristoffer@dalby.cc>	2026-05-06 07:23:02 +01:00
Kristoffer Dalby	9482cdf590	testdata: drop unused uppercase SSH-.hujson fixtures The 39 SSH-.hujson files in hscontrol/policy/v2/testdata/ssh_results/ were legacy hand-written "expected SSH rules" snippets superseded by the lowercase tscap captures (ssh-.hujson). The active loader in TestSSHDataCompat globs ssh-.hujson; filepath.Glob is case-sensitive on Linux so the uppercase set was loaded by no test. The duplication caused permanent dirty git state on case-insensitive filesystems (APFS, NTFS) where only one of SSH-A1.hujson and ssh-a1.hujson can physically exist in the working tree. Add an assertion to TestSSHDataCompat that the loader picks up every *.hujson under ssh_results/ so future fixture migrations cannot leave stranded files behind. Fixes #3240	2026-05-05 11:59:01 +01:00
primewildy	3d0f597b23	oidc: handle groups claim as string or array (FlexibleStringSlice) Some OIDC providers (notably JumpCloud) return the `groups` claim as a plain string when the user belongs to a single group, rather than a single-element array: Single group: {"groups": "MyGroup"} Multiple groups: {"groups": ["Group1", "Group2"]} This causes `json.Unmarshal` to fail with: cannot unmarshal string into Go struct field OIDCClaims.groups of type []string This is the same class of issue as juanfont#2293 (FlexibleBoolean for email_verified). The fix follows the same pattern: introduce a FlexibleStringSlice type with a custom UnmarshalJSON that accepts both a string and a []string, and use it for the Groups field in both OIDCClaims and OIDCUserInfo.	2026-05-04 15:26:53 +02:00
Kristoffer Dalby	76ee29352b	servertest: cover via-grant exit-node visibility end-to-end TestGrantViaExitNodeInternetVisibility boots a server, applies a policy that scopes autogroup:internet to a tag, registers a tagged exit advertiser and a regular client, and asserts the client's netmap surfaces the exit node with 0.0.0.0/0 and ::/0 in AllowedIPs — the substrate the Tailscale client reads to populate `tailscale exit-node list`. TestGrantViaExitNodeNoFilterRules retains its assertion (literal /0 absent from the exit node's PacketFilter, matching SaaS PacketFilter encoding); only its docstring is updated to reflect that the exit node now does receive a TheInternet-shaped rule, just not the literal /0 form. Updates #3233	2026-04-30 19:22:45 +01:00
Kristoffer Dalby	2b7f15abaa	policy/v2: surface autogroup:internet via grants on exit nodes A grant of the form `{src: alice, dst: autogroup:internet, via: tag:exit1}` was loading without error but stripping every exit node from alice's view: `tailscale exit-node list` returned "no exit nodes found". Two sites skipped autogroup:internet at the compile / steering layer: compileViaForNode's AutoGroup arm produced no FilterRule for the via-tagged exit node, and ViaRoutesForPeer's AutoGroup arm produced no Include/Exclude. With pm.needsPerNodeFilter true, the exit node's matchers were empty, BuildPeerMap could not link source to exit, and RoutesForPeer's ReduceRoutes stripped 0.0.0.0/0 and ::/0 from AllowedIPs. The skip belongs at the wire-format layer (ReduceFilterRules), not at the compile layer that also feeds internal matchers. Lift autogroup:internet handling into both AutoGroup arms with the same shape used for Prefix destinations: emit a TheInternet rule on via-tagged exit advertisers; surface peer.ExitRoutes() in Include when the peer carries the via tag, Exclude otherwise. ReduceFilterRules continues to keep the rule on exit-route advertisers' wire output and strip it elsewhere, preserving SaaS PacketFilter encoding. Also drop compileViaForNode's early len(SubnetRoutes)==0 return: SubnetRoutes excludes exit routes, so the early return pre-empted the autogroup:internet branch on nodes that only advertise exit routes. Existing tests pinning the buggy behaviour (TestViaRoutesForPeer subtests, TestCompileViaGrant case) flipped to the new contract. Fixes #3233	2026-04-30 19:22:45 +01:00
Kristoffer Dalby	ecaf56e0a0	integration: drop Force flag on docker network disconnect Force-disconnect leaves stale routes in the container's network namespace: libnetwork removes the host-side veth but the namespace-internal route survives. The next ConnectNetwork on the same network then fails with "cannot program address X/16 in sandbox interface because it conflicts with existing route", and the route never resolves on its own. Bounded retry around ConnectNetwork exhausts MaxElapsedTime instead of recovering. Without Force, libnetwork drains the namespace routes synchronously during disconnect and ConnectNetwork sees a clean slate. Cable-pull semantic is preserved: docker still tears down the endpoint at the namespace level, leaving in-flight TCP half-open inside the container's view, verified via paired probe-timeout pairs in HA prober logs while both routers are physically disconnected. Fixes #3234	2026-04-30 12:52:05 +01:00
Kristoffer Dalby	94ec607bca	state: per-goroutine deadline in HA probe cycle `time.After(ProbeTimeout)` returned a single channel shared by every probe goroutine in the cycle. Only the first goroutine to receive the deadline tick drains the channel; any other goroutine still waiting on its `responseCh` is then stuck forever, `wg.Wait()` never returns, and the scheduler loop in `app.go` stalls on the next tick. The condition fires whenever two or more nodes time out in the same cycle — common under cable-pull where IsOnline lags reality and both routers stay in the candidate set as half-open TCP. Move the timer inside each goroutine so every probe has its own deadline. Updates #3234	2026-04-30 12:52:05 +01:00
Kristoffer Dalby	d1443a431c	integration: skip subpackage tests in workflow generator The generator scans `integration/` recursively for `Test` functions and emits one CI job per match. Helper subpackages like `dockertestutil` and `tsic` host plain unit tests that should run under `go test`, not as Docker-based integration matrix entries. Limit the scan to depth 1 so only top-level `integration/_test.go` files contribute job names.	2026-04-30 12:52:05 +01:00
Kristoffer Dalby	155e42f892	integration: retry transient docker network ops Libnetwork endpoint cleanup is eventually consistent. A back-to-back disconnect+connect on the same network can race teardown and return a transient error. Wrap the daemon calls in bounded exponential backoff so TestHASubnetRouterFailoverDockerDisconnect no longer flakes on phase 4c reconnect. Fixes #3234	2026-04-30 12:52:05 +01:00

1 2 3 4 5 ...

4152 Commits