headscale

mirror of https://github.com/juanfont/headscale.git synced 2026-05-24 02:58:42 +09:00

Author	SHA1	Message	Date
Kristoffer Dalby	e2f2f9211f	state, servertest: property-test HA election + invariant catalogue Expand TestPrimaryRoutesProperty (5 -> 9 ops). New ops mirror the production shapes the failure cases hit: BatchProbeResults via UpdateNodes, SimultaneousDisconnect via UpdateNodes, SetApprovedRoutes that leaves announced RoutableIPs intact, OfflineExpiry that keeps Unhealthy set. The model now tracks announced and approved separately and recomputes the intersection. Strengthen the per-op assertions to cover invariants the model alone cannot prove: every primary must be online, every primary must currently advertise its prefix, no flap onto an unhealthy candidate when a healthy one was available, no flap off a previous primary that remains a healthy candidate. The check now takes a pre-op snapshot so the anti-flap rule has a stable reference. Add TestHAProberProperty in servertest. It drives a real TestServer with three HA-route-advertising clients through rapid-drawn sequences of ClientDisconnect / ClientReconnect / ProberTick / WaitForSnapshot ops and re-checks the same shape invariants after every step. Document the system in hscontrol/state/HA_INVARIANTS.md: a state machine over (Healthy+Online, Unhealthy+Online, Offline, OfflineExpired), fifteen numbered invariants with predicates and violation paths, and a coverage matrix mapping each invariant to its unit, servertest, and integration tests. Three rows pin the recent fixes to the invariants they enforce.	2026-05-18 17:18:08 +02:00
Kristoffer Dalby	c7630b505b	state: leave prefix unmapped when all primary candidates unhealthy electPrimaryRoutes' all-unhealthy fallback picked candidates[0] when the previous primary was no longer a candidate. The Phase-5 simultaneous dual-disconnect path in TestHASubnetRouterFailoverDocker Disconnect hits this asymmetrically: a batched probe cycle marks both routers unhealthy with prev=r2 preserved, then the grace-period Disconnect for r2 drops it from candidates. With prev gone and the remaining r1 still carrying its Unhealthy bit, the fallback pointed peers at the cable-pulled r1 — flapping primary to an unreachable node and tripping requirePrimaryStable. Leave the prefix unmapped when prev is gone and every candidate is unhealthy. Peers see no advertiser instead of an unreachable one, which is honest: the next probe cycle re-evaluates and picks whichever node responds. The property-test model that mirrored the old behaviour is updated to match.	2026-05-18 17:18:08 +02:00
Kristoffer Dalby	de6be71a86	state: batch HA probe results so dual-disconnect cannot flap primary requirePrimaryStable in TestHASubnetRouterFailoverDockerDisconnect Phase 5a (simultaneous cable-pull of both routers) intermittently caught the primary flipping to the offline r1. Both probe goroutines mark their target unhealthy back-to-back; SetNodeUnhealthy publishes a fresh NodeStore snapshot each call, so the intermediate snapshot — r1 unhealthy, r2 still healthy — runs the election with one healthy candidate left and picks it. The next snapshot then enters the all-unhealthy preserve-prev path, which preserves the wrong choice. Collect probe results from the cycle and apply them through a new NodeStore.UpdateNodes batched op so the election only runs once, with the cycle's final health state. PolicyChange dispatch moves outside the wg.Go goroutines and fires once if the primary assignment actually changed.	2026-05-18 17:18:08 +02:00
Kristoffer Dalby	fb8eecae25	state: defer HA failover when probe target reconnected mid-cycle The HA prober dispatches a PingRequest, waits ProbeTimeout (5s), and marks the node unhealthy if no callback arrives. A node that bounced its poll session between probe cycles satisfies two conditions that conspire to fail TestHASubnetRouterFailover: a probe queued against the previous session is silently dropped when the worker writes to the closed connection (timeout always fires), and a probe sent immediately after reconnect lands while wgengine is still rebuilding magicsock state from the new netmap. Either path installs a spurious unhealthy bit, which sends the preserved-primary anti-flap the wrong way. Record the session observed at dispatch time and drop the timeout path if the node reconnected since. Require the session to survive a full probe cycle before a timeout can drive a failover.	2026-05-18 17:18:08 +02:00
Kristoffer Dalby	b1196baf6d	state: add regression test for Node slice persistence Drives the persist path for ApprovedRoutes, Tags and Endpoints — seed a non-empty value, clear to nil, read the column back from disk, then close the State and reopen one against the same sqlite file to simulate a server restart. Pins the contract the named IsZero slice types enforce so future changes to the persist path cannot silently drop a cleared slice column. Updates #3110	2026-05-15 11:21:58 +02:00
Kristoffer Dalby	7a20db9f49	types: persist Node JSON slices via named IsZero types Endpoints, Tags and ApprovedRoutes serialize as JSON on Node. GORM's struct Updates path skips fields it considers zero, and reflect treats a nil slice as zero — clearing any of these columns via the State persist path would leave the previous value in the database. Introduce Strings, Prefixes and AddrPorts as named slice types whose IsZero() always reports false, so GORM keeps the column in the UPDATE regardless of the slice being nil or empty. JSON marshalling is unchanged: nil serializes to null, empty to []. List() returns the underlying unnamed slice for callers (mainly testify assertions over reflect.DeepEqual) that distinguish the named type from its base. Regenerated types_clone.go and types_view.go follow the field-type swap. Test assertions across hscontrol/{db,state,servertest} updated to call .List() where reflect.DeepEqual previously matched the raw slice type. Fixes #3110	2026-05-15 11:21:58 +02:00
Kristoffer Dalby	6fcff9e352	mapper, state: deliver nodeAttrs through MapResponse and harden nextdns DoH rewrite WithSelfNode and buildTailPeers merge each node's policy CapMap into the tailcfg.Node.CapMap they emit. State.NodeCapMap and State.NodeCapMaps wrap the policy manager: NodeCapMap returns a defensive clone per call; NodeCapMaps snapshots the full per-node map once for batched callers, amortising pm.mu acquisition across a peer build. generateDNSConfig grew a per-node CapMap argument so it can apply nodeAttr-driven DNS overlays. The nextdns DoH rewrite hardens against policy-controlled inputs: - nextDNSDoHHost anchors the prefix match instead of substring, so a hostile resolver URL cannot smuggle a nextdns hostname in a path or query. - nextDNSProfileFromCapMap accepts only profile names matching [A-Za-z0-9._-]{1,64} and picks the lexicographically first when multiple are granted -- deterministic, no shell metacharacters or URL fragments through. - addNextDNSMetadata composes the rewritten URL via url.Parse + url.Values rather than fmt.Sprintf, so existing query strings on the resolver URL survive and metadata cannot inject a new component. WithTaildropEnabled in servertest controls cfg.Taildrop.Enabled per test so cap/file-sharing emission can be toggled in tests that need to verify the off path.	2026-05-13 14:22:30 +02:00
SAY-5	01e548e030	state: avoid nil deref in registration handlers when old user is missing Mirror the guard from HandleNodeFromPreAuthKey in HandleNodeFromAuthPath. Both functions log the old user's name in the "different user" branch when an existing NodeStore entry under the same machine key belongs to another user. UserView.Name dereferences the backing User pointer unconditionally, so when the cached node was loaded with a non-nil UserID but a nil User (Preload join missed the row, or upstream code left the snapshot in that shape), the log call panics with a nil-pointer dereference at hscontrol/types/types_view.go:97. The panic is caught by the http2 server's runHandler for the noise control plane, so the process keeps running but every retry produces a new panic — production has observed bursts of ~1.9k panics per hour during a tailscaled reconnect loop. The gRPC/OIDC entry has no equivalent recover and would surface the panic to the caller. Guard both call sites with oldUser.Valid() and fall back to an empty old-user name when the pointer is nil. The "Creating new node for different user" log line still includes the existing node ID, hostname, machine key, and new user, so operator visibility is preserved. Add reproduction tests for both handlers seeding the orphan shape directly into NodeStore via PutNodeInStoreForTest. Co-Authored-By: Kristoffer Dalby <kristoffer@dalby.cc>	2026-05-06 07:23:02 +01:00
Kristoffer Dalby	94ec607bca	state: per-goroutine deadline in HA probe cycle `time.After(ProbeTimeout)` returned a single channel shared by every probe goroutine in the cycle. Only the first goroutine to receive the deadline tick drains the channel; any other goroutine still waiting on its `responseCh` is then stuck forever, `wg.Wait()` never returns, and the scheduler loop in `app.go` stalls on the next tick. The condition fires whenever two or more nodes time out in the same cycle — common under cable-pull where IsOnline lags reality and both routers stay in the candidate set as half-open TCP. Move the timer inside each goroutine so every probe has its own deadline. Updates #3234	2026-04-30 12:52:05 +01:00
Kristoffer Dalby	3d5c0af4e7	state: preserve previous primary when all HA advertisers unhealthy electPrimaryRoutes' all-unhealthy fallback picked candidates[0] (lowest NodeID) regardless of who was prev. Under cable-pull semantics IsOnline lags reality (long-poll TCP half-open), so both routers stay in candidates and both go Unhealthy via the prober — the fallback then churned primary to a node that was itself unreachable. Prefer prev when still in candidates; fall through to candidates[0] only when prev is gone. Anti-blackhole holds. Update the property test reference model and split the unit test into existence (KeepsAPrimary) and identity (PreservesPrevious) cases. Fixes #3203	2026-04-29 18:08:39 +01:00
Kristoffer Dalby	9f7c8e9a07	state: clear Unhealthy when node leaves HA candidate set Restore the legacy auto-clear at write boundaries that drop HA candidacy: Disconnect, SetApprovedRoutes(empty), and UpdateNodeFromMapRequest shrinking advertised routes to empty. Plus a defensive guard in SetNodeUnhealthy. Updates #3203	2026-04-29 18:08:39 +01:00
Kristoffer Dalby	66ac785c22	state: delete routes package, port primary route tests Remove hscontrol/routes/. Port the named scenarios and the rapid property test to hscontrol/state/. Updates #3203	2026-04-29 18:08:39 +01:00
Kristoffer Dalby	437754aeea	state: switch consumers to NodeStore primary routes Replace routes.PrimaryRoutes reads with NodeStore. Connect bumps SessionEpoch; Disconnect re-checks it inside UpdateNode so the check and mutation are atomic against a concurrent Connect on the same node. The connect_race regression test is carried in its final SessionEpoch form. Updates #3203	2026-04-29 18:08:39 +01:00
Kristoffer Dalby	da927eb018	state: compute primary routes inside NodeStore snapshot Add primaries and isPrimary maps to Snapshot plus an election algorithm. No callers yet. Updates #3203	2026-04-29 18:08:39 +01:00
Akhilesh Arora	0e10ca4e9a	state: preserve nil expiry on user owned registration when no default is configured When a user owned node registers or re registers with a PreAuthKey and the client sends zero client expiry while node.expiry is set to 0, the expiry column ends up stored as 0001-01-01 00:00:00 instead of NULL. Two sites in HandleNodeFromPreAuthKey build a non nil pointer to regReq.Expiry even when the value is zero time, and the needsDefaultExpiry guard only replaces it when s.cfg.Node.Expiry > 0, so the pointer to zero time survives to the database. Convert an unset regReq.Expiry to nil before handing it off so the needsDefaultExpiry path is the only place that assigns a non nil pointer. This is a narrower sibling of #3170 on the user owned PreAuthKey path. The regression was introduced alongside the fix for #3111 in `6337a3db`.	2026-04-29 13:06:38 +01:00
Kristoffer Dalby	d6dfdc100c	hscontrol: route hostname handling through dnsname and NodeStore Ingest (registration and MapRequest updates) now calls dnsname.SanitizeHostname directly and lets NodeStore auto-bump on collision. Admin rename uses dnsname.ValidLabel + SetGivenName so conflicts are surfaced to the caller instead of silently mutated. Three duplicate invalidDNSRegex definitions, the old NormaliseHostname and ValidateHostname helpers, EnsureHostname, InvalidString, ApplyHostnameFromHostInfo, GivenNameHasBeenChanged, generateGivenName and EnsureUniqueGivenName are removed along with their tests. ValidateHostname's username half is retained as ValidateUsername for users.go. The SaaS-matching collision rule replaces the random "invalid-xxxxxx" fallback and the 8-character hash suffix; the empty-input fallback is the literal "node". TestUpdateHostnameFromClient now exercises the rewrite end-to-end with awkward macOS/Windows names. Fixes #3188 Fixes #2926 Fixes #2343 Fixes #2762 Fixes #2449 Updates #2177 Updates #2121 Updates #363	2026-04-18 15:12:21 +01:00
Kristoffer Dalby	a2c3ac095e	state: auto-bump GivenName on collision and add SetGivenName NodeStore's writer goroutine now resolves GivenName collisions inside applyBatch: on PutNode/UpdateNode the landing label gets -N appended until unique, matching Tailscale SaaS. Empty labels fall back to the literal "node". SetGivenName exposes the admin-rename path: validates via dnsname.ValidLabel and rejects on collision with ErrGivenNameTaken, so renames do not silently rewrite behind the caller. Updates #3188 Updates #2926 Updates #2343 Updates #2762	2026-04-18 15:12:21 +01:00
Florian Preinstorfer	f1494a32ce	Update links to Tailscale documentation	2026-04-18 09:33:41 +02:00
Kristoffer Dalby	978f1e3947	state: tie-break ResolveNode by GivenName then lowest NodeID Resolve by GivenName (unique per tailnet) before Hostname (client- reported, may collide); within each pass, pick the lowest NodeID so results are deterministic across NodeStore snapshot iterations. Updates #3157	2026-04-17 16:31:49 +01:00
Kristoffer Dalby	842f36225e	state: drain pending pings on Close Blocked callers waiting on a pingTracker response channel would hang forever if the server Close()d mid-probe. Drain the pending map on Close so those goroutines unblock and exit cleanly. Updates #3157	2026-04-17 16:31:49 +01:00
Kristoffer Dalby	7d104b8c8d	servertest: add via grant map compat tests End-to-end exercise of via-grant compilation against SaaS captures: peer visibility, AllowedIPs, PrimaryRoutes, and per-rule src/dst reachability from each viewer's perspective. Updates #3157	2026-04-17 16:31:49 +01:00
Kristoffer Dalby	90e65ccd63	state: add HA health prober Ping HA subnet routers each probe cycle and mark unresponsive nodes unhealthy. Reconnecting a node clears its unhealthy state since the fresh Noise session proves basic connectivity. Updates #2129 Updates #2902	2026-04-16 15:10:56 +01:00
Kristoffer Dalby	97778c9930	all: add tests for PingRequest implementation Unit tests for Change (IsEmpty, Merge, Type, PingNode constructor), ping tracker (register/complete/cancel lifecycle, concurrency, latency), and end-to-end servertests exercising the full round-trip with real controlclient.Direct instances. Updates #2902 Updates #2129	2026-04-15 10:53:35 +01:00
Kristoffer Dalby	b113655b71	all: implement PingRequest for node connectivity checking Implement tailcfg.PingRequest support so the control server can verify whether a connected node is still reachable. This is the foundation for faster offline detection (currently ~16min due to Go HTTP/2 TCP retransmit behavior) and future C2N communication. The server sends a PingRequest via MapResponse with a unique callback URL. The Tailscale client responds with a HEAD request to that URL, proving connectivity. Round-trip latency is measured. Wire PingRequest through the Change → Batcher → MapResponse pipeline, add a ping tracker on State for correlating requests with responses, add ResolveNode for looking up nodes by ID/IP/hostname, and expose a /debug/ping page (elem-go form UI) and /machine/ping-response endpoint. Updates #2902 Updates #2129	2026-04-15 10:53:35 +01:00
Kristoffer Dalby	93860a5c06	all: apply formatter changes	2026-04-13 17:23:47 +01:00
Kristoffer Dalby	0641771128	db: guard UsePreAuthKey with WHERE used=false Add a row-level check so concurrent registrations with the same single-use key cannot both succeed. Skip the call on re-registration where the key is already marked used (#2830).	2026-04-10 14:09:57 +01:00
Kristoffer Dalby	99767cf805	hscontrol: validate machine key and bind src/dst in SSH check handler SSHActionHandler now verifies that the Noise session's machine key matches the dst node before proceeding. The (src, dst) pair is captured at hold-and-delegate time via a new SSHCheckBinding on AuthRequest so sshActionFollowUp can verify the follow-up URL matches. The OIDC non-registration callback requires the authenticated user to own the src node before approving.	2026-04-10 14:09:57 +01:00
Kristoffer Dalby	0d4f2293ff	state: replace zcache with bounded LRU for auth cache Replace zcache with golang-lru/v2/expirable for both the state auth cache and the OIDC state cache. Add tuning.register_cache_max_entries (default 1024) to cap the number of pending registration entries. Introduce types.RegistrationData to replace caching a full *Node; only the fields the registration callback path reads are retained. Remove the dead HSDatabase.regCache field. Drop zgo.at/zcache/v2 from go.mod.	2026-04-10 14:09:57 +01:00
Kristoffer Dalby	82bb4331f5	state: fix routesChanged mutating input Hostinfo routesChanged aliases newHI.RoutableIPs into a local variable then sorts it in place, which mutates the caller's Hostinfo data. The Hostinfo is subsequently stored on the node, so the mutation propagates but the input contract is violated. Clone the slice before sorting to avoid mutating the input.	2026-04-10 13:18:56 +01:00
Kristoffer Dalby	6ae182696f	state: fix policy change race in UpdateNodeFromMapRequest When UpdateNodeFromMapRequest and SetNodeTags race on persistNodeToDB, the first caller to run updatePolicyManagerNodes detects the tag change and returns a PolicyChange. The second caller finds no change and falls back to NodeAdded. If UpdateNodeFromMapRequest wins the race, it checked policyChange.IsFull() which is always false for PolicyChange (only sets IncludePolicy and RequiresRuntimePeerComputation). This caused the PolicyChange to be dropped, so affected clients never received PeersRemoved and the stale peer remained in their NetMap indefinitely. Fix: check !policyChange.IsEmpty() instead, which correctly detects any non-trivial policy change including PolicyChange(). This fixes the root cause of TestACLTagPropagation/multiple-tags-partial- removal flaking at ~20% on CI. Updates #3125	2026-04-08 14:32:08 +01:00
Kristoffer Dalby	ccddeceeec	state: fix GORM not persisting user_id=NULL on tagged node conversion GORM's struct-based Updates() silently skips nil pointer fields. When SetNodeTags sets node.UserID = nil to transfer ownership to tags, the in-memory NodeStore is correct but the database retains the old user_id value. This causes tagged nodes to remain associated with the original user in the database, preventing user deletion and risking ON DELETE CASCADE destroying tagged nodes. Add Select("") before Omit() on all three node persistence paths to force GORM to include all fields in the UPDATE statement, including nil pointers. This is the same pattern already used in db/ip.go for IPv4/IPv6 nil handling, and is documented GORM behavior: db.Select("").Omit("excluded").Updates(struct) The three affected paths are: - persistNodeToDB: used by SetNodeTags and MapRequest updates - applyAuthNodeUpdate: used by re-authentication with --advertise-tags - HandleNodeFromPreAuthKey: used by PAK re-registration Fixes #3161	2026-04-08 14:32:08 +01:00
Kristoffer Dalby	380f531342	state: trigger PolicyChange on every Connect and Disconnect Connect and Disconnect previously only appended a PolicyChange when the affected node was a subnet router (routeChange) or the database persist returned a full change. For every other node the peers just received a small PeerChangedPatch{Online: ...} and no filter rules were recomputed. That was too narrow: a node going offline or coming online can affect policy compilation in ways beyond subnet routes. TestGrantCapRelay Phase 4 exposed this. When the cap/relay target node went down with `tailscale down`, headscale only sent an Online=false patch, peers never got a recomputed netmap, and their cached PeerRelay allocation stayed populated until the 120s assertion timeout. With a PolicyChange queued on Disconnect, peers immediately receive a full netmap on relay loss and clear PeerRelay as expected; the symmetric change on Connect lets Phase 5 re-publish the policy when the relay comes back. Drop the now-unused routeChange return from the Disconnect gate. Updates #2180	2026-04-08 13:00:22 +01:00
Kristoffer Dalby	6337a3dbc4	state: apply default node key expiry on registration Use the node.expiry config to apply a default expiry to non-tagged nodes when the client does not request a specific expiry. This covers all registration paths: new node creation, re-authentication, and pre-auth key re-registration. Tagged nodes remain exempt and never expire. Fixes #1711	2026-04-08 13:00:22 +01:00
Kristoffer Dalby	c36cedc32f	policy/v2: fix via grants in BuildPeerMap, MatchersForNode, and ViaRoutesForPeer Use per-node compilation path for via grants in BuildPeerMap and MatchersForNode to ensure via-granted nodes appear in peer maps. Fix ViaRoutesForPeer golden test route inference to correctly resolve via grant effects. Updates #2180	2026-04-01 14:10:42 +01:00
Kristoffer Dalby	3ca4ff8f3f	state,servertest: add grant control plane tests and fix via route ReduceRoutes filtering Add servertest grant policy control plane tests covering basic grants, via grants, and cap grants. Fix ReduceRoutes in State to apply route reduction to non-via routes first, then append via-included routes, preventing via grant routes from being incorrectly filtered. Updates #2180	2026-04-01 14:10:42 +01:00
Kristoffer Dalby	8358017dcf	policy/v2,state,mapper: implement per-viewer via route steering Via grants steer routes to specific nodes per viewer. Until now, all clients saw the same routes for each peer because route assembly was viewer-independent. This implements per-viewer route visibility so that via-designated peers serve routes only to matching viewers, while non-designated peers have those routes withdrawn. Add ViaRouteResult type (Include/Exclude prefix lists) and ViaRoutesForPeer to the PolicyManager interface. The v2 implementation iterates via grants, resolves sources against the viewer, matches destinations against the peer's advertised routes (both subnet and exit), and categorizes prefixes by whether the peer has the via tag. Add RoutesForPeer to State which composes global primary election, via Include/Exclude filtering, exit routes, and ACL reduction. When no via grants exist, it falls back to existing behavior. Update the mapper to call RoutesForPeer per-peer instead of using a single route function for all peers. The route function now returns all routes (subnet + exit), and TailNode filters exit routes out of the PrimaryRoutes field for HA tracking. Updates #2180	2026-04-01 14:10:42 +01:00
Kristoffer Dalby	1053fbb16b	hscontrol/state: fix online status reset during re-registration Two fixes to how online status is handled during registration: 1. Re-registration (applyAuthNodeUpdate, HandleNodeFromPreAuthKey) no longer resets IsOnline to false. Online status is managed exclusively by Connect()/Disconnect() in the poll session lifecycle. The reset caused a false offline blip: the auth handler's change notification triggered a map regeneration showing the node as offline to peers, even though Connect() would set it back to true moments later. 2. New node creation (createAndSaveNewNode) now explicitly sets IsOnline=false instead of leaving it nil. This ensures peers always receive a known online status rather than an ambiguous nil/unknown.	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	b09af3846b	hscontrol/poll,state: fix grace period disconnect TOCTOU race When a node disconnects, serveLongPoll defers a cleanup that starts a grace period goroutine. This goroutine polls batcher.IsConnected() and, if the node has not reconnected within ~10 seconds, calls state.Disconnect() to mark it offline. A TOCTOU race exists: the node can reconnect (calling Connect()) between the IsConnected check and the Disconnect() call, causing the stale Disconnect() to overwrite the new session's online status. Fix with a monotonic per-node generation counter: - State.Connect() increments the counter and returns the current generation alongside the change list. - State.Disconnect() accepts the generation from the caller and rejects the call if a newer generation exists, making stale disconnects from old sessions a no-op. - serveLongPoll captures the generation at Connect() time and passes it to Disconnect() in the deferred cleanup. - RemoveNode's return value is now checked: if another session already owns the batcher slot (reconnect happened), the old session skips the grace period entirely. Update batcher_test.go to track per-node connect generations and pass them through to Disconnect(), matching production behavior. Fixes the following test failures: - server_state_online_after_reconnect_within_grace - update_history_no_false_offline - nodestore_correct_after_rapid_reconnect - rapid_reconnect_peer_never_sees_offline	2026-03-19 07:05:58 +01:00
Kristoffer Dalby	7bab8da366	state, policy, noise: implement SSH check period auto-approval Add SSH check period tracking so that recently authenticated users are auto-approved without requiring manual intervention each time. Introduce SSHCheckPeriod type with validation (min 1m, max 168h, "always" for every request) and encode the compiled check period as URL query parameters in the HoldAndDelegate URL. The SSHActionHandler checks recorded auth times before creating a new HoldAndDelegate flow. Auth timestamps are stored in-memory: - Default period (no explicit checkPeriod): auth covers any destination, keyed by source node with Dst=0 sentinel - Explicit period: auth covers only that specific destination, keyed by (source, destination) pair Auth times are cleared on policy changes. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	107c2f2f70	policy, noise: implement SSH check action Implement the SSH "check" action which requires additional verification before allowing SSH access. The policy compiler generates a HoldAndDelegate URL that the Tailscale client calls back to headscale. The SSHActionHandler creates an auth session and waits for approval via the generalised auth flow. Sort check (HoldAndDelegate) rules before accept rules to match Tailscale's first-match-wins evaluation order. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	cb3b6949ea	auth: generalise auth flow and introduce AuthVerdict Generalise the registration pipeline to a more general auth pipeline supporting both node registrations and SSH check auth requests. Rename RegistrationID to AuthID, unexport AuthRequest fields, and introduce AuthVerdict to unify the auth finish API. Add the urlParam generic helper for extracting typed URL parameters from chi routes, used by the new auth request handler. Updates #1850	2026-02-25 21:28:05 +01:00
Kristoffer Dalby	8048f10d13	hscontrol/state: extract findExistingNodeForPAK to reduce complexity Some checks failed Build / build-nix (push) Has been cancelled Build / build-cross (GOARCH=amd64 GOOS=darwin) (push) Has been cancelled Build / build-cross (GOARCH=amd64 GOOS=linux) (push) Has been cancelled Build / build-cross (GOARCH=arm64 GOOS=darwin) (push) Has been cancelled Build / build-cross (GOARCH=arm64 GOOS=linux) (push) Has been cancelled Check Generated Files / check-generated (push) Has been cancelled NixOS Module Tests / nix-module-check (push) Has been cancelled Tests / test (push) Has been cancelled Needs More Info - Timer / remove-label-on-response (push) Has been cancelled Needs More Info - Timer / close-stale (push) Has been cancelled update-flake-lock / lockfile (push) Has been cancelled GitHub Actions Version Updater / build (push) Has been cancelled Close inactive issues / close-issues (push) Has been cancelled Extract the existing-node lookup logic from HandleNodeFromPreAuthKey into a separate method. This reduces the cyclomatic complexity from 32 to 28, below the gocyclo limit of 30. Updates #3077	2026-02-20 21:51:00 +01:00
Kristoffer Dalby	75e56df9e4	hscontrol: enforce that tagged nodes never have user_id Tagged nodes are owned by their tags, not a user. Enforce this invariant at every write path: - createAndSaveNewNode: do not set UserID for tagged PreAuthKey registration; clear UserID when advertise-tags are applied during OIDC/CLI registration - SetNodeTags: clear UserID/User when tags are assigned - processReauthTags: clear UserID/User when tags are applied during re-authentication - validateNodeOwnership: reject tagged nodes with non-nil UserID - NodeStore: skip nodesByUser indexing for tagged nodes since they have no owning user - HandleNodeFromPreAuthKey: add fallback lookup for tagged PAK re-registration (tagged nodes indexed under UserID(0)); guard against nil User deref for tagged nodes in different-user check Since tagged nodes now have user_id = NULL, ListNodesByUser will not return them and DestroyUser naturally allows deleting users whose nodes have all been tagged. The ON DELETE CASCADE FK cannot reach tagged nodes through a NULL foreign key. Also tone down shouty comments throughout state.go. Fixes #3077	2026-02-20 21:51:00 +01:00
Kristoffer Dalby	f20bd0cf08	node: implement disable key expiry via CLI and API Add --disable flag to "headscale nodes expire" CLI command and disable_expiry field handling in the gRPC API to allow disabling key expiry for nodes. When disabled, the node's expiry is set to NULL and IsExpired() returns false. The CLI follows the new grpcRunE/RunE/printOutput patterns introduced in the recent CLI refactor. Also fix NodeSetExpiry to persist directly to the database instead of going through persistNodeToDB which omits the expiry field. Fixes #2681 Co-authored-by: Marco Santos <me@marcopsantos.com>	2026-02-20 21:49:55 +01:00
Kristoffer Dalby	43afeedde2	all: apply golangci-lint 2.9.0 fixes Fix issues found by the upgraded golangci-lint: - wsl_v5: add required whitespace in CLI files - staticcheck SA4006: replace new(var.Field) with &localVar pattern since staticcheck does not recognize Go 1.26 new(value) as a use of the variable - staticcheck SA5011: use t.Fatal instead of t.Error for nil guard checks so execution stops - unused: remove dead ptrTo helper function	2026-02-19 08:21:23 +01:00
Kristoffer Dalby	0f6d312ada	all: upgrade to Go 1.26rc2 and modernize codebase Some checks failed Build / build-nix (push) Has been cancelled Build / build-cross (GOARCH=amd64 GOOS=darwin) (push) Has been cancelled Build / build-cross (GOARCH=amd64 GOOS=linux) (push) Has been cancelled Build / build-cross (GOARCH=arm64 GOOS=darwin) (push) Has been cancelled Build / build-cross (GOARCH=arm64 GOOS=linux) (push) Has been cancelled Check Generated Files / check-generated (push) Has been cancelled NixOS Module Tests / nix-module-check (push) Has been cancelled Tests / test (push) Has been cancelled Close inactive issues / close-issues (push) Has been cancelled This commit upgrades the codebase from Go 1.25.5 to Go 1.26rc2 and adopts new language features. Toolchain updates: - go.mod: go 1.25.5 → go 1.26rc2 - flake.nix: buildGo125Module → buildGo126Module, go_1_25 → go_1_26 - flake.nix: build golangci-lint from source with Go 1.26 - Dockerfile.integration: golang:1.25-trixie → golang:1.26rc2-trixie - Dockerfile.tailscale-HEAD: golang:1.25-alpine → golang:1.26rc2-alpine - Dockerfile.derper: golang:alpine → golang:1.26rc2-alpine - .goreleaser.yml: go mod tidy -compat=1.25 → -compat=1.26 - cmd/hi/run.go: fallback Go version 1.25 → 1.26rc2 - .pre-commit-config.yaml: simplify golangci-lint hook entry Code modernization using Go 1.26 features: - Replace tsaddr.SortPrefixes with slices.SortFunc + netip.Prefix.Compare - Replace ptr.To(x) with new(x) syntax - Replace errors.As with errors.AsType[T] Lint rule updates: - Add forbidigo rules to prevent regression to old patterns	2026-02-08 12:35:23 +01:00
Kristoffer Dalby	ce580f8245	all: fix golangci-lint issues (#3064 ) Some checks failed Build / build-nix (push) Has been cancelled Build / build-cross (GOARCH=amd64 GOOS=darwin) (push) Has been cancelled Build / build-cross (GOARCH=amd64 GOOS=linux) (push) Has been cancelled Build / build-cross (GOARCH=arm64 GOOS=darwin) (push) Has been cancelled Build / build-cross (GOARCH=arm64 GOOS=linux) (push) Has been cancelled Check Generated Files / check-generated (push) Has been cancelled NixOS Module Tests / nix-module-check (push) Has been cancelled Tests / test (push) Has been cancelled	2026-02-06 21:45:32 +01:00
Kristoffer Dalby	3acce2da87	errors: rewrite errors to follow go best practices Errors should not start capitalised and they should not contain the word error or state that they "failed" as we already know it is an error Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	4a9a329339	all: use lowercase log messages Go style recommends that log messages and error strings should not be capitalized (unless beginning with proper nouns or acronyms) and should not end with punctuation. This change normalizes all zerolog .Msg() and .Msgf() calls to start with lowercase letters, following Go conventions and making logs more consistent across the codebase.	2026-02-06 07:40:29 +01:00
Kristoffer Dalby	dd16567c52	hscontrol/state,db: use zf constants for logging Replace raw string field names with zf constants in state.go and db/node.go for consistent, type-safe logging. state.go changes: - User creation, hostinfo validation, node registration - Tag processing during reauth (processReauthTags) - Auth path and PreAuthKey handling - Route auto-approval and MapRequest processing db/node.go changes: - RegisterNodeForTest logging - Invalid hostname replacement logging	2026-02-06 07:40:29 +01:00

1 2 3

105 Commits