14 Commits

Author SHA1 Message Date
Kristoffer Dalby
3d5c0af4e7 state: preserve previous primary when all HA advertisers unhealthy
electPrimaryRoutes' all-unhealthy fallback picked candidates[0]
(lowest NodeID) regardless of who was prev. Under cable-pull
semantics IsOnline lags reality (long-poll TCP half-open), so
both routers stay in candidates and both go Unhealthy via the
prober — the fallback then churned primary to a node that was
itself unreachable.

Prefer prev when still in candidates; fall through to
candidates[0] only when prev is gone. Anti-blackhole holds.

Update the property test reference model and split the unit
test into existence (KeepsAPrimary) and identity
(PreservesPrevious) cases.

Fixes #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
437754aeea state: switch consumers to NodeStore primary routes
Replace routes.PrimaryRoutes reads with NodeStore. Connect bumps
SessionEpoch; Disconnect re-checks it inside UpdateNode so the
check and mutation are atomic against a concurrent Connect on
the same node.

The connect_race regression test is carried in its final
SessionEpoch form.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
da927eb018 state: compute primary routes inside NodeStore snapshot
Add primaries and isPrimary maps to Snapshot plus an election
algorithm. No callers yet.

Updates #3203
2026-04-29 18:08:39 +01:00
Kristoffer Dalby
a2c3ac095e state: auto-bump GivenName on collision and add SetGivenName
NodeStore's writer goroutine now resolves GivenName collisions inside
applyBatch: on PutNode/UpdateNode the landing label gets -N appended
until unique, matching Tailscale SaaS. Empty labels fall back to the
literal "node".

SetGivenName exposes the admin-rename path: validates via
dnsname.ValidLabel and rejects on collision with ErrGivenNameTaken,
so renames do not silently rewrite behind the caller.

Updates #3188
Updates #2926
Updates #2343
Updates #2762
2026-04-18 15:12:21 +01:00
Kristoffer Dalby
93860a5c06 all: apply formatter changes 2026-04-13 17:23:47 +01:00
Kristoffer Dalby
75e56df9e4 hscontrol: enforce that tagged nodes never have user_id
Tagged nodes are owned by their tags, not a user. Enforce this
invariant at every write path:

- createAndSaveNewNode: do not set UserID for tagged PreAuthKey
  registration; clear UserID when advertise-tags are applied
  during OIDC/CLI registration
- SetNodeTags: clear UserID/User when tags are assigned
- processReauthTags: clear UserID/User when tags are applied
  during re-authentication
- validateNodeOwnership: reject tagged nodes with non-nil UserID
- NodeStore: skip nodesByUser indexing for tagged nodes since
  they have no owning user
- HandleNodeFromPreAuthKey: add fallback lookup for tagged PAK
  re-registration (tagged nodes indexed under UserID(0)); guard
  against nil User deref for tagged nodes in different-user check

Since tagged nodes now have user_id = NULL, ListNodesByUser
will not return them and DestroyUser naturally allows deleting
users whose nodes have all been tagged. The ON DELETE CASCADE
FK cannot reach tagged nodes through a NULL foreign key.

Also tone down shouty comments throughout state.go.

Fixes #3077
2026-02-20 21:51:00 +01:00
Kristoffer Dalby
ce580f8245 all: fix golangci-lint issues (#3064)
Some checks failed
Build / build-nix (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=linux) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=linux) (push) Has been cancelled
Check Generated Files / check-generated (push) Has been cancelled
NixOS Module Tests / nix-module-check (push) Has been cancelled
Tests / test (push) Has been cancelled
2026-02-06 21:45:32 +01:00
Kristoffer Dalby
72fcb93ef3 cli: ensure tagged-devices is included in profile list (#2991)
Some checks failed
Close inactive issues / close-issues (push) Has been cancelled
Build / build-nix (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=linux) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=linux) (push) Has been cancelled
Check Generated Files / check-generated (push) Has been cancelled
NixOS Module Tests / nix-module-check (push) Has been cancelled
Tests / test (push) Has been cancelled
update-flake-lock / lockfile (push) Has been cancelled
GitHub Actions Version Updater / build (push) Has been cancelled
2026-01-09 16:31:23 +01:00
Kristoffer Dalby
eb788cd007 make tags first class node owner (#2885)
Some checks failed
Build / build-nix (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=linux) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=linux) (push) Has been cancelled
Check Generated Files / check-generated (push) Has been cancelled
NixOS Module Tests / nix-module-check (push) Has been cancelled
Tests / test (push) Has been cancelled
Close inactive issues / close-issues (push) Has been cancelled
This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation.

There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt.

Updates #2417
Closes #2619
2025-12-02 12:01:25 +01:00
Kristoffer Dalby
db293e0698 hscontrol/state: make NodeStore batch configuration tunable (#2886) 2025-11-28 16:38:29 +01:00
Kristoffer Dalby
2bf1200483 policy: fix autogroup:self propagation and optimize cache invalidation (#2807)
Some checks failed
Build / build-nix (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=amd64 GOOS=linux) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=darwin) (push) Has been cancelled
Build / build-cross (GOARCH=arm64 GOOS=linux) (push) Has been cancelled
Check Generated Files / check-generated (push) Has been cancelled
Tests / test (push) Has been cancelled
Close inactive issues / close-issues (push) Has been cancelled
2025-10-23 17:57:41 +02:00
Kristoffer Dalby
fddc7117e4 stability and race conditions in auth and node store (#2781)
This PR addresses some consistency issues that was introduced or discovered with the nodestore.

nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.

auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.

integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)

This PR was assisted, particularly tests, by claude code.
2025-10-16 12:17:43 +02:00
Kristoffer Dalby
50ed24847b debug: add json and improve
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00
Kristoffer Dalby
9d236571f4 state/nodestore: in memory representation of nodes
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.

It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.

Writes will block until commited, while reads are never
blocked.

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2025-09-09 09:40:00 +02:00