v2 silently dropped policy.tests, so a policy that contradicted its
own assertions still applied. Resolve src/dst via the existing Alias
machinery, walk the compiled global filter rules (acls and grants
both contribute), and run on every user-write boundary: SetPolicy,
the file watcher, and `headscale policy check`. A failing test
rejects the write before it mutates live state.
Boot-time reload skips evaluation; an already-stored policy that
references a deleted user shouldn't lock the server out.
`headscale policy check` is a thin frontend for the new CheckPolicy
gRPC method. The server-side handler builds a fresh PolicyManager
from the request bytes and the state's live users/nodes, runs
SetPolicy on the sandbox so the tests block executes, and returns
the result through gRPC status. No persistence, no policy_mode
coupling. --bypass-grpc-and-access-database-directly opens the DB
directly when the server is not running.
cmd/headscale/cli/root.go no longer special-cases `policy check` in
init() (the early return from PR #2580 broke --config registration
and viper priming for --bypass).
integration/cli_policy_test.go covers policy_mode={file,database} x
fixture={acl-only, acl+passing-tests, acl+failing-tests} x
bypass={false,true} = 12 rows.
Updates #1803
Co-authored-by: Janis Jansons <janhouse@gmail.com>
CheckPolicy validates a candidate policy against a running server's
live users and nodes (running its tests block) without persisting
anything. Used by 'headscale policy check' to replace the in-process
validation path the CLI runs today, which would otherwise need its
own database connection.
Updates #1803
MatchFromFilterRule only read DstPorts[].IP into the destination
IPSet. Cap-grant-only filter rules (e.g. tailscale.com/cap/relay)
carry their destinations in CapGrant[].Dsts, so the derived matchers
had empty dest sets and BuildPeerMap / ReduceNodes never exposed the
cap target to its source nodes. Without a companion IP-level grant
the relay node stayed invisible, so clients never tried to use it
and connections sat on DERP.
Union CapGrant[].Dsts into the destination IPSet alongside DstPorts.
Restores peer-visibility for any cap-grant-only relationship; the
peer-relay flow is the most visible instance.
Fixes#3256
revgrep needs pull_request.base.sha in the local clone to compute
the diff against new code. With fetch-depth: 2, only HEAD and one
parent are fetched, so a stale base SHA (when main moves between
PR syncs) is not reachable and revgrep falls through, surfacing
pre-existing issues outside the PR scope.
Replace the grep/awk hash extraction in build.yml with a structured
vendorhash check step; the PR review comment now reads expected/
actual values directly from $GITHUB_OUTPUT instead of scraping Nix
stderr. Add a prek hook so divergence is caught locally before push.
Move the headscale vendorHash out of flake.nix into a content-
addressed flakehashes.json maintained by a small Go tool. The
schema and goModFingerprint algorithm mirror upstream tailscale's
tool/updateflakes so a future shared library extraction is trivial.
vendorhash check verifies flakehashes.json against the current
go.mod/go.sum. Hot path is a sha256 over those two files, so
re-runs without input change are essentially free; only an actual
fingerprint drift triggers go mod vendor + nardump.SRI.
vendorhash update recomputes both fields and rewrites the JSON.
The nix-vendor-sri devShell shim now wraps it.
Pulls in the cmd/nardump library split (tailscale/tailscale#19551)
so flakehashes.json tooling can import nardump.SRI directly.
Side effects: Go directive bumps to 1.26.2 and the nixpkgs lock
advances to a revision shipping go 1.26.2.
Add --bypass-grpc-and-access-database-directly to policy check so
the new ambiguous-user validator runs against the live user list.
Without the flag, policy check stays a syntax-only check and the
success message says so.
Updates #3160
When a user@ token resolved to more than one DB row, ACL and SSH
rules referencing it were silently dropped at compile time, leaving
clients with SSHPolicy={rules: null} and no signal to the admin.
Validate every Username reference in groups, tagOwners,
autoApprovers, ACLs and SSH rules at NewPolicyManager and SetPolicy
and return ErrMultipleUsersFound. Missing-user tokens stay tolerant
per #2863.
Updates #3160
Mirror the guard from HandleNodeFromPreAuthKey in HandleNodeFromAuthPath.
Both functions log the old user's name in the "different user" branch
when an existing NodeStore entry under the same machine key belongs to
another user. UserView.Name dereferences the backing User pointer
unconditionally, so when the cached node was loaded with a non-nil
UserID but a nil User (Preload join missed the row, or upstream code
left the snapshot in that shape), the log call panics with a nil-pointer
dereference at hscontrol/types/types_view.go:97.
The panic is caught by the http2 server's runHandler for the noise
control plane, so the process keeps running but every retry produces a
new panic — production has observed bursts of ~1.9k panics per hour
during a tailscaled reconnect loop. The gRPC/OIDC entry has no equivalent
recover and would surface the panic to the caller.
Guard both call sites with oldUser.Valid() and fall back to an empty
old-user name when the pointer is nil. The "Creating new node for
different user" log line still includes the existing node ID, hostname,
machine key, and new user, so operator visibility is preserved.
Add reproduction tests for both handlers seeding the orphan shape
directly into NodeStore via PutNodeInStoreForTest.
Co-Authored-By: Kristoffer Dalby <kristoffer@dalby.cc>
The 39 SSH-*.hujson files in hscontrol/policy/v2/testdata/ssh_results/
were legacy hand-written "expected SSH rules" snippets superseded by
the lowercase tscap captures (ssh-*.hujson). The active loader in
TestSSHDataCompat globs ssh-*.hujson; filepath.Glob is case-sensitive
on Linux so the uppercase set was loaded by no test.
The duplication caused permanent dirty git state on case-insensitive
filesystems (APFS, NTFS) where only one of SSH-A1.hujson and
ssh-a1.hujson can physically exist in the working tree.
Add an assertion to TestSSHDataCompat that the loader picks up every
*.hujson under ssh_results/ so future fixture migrations cannot leave
stranded files behind.
Fixes#3240
Some OIDC providers (notably JumpCloud) return the `groups` claim as
a plain string when the user belongs to a single group, rather than
a single-element array:
Single group: {"groups": "MyGroup"}
Multiple groups: {"groups": ["Group1", "Group2"]}
This causes `json.Unmarshal` to fail with:
cannot unmarshal string into Go struct field OIDCClaims.groups of type []string
This is the same class of issue as juanfont#2293 (FlexibleBoolean for
email_verified). The fix follows the same pattern: introduce a
FlexibleStringSlice type with a custom UnmarshalJSON that accepts
both a string and a []string, and use it for the Groups field in
both OIDCClaims and OIDCUserInfo.
TestGrantViaExitNodeInternetVisibility boots a server, applies a
policy that scopes autogroup:internet to a tag, registers a tagged
exit advertiser and a regular client, and asserts the client's netmap
surfaces the exit node with 0.0.0.0/0 and ::/0 in AllowedIPs — the
substrate the Tailscale client reads to populate
`tailscale exit-node list`.
TestGrantViaExitNodeNoFilterRules retains its assertion (literal /0
absent from the exit node's PacketFilter, matching SaaS PacketFilter
encoding); only its docstring is updated to reflect that the exit
node now does receive a TheInternet-shaped rule, just not the
literal /0 form.
Updates #3233
A grant of the form `{src: alice, dst: autogroup:internet, via:
tag:exit1}` was loading without error but stripping every exit node
from alice's view: `tailscale exit-node list` returned "no exit nodes
found".
Two sites skipped autogroup:internet at the compile / steering layer:
compileViaForNode's *AutoGroup arm produced no FilterRule for the
via-tagged exit node, and ViaRoutesForPeer's *AutoGroup arm produced
no Include/Exclude. With pm.needsPerNodeFilter true, the exit node's
matchers were empty, BuildPeerMap could not link source to exit, and
RoutesForPeer's ReduceRoutes stripped 0.0.0.0/0 and ::/0 from
AllowedIPs.
The skip belongs at the wire-format layer (ReduceFilterRules), not at
the compile layer that also feeds internal matchers. Lift
autogroup:internet handling into both *AutoGroup arms with the same
shape used for *Prefix destinations: emit a TheInternet rule on
via-tagged exit advertisers; surface peer.ExitRoutes() in Include
when the peer carries the via tag, Exclude otherwise.
ReduceFilterRules continues to keep the rule on exit-route
advertisers' wire output and strip it elsewhere, preserving SaaS
PacketFilter encoding.
Also drop compileViaForNode's early len(SubnetRoutes)==0 return:
SubnetRoutes excludes exit routes, so the early return pre-empted the
autogroup:internet branch on nodes that only advertise exit routes.
Existing tests pinning the buggy behaviour (TestViaRoutesForPeer
subtests, TestCompileViaGrant case) flipped to the new contract.
Fixes#3233
Force-disconnect leaves stale routes in the container's network
namespace: libnetwork removes the host-side veth but the
namespace-internal route survives. The next ConnectNetwork on the
same network then fails with "cannot program address X/16 in sandbox
interface because it conflicts with existing route", and the route
never resolves on its own. Bounded retry around ConnectNetwork
exhausts MaxElapsedTime instead of recovering.
Without Force, libnetwork drains the namespace routes synchronously
during disconnect and ConnectNetwork sees a clean slate. Cable-pull
semantic is preserved: docker still tears down the endpoint at the
namespace level, leaving in-flight TCP half-open inside the
container's view, verified via paired probe-timeout pairs in HA
prober logs while both routers are physically disconnected.
Fixes#3234
`time.After(ProbeTimeout)` returned a single channel shared by every
probe goroutine in the cycle. Only the first goroutine to receive the
deadline tick drains the channel; any other goroutine still waiting on
its `responseCh` is then stuck forever, `wg.Wait()` never returns, and
the scheduler loop in `app.go` stalls on the next tick. The condition
fires whenever two or more nodes time out in the same cycle — common
under cable-pull where IsOnline lags reality and both routers stay in
the candidate set as half-open TCP.
Move the timer inside each goroutine so every probe has its own
deadline.
Updates #3234
The generator scans `integration/` recursively for `Test*` functions
and emits one CI job per match. Helper subpackages like
`dockertestutil` and `tsic` host plain unit tests that should run
under `go test`, not as Docker-based integration matrix entries.
Limit the scan to depth 1 so only top-level `integration/*_test.go`
files contribute job names.
Libnetwork endpoint cleanup is eventually consistent. A back-to-back
disconnect+connect on the same network can race teardown and return a
transient error. Wrap the daemon calls in bounded exponential backoff
so TestHASubnetRouterFailoverDockerDisconnect no longer flakes on
phase 4c reconnect.
Fixes#3234
electPrimaryRoutes' all-unhealthy fallback picked candidates[0]
(lowest NodeID) regardless of who was prev. Under cable-pull
semantics IsOnline lags reality (long-poll TCP half-open), so
both routers stay in candidates and both go Unhealthy via the
prober — the fallback then churned primary to a node that was
itself unreachable.
Prefer prev when still in candidates; fall through to
candidates[0] only when prev is gone. Anti-blackhole holds.
Update the property test reference model and split the unit
test into existence (KeepsAPrimary) and identity
(PreservesPrevious) cases.
Fixes#3203
Add DisconnectFromNetwork/ReconnectToNetwork on TailscaleClient
backed by pool.Client.DisconnectNetwork.
Exercise single-router fail+recover either side, sequential dual
failure, and simultaneous dual failure. The dual-failure legs
assert no flap to a known-bad primary; the single-router-return
legs check traffic only because docker network disconnect
transiently fails probes on sibling routers.
Fails on parent; passes after the fix.
Updates #3203
Three regression tests for the user scenario: an in-process
Disconnect/Reconnect, a tailscale-down/up integration test, and
an iptables -j DROP cable-pull integration test.
Updates #3203
Restore the legacy auto-clear at write boundaries that drop HA
candidacy: Disconnect, SetApprovedRoutes(empty), and
UpdateNodeFromMapRequest shrinking advertised routes to empty.
Plus a defensive guard in SetNodeUnhealthy.
Updates #3203
Replace routes.PrimaryRoutes reads with NodeStore. Connect bumps
SessionEpoch; Disconnect re-checks it inside UpdateNode so the
check and mutation are atomic against a concurrent Connect on
the same node.
The connect_race regression test is carried in its final
SessionEpoch form.
Updates #3203
- test: comment that the !regReq.Expiry.IsZero() gate also covers
the tags-only PreAuthKey path
- CHANGELOG: note pre-existing 0001-01-01 rows self-heal on
re-registration rather than being backfilled
When a user owned node registers or re registers with a PreAuthKey and the
client sends zero client expiry while node.expiry is set to 0, the expiry
column ends up stored as 0001-01-01 00:00:00 instead of NULL. Two sites in
HandleNodeFromPreAuthKey build a non nil pointer to regReq.Expiry even when
the value is zero time, and the needsDefaultExpiry guard only replaces it
when s.cfg.Node.Expiry > 0, so the pointer to zero time survives to the
database.
Convert an unset regReq.Expiry to nil before handing it off so the
needsDefaultExpiry path is the only place that assigns a non nil pointer.
This is a narrower sibling of #3170 on the user owned PreAuthKey path. The
regression was introduced alongside the fix for #3111 in 6337a3db.
TestEnablingExitRoutes runs without an ACL, so tailcfg.FilterAllowAll
hides any policy-path regression. Add a sibling that applies the literal
#3212 policy via hsic.WithACLPolicy after registration and approval,
then asserts each peer carries 0.0.0.0/0 + ::/0 in AllowedIPs and
ExitNodeOption is true — the daemon-derived bool that drives
`tailscale exit-node list`.
Updates #3212
compileFilterRules skipped autogroup:internet destinations to keep them
out of the wire-format PacketFilter, but those same compiled rules are
the source of pm.matchers — and Node.CanAccess relies on a matcher whose
DestsIsTheInternet covers the public internet to surface exit-node peers
to ACL sources. With the skip in place no such matcher existed, exit
nodes silently dropped out of the source's peer list, and the docs'
exit-node walkthrough stopped working: `tailscale exit-node list`
returned "no exit nodes found" and `tailscale set --exit-node=<ip>`
returned "no node found in netmap with IP".
Drop the compile-time skip so autogroup:internet flows through normal
matcher derivation, and teach ReduceFilterRules to keep the resulting
client packet-filter rule on exit-route advertisers — Tailscale SaaS
sends those rules to exit nodes so the kernel filter accepts traffic
forwarded by autogroup:internet sources.
Verified against a live tailnet on 2026-04-28 via tscap; the b17/b18
captures land under testdata/issue_3212/ as a regression guard. The
captures are isolated from testdata/routes_results/ because the broader
TestRoutesCompat machinery assumes a CIDR-prefix wire format that
differs from the IPSet-range form SaaS emits for autogroup:internet —
aligning that wire format is tracked separately.
Fixes#3212
Add an integration test that runs the tailscale-rs axum example
against headscale end-to-end. The test provisions one headscale
instance, one Go tailscale probe client (tsic), and one
tailscale-rs node (tsric) on the same tailnet, then verifies:
- the tailscale-rs node registers and is assigned both an IPv4
and an IPv6 address
- the Go probe sees the tailscale-rs node as a peer in its status
- GET /index.html and GET /assets/index.css from the axum server
return the expected content over the tailnet
- three sequential POST /count calls return distinct, incrementing
counter values, proving netstack state is maintained across
multiple TCP connections
This is the first integration test that exercises a non-Go
Tailscale client against headscale, giving end-to-end coverage of
the control protocol for alternate implementations.
Add TailscaleRustInContainer (tsric), a Rust-client counterpart to
integration/tsic. It runs the axum example from tailscale-rs in a
Docker container and exposes the same lifecycle hooks as tsic
(Shutdown, SaveLog, Execute, WriteFile) so integration tests can
treat it as any other Tailscale node.
Dockerfile.tailscale-rs clones tailscale-rs at build time, so no
local source checkout is required. The repo URL and ref are Docker
build arguments (TAILSCALE_RS_REPO, TAILSCALE_RS_REF) exposed as
tsric.WithRepo / tsric.WithRef options. The HEADSCALE_INTEGRATION_
TAILSCALE_RS_IMAGE environment variable provides an escape hatch
for using a pre-built image instead of building from source.
The Dockerfile patches the cloned root Cargo.toml to expose
ts_control's insecure-keyfetch feature through the tailscale crate
so the axum example can fetch the control key over plain HTTP.
The integration harness serves the control plane without TLS,
which is the only mode tailscale-rs can register against until it
grows a way to inject a custom CA bundle.
compileFilterRules, compileGrants, and updateLocked guarded the
"no rules so allow all" fallback with len(pol.Grants) == 0, which
matches both an absent grants field and an explicit empty array.
JSON {"grants": []} unmarshals to a non-nil empty slice; it should
compile to zero filter rules (deny all) to match Tailscale SaaS,
but the length check sent it down the FilterAllowAll path.
Distinguish absent (nil) from explicit-empty by switching the guard
to pol.Grants == nil, the same asymmetry already used for ACLs.
{} keeps allowing all; {"acls": []} and {"grants": []} now both
deny all.
Fixes#3211
The scheduled job has failed every night since 2026-04-13. Two
prior fixes (race-condition guards and splitting the combined
where predicate in #3200) did not address the actual cause:
`flatten` collapses nested records in a list-of-records pipeline,
so after `gh api ... | from json | flatten` the `label` and `user`
columns no longer exist - their fields are lifted to the top
level (with prefix only on naming collisions). `where label.name
== ...` and `where user.type != "Bot"` then both reference a
column that is not there.
`gh api --paginate` already returns a single concatenated JSON
array, so `from json` produces a list of records directly and no
flattening is needed. Drop both `| flatten` calls.
Verified locally with nu 0.108 against the events stream of issue
#3178: without flatten, `where event == "labeled" | where
label.name == "needs-more-info" | last` returns the labeled event
with its `label` record intact.
GitHub's /issues/:n/events endpoint returns a mixed-schema table.
Only labeled/unlabeled rows carry a `label` column. Nu's `where`
does not short-circuit on missing columns, so the combined
predicate `event == "labeled" and label.name == ...` dereferenced
`label.name` on every row and crashed on the first non-labeled
event with `nu::shell::column_not_found`.
The scheduled job has failed every night since 2026-04-13 (the
first run with a labeled issue), so no `needs-more-info` issue
has been auto-closed.
Split into two sequential `where` filters so `label.name` is only
accessed on rows that have the column.
Summarise the ingest rewrite, the SaaS-matching collision rule, and the
BREAKING change from random-suffix to numeric-suffix collision labels
and from "invalid-<rand>" to the literal "node" fallback.
Updates #3188
Ingest (registration and MapRequest updates) now calls
dnsname.SanitizeHostname directly and lets NodeStore auto-bump on
collision. Admin rename uses dnsname.ValidLabel + SetGivenName so
conflicts are surfaced to the caller instead of silently mutated.
Three duplicate invalidDNSRegex definitions, the old NormaliseHostname
and ValidateHostname helpers, EnsureHostname, InvalidString,
ApplyHostnameFromHostInfo, GivenNameHasBeenChanged, generateGivenName
and EnsureUniqueGivenName are removed along with their tests.
ValidateHostname's username half is retained as ValidateUsername for
users.go.
The SaaS-matching collision rule replaces the random "invalid-xxxxxx"
fallback and the 8-character hash suffix; the empty-input fallback is
the literal "node". TestUpdateHostnameFromClient now exercises the
rewrite end-to-end with awkward macOS/Windows names.
Fixes#3188Fixes#2926Fixes#2343Fixes#2762Fixes#2449
Updates #2177
Updates #2121
Updates #363