aboutsummaryrefslogtreecommitdiff
path: root/device/receive.go
AgeCommit message (Collapse)Author
2023-12-11device: do atomic 64-bit add outside of vector loopJason A. Donenfeld
Only bother updating the rxBytes counter once we've processed a whole vector, since additions are atomic. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-12-11device: reduce redundant per-packet overhead in RX pathJordan Whited
Peer.RoutineSequentialReceiver() deals with packet vectors and does not need to perform timer and endpoint operations for every packet in a given vector. Changing these per-packet operations to per-vector improves throughput by as much as 10% in some environments. Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-10-10device: move Queue{In,Out}boundElement Mutex to container typeJordan Whited
Queue{In,Out}boundElement locking can contribute to significant overhead via sync.Mutex.lockSlow() in some environments. These types are passed throughout the device package as elements in a slice, so move the per-element Mutex to a container around the slice. Reviewed-by: Maisem Ali <maisem@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-10-10device: distribute crypto work as slice of elementsJordan Whited
After reducing UDP stack traversal overhead via GSO and GRO, runtime.chanrecv() began to account for a high percentage (20% in one environment) of perf samples during a throughput benchmark. The individual packet channel ops with the crypto goroutines was the primary contributor to this overhead. Updating these channels to pass vectors, which the device package already handles at its ends, reduced this overhead substantially, and improved throughput. The iperf3 results below demonstrate the effect of this commit between two Linux computers with i5-12400 CPUs. There is roughly ~13us of round trip latency between them. The first result is with UDP GSO and GRO, and with single element channels. Starting Test: protocol: TCP, 1 streams, 131072 byte blocks [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00 sec 12.3 GBytes 10.6 Gbits/sec 232 3.15 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - Test Complete. Summary Results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 12.3 GBytes 10.6 Gbits/sec 232 sender [ 5] 0.00-10.04 sec 12.3 GBytes 10.6 Gbits/sec receiver The second result is with channels updated to pass a slice of elements. Starting Test: protocol: TCP, 1 streams, 131072 byte blocks [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00 sec 13.2 GBytes 11.3 Gbits/sec 182 3.15 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - Test Complete. Summary Results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 13.2 GBytes 11.3 Gbits/sec 182 sender [ 5] 0.00-10.04 sec 13.2 GBytes 11.3 Gbits/sec receiver Reviewed-by: Adrian Dewhurst <adrian@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-13global: buff -> bufJason A. Donenfeld
This always struck me as kind of weird and non-standard. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-10conn, device, tun: implement vectorized I/O plumbingJordan Whited
Accept packet vectors for reading and writing in the tun.Device and conn.Bind interfaces, so that the internal plumbing between these interfaces now passes a vector of packets. Vectors move untouched between these interfaces, i.e. if 128 packets are received from conn.Bind.Read(), 128 packets are passed to tun.Device.Write(). There is no internal buffering. Currently, existing implementations are only adjusted to have vectors of length one. Subsequent patches will improve that. Also, as a related fixup, use the unix and windows packages rather than the syscall package when possible. Co-authored-by: James Tucker <james@tailscale.com> Signed-off-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-02-07global: bump copyright yearJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-20global: bump copyright yearJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-04all: use Go 1.19 and its atomic typesBrad Fitzpatrick
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-11-23global: use netip where possible nowJason A. Donenfeld
There are more places where we'll need to add it later, when Go 1.18 comes out with support for it in the "net" package. Also, allowedips still uses slices internally, which might be suboptimal. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-06-03device: simplify allowedips lookup signatureJason A. Donenfeld
The inliner should handle this for us. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-05-07device: add ID to repeated routinesJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-05-07device: avoid verbose log line during ordinary shutdown sequenceJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-05-06device: log all errors received by RoutineReceiveIncomingJosh Bleecher Snyder
When debugging, it's useful to know why a receive func exited. We were already logging that, but only in the "death spiral" case. Move the logging up, to capture it always. Reduce the verbosity, since it is not an error case any more. Put the receive func name in the log line. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-12conn: reconstruct v4 vs v6 receive function based on symtabJason A. Donenfeld
This is kind of gross but it's better than the alternatives. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-12device: allocate new buffer in receive death spiralKristupas Antanavičius
Note: this bug is "hidden" by avoiding "death spiral" code path by 6228659 ("device: handle broader range of errors in RoutineReceiveIncoming"). If the code reached "death spiral" mechanism, there would be multiple double frees happening. This results in a deadlock on iOS, because the pools are fixed size and goroutine might stop until somebody makes space in the pool. This was almost 100% repro on the new ARM Macbooks: - Build with 'ios' tag for Mac. This will enable bounded pools. - Somehow call device.IpcSet at least couple of times (update config) - device.BindUpdate() would be triggered - RoutineReceiveIncoming would enter "death spiral". - RoutineReceiveIncoming would stall on double free (pool is already full) - The stuck routine would deadlock 'device.closeBindLocked()' function on line 'netc.stopping.Wait()' Signed-off-by: Kristupas Antanavičius <kristupas.antanavicius@nordsec.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-02all: make conn.Bind.Open return a slice of receive functionsJosh Bleecher Snyder
Instead of hard-coding exactly two sources from which to receive packets (an IPv4 source and an IPv6 source), allow the conn.Bind to specify a set of sources. Beneficial consequences: * If there's no IPv6 support on a system, conn.Bind.Open can choose not to return a receive function for it, which is simpler than tracking that state in the bind. This simplification removes existing data races from both conn.StdNetBind and bindtest.ChannelBind. * If there are more than two sources on a system, the conn.Bind no longer needs to add a separate muxing layer. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-03-30device: handle broader range of errors in RoutineReceiveIncomingJosh Bleecher Snyder
RoutineReceiveIncoming exits immediately on net.ErrClosed, but not on other errors. However, for errors that are known to be permanent, such as syscall.EAFNOSUPPORT, we may as well exit immediately instead of retrying. This considerably speeds up the package device tests right now, because the Bind sometimes (incorrectly) returns syscall.EAFNOSUPPORT instead of net.ErrClosed. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-02-16conn: bump to 1.16 and get rid of NetErrClosed hackJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-09device: handshake routine writes into encryption queueJason A. Donenfeld
Since RoutineHandshake calls peer.SendKeepalive(), it potentially is a writer into the encryption queue, so we need to bump the wg count. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-09device: do not attach finalizer to non-returned objectJason A. Donenfeld
Before, the code attached a finalizer to an object that wasn't returned, resulting in immediate garbage collection. Instead return the actual pointer. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-08device: remove mutex from Peer send/receiveJosh Bleecher Snyder
The immediate motivation for this change is an observed deadlock. 1. A goroutine calls peer.Stop. That calls peer.queue.Lock(). 2. Another goroutine is in RoutineSequentialReceiver. It receives an elem from peer.queue.inbound. 3. The peer.Stop goroutine calls close(peer.queue.inbound), close(peer.queue.outbound), and peer.stopping.Wait(). It blocks waiting for RoutineSequentialReceiver and RoutineSequentialSender to exit. 4. The RoutineSequentialReceiver goroutine calls peer.SendStagedPackets(). SendStagedPackets attempts peer.queue.RLock(). That blocks forever because the peer.Stop goroutine holds a write lock on that mutex. A background motivation for this change is that it can be expensive to have a mutex in the hot code path of RoutineSequential*. The mutex was necessary to avoid attempting to send elems on a closed channel. This commit removes that danger by never closing the channel. Instead, we send a sentinel nil value on the channel to indicate to the receiver that it should exit. The only problem with this is that if the receiver exits, we could write an elem into the channel which would never get received. If it never gets received, it cannot get returned to the device pools. To work around this, we use a finalizer. When the channel can be GC'd, the finalizer drains any remaining elements from the channel and restores them to the device pool. After that change, peer.queue.RWMutex no longer makes sense where it is. It is only used to prevent concurrent calls to Start and Stop. Move it to a more sensible location and make it a plain sync.Mutex. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-08device: overhaul device state managementJosh Bleecher Snyder
This commit simplifies device state management. It creates a single unified state variable and documents its semantics. It also makes state changes more atomic. As an example of the sort of bug that occurred due to non-atomic state changes, the following sequence of events used to occur approximately every 2.5 million test runs: * RoutineTUNEventReader received an EventDown event. * It called device.Down, which called device.setUpDown. * That set device.state.changing, but did not yet attempt to lock device.state.Mutex. * Test completion called device.Close. * device.Close locked device.state.Mutex. * device.Close blocked on a call to device.state.stopping.Wait. * device.setUpDown then attempted to lock device.state.Mutex and blocked. Deadlock results. setUpDown cannot progress because device.state.Mutex is locked. Until setUpDown returns, RoutineTUNEventReader cannot call device.state.stopping.Done. Until device.state.stopping.Done gets called, device.state.stopping.Wait is blocked. As long as device.state.stopping.Wait is blocked, device.state.Mutex cannot be unlocked. This commit fixes that deadlock by holding device.state.mu when checking that the device is not closed. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-08device: remove device.state.stopping from RoutineHandshakeJosh Bleecher Snyder
It is no longer necessary. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-08device: remove device.state.stopping from RoutineDecryptionJosh Bleecher Snyder
It is no longer necessary, as of 454de6f3e64abd2a7bf9201579cd92eea5280996 (device: use channel close to shut down and drain decryption channel). Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-29device: use new model queues for handshakesJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-29device: simplify peer queue lockingJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-28global: bump copyrightJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-27device: get rid of nonce routineJason A. Donenfeld
This moves to a simple queue with no routine processing it, to reduce scheduler pressure. This splits latency in half! benchmark old ns/op new ns/op delta BenchmarkThroughput-16 2394 2364 -1.25% BenchmarkLatency-16 259652 120810 -53.47% Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-26device: combine debug and info log levels into 'verbose'Jason A. Donenfeld
There are very few cases, if any, in which a user only wants one of these levels, so combine it into a single level. While we're at it, reduce indirection on the loggers by using an empty function rather than a nil function pointer. It's not like we have retpolines anyway, and we were always calling through a function with a branch prior, so this seems like a net gain. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-26device: change logging interface to use functionsJosh Bleecher Snyder
This commit overhauls wireguard-go's logging. The primary, motivating change is to use a function instead of a *log.Logger as the basic unit of logging. Using functions provides a lot more flexibility for people to bring their own logging system. It also introduces logging helper methods on Device. These reduce line noise at the call site. They also allow for log functions to be nil; when nil, instead of generating a log line and throwing it away, we don't bother generating it at all. This spares allocation and pointless work. This is a breaking change, although the fix required of clients is fairly straightforward. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-20device: allow compiling with Go 1.15Jason A. Donenfeld
Until we depend on Go 1.16 (which isn't released yet), alias our own variable to the private member of the net package. This will allow an easy find replace to make this go away when we eventually switch to 1.16. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-20device: remove QueueInboundElement.droppedJosh Bleecher Snyder
Now that we block when enqueueing to the decryption queue, there is only one case in which we "drop" a inbound element, when decryption fails. We can use a simple, obvious, sync-free sentinel for that, elem.packet == nil. Also, we can return the message buffer to the pool slightly later, which further simplifies the code. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-20device: remove selects from encrypt/decrypt/inbound/outbound enqueuingJosh Bleecher Snyder
Block instead. Backpressure here is fine, probably preferable. This reduces code complexity. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-20device: use channel close to shut down and drain decryption channelJosh Bleecher Snyder
This is similar to commit e1fa1cc5560020e67d33aa7e74674853671cf0a0, but for the decryption channel. It is an alternative fix to f9f655567930a4cd78d40fa4ba0d58503335ae6a. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-08device: receive: do not exit immediately on transient UDP receive errorsJason A. Donenfeld
Some users report seeing lines like: > Routine: receive incoming IPv4 - stopped Popping up unexpectedly. Let's sleep and try again before failing, and also log the error, and perhaps we'll eventually understand this situation better in future versions. Because we have to distinguish between the socket being closed explicitly and whatever error this is, we bump the module to require Go 1.16. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-07device: receive: drain decryption queue before exiting RoutineDecryptionJason A. Donenfeld
It's possible for RoutineSequentialReceiver to try to lock an elem after RoutineDecryption has exited. Before this meant we didn't then unlock the elem, so the whole program deadlocked. As well, it looks like the flush code (which is now potentially unnecessary?) wasn't properly dropping the buffers for the not-already-dropped case. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-07device: remove QueueInboundElement leak with stopped peersJosh Bleecher Snyder
This is particularly problematic on mobile, where there is a fixed number of elements. If most of them leak, it'll impact performance; if all of them leak, the device will permanently deadlock. I have a test that detects element leaks, which is how I found this one. There are some remaining leaks that I have not yet tracked down, but this is the most prominent by far. I will commit the test when it passes reliably. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07device: fix error shadowing before log printBrad Fitzpatrick
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-01-07device: always name *Queue*Element variables elemJosh Bleecher Snyder
They're called elem in most places. Rename a few local variables to make it consistent. This makes it easier to grep the code for things like elem.Drop. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07device: simplify copying counter to nonceJosh Bleecher Snyder
Since we already have it packed into a uint64 in a known byte order, write it back out again the same byte order instead of copying byte by byte. This should also generate more efficient code, because the compiler can do a single uint64 write, instead of eight bounds checks and eight byte writes. Due to a missed optimization, it actually generates a mishmash of smaller writes: 1 byte, 4 bytes, 2 bytes, 1 byte. This is https://golang.org/issue/41663. The code is still better than before, and will get better yet once that compiler bug gets fixed. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07device: remove starting waitgroupsJosh Bleecher Snyder
In each case, the starting waitgroup did nothing but ensure that the goroutine has launched. Nothing downstream depends on the order in which goroutines launch, and if the Go runtime scheduler is so broken that goroutines don't get launched reasonably promptly, we have much deeper problems. Given all that, simplify the code. Passed a race-enabled stress test 25,000 times without failure. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2020-12-08device: clear pointers when returning elems to poolsJosh Bleecher Snyder
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2020-11-18device: add write queue mutex for peerHaichao Liu
fix panic: send on closed channel when remove peer Signed-off-by: Haichao Liu <liuhaichao@bytedance.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-05-02global: update header comments and modulesJason A. Donenfeld
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-05-02conn: introduce new package that splits out the Bind and Endpoint typesDavid Crawshaw
The sticky socket code stays in the device package for now, as it reaches deeply into the peer list. This is the first step in an effort to split some code out of the very busy device package. Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2019-07-01device: receive: uniform message for source address checkJason A. Donenfeld
2019-07-01device: receive: simplify flush loopJason A. Donenfeld
2019-06-11device: update transfer counters correctlyJason A. Donenfeld
The rule is to always update them to the full packet size minus UDP/IP encapsulation for all authenticated packet types.
2019-06-03device, ratelimiter: replace uses of time.Now().Sub() with time.Since()Matt Layher
Simplification found by staticcheck: $ staticcheck ./... | grep S1012 device/cookie.go:90:5: should use time.Since instead of time.Now().Sub (S1012) device/cookie.go:127:5: should use time.Since instead of time.Now().Sub (S1012) device/cookie.go:242:5: should use time.Since instead of time.Now().Sub (S1012) device/noise-protocol.go:304:13: should use time.Since instead of time.Now().Sub (S1012) device/receive.go:82:46: should use time.Since instead of time.Now().Sub (S1012) device/send.go:132:5: should use time.Since instead of time.Now().Sub (S1012) device/send.go:139:5: should use time.Since instead of time.Now().Sub (S1012) device/send.go:235:59: should use time.Since instead of time.Now().Sub (S1012) device/send.go:393:9: should use time.Since instead of time.Now().Sub (S1012) ratelimiter/ratelimiter.go:79:10: should use time.Since instead of time.Now().Sub (S1012) ratelimiter/ratelimiter.go:87:10: should use time.Since instead of time.Now().Sub (S1012) Change applied using: $ find . -type f -name "*.go" -exec sed -i "s/Now().Sub(/Since(/g" {} \; Signed-off-by: Matt Layher <mdlayher@gmail.com>