Commit Graph

6691 Commits

Author SHA1 Message Date
Cong Wang 176188cff6 net: fix dev_ifsioc_locked() race condition
commit 3b23a32a63219f51a5298bc55a65ecee866e79d0 upstream.

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr->ifr_hwaddr.sa_data,
					dev->dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07 12:20:43 +01:00
Marco Elver e6af7cb64b net: fix up truesize of cloned skb in skb_prepare_for_shift()
commit 097b9146c0e26aabaa6ff3e5ea536a53f5254a79 upstream.

Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when
cloning an skb, save and restore truesize after pskb_expand_head(). This
can occur if the allocator decides to service an allocation of the same
size differently (e.g. use a different size class, or pass the
allocation on to KFENCE).

Because truesize is used for bookkeeping (such as sk_wmem_queued), a
modified truesize of a cloned skb may result in corrupt bookkeeping and
relevant warnings (such as in sk_stream_kill_queues()).

Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com
Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07 12:20:42 +01:00
Andrey Zhizhikin d51b217cf8 This is the 5.4.102 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmBAqDoACgkQONu9yGCS
 aT7R4A//RC4/R+Uc+cX8I2al+B017epRXRtfMDz7cd/dO1SAAhgDi4zrebAxs1XP
 6g/t37NuDZ0rjKxMBRzATSwizDLP9gKpeWCVQTtvlHGf+tm/5sn2bt7pckoPvXvo
 GqXPT4YgUgZQSHE+YG5Rhjtv0xMcOEu9yNTsPNZJU6BDdYJylQX/D97MPVjJjbXJ
 Sz+U98wHt0zIbwkg13/2FZvPMdEKL0z8Ub/SIKDaXfFSPJMDYb/5UcEfdnDctSbI
 B3i2i1/IXa97EmNG/MNDi1zPI2l9+PtRrtIzpfLASRNx3ySceiC25EyDk0mp5JnZ
 czxXJ0NxG9z9Pk9X6Isvaz6X5Nqv70LORTFeZRBEp0ohYbsxH/yBuPZ0T8bukjgU
 MA/uZDQryfeNgBN1aEJlTRCAmGyyD6NIICsNPnetmmowgqYxhHXt0tVafMvWpH9F
 vbM3eHcOfOfNejoQiPqTj5vX7NF0BZGQYa5LywKHeGe5q2nwaMj++Kffj9ERCo49
 OZFylFPiQVdEjse07JJb5vGWQkvvTv1FDB+zb7GVgHwJNnb9Lswv2VQbjdZBS++h
 YUuDSxkhEYR+vdKKLcFBbjAYkJXrpiSeXzywjR5N0c90OJdaBX1kpAbBHHXYiwo1
 P39l5/hsxWljQ1ZJqbeFWr2ef27xDiEz7aPojLUlyjBRgBC4eYc=
 =JSQX
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmBBQn0ACgkQ7G51OISz
 Hs1bUw/8C8T64CcvBDZhx2MyFqP5Wmc3h2li2/mAGYBPXnq9bQqhHVwNxBoeOY+d
 8jmPuao6Rb5xWWDCK/7bpshOyM7M4oBKDwJK6tgg9+ntPhllL29H8NYnH+hd45Zm
 oeM9gzKk4DU2jxZMt9BOi/ByrCApVKxcbiddPKmgXqaz568JQg4P10E7D2ceRudq
 uGlZbtEVqw6ftAyygzqiVmioGCwdgd/5BC+HPGcSaodF63z5Dc5LPOG+uefAVxq/
 oXawWdqScGqJXsbbkncgbvNSkIlYdVJIPqnMYrYnD2saAq4qeisrEkUzTUJwRCYT
 CeKxKL6nP4vqzgc21YGyxhtxJf+Sv2pU9QMgN/qfuhG9Nnz+7hQ3TntEjviPf6kC
 73a7cyQeLCVM2uECVN1b3fXweXRvllzkYCsomO5hU2J8mHHcUjJhwyYqsyE80Vpl
 w8qmZcyR0J0BmKXnYISGfe/S77Ze2nqLhOkGmP/PQWiTrULtLR/7wDEOap2xZbmz
 LPqK7nIrTD9lntyksNg1C1Az+So/tZQLh2hCX0C2efGqRyP2UbzoR6HgaI84LWf7
 L6qG9zVjl88lHLPZSO6WpOlxoR8RpmMprHs0bHJS79ucU+XzD0o9LyWhMBm6++qE
 WuDsSYYJtKPl7Qe04uBJNfO8cbmY9YcorYkgAcUz/Ar+QgCT0js=
 =aicv
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.102' into 5.4-2.3.x-imx

This is the 5.4.102 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-03-04 20:26:33 +00:00
Jesper Dangaard Brouer 88f8f40c90 bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx
[ Upstream commit 2c0a10af688c02adcf127aad29e923e0056c6b69 ]

BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use
bpf_fib_lookup for doing MTU-check, but *prior* to extending packet size,
by adjusting fib_params 'tot_len' with the packet length plus the expected
encap size. (Just like the bpf_check_mtu helper supports). He discovered
that for SKB ctx the param->tot_len was not used, instead skb->len was used
(via MTU check in is_skb_forwardable() that checks against netdev MTU).

Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g.
zero) then keep existing TC behaviour intact. Notice that 'tot_len' for MTU
check is done like XDP code-path, which checks against FIB-dst MTU.

V16:
- Revert V13 optimization, 2nd lookup is against egress/resulting netdev

V13:
- Only do ifindex lookup one time, calling dev_get_by_index_rcu().

V10:
- Use same method as XDP for 'tot_len' MTU check

Fixes: 4c79579b44 ("bpf: Change bpf_fib_lookup to return lookup status")
Reported-by: Carlo Carraro <colrack@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/161287789444.790810.15247494756551413508.stgit@firesoul
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-04 10:26:17 +01:00
Andrey Zhizhikin ce0c0d68c7 This is the 5.4.99 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAs43QACgkQONu9yGCS
 aT4omw/+JPBAZB5ClIOSDuf3/yJkbigVRFNVmQJy4/cluG32cxlcpudoau7AXq3N
 0Sn/rfSdldl5eI98OTA+Y0yPIsVnQJdei228A5gmULkkc+rEFugorSJKRmmA7tH0
 VdZ1C4NlhhmjoIT/W8mMNzv14dtyGQvRbT+zzfxqwqL6tF9+alcdBYTP/Z691K6x
 8Csfe05MZ8VkvBizStaTXC+dtMhU917Ikd5i5v4ZzaesZJcUTLS7J82FhtKeoz7q
 tDoA/Bl+pN1KjyIIE61/zJ8DKzBtOeuo1PWJFpO+EBVhKVosr3oWJfTAiM7Fsnu5
 dbKHYPsbe3mB79JdQibr7TpU7vSjDr5a/HTuYtp7WM1R5IssiFeVOdpXTGim/s/E
 Flao5LYSUcj0X/Io6TyUnxQWw8sJz3PGKYiLUn8/9DBpzNFzynQ+vuapXCoGxJzh
 W108q32PIx2ZTJsD5RUUqZbytG/zKzI1+SxXo2uOhs9/k5qT+35Yp9epsE2Cp8v1
 Oiw3P/ZUDNk6zPj0dsHcTsqTofRK07l71HnM8iIbCWSPw834IoGBuB8c3H7HaHn4
 v5M4tMTDAaKi/e09K92fR6SZDgZz8D0N+sLLneA4NEASXIJanCUwcgVCUbja+BO1
 H1hiYTTZQa7kOkSxBa/wGsWkdfvOpOvCSFr+c6LPmB9sHMe4K8o=
 =3BI0
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAtjOoACgkQ7G51OISz
 Hs3p0w/+J3WWl9sa+L25ZY92AZDQWMFEIlsiyKrVVgC3dRhtEr2iWylMFsFRCS3K
 IYTcHxe3MrhnaRycXZ25I0jse3doAFkTbsEl3WcjabI4/xARUW77vedz5CvxBBH9
 5Spjn5qs5zfmxHTss4qbta1zEq7PjtoJEpFVDhlhK6VXKXz07QZrRnSAztde030R
 vg9OaJFyQauPguySXISBg/XXLJEJ3j5NO3PnAW5eJKJyguQF/ujFNsv5s80297BT
 vc7kNdyy7Zz+C8dem65WMsB9egcuqt6jwjadePGz/5mhUxaj+IESJnkZ3ezCSgMI
 NDdIROT228RScO5BVEHXZHKsAdyIKibdZ7Ta7P607buvBRSd7+eV6/QkbndNYxva
 40NBGTlevffBBBl3KXbZTSJX4IrUCr45kfwNXrHGo588aiSrOoYPbr3Tq0TsWyvc
 hkDOb5XARlvyun4vvL5cKmfOV7I6MrnGs07chQ8Xd732SIhRwgzsZg7cM3kzmceB
 cEV1d+2uTmxzzldW34fU/xWPSDb/dFKjOKrsttMNM3lbL58oPS17eu4zjSQ/vO72
 x/h8KY6Uc1ceja/fWZrk4t1fusMNJZ/NPxPVjOFJpMgz90aGMMQDm4bVM6yW1x4r
 zTr1iR6q9N/+S6Y2Vc2LEuEhB2fC9q9kuFbp8Evk8DPSotQT514=
 =FE60
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.99' into 5.4-2.3.x-imx

This is the 5.4.99 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-02-17 21:38:47 +00:00
Eric Dumazet 2085d88663 net: gro: do not keep too many GRO packets in napi->rx_list
commit 8dc1c444df193701910f5e80b5d4caaf705a8fb0 upstream.

Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.

Before the patch, GRO packets were immediately passed to
upper stacks.

After the patch, we can accumulate quite a lot of GRO
packets (depdending on NAPI budget).

My fix is counting in napi->rx_count number of segments
instead of number of logical packets.

Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Tested-by: Jian Yang <jianyang@google.com>
Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210204213146.4192368-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 10:35:19 +01:00
Willem de Bruijn 0b42ab0783 udp: fix skb_copy_and_csum_datagram with odd segment sizes
commit 52cbd23a119c6ebf40a527e53f3402d2ea38eccb upstream.

When iteratively computing a checksum with csum_block_add, track the
offset "pos" to correctly rotate in csum_block_add when offset is odd.

The open coded implementation of skb_copy_and_csum_datagram did this.
With the switch to __skb_datagram_iter calling csum_and_copy_to_iter,
pos was reinitialized to 0 on each call.

Bring back the pos by passing it along with the csum to the callback.

Changes v1->v2
  - pass csum value, instead of csump pointer (Alexander Duyck)

Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/
Fixes: 950fcaecd5 ("datagram: consolidate datagram copy to iter helpers")
Reported-by: Oliver Graute <oliver.graute@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-17 10:35:19 +01:00
Andrey Zhizhikin 106105cb76 This is the 5.4.97 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAjmR0ACgkQONu9yGCS
 aT6y/A/3f4yvZr4VWRdsX9eWC5snc9jx+QSd/t+LzdRTJa8gCHQcEp9TTGiZHr7/
 DSM5c32BXesBDs2Ctb5jUYRfp1SgPH5pen7/HUREG0qCG+u2lY6I3/Nc0thCQNcH
 xCOHlBMx1bJ9Dy4Z39YpwqGbGRldFM+/UoAke1/mGvqxVBeQyx4bwKg94qdKRugb
 XRwKRcihNgL2NfWdGQ+yy+G4msjrUoswdpns5CWOjWXxkObfeg3clnQipw6mSloR
 q+NCcwgPXUC1jbzH0nXQwzfHpD+mcFU8/ueUgN/1Q6OGkZ5uDv2vdSK4PtzTyUSN
 SNlcCl5D1hQdml9+Vh+/ScVCwzpKHmCzyWY/e17Fe1mMXGYIrOkexNcgrfld6Hfz
 1yQ/9UfBZ2gAUTsecOvZS+l/ejh5NkOJX5CtMlQDA4wtn6JjMWLVddksXxAcbIIP
 PWdLlfH+vfGfxrGJ/g3L0ALlppx0ezDCMvz6X2mVn1w7ifXcW+mnucypmDoQQ57j
 Ckc+YRluLxrBdLJsS98iLOkoTFxRJXZArJSI/lDW3LqPaFhFX5SMg/ilArefas78
 62y8gIPdIlMt2sjJ12xnY4G4cod5Ec29YpMorIbl2CZve5OC5e5MiUxubU0C7noz
 zEGP9+bqp0WVd5Ir2yAvSyvmkNPOSOoBsrKuw0Gw1M1p3gxKLg==
 =BMmj
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAjxlMACgkQ7G51OISz
 Hs3P1w//U13aCX9gjARBZ4+yy8TlliA7v0vj1KyTZi87Q3KBbfp5fFMrDeqS1XLu
 TjDAbayvTjM/eQ2r3vjUviRqyfyttSe1A7pxPyzE9E8bd6uiPCWNQnLBucvUChE1
 G2ai9xKfJj2xxoX9PNoTH2RrR32n31hp2IMJ3SVIY7Acr/YkGRHAoflTgZDkn370
 6fPWEoGeiJuSkdnNNvANaEkgW1xyjAmAEhv1HLVbgQzVT6XC0lSAHTGITdEzgu5a
 qiwXUrt7K5t8YV1yBJYWb9znOSm8bu5xbMHm7OwRqAT9Z7/5zhSA/IZnOBifoIUH
 iib1w9N2Fxs8gS9uS3iZ4waIRLV0UgXmYWMu4V9EVKtavp/WlhtezMrsLcobzYyl
 Vuz5FFlEJnIRs9XjGHRg4vGO8bDWEkq06Lk0054k5I6D8Sub42zs+T0RliXfvpBy
 I1Edta424JmHeW4RPuHgA1Qy2X+PnsKlVM3EuFHy3j893HcDrw9w3AkO13eEZW0a
 NYvayffS86au/gKC6WWUdNv344qL/wj3k57AC0v5RX9HVamIKu1lsqaib+TzXix/
 ufkPu9TInDZ6RZ8+jB0gBlFa7lWs8YzkwusoFu0GFdg3kKVrMkIbLbL8qdJ1QjBp
 LxPE5qFM3XXf9+6nbrGJv26UvwZHbqP5e256/gvMoWKAVbqxw/0=
 =HgMH
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.97' into 5.4-2.3.x-imx

This is the 5.4.97 stable release

Conflicts (manual resolve):
- drivers/usb/host/xhci.c:
- drivers/usb/host/xhci.h:
Merge commits 5f0ebd9dfc ("MLK-18794-1 usb: host: xhci: add .bus_suspend
override") and cfaf1a54fd ("MLK-16735 usb: host: add XHCI_CDNS_HOST flag")
from NXP tree with commit 9b269d1ce44e9 ("usb: xhci-mtk: fix unreleased
bandwidth data") and f4e4f067f9 ("usb: xhci-mtk: fix unreleased bandwidth
data") from upstream.

- drivers/usb/host/xhci-plat.c:
Keep NXP implementation done in commit b600e087f2 ("MLK-24527-1 usb: host:
xhci-plat: add platform data support"), which covers the logic presented in
commit 2847d242a1 ("usb: host: xhci-plat: Use of_device_get_match_data()
helper") from upstream.

Merge upstream commit 40af962eb1 ("usb: host: xhci: mvebu: make USB 3.0 PHY
optional for Armada 3720"), which contains the logic of NXP commit cc2b8987ac
("MLK-24527-3 usb: host: xhci-plat: add priv quirk for skip PHY
initialization"),
drop NXP implementation.

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-02-10 11:40:45 +00:00
Chinmay Agarwal 90d7459d24 neighbour: Prevent a dead entry from updating gc_list
commit eb4e8fac00d1e01ada5e57c05d24739156086677 upstream.

Following race condition was detected:
<CPU A, t0> - neigh_flush_dev() is under execution and calls
neigh_mark_dead(n) marking the neighbour entry 'n' as dead.

<CPU B, t1> - Executing: __netif_receive_skb() ->
__netif_receive_skb_core() -> arp_rcv() -> arp_process().arp_process()
calls __neigh_lookup() which takes a reference on neighbour entry 'n'.

<CPU A, t2> - Moves further along neigh_flush_dev() and calls
neigh_cleanup_and_release(n), but since reference count increased in t2,
'n' couldn't be destroyed.

<CPU B, t3> - Moves further along, arp_process() and calls
neigh_update()-> __neigh_update() -> neigh_update_gc_list(), which adds
the neighbour entry back in gc_list(neigh_mark_dead(), removed it
earlier in t0 from gc_list)

<CPU B, t4> - arp_process() finally calls neigh_release(n), destroying
the neighbour entry.

This leads to 'n' still being part of gc_list, but the actual
neighbour structure has been freed.

The situation can be prevented from happening if we disallow a dead
entry to have any possibility of updating gc_list. This is what the
patch intends to achieve.

Fixes: 9c29a2f55e ("neighbor: Fix locking order for gc_list changes")
Signed-off-by: Chinmay Agarwal <chinagar@codeaurora.org>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210127165453.GA20514@chinagar-linux.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10 09:25:32 +01:00
Andrey Zhizhikin a968d52b84 This is the 5.4.96 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAf+uoACgkQONu9yGCS
 aT5ppQ/9FJYw4yqV6ct2tU7N4J17wErrTbE0ysEGiLEoYODQ1K4QtAmwQUC2jrT1
 VauR+83tPvSXCEK8OxmTS7jMOTyATy5xcodNwnV67O3mOC3Xk3h7VLeRClvGV/XB
 ijgTN84wlJnyDsVc/3BYtFUbFqzTyOc2nj/NRzOD5mxkpmlKkNTHV2kk7Afna876
 akrSBMb9Np8Ty8NVwz/83TzAbtP0eBq14lZq1WusD1DrVbD1MrAdi8YMbMBSra7c
 KdQTXVGPQq9YmKXJcw6gu7LLh6ykfVu/M9JT/86dlzaXedKBtP301vIc5AcV9Io8
 bqDPVlT792U9r5W9Vfq7kNk/wSpED5MGBgvRE+/RnAfNI1NzBUTTm5mFhn4HUBzl
 OXpXcK01hm2apM8+z3cGoRQYo5462tZR5QxT8RbMYnX0q3xwsDIjfXYMGZWgxTsY
 Ah8OVFd9XnMbnmqtoCPBABMsnKyARgs5NTTbtGwUyoSYYxxMEuU80M1G+F18MG0G
 4DOqg77f197VeCapd41Dzac08hq1VLUtQJAHH/bTRgVceDi5hJ5qBO5FKYmWr0G7
 pvp5zm1i8rmXXZS0E+CIXKtW2td8jbBKZ6GWrzWXlT10GB6zLlB0yElgcpNSc6F1
 8FszN0Df4hmYelAl6ZZJ/vOD+DnHdxkYJ/QD/IqH0QOOaMclLxY=
 =2WAV
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAgZBQACgkQ7G51OISz
 Hs2JYA//Qal2jpFQxqebBp99QqM4215wS7NhTEtTj9B32YJnvvJVudxzeos723mG
 vA4+M6RonrtyKg7odGhOqZQ4QeDhq17ywXyo0i8QfVaYUtxckPRzqYqyPMURCq3n
 pXrYhp5k3Fx7+RxAVMOyA6AoEFZsE/f7h05IrBoNIz8BQ5wo1o51Mp9HJnqLyYDz
 8oZB9v5xtLaWk7agMPoF1i6atvv1d2KjZqg/SmrhRT25ykKZIOXjIUSP2hjIS0lx
 t+zUbw0KuPiqyOesxdxs6kWgxI8RpNYkgA6Mxsk0GcmYO9BxEk/8CkFWfm2sJMVO
 W/llY6k472i58sAY3VKOAvVUZwtuhz5imShwNqV27l6GAXxYrKA9yVUw0WD6TgTs
 QvodfkgxMFKt6+RYbbiJ6JrcPWA/VCdMrRYX88AuV0oKOGU3dm5LRT6lqoGWU4n5
 JPSvhMfM3ekwmLV3YIeHbW301ElhLxkd3X7E8BDZv0RefgLFONcS2Unfl7DzGfHm
 ytUvCtCvFJGTewpAqkxK/hNhEvq9jzucxKNV7vhi51fMXZk6SH8/1RD1Odk66uV4
 jRXvVS+NxFyr/oEppcQXTwynvBZh+h4167Mx0HFRdkResN0a11MDeGN//kS+BDoy
 8FmfggOc5ivNqLh7C16cwsjdOwTVVnqMeRFXgrXV4CGcfFseW3Q=
 =sIM7
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.96' into 5.4-2.3.x-imx

This is the 5.4.96 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-02-07 22:05:04 +00:00
Eric Dumazet fd4c12f312 net_sched: gen_estimator: support large ewma log
commit dd5e073381f2ada3630f36be42833c6e9c78b75e upstream

syzbot report reminded us that very big ewma_log were supported in the past,
even if they made litle sense.

tc qdisc replace dev xxx root est 1sec 131072sec ...

While fixing the bug, also add boundary checks for ewma_log, in line
with range supported by iproute2.

UBSAN: shift-out-of-bounds in net/core/gen_estimator.c:83:38
shift exponent -1 is negative
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 est_timer.cold+0xbb/0x12d net/core/gen_estimator.c:83
 call_timer_fn+0x1a5/0x710 kernel/time/timer.c:1417
 expire_timers kernel/time/timer.c:1462 [inline]
 __run_timers.part.0+0x692/0xa80 kernel/time/timer.c:1731
 __run_timers kernel/time/timer.c:1712 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744
 __do_softirq+0x2bc/0xa77 kernel/softirq.c:343
 asm_call_irq_on_stack+0xf/0x20
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
 do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
 invoke_softirq kernel/softirq.c:226 [inline]
 __irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:432
 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628
RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline]
RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline]
RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516

Fixes: 1c0d32fde5 ("net_sched: gen_estimator: complete rewrite of rate estimators")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20210114181929.1717985-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-07 15:35:47 +01:00
Andrey Zhizhikin 6aa59e41d8 This is the 5.4.93 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmARRPQACgkQONu9yGCS
 aT7jAg//SFgHtf8wdnuWP7vANyU+MV8fGTs2No729MXuDEZLMwI9uwlkegcNRatI
 G9zCbuPpoXyQFo5wHVYmS1z97dt+SbAY8bO6qjGJBO6e3Pxbo+DEiGCl70Lm6qqu
 8Z3yuECpNID6A3rAgkE2jDBnMr6QolU4hjKnsf8VEVRwDYDjWxTaxZvtS0tGZJf9
 em/F7+1T1cFd2va6FLhyrin1Mu6J/YgZ9NcZTotx/wV5UUwsp/TCxkciUUa4MgkX
 Tv0rt2LSGx2DKw9pGoQi/oXLpyFbQFAM37KWSto7oS7cPzY1FJ7z1Yxcu18J+v5Y
 bsVpCrtDqmrjI8vkcO+8cGcGPXPTT0liUWpWzLX3wiXAZW876fuJrUPFg1LszZoN
 oztyaQTLSCgfYrS21aKOsP3DP2PPRl2TUCslOQwABJrGJ6nLhTyjiF3g+bV2GPlH
 N5f1vutsyp90YkNqywWdK9rito7JFgawqlw63oS65EYsFLmVFeBAdU+b/ecAw53O
 k09HqkZt7RZXVKRpNkLbGfBkGY3wrsiV33SCMpuSQ2lZWyUfwaFmANuaQWeJ5pvl
 nfv09NfzK0cTlMCkHDl5HIqhErvuTSDC6DowEitpQhv23JddgU9KVvWcS5xrJFpP
 9+9QfqkMk9uq0DIYKTSKC+tOrMl7xxsaSG7Xm5MOxr3GXPg1BPE=
 =Fkdf
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAShrAACgkQ7G51OISz
 Hs2lHA//Vcj0FKmDBTB3chpxYI/KSgwg9QOGBY/H4eXkwWdQPWrKwuUaraRLU1Es
 xWSCZ5eV6eMYKwiUv6dYUhwjYiLKDf+bt4WmrHXOSCRT3dC5uVYKNqXSch4ktvAw
 yYqfE0ZY64pbw+Hf/iJmbXPY8vD/FIDSw1PtabQRc3suYF7FHRG8q6Virw9pGkWA
 qsk3IfsyRrGFH/JVMrYUKRq8rmbYKeNOS399r7xvuPaqM7wV30K58uOjWOrU6Z4T
 s5LSJVBfN1zFs5m4zu2y8V3aLK+c2SuyBCF/bjEOGy3BsY4qjAVvuEMHmOF82mg1
 +R1tesrwvgdcEBHtT49qp99ykyHK4pwq3lMTvewdy9c4o8pILtZ6ZT6D0Q1K7aat
 Z1ICJ9hKUMG2C/54tox0ajYRAiot663OTA640dUBFFKd9wx5HmlF2w0V7Tzt1pYV
 +YTtjzm/3ADrlxkC7m8Aulpuj7PzS1vjfggQRiS8zIctdTSfBVrXAqab63swn8S/
 gR8jZpDWLE8XdEXx9zURPdMsr162jRfqqAStHkuIpjpeqdEvVBVOnqOuL+gsk5pQ
 wgb5O67mtZVFWtAX43KgO86SSuCd2bequMPG64HeCOXid+5OIpoU1x/kgzSKZbZ9
 UMDIOvrHcnQ2vKdbv6tAGzoBJm5inJbciZNqVuYF0uJFxyNR5Qc=
 =7CHe
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.93' into 5.4-2.3.x-imx

This is the 5.4.93 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-28 09:41:02 +00:00
Tariq Toukan ff64094dc7 net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
commit a3eb4e9d4c9218476d05c52dfd2be3d6fdce6b91 upstream.

With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
logically done when RXCSUM offload is off.

Fixes: 14136564c8 ("net: Add TLS RX offload feature")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Boris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:54 +01:00
Alexander Lobakin 5a3890bad3 skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too
commit 66c556025d687dbdd0f748c5e1df89c977b6c02a upstream.

Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for
tiny skbs") ensured that skbs with data size lower than 1025 bytes
will be kmalloc'ed to avoid excessive page cache fragmentation and
memory consumption.
However, the fix adressed only __napi_alloc_skb() (primarily for
virtio_net and napi_get_frags()), but the issue can still be achieved
through __netdev_alloc_skb(), which is still used by several drivers.
Drivers often allocate a tiny skb for headers and place the rest of
the frame to frags (so-called copybreak).
Mirror the condition to __netdev_alloc_skb() to handle this case too.

Since v1 [0]:
 - fix "Fixes:" tag;
 - refine commit message (mention copybreak usecase).

[0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me

Fixes: a1c7fff7e1 ("net: netdev_alloc_skb() use build_skb()")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-27 11:47:53 +01:00
Andrey Zhizhikin 7cfef3bb5d This is the 5.4.92 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAMOqAACgkQONu9yGCS
 aT5y+A//dHc3oRvCuXWaRS2Zhmx2KyZNOMkmElQnqi1aMcnrRhyIzNZ5gwCftYp6
 9EzhryrjioTZMHd14eYwwjyT2yckoBFKNsW+cPJ4YgqB8TtVD5a/2ygYAXBrHVkW
 Fj3fXeJZmkRk9U156Gw/O8GP/BJ2ld/lk89IYYNkdjXwjjKyyOotBDGMSou4Swjl
 8EciEzb3fyn8DvbD2bCFit5RgaNH2OMr0uTITS7RyLNmhBoZSfJo62KbFxYbnFti
 I3EKxVhnJemNzU+jWNpczZxTyOodMAzcOWbpttJTIxpGDsivWSXM3kDbIq1HT7pe
 xAfYEtkL+kgLb4EPIzdNue6GRQlRKbgwsfs/ralQ9iPFvL9GHP4zvMj6wGV1Qzjw
 4PI+wc76ZNlQMtkntGrOWRDmYrTICL1UY3Uh93SmaYKWSMRATuHK6LFe+y+7tIK7
 Yo/XAdlAzzmc3cGh4ikC1zj4WchRG9/GlfucnFGqxBuxZGXq8WBStBIOkHda4vFg
 a5Ncli+PyOID22AtXb8It6JFI70arZ53CUAwCRqRA7FYlrzZrcsZe15uuB72yDTZ
 mPeaNplWiIXPn8vWMDGFBX5Zhysgb/8FGXtSaFCOnE3QUVHPIE2hoLUlClfJIqxf
 f4uGh5HfquTXZUXzlvoM8tgKPzfpkrqZe1JKNdCh+khI6VzxX8Q=
 =B0JT
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAMkaYACgkQ7G51OISz
 Hs03wQ//S14o2QWvqICJ1GieRhzdfHeyIFHtopp6eXr2ScomLmbUYtvOWOXd22bO
 rbLedJVUz+7pKPyf18cEbbeOC4/1k5AfQaB2fjXCa/PJsOyux1rgK85Mr3baISs0
 rqAFJEk1gjpq2YpTzixot5Q2bKfXC7tByDE8spAao3XoPKgqCs3iNqu4vrjU2W2b
 5Rm+0eWQOeT2r5jBNq3bVQB9qtTcfv7yAwbCPwx7OKmKoxgNn351OiQXPGyzi2GJ
 PW9wXVE1Z10BzV2X4gZY2Nl8+FrWVr5jxxWI0Vsg1Vwx63twxPsO/aNm7DK1T4IU
 3TwTSHHxhJCDmkwUK7b90QzW5CSdi8m+EWNGcn+j0iuaQ0PZsTNK1ddLPaXVcxUo
 4fvSpxk8W+mgNegAs0XFx3c013CmshkEoYOnvCz1GGvukfhYfTN4Oyozn38vFzJX
 BdbCI3aZ3y9tuU/+KiEEjZl/QHEyatKYmFF+LSNUcZOfeB0YKdcoZeZVyaaJF0M3
 y38AcvQP6gWDCEMv9ikomUuS/qkprv1GM1w6JI7E8z+OPPXqkcsDPmSYoBusYHnI
 tmswg8ByFQDZTXTlXnhhQ46AwcfO93P7p6AppT6lOUEPBLYJQ0+hAPrRBklESMFz
 ELBTX5Qu/MFThBotC2plAJPyDdJssx3V9req5bgx36cvyRo88wk=
 =H7Zs
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.92' into 5.4-2.3.x-imx

This is the 5.4.92 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-23 21:14:12 +00:00
Daniel Borkmann 55bac51762 net, sctp, filter: remap copy_from_user failure error
[ no upstream commit ]

Fix a potential kernel address leakage for the prerequisite where there is
a BPF program attached to the cgroup/setsockopt hook. The latter can only
be attached under root, however, if the attached program returns 1 to then
run the related kernel handler, an unprivileged program could probe for
kernel addresses that way. The reason this is possible is that we're under
set_fs(KERNEL_DS) when running the kernel setsockopt handler. Aside from
old cBPF there is also SCTP's struct sctp_getaddrs_old which contains
pointers in the uapi struct that further need copy_from_user() inside the
handler. In the normal case this would just return -EFAULT, but under a
temporary KERNEL_DS setting the memory would be copied and we'd end up at
a different error code, that is, -EINVAL, for both cases given subsequent
validations fail, which then allows the app to distinguish and make use of
this fact for probing the address space. In case of later kernel versions
this issue won't work anymore thanks to Christoph Hellwig's work that got
rid of the various temporary set_fs() address space overrides altogether.
One potential option for 5.4 as the only affected stable kernel with the
least complexity would be to remap those affected -EFAULT copy_from_user()
error codes with -EINVAL such that they cannot be probed anymore. Risk of
breakage should be rather low for this particular error case.

Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Reported-by: Ryota Shiga (Flatt Security)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:58:00 +01:00
Eric Dumazet 5c466480d7 net: avoid 32 x truesize under-estimation for tiny skbs
[ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ]

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd3 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:59 +01:00
Baptiste Lepers 982e763ea3 udp: Prevent reuseport_select_sock from reading uninitialized socks
[ Upstream commit fd2ddef043592e7de80af53f47fa46fd3573086e ]

reuse->socks[] is modified concurrently by reuseport_add_sock. To
prevent reading values that have not been fully initialized, only read
the array up until the last known safe index instead of incorrectly
re-reading the last index of the array.

Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23 15:57:56 +01:00
Andrey Zhizhikin 82b5d3cd6b This is the 5.4.90 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmAENzgACgkQONu9yGCS
 aT7khA//eTBSPP1vAJIqph0YgQbgCCzvzQTj5enM6F1cCZqVha8s0ZjY4fl9Mkky
 MTVmQdGEem4MoqypzFgAQPQn8KpoM//sQue+b9evny3wU/cmgry5Hs7H3F1/Y7Yv
 q27Q5jzRTmvcy4Up21FhpFE58FXCXiO5H58FrtKEuJtoCxk+akyGuF8Z0UH3Rvp/
 FTKjAKnfzQ9b3MjBJY16W3EqZnpLB+sFMhimS+QyHAr4biTXgIhM/ZebyKxYOGDw
 fq9MX5XCSM5Aka9RfWIGl8FF5y1IICkBQ0Il+xI7zsQwONFD9UIMhAcTE2LxybQT
 YsV/GJ7r/nZWSTcup+vD+tTNceXQoBY2EDGIKeX3rNme8cLWWJeDbTc7KbIkIi35
 ctRFeEcUiFMoQEhIXyi7c8DcOU4xjmTUXtigjhcLLzAODuOBriWbIsM81RuLwNGC
 i/jLYEWhQ+tXozLsmb1/7fL8mvAlZfD3Vwkm4aTSSPul1i52tqBnRZBSut0+KRMa
 +SOpxytl+H5tFV6Z3bI0lrtJ0xnKdr0oJj367JsxIG1yeOpkqe8CEFWW+14TsjqV
 R1ETqDTtqi8YTGfIgp4Q3EUe9LdoJwUQFKh1lv0SMKYac6vtz/C+MxziJXHPValE
 dNK3MocE1zpfMgnZpHP/IwbLOeiWfNl+ZL/wpD73EUr1PvUiRvQ=
 =4Noe
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAmAEu7kACgkQ7G51OISz
 Hs1X1w//ffLwm+GzvGbo3zl1QbmLlLPt7+9zFd753+WvX/mhuI5No1OiHwPaHfXn
 AfVcxFRy39XX2LbcZPFAv624JxZDV8vdxMLPcQE7W9JYUDd/aHy1cz/QfFxxA/03
 UuD8x13ue1ZOPQ5P+wpMCkohnziAnwd28bOc9fzMbxVY0hX792ekokUVD4WutCPx
 8Ula2r7mljSzGXmcn592GJTjWonL88TuyUm1g8+O5Rzb/NNhNJDVbuN4bmpHHQJB
 PXaqJVzg6bkCwv+n9pcjx/KkeqXDmgGxtCRfJjxqFulcgU6Xf3zdgGH1dDsm46bl
 wzrQXddw7CeS5umEq7ZbUnq//6jwiXG4eUazkW7C8TkbtDugnpNy/MofAXxDiHBb
 1YTDqYUBhRFD9VDxxvZD8i5VJ+pjQUzqj67OWHA2y89K5pjY6ZJad0NEt9UHYdsU
 qlRk3skziu3+Bw1g02sPmOthVYJCxajTXK4eyOD9mwhnPydzEwTtQ/4ZCJmsBCU0
 VkePUzmRhyfkjmN0mhzFXd6Y6LHCwjzkhhb2djXdnI/H1h1NkJZ6UHe4lrl4KbNn
 m2BMyEaqXIS/kNL0O4O6ZW64pq1d8M3L3QM6i++KxdJqm207Z8YLOV2s3zQGjw+x
 SK/Tf1d9SKy8+J465gEvtwOCDRsM0CdB/1/YPwv8ds6jF6zvQ3M=
 =Y7tN
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.90' into 5.4-2.3.x-imx

This is the 5.4.90 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-17 22:35:35 +00:00
Vasily Averin bbb2fee395 net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet
commit 54970a2fbb673f090b7f02d7f57b10b2e0707155 upstream.

syzbot reproduces BUG_ON in skb_checksum_help():
tun creates (bogus) skb with huge partial-checksummed area and
small ip packet inside. Then ip_rcv trims the skb based on size
of internal ip packet, after that csum offset points beyond of
trimmed skb. Then checksum_tg() called via netfilter hook
triggers BUG_ON:

        offset = skb_checksum_start_offset(skb);
        BUG_ON(offset >= skb_headlen(skb));

To work around the problem this patch forces pskb_trim_rcsum_slow()
to return -EINVAL in described scenario. It allows its callers to
drop such kind of packets.

Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0
Reported-by: syzbot+7010af67ced6105e5ab6@syzkaller.appspotmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1b2494af-2c56-8ee2-7bc0-923fcad1cdf8@virtuozzo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:05:38 +01:00
Andrey Zhizhikin a8a2b9ee4b This is the 5.4.89 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl/99ZgACgkQONu9yGCS
 aT609BAAg3AcT6t2WQFfY0LZwaT4u8Y7mg7gx2995vDhzWOei/o6AasogDpnv+ey
 fDIu0NwMTK73K5bDSas5pWirEi/+eCk1S0xxg8rLkHgHOYJD7z6Ktq5DlNv5nfNN
 KUl1jnEcZznk4Y3ogxDwJTHmXVCRZAlckn46YiCpYKZeZbA/IqHlzzle9Dwd3eLN
 ElZN6Vdq5vagJOxTuFAEdHLy8mxIWySN0Kh6Ac0VKaaxLbE3GsXXEUtin7nLe/nj
 19/98ije7vQaTUNdqMSu5FIQsZGHg+XNji7EGLvmF/nITEUdwzIWuMsP5/ArVpJn
 rjnmz2J3IuQix7X08PGcde/0T1scXxnspOrQyVnMgGEl9J/5NpewrIItGZGt3H0u
 /fTvohGXx1nvaavDii3u7/y+s038v9HeP9Br6ISlprwZP8Pg4arm0sPQ2aHbPQ1v
 GQZSqat6hOm8DvpkLr0mO4w/+RYgRaVLRCIf8jWoStPvS/pm4APaDvYPAjZdqPRm
 xPSOa9Irvg0UaiwIxiXJdPBvFELvUHexpSxTNGQWsXdNHfMROnK+B4c3MScbDVt8
 vevIh3PVYqENW5Nsn7mSwdWPRzmNaouW/2fWqYjCWxhaSGfqweOz/JawHrwuTTQj
 GRdTgEn9w6o3uj8hQIt7c0+QfGLSvZlHfyvl7JYk/cV6SoofI40=
 =0wHl
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/+GcAACgkQ7G51OISz
 Hs2OQRAAi3ttv18t/kW4jA4kgL0O7nIWOhJPCjv9h488SCRVRKp4yzapTg2s+GU1
 3KgVh67FkoBYEsNVYcXU6Pxta0Z3YZxDnQbPEpWIjTi5eUlXBDyYm4bVPjyuo10N
 s20BTnrgQQ+C7/tvtZX7pam6qVvULerF4FMncVyHfkb5PxI+JabdDuTeQHnLkACj
 6lKXnF6Sd7Y0aOiFp6C7Mq8TBUrPQnhrbYWJdonXlVGF0bGbeH75T8Ab4P+sk+DT
 MaH/rDaewXDosG7Dzz8g8irK9qo5i7MGXJVq9QvciPH9kbZI/R1FoaADusvKxCxT
 nsdf0/u4fc3vOaDL3nLm7tMb6JHGfEqH+ByC6ZnJg+1niuuInqPMo9dA7Njs9J0c
 0aNWjNKcDlZy+sZSjRavls0KciGS1lumvOkOp+rYM6GaHOr8bRefIgpxlcjHUmSO
 6iYTVAzStwejdRG1tDR8ezLRbWTOQZcDMzJ0k8s3V5enKmI/3qurVsOz82Inao8P
 YNiFpC7O8CrSmXasadQ/IHrIzsW8wNF2y+G7wKQdDQO//5Fyd7G9RsKXHCceBv87
 nnalwLer3lZd0Cgzs7PSK2umbgl4/BIFL3yhL37+n2/BYqpu2OViVbi9Pqhc+tu1
 MHZN/ocWnkzhiRGCHBODLaT1r1Uo1XcqoLHEw/KmsnUflLMsiPI=
 =RRIw
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.89' into 5.4-2.3.x-imx

This is the 5.4.89 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-12 21:50:48 +00:00
Antoine Tenart 8602c20a91 net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc
[ Upstream commit 4ae2bb81649dc03dfc95875f02126b14b773f7ab ]

Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.

Fixes: 8af2c06ff4 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:13 +01:00
Antoine Tenart 1f6b04a2b2 net-sysfs: take the rtnl lock when storing xps_rxqs
[ Upstream commit 2d57b4f142e0b03e854612b8e28978935414bced ]

Two race conditions can be triggered when storing xps rxqs, resulting in
various oops and invalid memory accesses:

1. Calling netdev_set_num_tc while netif_set_xps_queue:

   - netif_set_xps_queue uses dev->tc_num as one of the parameters to
     compute the size of new_dev_maps when allocating it. dev->tc_num is
     also used to access the map, and the compiler may generate code to
     retrieve this field multiple times in the function.

   - netdev_set_num_tc sets dev->tc_num.

   If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
   is set to a higher value through netdev_set_num_tc, later accesses to
   new_dev_maps in netif_set_xps_queue could lead to accessing memory
   outside of new_dev_maps; triggering an oops.

2. Calling netif_set_xps_queue while netdev_set_num_tc is running:

   2.1. netdev_set_num_tc starts by resetting the xps queues,
        dev->tc_num isn't updated yet.

   2.2. netif_set_xps_queue is called, setting up the map with the
        *old* dev->num_tc.

   2.3. netdev_set_num_tc updates dev->tc_num.

   2.4. Later accesses to the map lead to out of bound accesses and
        oops.

   A similar issue can be found with netdev_reset_tc.

One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_rxqs in a concurrent thread. With the right timing an oops is
triggered.

Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_rxqs_store.

Fixes: 8af2c06ff4 ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:13 +01:00
Antoine Tenart 67ed54a63f net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc
[ Upstream commit fb25038586d0064123e393cadf1fadd70a9df97a ]

Accesses to dev->xps_cpus_map (when using dev->num_tc) should be
protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
see an actual bug being triggered, but let's be safe here and take the
rtnl lock while accessing the map in sysfs.

Fixes: 184c449f91 ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:13 +01:00
Antoine Tenart fb14db9508 net-sysfs: take the rtnl lock when storing xps_cpus
[ Upstream commit 1ad58225dba3f2f598d2c6daed4323f24547168f ]

Two race conditions can be triggered when storing xps cpus, resulting in
various oops and invalid memory accesses:

1. Calling netdev_set_num_tc while netif_set_xps_queue:

   - netif_set_xps_queue uses dev->tc_num as one of the parameters to
     compute the size of new_dev_maps when allocating it. dev->tc_num is
     also used to access the map, and the compiler may generate code to
     retrieve this field multiple times in the function.

   - netdev_set_num_tc sets dev->tc_num.

   If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
   is set to a higher value through netdev_set_num_tc, later accesses to
   new_dev_maps in netif_set_xps_queue could lead to accessing memory
   outside of new_dev_maps; triggering an oops.

2. Calling netif_set_xps_queue while netdev_set_num_tc is running:

   2.1. netdev_set_num_tc starts by resetting the xps queues,
        dev->tc_num isn't updated yet.

   2.2. netif_set_xps_queue is called, setting up the map with the
        *old* dev->num_tc.

   2.3. netdev_set_num_tc updates dev->tc_num.

   2.4. Later accesses to the map lead to out of bound accesses and
        oops.

   A similar issue can be found with netdev_reset_tc.

One way of triggering this is to set an iface up (for which the driver
uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
xps_cpus in a concurrent thread. With the right timing an oops is
triggered.

Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
and netdev_reset_tc should be mutually exclusive. We do that by taking
the rtnl lock in xps_cpus_store.

Fixes: 184c449f91 ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:16:13 +01:00
Andrey Zhizhikin 6f99d03764 This is the 5.4.87 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl/1wNYACgkQONu9yGCS
 aT64cxAAwwt2H911zFagJCVDfLKXJ4da062n0YcJe3saGSg+mdEkSGYEDxjV6jjM
 jTzK1W5C49sQ9kzIF43YnYgdULwcXJ76G/uqFjFOlmbRzAKAYgs/3KXesa7S4cp+
 LT0fiR7uyViOw1zn4yBIeSnax8uRwT4vR1vV++ILC/7vL6hcnOBOPLxGzUKYlvJQ
 TD8ZQjeTXe5E7IhE+ztuhJQT+hZr1VERTjoktcfmlUps94uITeKdKYoCCZQ/zYIL
 IS7OgnAw5RNERHa1JUZruaGFvJORTu8wAfVtgD1VgRUZAe2ziWH6aCeDPaWaLzS5
 3U7Rc3Fyf0CRYrhe7mI1J864GIEUAe9V34sGQzaU/ap4SWpLvHbu12ePlb+nLNKF
 MZmGEd0eZuKKDSx9dlcx8hbfVg99YpI5oOeDvfCJpYx/uxNzzJhO5wkkZxweiN9s
 XTMUhhkTNkhgYdzn4Y8G9++LLAZpwOImSh3NkntoH+mSVlC+jVBbskz6PdywDjQR
 ROVpW26t5Ee6uDTrjci5cffbfje2y0r9km5/sbRWUz2YGsqYfAI3FtbH5isNUPOm
 Q6ucTd+xvmApfp9bn+XYLnbTQEGAD6mAgSmO11CIDsUJUvOTD/2cv861kATJqhXm
 01rHgohIG604vERppYC3WWFjh0cdevBvwSOpDi1LIdlgbEF6QY0=
 =q0Fm
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/8SgIACgkQ7G51OISz
 Hs2VKRAAml9ZwPiiMF8+9Si2UgXrC76XSe9ugeFR8w3xRYfSeE7/xH/eA3pl8iay
 zO3Z+EuF0Qd6Wo+q+8hLO7Qp6PbWfp9dqWoUA2klptVT6cyvrW3lGT65XK8oYvIO
 7ElkQYY8MCx/NLpExbgQzydJbcb7u6Gpl3s+o6eVIXEA4vewlTU5RVpUGs0YxpWR
 lFh6w1LM1bE18HNK34saIsvKYQURBWMdpj+eN1P/Ts4XSTXBv7Xw9Uf4D4berzsd
 +Xtp4AT2zmyq1iU1QRtkNXY2XyBsnMJLfQAEkbpq+hHHY644hJKU7yWiSWgfvAac
 ylY/VZ6kkEvdXsCHC8pCc/MlPMc9T3ciuHIHDRUjSXDctPcxeOdkVuNah0JG1s2l
 UQYD328Sb55zaq4oSJnG9SQVSj299yOnrRmAKLJb12cFa4wxAwXbbaP1w2baf/Ck
 PQAivRb+zp7G17Uirih3yv/UjSK9f1OcyZywU2P1srL18elfjKdJTQAS11uoZ5NQ
 vGhwlQp5VGQmdML5Z2h9V6w7G+XcEaflR5PkM6QmiuAIRk7/0uESSB+gje6sE4eA
 hGbikUI9K3BV6ppx/OlRLgbEgq0f+V2xjwDRZL7qDLsiiOUp6eWyY9K+QgAglnFH
 2XoZTDsZ+7yNQ2nRQvwhHg2hEGyjJh1dDib2yMmslDujD/w2GZA=
 =Oh8o
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.87' into 5.4-2.3.x-imx

This is the 5.4.87 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 12:52:13 +00:00
Andrey Zhizhikin 3a2ed314f3 This is the 5.4.82 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl/PSigACgkQONu9yGCS
 aT6bSw//eDCpWcnLDa1Rt4bOrnO82484ebr1PZeYPfca/3QVS59j8DsVOf6Xklmz
 z2ponI6SRFxZwO2SmXrfoiOhUVI9Kd3ohTH+LSo3ezpk0klamIf60L914RBc7QFE
 wmVgOPz5LwLxfkU5a148/H4rwLGlM9oBxVcCXpnLkN03Ul4JM/P6A/T3rFrX8ZkW
 3r4NYu3jOHgNz+irosW8zAea+jIf7ALg4Gch3ILwrbM4KSQiyXbAp0mJsY+li7HE
 BSa1RJHBXkqCwK/mWT4LWuJNf871T656kKr04/rxipRu2lEcGCPghO4DGba1mjqR
 NdnuMWBjoxetlRAbWOylWT+2ngQNx+E9hFrBxg1+js/mcHvfpeM4EuSK4YCnI7rO
 6r5JZqYdw7GGHqvy51JPLx1m+NMt8XhTp5+1vOIZhjtdNrcTMBz0kxIiGbvTwdlb
 BbO+LDjmBmQYwmTcadbBPPMRLKnvx5bbNtTAzdwkvYEC8ev5RfxebFO/StTbmVRd
 JIUKkwmNw803OjhMgs+dXVw0lX8C1nLSSROKHf4+lCGFhCDnDhos5DpKpfBIwXxP
 Xv0Uf1YA4ygFVId+kuJOoXWNBkzB6UOlKMxoU1YcuRwpZHFk8b+MvTAzaCbSSl3A
 nJT6CK3K3H6WSiF9PC8i85kFJbAJbwifjx904nGBekaqU0bgI+s=
 =Faec
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/8P/MACgkQ7G51OISz
 Hs19zBAAp3TVOftsZveCj5LPuqVf9ceOJe8qrTgsPJqL/j5pYvEhjiFryIBMBfBq
 tg9jj6dgsShjej8u/7jlcNbXnbDynTOEBozU1KkQspAHp0AS6r/SoiF0R3W5trK+
 sfbAsUce9wU2kgZF190Covr2Ju3xQSJ5iFXtrtpdYTlzxB3b46Rw7R3DWOYLaO41
 EzJ48IIfcjD16MAoiLYdIoBjaLm9so0JIOZJyrGCnboDnAlfI/9Ty11kGD2vqHAq
 P9g74uGbCBa68JsL8437yx2eG3mjdI1o7n/MLDelaFYNtxjzMGae7aRI2PrdelJS
 ny1Le6tdG+0L1CBoCTXHaLSTIgcyaSaQ9i/3ussQl12sAuNXiWUQJYuRbzQG+fwm
 yUBWJISv2kJxXYuCUDTrY1BWK/38HKu2CCSE8ijl9v4re0zG2+EZlr9qXUAU8ap7
 yzIYaHZ2WlXMag6lzbbaK7PqdBaFeQLNEoHT4hCNEZYHJ7peOcOZgJlm9o+3l1FP
 3LFwRTVetFYVGhSMaDSLXn5jMIQE2jgmPUtaPiTNKYI7UxXjSwZLxVgTIOQm7ldh
 248DIjUGJ6BLDRuz/2dGRjVaf7vYf2G/igDc0ySI9hMAxrk2zRROHSwdeu19Eil6
 z37MY0RDtWTEDi/bU4z10I9Al3lpXh9N2yXIo9b4//GcbZLgVjU=
 =mk34
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.82' into 5.4-2.3.x-imx

This is the 5.4.82 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 12:09:17 +00:00
Andrey Zhizhikin 25100dfc5e This is the 5.4.80 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+8/L0ACgkQONu9yGCS
 aT624w/+M3fyTWj45qssxAOYUbWH4OPzKjMTKq1qHOGTBGYcVLxmggDV5xziQs8B
 WiCUysdJsM9Xwe/a9+fy9X2FHk7KxILf02mYLVcwyLJLXCHsCXtvBeTf937h5SaI
 cIsR1e2LQ7s1mTnVmBs2DGDQcD6Y17f/FoTpBejOSB9O+MSBNoBhOR/aaDUzzLm1
 sfpQ3zpnF6iAo2KYITxq/QkyRyiCPMl1c+/ggLTYvrM15DGhnChPN9j1+X0TLdjz
 UuZakvX/UY9vnY6oWla7wybwUzZMfFqZtehvwFA4wqeZqXcJcb+nBpfpoT1Gp9bv
 cpz+8nmF0ER1eS6m1C/XqiTr3IqDOSAHfcu80HzJRC+dmcXjxyNj+AZyFhm+uCJS
 IyUi6+mFwCypg3II2QEMNYdeips4Qj051IPNl5gEteNC4GQqXef3JdR52qIDzsHe
 9xgQVFZjVDYpZ6AOkyjqzGJ0dy3a1f7GNIPxwe6DUnbkOkOB+Z5KhGFbEOp+yGoa
 3PUnVvtrTs07VkB0afwoj7xIyfowmjxCPSSXkfnYY2iJ6FYsfCm2x/RtM5tTvgT+
 E8W71RxsyRwhjC2Z85wi6PR59XTIJcw3oJvJkrvchCAsc3Z1L7wBtjyHdvouxo8+
 h/NlGOAisTiQFdT2IixgmTZaoxE7fQLDCJDMmgZT2qPJ1hn7Pbo=
 =Puge
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/8NwIACgkQ7G51OISz
 Hs2NwBAAhIbyJnUomeS6SPHqEKXOnqTfa2jXyrt4fiGV+BqMtdms/ImhiaOxfUHa
 1iZjLOlX1nFVo3IuJA2kpGBfkMoH/xywEJ/61DWKZYml6jbtApTJ/mFpbWL2jxzx
 VXcXHbGmnjBlprhh8TqnMirube6+j5Al+VCdZ9aglUM4YQOiJ1QplhtzODVsz/oo
 nr6X1CQuFVi4oO36b9R3j53088zxoVbaats7NJJsj5pcceko03dfofV/C8sH2p9P
 ExnUsVsWqZ14XLGAZngwPmqZlpyAo//0lFomtyo4kpEORckPReBK38idSRQUQQGx
 5/z4h5xL98WsoNm4jl4rJrpP3Lff2wTIdxieDkpUcMAmNrHj9VW2z6Mq2lr8W8Iv
 86AZrZVaxxyxl5Im462lUH4b+DIaa9VVHSf6prHIBmRtYqVr26cCCxava7vop+LG
 qMa3i++wzEHlVNkDwfjcHnQfZ8/0wTJkEAjIDSanerfhURc0z+n+80O2S8dS2iIF
 49K57bvsbChUaw8vsoN329X3Twa4ckGofuiK9XF84YJ/VU9WFQeze7tzISSHK9eQ
 NigYm3gNyHPyK31Cj/b0XiELDWXKOiaDMplXjp1Zj6dZgY3/DKyS3gUlSOoL1EyQ
 HXdZ+xKt9x91XNFF6D+ridgEBRGr4yQ/L5YIUneBWUBhFi6v3Z0=
 =RBqz
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.80' into 5.4-2.3.x-imx

This is the 5.4.80 stable release

Conflicts (manual resolve):
- arch/arm64/boot/dts/freescale/imx8mn.dtsi:
Fix minor merge conflict where commit [8381af1b684c] in stable tree
removed one blank line.

- drivers/net/can/flexcan.c:
Fix merge fuzz during integration of stable commit [4c0a778fcf7b5].

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 11:29:40 +00:00
Andrey Zhizhikin 3b4072dd51 This is the 5.4.74 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+elYEACgkQONu9yGCS
 aT4DfQ/+OoCvKzPm/gxmJejGNUvagBhMLXxNw62jvmLwHnagWNchXMQEoplmwpIz
 D3FeSnH6VjBj8QfXCzxZJVazuPNaiSfxrwvaboakvVnvJw66rC0LgiXUJ5MuMhmr
 YVBJ9YfA73Lpv96ySrXWdqEO6QIMgYnlR95Ep+33IBUb5x2QuQB+8ho+qQ3h6I4r
 uoVAzFLaliCpRF/Hz9pwjZjSo3zDbyYx29XVFXYkrHn8cJWE6oBZtNo+K1cyY3wH
 dNY9CXPRh4oC5G+w579m5GvnW5Ac5hTHKONNURCu9NgsEJgHfpuXXiK+ve1yS7xa
 LFj1qFuYW90scgvmcx/YSKIWkNdCGCsqLlp3OJwVDm573touy6NZOag5GW2S35iD
 GcPRvJjWHay8NJSwKteKN9YH92xBxaSWJalrIQcY4Q4VAgJpXizIxZskGieWRdYv
 2XrSAOyXfSPP3nEsRXANEC2RY38Vp6zQt5G4a5duvztNU8knRjuQijMU7vvUbjvU
 V7D+kpamoqSiEkKmPYi3ViH80BkBNaxVrh54AMW9BQiFxUum5X/8sD7PDnKg+p8R
 tPPFsFHKAyVSQQe/7VlAfDq1D9xCfgfzA4TiMYqseyBBFs4UZ1dkLBQTL7Xza9ma
 H4NrA6SQibzYXH5F8OPWFqLPye1hmzAvojhskLk6ijeCw+koLk4=
 =zfx+
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/8I+oACgkQ7G51OISz
 Hs0wwQ//TfFLCDI74fXraTgX8Uhcv4jelk05KVXA1cXTA4LJvoRHpDmEXXVg3dr0
 jZBpyyCO9GQZ3jJibY5hC112rttvSTCVky/Drefs9O0t5Z8VUdrYVtZ+GxBZGt2d
 ciooMYHh7aQmMoThyVqSd3rYC8KXNTwRSaAQblJqY0JnBChLjgEgIEXuJi1HJthG
 1tUy+sOI8DOs4mSDV+SFx9heC2qd0cFW2b3tnQwvT4FQ/dbEPXoi5ORkdLbrMU5Z
 iiT67i26FamT0uzGKH/RyRjpLndTF9p84IAJ/i0Qs9mnrA9/ldJKNxlGjN8Yxm+v
 k4pDBCgi1bBdPUc8ET5TnJyYSfZU654KfUpkXduOdGJb2/uOM7wh5lIUneOxHNcT
 4lwMuAj1598WYNiATLNubGmugevbfgcErut76XQ6AcA5T6Jx5BgawwFuL3G9x5h4
 J4ftWKroa432yFEt2t8MM58HNQCyE3M04iaXxFd9D9/l5pHnC5+IY+6Y1g5a3pkQ
 82kxmyTok9jjaURnWJp5q3xG4YBBOmzjVFdHb77LtuNn1wRAP2Rjh0Ux/CwVPiUr
 de8PSFvrTBcZ8KP24oGrdoFFGAYqVS06QT2lNcTYuD8vS8y/GNnIDL81gz3hdPGq
 irCoqKPD0ur6QWqNE3I1v6nCT8kReSSqjFCOF9sNTDe9Jyc7XFo=
 =3FEK
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.74' into 5.4-2.3.x-imx

This is the 5.4.74 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 10:09:44 +00:00
Andrey Zhizhikin b5636ee381 This is the 5.4.73 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+ahE8ACgkQONu9yGCS
 aT4j1A/9HzkKKoqZ2vXYQ1/uEnUqZech9ly1KxpNTBrSZYAtx3MaWY7tGDEx2BqD
 y6iw9x4MymhHEbpwLg6YmmdWuMQLNNYJGoyLiPJgWhkE4c7zHadhNz1DcPEI8F7z
 bSlUJ3Oebr8gzv0FvUmeVXw7Z2EuOqM1zGgTAZfnKY3DkYHbLnrzUJ4AiI8TNeba
 pPIhjfIJ1TvhF+s5ggf2m8OtSWLZ0doCWCPmCFe2WyERX2WYCzPgsm0yL7L7oXME
 ZqWpOcClBsiYekBNcZ4kxozhJtArCnv24n9VoXJ/YJIlWKvCA6uC8r527nGN/z08
 dfFelj1nDs7/VrCSP4+109EjxLQnSYGgIWP0g0OsC+9wOmrQsYJ1azP1eNjm+NuC
 hPa8uYVEZxwVyJuEfu4ZB4NMZBlD2qnHoskvBKbyZ8yaVnbvlMp552XMwsmJBpCs
 8wArzabrJEz396LUUIYG829D7NBDuRav1Miu+FTzlbn+xZ/Y/S8OmhoG2stWa4wV
 y5x0M0DWgrqiZ9rMkz9A03UNnCInQVTfIBoMl63xFitW4/0vLsln3+CjzlKm7H46
 rD/tKACUoCDjR5DN+JwQzmTdL9zBb4p1cXwWjWb6rON3BkXmO0JVAxzurxI9PfX0
 ZWDydZ3HNmrm0d3J12zf3kTX56PfPFAGWUsEc4Ntb5zdWXSQJsE=
 =fZ3T
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.73' into 5.4-2.3.x-imx

This is the 5.4.73 stable release

Conflicts:
- arch/arm/boot/dts/imx6sl.dtsi:
Commit [a1767c9019] in NXP tree is now covered with commit [5c4c2f437c]
from upstream.

- drivers/gpu/drm/mxsfb/mxsfb_drv.c:
Resolve merge hunk for patch [ed8b90d303] from upstream

- drivers/media/i2c/ov5640.c:
Patch [aa4bb8b883] in NXP tree is now covered by patches [79ec0578c7]
and [b2f8546056] from upstream. Changes from NXP patch [99aa4c8c18] are
covered in upstream version as well.

- drivers/net/ethernet/freescale/fec_main.c:
Fix merge fuzz for patch [9e70485b40] from upstream.

- drivers/usb/cdns3/gadget.c:
Keep NXP version of the file, upstream version is not compatible.

- drivers/usb/dwc3/core.c:
- drivers/usb/dwc3/core.h:
Fix merge fuzz of patch [08045050c6] together wth NXP patch [b30e41dc1e]

- sound/soc/fsl/fsl_sai.c:
- sound/soc/fsl/fsl_sai.h:
Commit [2ea70e51eb72a] in NXP tree is now covered with commit [1ad7f52fe6]
from upstream.

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 10:09:27 +00:00
Andrey Zhizhikin 9e9365cbcf This is the 5.4.71 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl+Gt9kACgkQONu9yGCS
 aT4qAg//ecjVetf6vClqaA6jNWoVHeyuSxJKNWdLWq0XkQgYInuya8irLXoGwY1j
 UTTOvbFT+UwJ1N9DWIB5cLaEkYNLcGA9YYExtcVA6YUfdlhWQw5zcyovIXcw4jHx
 Ma0O2usPE/7Rb9O2+3O8t4jr1YF9C7iRkY82FJJIbDa6GbFQA6hGZ4mHfnjh1l84
 owgSjZ1Yy2HU2uUzX8hA6dXZeIu+SQMk5E2nQSm/DAPhDwbIqPb5Rx9UYqCXafiA
 1c9Cj7RWKopPP9gxNSOzYfLVzOr0YHoFm5uMxtz4apzpNhl/j5CTGphFOnY6SuJs
 BWrRK9D47PGtds5IJ1MslVVb1i1tt0in0RJsNuYV35CXCoJDuaIzaQPJBlpyvix0
 ZialpH+nI3Z1yy7uzVSvrAK11AMwq+79VG/byHht02YVZycHOt7e4wRep3KjpQQq
 uJHapB5djGhPkZypgHOak9Tw1A/snwxC4yR2Xl+Cqn46igIJ8xlgnuey1AmT7pzi
 dSEiJoC7xzHUFildfzrNWkZwIffBFYVPJGfPFyRpyvNc2mOW3S9bwbX0NptE0qSQ
 YzOxQIfqa43TOBJNKZdgHGDrpnPHTESzO63BurQ3fpUI2ex7XjWq92zrIwp23reJ
 9Y/cpELW/paL/dg3ZFYp/wPoOoAh/84GICPXLZPREJkKf0WWey0=
 =1GCf
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdQaENiSDAlGTDEbB7G51OISzHs0FAl/8FbcACgkQ7G51OISz
 Hs22Gw/9HbY9ScdN+FImTSgR+Hmjc+HShBhv3eAF3wvkZuGloh62OrjyDYhvo3mo
 TBTpLkklaSGBp68vXvkH5qpa+RNoM9FBn3iXqjjBY11czU5MKHJczAKPJp3YYfTv
 X+49Kc0vlT/msX8yWe+0kLOkQrRjOBRAAzAqhJwoZ7RZC3/Ikta/4/1xTxHeFb1j
 kuo/TJQWca2NEtzEf3oFuiHsh9CJYDUJIPSGl4yd6R8z/mNqEH9ytR4oyhpedcl/
 VZv/6npNZ8G8OqFLOc4tddsXgxMYj6yVpSDtysJdEM4Vbrf9hLPZvKXc5dsttl7Z
 +ah7afTYBT9entCYRdNxnR69R+gVu0SilMKrI9+DG3s16ADJyppG5qDSUkEvdwtR
 M9nBlxgpx7oxHV8WNicXfAz2+s3QYtcLUs6k5hMMv7WYg9Rplzd8MmDfqqbEHI/j
 wIgxRechQ9UB9efrmHk1tWTwx3tymV573Dpms7LXkeP3gwbNCcA6Hce9dasYnMhT
 nfiFr164bPV7kqsqFYVXl3i/8ibAN5X784mUq9qDAKBtI8kEd3z0R9a4Tk4pohwG
 U15jzSiZoHYnpcDTszzUlS50YAcvkLuYscwj+aCO/uXGBAxuAwKsfZ4KoFrtAFcl
 J0X9bmnJ2+eWAZqnUrXWDvbFp+OEoFAwxRGIoxxsPKsI1uR06PY=
 =fy0V
 -----END PGP SIGNATURE-----

Merge tag 'v5.4.71' into 5.4-2.3.x-imx

This is the 5.4.71 stable release

Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
2021-01-11 09:09:08 +00:00
Dongdong Wang ca49d919d7 lwt: Disable BH too in run_lwt_bpf()
[ Upstream commit d9054a1ff585ba01029584ab730efc794603d68f ]

The per-cpu bpf_redirect_info is shared among all skb_do_redirect()
and BPF redirect helpers. Callers on RX path are all in BH context,
disabling preemption is not sufficient to prevent BH interruption.

In production, we observed strange packet drops because of the race
condition between LWT xmit and TC ingress, and we verified this issue
is fixed after we disable BH.

Although this bug was technically introduced from the beginning, that
is commit 3a0af8fd61 ("bpf: BPF for lightweight tunnel infrastructure"),
at that time call_rcu() had to be call_rcu_bh() to match the RCU context.
So this patch may not work well before RCU flavor consolidation has been
completed around v5.0.

Update the comments above the code too, as call_rcu() is now BH friendly.

Signed-off-by: Dongdong Wang <wangdongdong.6@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/bpf/20201205075946.497763-1-xiyou.wangcong@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:51:30 +01:00
Davide Caratti ba203b92a8 net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl
[ Upstream commit 13de4ed9e3a9ccbe54d05f7d5c773f69ecaf6c64 ]

skb_mpls_dec_ttl() reads the LSE without ensuring that it is contained in
the skb "linear" area. Fix this calling pskb_may_pull() before reading the
current ttl.

Found by code inspection.

Fixes: 2a2ea50870 ("net: sched: add mpls manipulation actions to TC")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/53659f28be8bc336c113b5254dc637cc76bbae91.1606987074.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08 10:40:26 +01:00
Willem de Bruijn 9a62c8229c sock: set sk_err to ee_errno on dequeue from errq
[ Upstream commit 985f7337421a811cb354ca93882f943c8335a6f5 ]

When setting sk_err, set it to ee_errno, not ee_origin.

Commit f5f99309fa ("sock: do not set sk_err in
sock_dequeue_err_skb") disabled updating sk_err on errq dequeue,
which is correct for most error types (origins):

  -       sk->sk_err = err;

Commit 38b257938a ("sock: reset sk_err when the error queue is
empty") reenabled the behavior for IMCP origins, which do require it:

  +       if (icmp_next)
  +               sk->sk_err = SKB_EXT_ERR(skb_next)->ee.ee_origin;

But read from ee_errno.

Fixes: 38b257938a ("sock: reset sk_err when the error queue is empty")
Reported-by: Ayush Ranjan <ayushranjan@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20201126151220.2819322-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08 10:40:23 +01:00
Parav Pandit 99b5382bff devlink: Hold rtnl lock while reading netdev attributes
[ Upstream commit b187c9b4178b87954dbc94e78a7094715794714f ]

A netdevice of a devlink port can be moved to different net namespace
than its parent devlink instance.
This scenario occurs when devlink reload is not used.

When netdevice is undergoing migration to net namespace, its ifindex
and name may change.

In such use case, devlink port query may read stale netdev attributes.

Fix it by reading them under rtnl lock.

Fixes: bfcd3a4661 ("Introduce devlink infrastructure")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-08 10:40:23 +01:00
John Fastabend 95fafa1cb7 bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self
[ Upstream commit 6fa9201a898983da731fca068bb4b5c941537588 ]

If a socket redirects to itself and it is under memory pressure it is
possible to get a socket stuck so that recv() returns EAGAIN and the
socket can not advance for some time. This happens because when
redirecting a skb to the same socket we received the skb on we first
check if it is OK to enqueue the skb on the receiving socket by checking
memory limits. But, if the skb is itself the object holding the memory
needed to enqueue the skb we will keep retrying from kernel side
and always fail with EAGAIN. Then userspace will get a recv() EAGAIN
error if there are no skbs in the psock ingress queue. This will continue
until either some skbs get kfree'd causing the memory pressure to
reduce far enough that we can enqueue the pending packet or the
socket is destroyed. In some cases its possible to get a socket
stuck for a noticeable amount of time if the socket is only receiving
skbs from sk_skb verdict programs. To reproduce I make the socket
memory limits ridiculously low so sockets are always under memory
pressure. More often though if under memory pressure it looks like
a spurious EAGAIN error on user space side causing userspace to retry
and typically enough has moved on the memory side that it works.

To fix skip memory checks and skb_orphan if receiving on the same
sock as already assigned.

For SK_PASS cases this is easy, its always the same socket so we
can just omit the orphan/set_owner pair.

For backlog cases we need to check skb->sk and decide if the orphan
and set_owner pair are needed.

Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556572660.73229.12566203819812939627.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:29:19 +01:00
John Fastabend a9f3670728 bpf, sockmap: Use truesize with sk_rmem_schedule()
[ Upstream commit 70796fb751f1d34cc650e640572a174faf009cd4 ]

We use skb->size with sk_rmem_scheduled() which is not correct. Instead
use truesize to align with socket and tcp stack usage of sk_rmem_schedule.

Suggested-by: Daniel Borkman <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556570616.73229.17003722112077507863.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:29:19 +01:00
John Fastabend e8b1de6975 bpf, sockmap: On receive programs try to fast track SK_PASS ingress
[ Upstream commit 9ecbfb06a078c4911fb444203e8e41d93d22f886 ]

When we receive an skb and the ingress skb verdict program returns
SK_PASS we currently set the ingress flag and put it on the workqueue
so it can be turned into a sk_msg and put on the sk_msg ingress queue.
Then finally telling userspace with data_ready hook.

Here we observe that if the workqueue is empty then we can try to
convert into a sk_msg type and call data_ready directly without
bouncing through a workqueue. Its a common pattern to have a recv
verdict program for visibility that always returns SK_PASS. In this
case unless there is an ENOMEM error or we overrun the socket we
can avoid the workqueue completely only using it when we fall back
to error cases caused by memory pressure.

By doing this we eliminate another case where data may be dropped
if errors occur on memory limits in workqueue.

Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226859704.5692.12929678876744977669.stgit@john-Precision-5820-Tower
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:29:19 +01:00
John Fastabend 329c84430a bpf, sockmap: Skb verdict SK_PASS to self already checked rmem limits
[ Upstream commit cfea28f890cf292d5fe90680db64b68086ef25ba ]

For sk_skb case where skb_verdict program returns SK_PASS to continue to
pass packet up the stack, the memory limits were already checked before
enqueuing in skb_queue_tail from TCP side. So, lets remove the extra checks
here. The theory is if the TCP stack believes we have memory to receive
the packet then lets trust the stack and not double check the limits.

In fact the accounting here can cause a drop if sk_rmem_alloc has increased
after the stack accepted this packet, but before the duplicate check here.
And worse if this happens because TCP stack already believes the data has
been received there is no retransmit.

Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226857664.5692.668205469388498375.stgit@john-Precision-5820-Tower
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:29:18 +01:00
John Fastabend 9df3884a4d bpf, sockmap: Ensure SO_RCVBUF memory is observed on ingress redirect
[ Upstream commit 36cd0e696a832a00247fca522034703566ac8885 ]

Fix sockmap sk_skb programs so that they observe sk_rcvbuf limits. This
allows users to tune SO_RCVBUF and sockmap will honor them.

We can refactor the if(charge) case out in later patches. But, keep this
fix to the point.

Fixes: 51199405f9 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/160556568657.73229.8404601585878439060.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-24 13:29:08 +01:00
Florian Fainelli 99ddc32116 net: Have netpoll bring-up DSA management interface
[ Upstream commit 1532b9778478577152201adbafa7738b1e844868 ]

DSA network devices rely on having their DSA management interface up and
running otherwise their ndo_open() will return -ENETDOWN. Without doing
this it would not be possible to use DSA devices as netconsole when
configured on the command line. These devices also do not utilize the
upper/lower linking so the check about the netpoll device having upper
is not going to be a problem.

The solution adopted here is identical to the one done for
net/ipv4/ipconfig.c with 728c02089a ("net: ipv4: handle DSA enabled
master network devices"), with the network namespace scope being
restricted to that of the process configuring netpoll.

Fixes: 04ff53f96a ("net: dsa: Add netconsole support")
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24 13:28:57 +01:00
Jeff Dike e5fe052c06 Exempt multicast addresses from five-second neighbor lifetime
[ Upstream commit 8cf8821e15cd553339a5b48ee555a0439c2b2742 ]

Commit 58956317c8 ("neighbor: Improve garbage collection")
guarantees neighbour table entries a five-second lifetime.  Processes
which make heavy use of multicast can fill the neighour table with
multicast addresses in five seconds.  At that point, neighbour entries
can't be GC-ed because they aren't five seconds old yet, the kernel
log starts to fill up with "neighbor table overflow!" messages, and
sends start to fail.

This patch allows multicast addresses to be thrown out before they've
lived out their five seconds.  This makes room for non-multicast
addresses and makes messages to all addresses more reliable in these
circumstances.

Fixes: 58956317c8 ("neighbor: Improve garbage collection")
Signed-off-by: Jeff Dike <jdike@akamai.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24 13:28:56 +01:00
Wang Hai 2894a07110 devlink: Add missing genlmsg_cancel() in devlink_nl_sb_port_pool_fill()
[ Upstream commit 849920c703392957f94023f77ec89ca6cf119d43 ]

If sb_occ_port_pool_get() failed in devlink_nl_sb_port_pool_fill(),
msg should be canceled by genlmsg_cancel().

Fixes: df38dafd25 ("devlink: implement shared buffer occupancy monitoring interface")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://lore.kernel.org/r/20201113111622.11040-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-24 13:28:55 +01:00
Christian Eggers a45c8c0a31 socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabled
commit 4e3bbb33e6f36e4b05be1b1b9b02e3dd5aaa3e69 upstream.

SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for
hardware time stamps (configured via SO_TIMESTAMPING_NEW).

User space (ptp4l) first configures hardware time stamping via
SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l
disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not
switch hardware time stamps back to "32 bit mode".

This problem happens on 32 bit platforms were the libc has already
switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
socket options). ptp4l complains with "missing timestamp on transmitted
peer delay request" because the wrong format is received (and
discarded).

Fixes: 887feae36a ("socket: Add SO_TIMESTAMP[NS]_NEW")
Fixes: 783da70e8396 ("net: add sock_enable_timestamps")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-01 12:01:01 +01:00
Ke Li 09ea22aa36 net: Properly typecast int values to set sk_max_pacing_rate
[ Upstream commit 700465fd338fe5df08a1b2e27fa16981f562547f ]

In setsockopt(SO_MAX_PACING_RATE) on 64bit systems, sk_max_pacing_rate,
after extended from 'u32' to 'unsigned long', takes unintentionally
hiked value whenever assigned from an 'int' value with MSB=1, due to
binary sign extension in promoting s32 to u64, e.g. 0x80000000 becomes
0xFFFFFFFF80000000.

Thus inflated sk_max_pacing_rate causes subsequent getsockopt to return
~0U unexpectedly. It may also result in increased pacing rate.

Fix by explicitly casting the 'int' value to 'unsigned int' before
assigning it to sk_max_pacing_rate, for zero extension to happen.

Fixes: 76a9ebe811 ("net: extend sk_pacing_rate to unsigned long")
Signed-off-by: Ji Li <jli@akamai.com>
Signed-off-by: Ke Li <keli@akamai.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201022064146.79873-1-keli@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29 09:57:26 +01:00
Christian Eggers c4099221db socket: fix option SO_TIMESTAMPING_NEW
[ Upstream commit 59e611a566e7cd48cf54b6777a11fe3f9c2f9db5 ]

The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around,
so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally
move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems
unrelated.

This problem happens on 32 bit platforms were the libc has already
switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
socket options). ptp4l complains with "missing timestamp on transmitted
peer delay request" because the wrong format is received (and
discarded).

Fixes: 9718475e69 ("socket: Add SO_TIMESTAMPING_NEW")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-29 09:57:24 +01:00
Guillaume Nault a42dbd059e net/core: check length before updating Ethertype in skb_mpls_{push,pop}
commit 4296adc3e32f5d544a95061160fe7e127be1b9ff upstream.

Openvswitch allows to drop a packet's Ethernet header, therefore
skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true
and mac_len=0. In that case the pointer passed to skb_mod_eth_type()
doesn't point to an Ethernet header and the new Ethertype is written at
unexpected locations.

Fix this by verifying that mac_len is big enough to contain an Ethernet
header.

Fixes: fa4e0f8855 ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-14 10:33:06 +02:00
Jason Liu 2f68e5475b Merge tag 'v5.4.70' into imx_5.4.y
* tag 'v5.4.70': (3051 commits)
  Linux 5.4.70
  netfilter: ctnetlink: add a range check for l3/l4 protonum
  ep_create_wakeup_source(): dentry name can change under you...
  ...

 Conflicts:
	arch/arm/mach-imx/pm-imx6.c
	arch/arm64/boot/dts/freescale/imx8mm-evk.dts
	arch/arm64/boot/dts/freescale/imx8mn-ddr4-evk.dts
	drivers/crypto/caam/caamalg.c
	drivers/gpu/drm/imx/dw_hdmi-imx.c
	drivers/gpu/drm/imx/imx-ldb.c
	drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
	drivers/mmc/host/sdhci-esdhc-imx.c
	drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
	drivers/net/ethernet/freescale/enetc/enetc.c
	drivers/net/ethernet/freescale/enetc/enetc_pf.c
	drivers/thermal/imx_thermal.c
	drivers/usb/cdns3/ep0.c
	drivers/xen/swiotlb-xen.c
	sound/soc/fsl/fsl_esai.c
	sound/soc/fsl/fsl_sai.c

Signed-off-by: Jason Liu <jason.hui.liu@nxp.com>
2020-10-08 17:46:51 +08:00
Daniel Borkmann 5e77009e33 bpf: Fix clobbering of r2 in bpf_gen_ld_abs
[ Upstream commit e6a18d36118bea3bf497c9df4d9988b6df120689 ]

Bryce reported that he saw the following with:

  0:  r6 = r1
  1:  r1 = 12
  2:  r0 = *(u16 *)skb[r1]

The xlated sequence was incorrectly clobbering r2 with pointer
value of r6 ...

  0: (bf) r6 = r1
  1: (b7) r1 = 12
  2: (bf) r1 = r6
  3: (bf) r2 = r1
  4: (85) call bpf_skb_load_helper_16_no_cache#7692160

... and hence call to the load helper never succeeded given the
offset was too high. Fix it by reordering the load of r6 to r1.

Other than that the insn has similar calling convention than BPF
helpers, that is, r0 - r5 are scratch regs, so nothing else
affected after the insn.

Fixes: e0cea7ce98 ("bpf: implement ld_abs/ld_ind in native bpf")
Reported-by: Bryce Kahle <bryce.kahle@datadoghq.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/cace836e4d07bb63b1a53e49c5dfb238a040c298.1599512096.git.daniel@iogearbox.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:18:17 +02:00
Aya Levin 9325e9e5ab devlink: Fix reporter's recovery condition
[ Upstream commit bea0c5c942d3b4e9fb6ed45f6a7de74c6b112437 ]

Devlink health core conditions the reporter's recovery with the
expiration of the grace period. This is not relevant for the first
recovery. Explicitly demand that the grace period will only apply to
recoveries other than the first.

Fixes: c8e1da0bf9 ("devlink: Add health report functionality")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:17:58 +02:00
Vasily Averin 2ba309f086 neigh_stat_seq_next() should increase position index
[ Upstream commit 1e3f9f073c47bee7c23e77316b07bc12338c5bba ]

if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:17:24 +02:00
Ido Schimmel d3c2188ee6 net: Fix bridge enslavement failure
[ Upstream commit e1b9efe6baebe79019a2183176686a0e709388ae ]

When a netdev is enslaved to a bridge, its parent identifier is queried.
This is done so that packets that were already forwarded in hardware
will not be forwarded again by the bridge device between netdevs
belonging to the same hardware instance.

The operation fails when the netdev is an upper of netdevs with
different parent identifiers.

Instead of failing the enslavement, have dev_get_port_parent_id() return
'-EOPNOTSUPP' which will signal the bridge to skip the query operation.
Other callers of the function are not affected by this change.

Fixes: 7e1146e8c1 ("net: devlink: introduce devlink_compat_switch_id_get() helper")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-26 18:03:13 +02:00
David Ahern 583ff79349 ipv4: Initialize flowi4_multipath_hash in data path
[ Upstream commit 1869e226a7b3ef75b4f70ede2f1b7229f7157fa4 ]

flowi4_multipath_hash was added by the commit referenced below for
tunnels. Unfortunately, the patch did not initialize the new field
for several fast path lookups that do not initialize the entire flow
struct to 0. Fix those locations. Currently, flowi4_multipath_hash
is random garbage and affects the hash value computed by
fib_multipath_hash for multipath selection.

Fixes: 24ba14406c ("route: Add multipath_hash in flowi_common to make user-define hash")
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-26 18:03:12 +02:00
Miaohe Lin 78607d494c net: handle the return value of pskb_carve_frag_list() correctly
commit eabe861881a733fc84f286f4d5a1ffaddd4f526f upstream.

pskb_carve_frag_list() may return -ENOMEM in pskb_carve_inside_nonlinear().
we should handle this correctly or we would get wrong sk_buff.

Fixes: 6fa01ccd88 ("skbuff: Add pskb_extract() helper function")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-23 12:40:34 +02:00
Jakub Kicinski ddb279d64b net: disable netpoll on fresh napis
[ Upstream commit 96e97bc07e90f175a8980a22827faf702ca4cb30 ]

napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
netpoll from accessing rings before init is complete. However, the
same is not done for fresh napi instances in netif_napi_add(),
even though we expect NAPI instances to be added as disabled.

This causes crashes during driver reconfiguration (enabling XDP,
changing the channel count) - if there is any printk() after
netif_napi_add() but before napi_enable().

To ensure memory ordering is correct we need to use RCU accessors.

Reported-by: Rob Sherwood <rsher@fb.com>
Fixes: 2d8bff1269 ("netpoll: Close race condition between poll_one_napi and napi_disable")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-12 14:18:55 +02:00
Alexander Lobakin de53545e8d net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()
commit 6570bc79c0dfff0f228b7afd2de720fb4e84d61d upstream.

Commit 323ebb61e3 ("net: use listified RX for handling GRO_NORMAL
skbs") made use of listified skb processing for the users of
napi_gro_frags().
The same technique can be used in a way more common napi_gro_receive()
to speed up non-merged (GRO_NORMAL) skbs for a wide range of drivers
including gro_cells and mac80211 users.
This slightly changes the return value in cases where skb is being
dropped by the core stack, but it seems to have no impact on related
drivers' functionality.
gro_normal_batch is left untouched as it's very individual for every
single system configuration and might be tuned in manual order to
achieve an optimal performance.

Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hyunsoon Kim <h10.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09 19:12:31 +02:00
Miaohe Lin 6ed8917675 net: Fix potential wrong skb->protocol in skb_vlan_untag()
[ Upstream commit 55eff0eb7460c3d50716ed9eccf22257b046ca92 ]

We may access the two bytes after vlan_hdr in vlan_set_encap_proto(). So
we should pull VLAN_HLEN + sizeof(unsigned short) in skb_vlan_untag() or
we may access the wrong data.

Fixes: 0d5501c1c8 ("net: Always untag vlan-tagged traffic on input.")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03 11:26:39 +02:00
John Fastabend 6e0bc946cb bpf: sock_ops sk access may stomp registers when dst_reg = src_reg
[ Upstream commit 84f44df664e9f0e261157e16ee1acd77cc1bb78d ]

Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the
src_reg = dst_reg when reading the sk field of a sock_ops struct we
generate xlated code,

  53: (61) r9 = *(u32 *)(r9 +28)
  54: (15) if r9 == 0x0 goto pc+3
  56: (79) r9 = *(u64 *)(r9 +0)

This stomps on the r9 reg to do the sk_fullsock check and then when
reading the skops->sk field instead of the sk pointer we get the
sk_fullsock. To fix use similar pattern noted in the previous fix
and use the temp field to save/restore a register used to do
sk_fullsock check.

After the fix the generated xlated code reads,

  52: (7b) *(u64 *)(r9 +32) = r8
  53: (61) r8 = *(u32 *)(r9 +28)
  54: (15) if r9 == 0x0 goto pc+3
  55: (79) r8 = *(u64 *)(r9 +32)
  56: (79) r9 = *(u64 *)(r9 +0)
  57: (05) goto pc+1
  58: (79) r8 = *(u64 *)(r9 +32)

Here r9 register was in-use so r8 is chosen as the temporary register.
In line 52 r8 is saved in temp variable and at line 54 restored in case
fullsock != 0. Finally we handle fullsock == 0 case by restoring at
line 58.

This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge
this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a
bit more confusing than just adding a new macro despite a bit of
duplicating code.

Fixes: 1314ef5611 ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159718349653.4728.6559437186853473612.stgit@john-Precision-5820-Tower
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-26 10:40:59 +02:00
Kees Cook 2816386853 net/compat: Add missing sock updates for SCM_RIGHTS
commit d9539752d23283db4692384a634034f451261e29 upstream.

Add missed sock updates to compat path via a new helper, which will be
used more in coming patches. (The net/core/scm.c code is left as-is here
to assist with -stable backports for the compat path.)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 48a87cc26c ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
Fixes: d84295067f ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-21 13:05:25 +02:00
Miaohe Lin 02618095ab net: Fix potential memory leak in proto_register()
[ Upstream commit 0f5907af39137f8183ed536aaa00f322d7365130 ]

If we failed to assign proto idx, we free the twsk_slab_name but forget to
free the twsk_slab. Add a helper function tw_prot_cleanup() to free these
together and also use this helper function in proto_unregister().

Fixes: b45ce32135 ("sock: fix potential memory leak in proto_register()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19 08:16:22 +02:00
Lorenz Bauer ca7ace8fd2 bpf: sockmap: Require attach_bpf_fd when detaching a program
commit bb0de3131f4c60a9bf976681e0fe4d1e55c7a821 upstream.

The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.

Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-07 09:34:02 +02:00
Kuniyuki Iwashima 6735c126d2 udp: Copy has_conns in reuseport_grow().
[ Upstream commit f2b2c55e512879a05456eaf5de4d1ed2f7757509 ]

If an unconnected socket in a UDP reuseport group connect()s, has_conns is
set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
sockets in udp_hslot looking for the connected socket with the highest
score.

However, when the number of sockets bound to the port exceeds max_socks,
reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
to return without scanning all sockets, resulting in that packets sent to
connected sockets may be distributed to unconnected sockets.

Therefore, reuseport_grow() should copy has_conns.

Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-31 18:39:31 +02:00
Weilong Chen 01c9283506 rtnetlink: Fix memory(net_device) leak when ->newlink fails
[ Upstream commit cebb69754f37d68e1355a5e726fdac317bcda302 ]

When vlan_newlink call register_vlan_dev fails, it might return error
with dev->reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
free the memory. But currently rtnl_newlink only free the memory which
state is NETREG_UNINITIALIZED.

BUG: memory leak
unreferenced object 0xffff8881051de000 (size 4096):
  comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
  hex dump (first 32 bytes):
    76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00  vlan2...........
    00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00  .E(.............
  backtrace:
    [<0000000047527e31>] kmalloc_node include/linux/slab.h:578 [inline]
    [<0000000047527e31>] kvmalloc_node+0x33/0xd0 mm/util.c:574
    [<000000002b59e3bc>] kvmalloc include/linux/mm.h:753 [inline]
    [<000000002b59e3bc>] kvzalloc include/linux/mm.h:761 [inline]
    [<000000002b59e3bc>] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
    [<000000006076752a>] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
    [<00000000572b3be5>] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
    [<00000000e84ea553>] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
    [<0000000052c7c0a9>] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
    [<000000004b5cb379>] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
    [<00000000c71c20d3>] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    [<00000000c71c20d3>] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
    [<00000000cca72fa9>] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
    [<000000009221ebf7>] sock_sendmsg_nosec net/socket.c:652 [inline]
    [<000000009221ebf7>] sock_sendmsg+0x109/0x140 net/socket.c:672
    [<000000001c30ffe4>] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
    [<00000000b71ca6f3>] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
    [<0000000007297384>] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
    [<000000000eb29b11>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
    [<000000006839b4d0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-31 18:39:30 +02:00
Xiongfeng Wang 274b40b6df net-sysfs: add a newline when printing 'tx_timeout' by sysfs
[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-31 18:39:30 +02:00
Subash Abhinov Kasiviswanathan d109acd580 dev: Defer free of skbs in flush_backlog
[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c8 ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-31 18:39:30 +02:00
Cong Wang 94886c86e8 cgroup: fix cgroup_sk_alloc() for sk_clone_lock()
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd->val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd->val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reported-by: Daniël Sonck <dsonck92@gmail.com>
Reported-by: Zhang Qiang <qiang.zhang@windriver.com>
Tested-by: Cameron Berkenpas <cam@neo-zeon.de>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:32:49 +02:00
Toke Høiland-Jørgensen 9b7fd81cf9 sched: consistently handle layer3 header accesses in the presence of VLANs
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
  bpf_skb_ecn_set_ce()

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
  calling the helper twice

Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
Fixes: d8b9605d26 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22 09:32:48 +02:00
Kees Cook baef8d1027 bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()
commit 63960260457a02af2a6cb35d75e6bdb17299c882 upstream.

When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file->f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c0 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16 08:16:45 +02:00
John Fastabend b709a08bc4 bpf, sockmap: RCU dereferenced psock may be used outside RCU block
[ Upstream commit 8025751d4d55a2f32be6bdf825b6a80c299875f5 ]

If an ingress verdict program specifies message sizes greater than
skb->len and there is an ENOMEM error due to memory pressure we
may call the rcv_msg handler outside the strp_data_ready() caller
context. This is because on an ENOMEM error the strparser will
retry from a workqueue. The caller currently protects the use of
psock by calling the strp_data_ready() inside a rcu_read_lock/unlock
block.

But, in above workqueue error case the psock is accessed outside
the read_lock/unlock block of the caller. So instead of using
psock directly we must do a look up against the sk again to
ensure the psock is available.

There is an an ugly piece here where we must handle
the case where we paused the strp and removed the psock. On
psock removal we first pause the strparser and then remove
the psock. If the strparser is paused while an skb is
scheduled on the workqueue the skb will be dropped on the
flow and kfree_skb() is called. If the workqueue manages
to get called before we pause the strparser but runs the rcvmsg
callback after the psock is removed we will hit the unlikely
case where we run the sockmap rcvmsg handler but do not have
a psock. For now we will follow strparser logic and drop the
skb on the floor with skb_kfree(). This is ugly because the
data is dropped. To date this has not caused problems in practice
because either the application controlling the sockmap is
coordinating with the datapath so that skbs are "flushed"
before removal or we simply wait for the sock to be closed before
removing it.

This patch fixes the describe RCU bug and dropping the skb doesn't
make things worse. Future patches will improve this by allowing
the normal case where skbs are not merged to skip the strparser
altogether. In practice many (most?) use cases have no need to
merge skbs so its both a code complexity hit as seen above and
a performance issue. For example, in the Cilium case we always
set the strparser up to return sbks 1:1 without any merging and
have avoided above issues.

Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-16 08:16:37 +02:00
John Fastabend 2000bb5465 bpf, sockmap: RCU splat with redirect and strparser error or TLS
[ Upstream commit 93dd5f185916b05e931cffae636596f21f98546e ]

There are two paths to generate the below RCU splat the first and
most obvious is the result of the BPF verdict program issuing a
redirect on a TLS socket (This is the splat shown below). Unlike
the non-TLS case the caller of the *strp_read() hooks does not
wrap the call in a rcu_read_lock/unlock. Then if the BPF program
issues a redirect action we hit the RCU splat.

However, in the non-TLS socket case the splat appears to be
relatively rare, because the skmsg caller into the strp_data_ready()
is wrapped in a rcu_read_lock/unlock. Shown here,

 static void sk_psock_strp_data_ready(struct sock *sk)
 {
	struct sk_psock *psock;

	rcu_read_lock();
	psock = sk_psock(sk);
	if (likely(psock)) {
		if (tls_sw_has_ctx_rx(sk)) {
			psock->parser.saved_data_ready(sk);
		} else {
			write_lock_bh(&sk->sk_callback_lock);
			strp_data_ready(&psock->parser.strp);
			write_unlock_bh(&sk->sk_callback_lock);
		}
	}
	rcu_read_unlock();
 }

If the above was the only way to run the verdict program we
would be safe. But, there is a case where the strparser may throw an
ENOMEM error while parsing the skb. This is a result of a failed
skb_clone, or alloc_skb_for_msg while building a new merged skb when
the msg length needed spans multiple skbs. This will in turn put the
skb on the strp_wrk workqueue in the strparser code. The skb will
later be dequeued and verdict programs run, but now from a
different context without the rcu_read_lock()/unlock() critical
section in sk_psock_strp_data_ready() shown above. In practice
I have not seen this yet, because as far as I know most users of the
verdict programs are also only working on single skbs. In this case no
merge happens which could trigger the above ENOMEM errors. In addition
the system would need to be under memory pressure. For example, we
can't hit the above case in selftests because we missed having tests
to merge skbs. (Added in later patch)

To fix the below splat extend the rcu_read_lock/unnlock block to
include the call to sk_psock_tls_verdict_apply(). This will fix both
TLS redirect case and non-TLS redirect+error case. Also remove
psock from the sk_psock_tls_verdict_apply() function signature its
not used there.

[ 1095.937597] WARNING: suspicious RCU usage
[ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G        W
[ 1095.944363] -----------------------------
[ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage!
[ 1095.950866]
[ 1095.950866] other info that might help us debug this:
[ 1095.950866]
[ 1095.957146]
[ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1
[ 1095.961482] 1 lock held by test_sockmap/15970:
[ 1095.964501]  #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls]
[ 1095.968568]
[ 1095.968568] stack backtrace:
[ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G        W         5.7.0-rc7-02911-g463bac5f1ca79 #1
[ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 1095.980519] Call Trace:
[ 1095.982191]  dump_stack+0x8f/0xd0
[ 1095.984040]  sk_psock_skb_redirect+0xa6/0xf0
[ 1095.986073]  sk_psock_tls_strp_read+0x1d8/0x250
[ 1095.988095]  tls_sw_recvmsg+0x714/0x840 [tls]

v2: Improve commit message to identify non-TLS redirect plus error case
    condition as well as more common TLS case. In the process I decided
    doing the rcu_read_unlock followed by the lock/unlock inside branches
    was unnecessarily complex. We can just extend the current rcu block
    and get the same effeective without the shuffling and branching.
    Thanks Martin!

Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/159312677907.18340.11064813152758406626.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-16 08:16:37 +02:00
Eric Dumazet 67571b1ab2 net: increment xmit_recursion level in dev_direct_xmit()
[ Upstream commit 0ad6f6e767ec2f613418cbc7ebe5ec4c35af540c ]

Back in commit f60e5990d9 ("ipv6: protect skb->sk accesses
from recursive dereference inside the stack") Hannes added code
so that IPv6 stack would not trust skb->sk for typical cases
where packet goes through 'standard' xmit path (__dev_queue_xmit())

Alas af_packet had a dev_direct_xmit() path that was not
dealing yet with xmit_recursion level.

Also change sk_mc_loop() to dump a stack once only.

Without this patch, syzbot was able to trigger :

[1]
[  153.567378] WARNING: CPU: 7 PID: 11273 at net/core/sock.c:721 sk_mc_loop+0x51/0x70
[  153.567378] Modules linked in: nfnetlink ip6table_raw ip6table_filter iptable_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_filter macsec macvtap tap macvlan 8021q hsr wireguard libblake2s blake2s_x86_64 libblake2s_generic udp_tunnel ip6_udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 libchacha curve25519_x86_64 libcurve25519_generic netdevsim batman_adv dummy team bridge stp llc w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core
[  153.567386] CPU: 7 PID: 11273 Comm: b159172088 Not tainted 5.8.0-smp-DEV #273
[  153.567387] RIP: 0010:sk_mc_loop+0x51/0x70
[  153.567388] Code: 66 83 f8 0a 75 24 0f b6 4f 12 b8 01 00 00 00 31 d2 d3 e0 a9 bf ef ff ff 74 07 48 8b 97 f0 02 00 00 0f b6 42 3a 83 e0 01 5d c3 <0f> 0b b8 01 00 00 00 5d c3 0f b6 87 18 03 00 00 5d c0 e8 04 83 e0
[  153.567388] RSP: 0018:ffff95c69bb93990 EFLAGS: 00010212
[  153.567388] RAX: 0000000000000011 RBX: ffff95c6e0ee3e00 RCX: 0000000000000007
[  153.567389] RDX: ffff95c69ae50000 RSI: ffff95c6c30c3000 RDI: ffff95c6c30c3000
[  153.567389] RBP: ffff95c69bb93990 R08: ffff95c69a77f000 R09: 0000000000000008
[  153.567389] R10: 0000000000000040 R11: 00003e0e00026128 R12: ffff95c6c30c3000
[  153.567390] R13: ffff95c6cc4fd500 R14: ffff95c6f84500c0 R15: ffff95c69aa13c00
[  153.567390] FS:  00007fdc3a283700(0000) GS:ffff95c6ff9c0000(0000) knlGS:0000000000000000
[  153.567390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  153.567391] CR2: 00007ffee758e890 CR3: 0000001f9ba20003 CR4: 00000000001606e0
[  153.567391] Call Trace:
[  153.567391]  ip6_finish_output2+0x34e/0x550
[  153.567391]  __ip6_finish_output+0xe7/0x110
[  153.567391]  ip6_finish_output+0x2d/0xb0
[  153.567392]  ip6_output+0x77/0x120
[  153.567392]  ? __ip6_finish_output+0x110/0x110
[  153.567392]  ip6_local_out+0x3d/0x50
[  153.567392]  ipvlan_queue_xmit+0x56c/0x5e0
[  153.567393]  ? ksize+0x19/0x30
[  153.567393]  ipvlan_start_xmit+0x18/0x50
[  153.567393]  dev_direct_xmit+0xf3/0x1c0
[  153.567393]  packet_direct_xmit+0x69/0xa0
[  153.567394]  packet_sendmsg+0xbf0/0x19b0
[  153.567394]  ? plist_del+0x62/0xb0
[  153.567394]  sock_sendmsg+0x65/0x70
[  153.567394]  sock_write_iter+0x93/0xf0
[  153.567394]  new_sync_write+0x18e/0x1a0
[  153.567395]  __vfs_write+0x29/0x40
[  153.567395]  vfs_write+0xb9/0x1b0
[  153.567395]  ksys_write+0xb1/0xe0
[  153.567395]  __x64_sys_write+0x1a/0x20
[  153.567395]  do_syscall_64+0x43/0x70
[  153.567396]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  153.567396] RIP: 0033:0x453549
[  153.567396] Code: Bad RIP value.
[  153.567396] RSP: 002b:00007fdc3a282cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  153.567397] RAX: ffffffffffffffda RBX: 00000000004d32d0 RCX: 0000000000453549
[  153.567397] RDX: 0000000000000020 RSI: 0000000020000300 RDI: 0000000000000003
[  153.567398] RBP: 00000000004d32d8 R08: 0000000000000000 R09: 0000000000000000
[  153.567398] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d32dc
[  153.567398] R13: 00007ffee742260f R14: 00007fdc3a282dc0 R15: 00007fdc3a283700
[  153.567399] ---[ end trace c1d5ae2b1059ec62 ]---

f60e5990d9 ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30 15:36:44 -04:00
Yang Yingliang 742f2358b3 net: fix memleak in register_netdevice()
[ Upstream commit 814152a89ed52c722ab92e9fbabcac3cb8a39245 ]

I got a memleak report when doing some fuzz test:

unreferenced object 0xffff888112584000 (size 13599):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 32 bytes):
    74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
    00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000002f60ba65>] __kmalloc_node+0x309/0x3a0
    [<0000000075b211ec>] kvmalloc_node+0x7f/0xc0
    [<00000000d3a97396>] alloc_netdev_mqs+0x76/0xfc0
    [<00000000609c3655>] __tun_chr_ioctl+0x1456/0x3d70
    [<000000001127ca24>] ksys_ioctl+0xe5/0x130
    [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
    [<00000000e1023498>] do_syscall_64+0x56/0xa0
    [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888111845cc0 (size 8):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 8 bytes):
    74 61 70 30 00 88 ff ff                          tap0....
  backtrace:
    [<000000004c159777>] kstrdup+0x35/0x70
    [<00000000d8b496ad>] kstrdup_const+0x3d/0x50
    [<00000000494e884a>] kvasprintf_const+0xf1/0x180
    [<0000000097880a2b>] kobject_set_name_vargs+0x56/0x140
    [<000000008fbdfc7b>] dev_set_name+0xab/0xe0
    [<000000005b99e3b4>] netdev_register_kobject+0xc0/0x390
    [<00000000602704fe>] register_netdevice+0xb61/0x1250
    [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
    [<000000001127ca24>] ksys_ioctl+0xe5/0x130
    [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
    [<00000000e1023498>] do_syscall_64+0x56/0xa0
    [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff88811886d800 (size 512):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
  backtrace:
    [<0000000050315800>] device_add+0x61e/0x1950
    [<0000000021008dfb>] netdev_register_kobject+0x17e/0x390
    [<00000000602704fe>] register_netdevice+0xb61/0x1250
    [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
    [<000000001127ca24>] ksys_ioctl+0xe5/0x130
    [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
    [<00000000e1023498>] do_syscall_64+0x56/0xa0
    [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

If call_netdevice_notifiers() failed, then rollback_registered()
calls netdev_unregister_kobject() which holds the kobject. The
reference cannot be put because the netdev won't be add to todo
list, so it will leads a memleak, we need put the reference to
avoid memleak.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30 15:36:44 -04:00
Tariq Toukan 9e693934cd net: Do not clear the sock TX queue in sk_set_socket()
[ Upstream commit 41b14fb8724d5a4b382a63cb4a1a61880347ccb8 ]

Clearing the sock TX queue in sk_set_socket() might cause unexpected
out-of-order transmit when called from sock_orphan(), as outstanding
packets can pick a different TX queue and bypass the ones already queued.

This is undesired in general. More specifically, it breaks the in-order
scheduling property guarantee for device-offloaded TLS sockets.

Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
explicitly only where needed.

Fixes: e022f0b4a0 ("net: Introduce sk_tx_queue_mapping")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30 15:36:44 -04:00
Ahmed S. Darwish 99705220b2 net: core: device_rename: Use rwsem instead of a seqcount
[ Upstream commit 11d6011c2cf29f7c8181ebde6c8bc0c4d83adcd7 ]

Sequence counters write paths are critical sections that must never be
preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.

Commit 5dbe7c178d ("net: fix kernel deadlock with interface rename and
netdev name retrieval.") handled a deadlock, observed with
CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
infinitely spinning: it got scheduled after the seqcount write side
blocked inside its own critical section.

To fix that deadlock, among other issues, the commit added a
cond_resched() inside the read side section. While this will get the
non-preemptible kernel eventually unstuck, the seqcount reader is fully
exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.

The fix is also still broken: if the seqcount reader belongs to a
real-time scheduling policy, it can spin forever and the kernel will
livelock.

Disabling preemption over the seqcount write side critical section will
not work: inside it are a number of GFP_KERNEL allocations and mutex
locking through the drivers/base/ :: device_rename() call chain.

>From all the above, replace the seqcount with a rwsem.

Fixes: 5dbe7c178d (net: fix kernel deadlock with interface rename and netdev name retrieval.)
Fixes: 30e6c9fa93 (net: devnet_rename_seq should be a seqcount)
Fixes: c91f6df2db (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
Cc: <stable@vger.kernel.org>
Reported-by: kbuild test robot <lkp@intel.com> [ v1 missing up_read() on error exit ]
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> [ v1 missing up_read() on error exit ]
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:52 +02:00
Thomas Gleixner e33765201d sched/rt, net: Use CONFIG_PREEMPTION.patch
[ Upstream commit 2da2b32fd9346009e9acdb68c570ca8d3966aba7 ]

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Update the comment to use CONFIG_PREEMPTION.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20191015191821.11479-22-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:52 +02:00
Andrey Ignatov 33a76c15c7 bpf: Fix memlock accounting for sock_hash
[ Upstream commit 60e5ca8a64bad8f3e2e20a1e57846e497361c700 ]

Add missed bpf_map_charge_init() in sock_hash_alloc() and
correspondingly bpf_map_charge_finish() on ENOMEM.

It was found accidentally while working on unrelated selftest that
checks "map->memory.pages > 0" is true for all map types.

Before:
	# bpftool m l
	...
	3692: sockhash  name m_sockhash  flags 0x0
		key 4B  value 4B  max_entries 8  memlock 0B

After:
	# bpftool m l
	...
	84: sockmap  name m_sockmap  flags 0x0
		key 4B  value 4B  max_entries 8  memlock 4096B

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200612000857.2881453-1-rdna@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:44 +02:00
YiFei Zhu 9777d12a8b net/filter: Permit reading NET in load_bytes_relative when MAC not set
[ Upstream commit 0f5d82f187e1beda3fe7295dfc500af266a5bd80 ]

Added a check in the switch case on start_header that checks for
the existence of the header, and in the case that MAC is not set
and the caller requests for MAC, -EFAULT. If the caller requests
for NET then MAC's existence is completely ignored.

There is no function to check NET header's existence and as far
as cgroup_skb/egress is concerned it should always be set.

Removed for ptr >= the start of header, considering offset is
bounded unsigned and should always be true. len <= end - mac is
redundant to ptr + len <= end.

Fixes: 3eee1f75f2 ("bpf: fix bpf_skb_load_bytes_relative pkt length check")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/76bb820ddb6a95f59a772ecbd8c8a336f646b362.1591812755.git.zhuyifei@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:43 +02:00
Jakub Sitnicki 5bed77b0a2 bpf, sockhash: Synchronize delete from bucket list on map free
[ Upstream commit 75e68e5bf2c7fa9d3e874099139df03d5952a3e1 ]

We can end up modifying the sockhash bucket list from two CPUs when a
sockhash is being destroyed (sock_hash_free) on one CPU, while a socket
that is in the sockhash is unlinking itself from it on another CPU
it (sock_hash_delete_from_link).

This results in accessing a list element that is in an undefined state as
reported by KASAN:

| ==================================================================
| BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280
| Write of size 8 at addr dead000000000122 by task kworker/2:1/95
|
| CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: events bpf_map_free_deferred
| Call Trace:
|  dump_stack+0x97/0xe0
|  ? sock_hash_free+0x13c/0x280
|  __kasan_report.cold+0x5/0x40
|  ? mark_lock+0xbc1/0xc00
|  ? sock_hash_free+0x13c/0x280
|  kasan_report+0x38/0x50
|  ? sock_hash_free+0x152/0x280
|  sock_hash_free+0x13c/0x280
|  bpf_map_free_deferred+0xb2/0xd0
|  ? bpf_map_charge_finish+0x50/0x50
|  ? rcu_read_lock_sched_held+0x81/0xb0
|  ? rcu_read_lock_bh_held+0x90/0x90
|  process_one_work+0x59a/0xac0
|  ? lock_release+0x3b0/0x3b0
|  ? pwq_dec_nr_in_flight+0x110/0x110
|  ? rwlock_bug.part.0+0x60/0x60
|  worker_thread+0x7a/0x680
|  ? _raw_spin_unlock_irqrestore+0x4c/0x60
|  kthread+0x1cc/0x220
|  ? process_one_work+0xac0/0xac0
|  ? kthread_create_on_node+0xa0/0xa0
|  ret_from_fork+0x24/0x30
| ==================================================================

Fix it by reintroducing spin-lock protected critical section around the
code that removes the elements from the bucket on sockhash free.

To do that we also need to defer processing of removed elements, until out
of atomic context so that we can unlink the socket from the map when
holding the sock lock.

Fixes: 90db6d772f74 ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200607205229.2389672-3-jakub@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:42 +02:00
Jakub Sitnicki 8f73ac0b64 bpf, sockhash: Fix memory leak when unlinking sockets in sock_hash_free
[ Upstream commit 33a7c831565c43a7ee2f38c7df4c4a40e1dfdfed ]

When sockhash gets destroyed while sockets are still linked to it, we will
walk the bucket lists and delete the links. However, we are not freeing the
list elements after processing them, leaking the memory.

The leak can be triggered by close()'ing a sockhash map when it still
contains sockets, and observed with kmemleak:

  unreferenced object 0xffff888116e86f00 (size 64):
    comm "race_sock_unlin", pid 223, jiffies 4294731063 (age 217.404s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      81 de e8 41 00 00 00 00 c0 69 2f 15 81 88 ff ff  ...A.....i/.....
    backtrace:
      [<00000000dd089ebb>] sock_hash_update_common+0x4ca/0x760
      [<00000000b8219bd5>] sock_hash_update_elem+0x1d2/0x200
      [<000000005e2c23de>] __do_sys_bpf+0x2046/0x2990
      [<00000000d0084618>] do_syscall_64+0xad/0x9a0
      [<000000000d96f263>] entry_SYSCALL_64_after_hwframe+0x49/0xb3

Fix it by freeing the list element when we're done with it.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200607205229.2389672-2-jakub@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24 17:50:19 +02:00
John Fastabend e7b1564a24 bpf: Fix running sk_skb program types with ktls
[ Upstream commit e91de6afa81c10e9f855c5695eb9a53168d96b73 ]

KTLS uses a stream parser to collect TLS messages and send them to
the upper layer tls receive handler. This ensures the tls receiver
has a full TLS header to parse when it is run. However, when a
socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS
is enabled we end up with two stream parsers running on the same
socket.

The result is both try to run on the same socket. First the KTLS
stream parser runs and calls read_sock() which will tcp_read_sock
which in turn calls tcp_rcv_skb(). This dequeues the skb from the
sk_receive_queue. When this is done KTLS code then data_ready()
callback which because we stacked KTLS on top of the bpf stream
verdict program has been replaced with sk_psock_start_strp(). This
will in turn kick the stream parser again and eventually do the
same thing KTLS did above calling into tcp_rcv_skb() and dequeuing
a skb from the sk_receive_queue.

At this point the data stream is broke. Part of the stream was
handled by the KTLS side some other bytes may have been handled
by the BPF side. Generally this results in either missing data
or more likely a "Bad Message" complaint from the kTLS receive
handler as the BPF program steals some bytes meant to be in a
TLS header and/or the TLS header length is no longer correct.

We've already broke the idealized model where we can stack ULPs
in any order with generic callbacks on the TX side to handle this.
So in this patch we do the same thing but for RX side. We add
a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict
program is running and add a tls_sw_has_ctx_rx() helper so BPF
side can learn there is a TLS ULP on the socket.

Then on BPF side we omit calling our stream parser to avoid
breaking the data stream for the KTLS receiver. Then on the
KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS
receiver is done with the packet but before it posts the
msg to userspace. This gives us symmetry between the TX and
RX halfs and IMO makes it usable again. On the TX side we
process packets in this order BPF -> TLS -> TCP and on
the receive side in the reverse order TCP -> TLS -> BPF.

Discovered while testing OpenSSL 3.0 Alpha2.0 release.

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22 09:31:12 +02:00
John Fastabend d9cd7b8394 bpf: Refactor sockmap redirect code so its easy to reuse
[ Upstream commit ca2f5f21dbbd5e3a00cd3e97f728aa2ca0b2e011 ]

We will need this block of code called from tls context shortly
lets refactor the redirect logic so its easy to use. This also
cleans up the switch stmt so we have fewer fallthrough cases.

No logic changes are intended.

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/159079360110.5745.7024009076049029819.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22 09:31:11 +02:00
Jason Liu 5691e22711 Merge tag 'v5.4.47' into imx_5.4.y
* tag 'v5.4.47': (2193 commits)
  Linux 5.4.47
  KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
  KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
  ...

 Conflicts:
	arch/arm/boot/dts/imx6qdl.dtsi
	arch/arm/mach-imx/Kconfig
	arch/arm/mach-imx/common.h
	arch/arm/mach-imx/suspend-imx6.S
	arch/arm64/boot/dts/freescale/imx8qxp-mek.dts
	arch/powerpc/include/asm/cacheflush.h
	drivers/cpufreq/imx6q-cpufreq.c
	drivers/dma/imx-sdma.c
	drivers/edac/synopsys_edac.c
	drivers/firmware/imx/imx-scu.c
	drivers/net/ethernet/freescale/fec.h
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	drivers/net/phy/phy_device.c
	drivers/perf/fsl_imx8_ddr_perf.c
	drivers/usb/cdns3/gadget.c
	drivers/usb/dwc3/gadget.c
	include/uapi/linux/dma-buf.h

Signed-off-by: Jason Liu <jason.hui.liu@nxp.com>
2020-06-19 17:32:49 +08:00
Boris Sukholitko b51eb49d9a __netif_receive_skb_core: pass skb by reference
[ Upstream commit c0bbbdc32febd4f034ecbf3ea17865785b2c0652 ]

__netif_receive_skb_core may change the skb pointer passed into it (e.g.
in rx_handler). The original skb may be freed as a result of this
operation.

The callers of __netif_receive_skb_core may further process original skb
by using pt_prev pointer returned by __netif_receive_skb_core thus
leading to unpleasant effects.

The solution is to pass skb by reference into __netif_receive_skb_core.

v2: Added Fixes tag and comment regarding ppt_prev and skb invariant.

Fixes: 88eb1944e1 ("net: core: propagate SKB lists through packet_type lookup")
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-03 08:20:47 +02:00
Jakub Sitnicki 860fe59783 flow_dissector: Drop BPF flow dissector prog ref on netns cleanup
commit 5cf65922bb15279402e1e19b5ee8c51d618fa51f upstream.

When attaching a flow dissector program to a network namespace with
bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.

If netns gets destroyed while a flow dissector is still attached, and there
are no other references to the prog, we leak the reference and the program
remains loaded.

Leak can be reproduced by running flow dissector tests from selftests/bpf:

  # bpftool prog list
  # ./test_flow_dissector.sh
  ...
  selftests: test_flow_dissector [PASS]
  # bpftool prog list
  4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
          loaded_at 2020-05-20T18:50:53+0200  uid 0
          xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
          btf_id 4
  #

Fix it by detaching the flow dissector program when netns is going away.

Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27 17:46:49 +02:00
John Fastabend ce3193bf89 bpf, sockmap: msg_pop_data can incorrecty set an sge length
[ Upstream commit 3e104c23816220919ea1b3fd93fabe363c67c484 ]

When sk_msg_pop() is called where the pop operation is working on
the end of a sge element and there is no additional trailing data
and there _is_ data in front of pop, like the following case,

   |____________a_____________|__pop__|

We have out of order operations where we incorrectly set the pop
variable so that instead of zero'ing pop we incorrectly leave it
untouched, effectively. This can cause later logic to shift the
buffers around believing it should pop extra space. The result is
we have 'popped' more data then we expected potentially breaking
program logic.

It took us a while to hit this case because typically we pop headers
which seem to rarely be at the end of a scatterlist elements but
we can't rely on this.

Fixes: 7246d8ed4d ("bpf: helper to pop data from messages")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/158861288359.14306.7654891716919968144.stgit@john-Precision-5820-Tower
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20 08:20:20 +02:00
Zefan Li fc800ec491 netprio_cgroup: Fix unlimited memory leak of v2 cgroups
[ Upstream commit 090e28b229af92dc5b40786ca673999d59e73056 ]

If systemd is configured to use hybrid mode which enables the use of
both cgroup v1 and v2, systemd will create new cgroup on both the default
root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
task to the two cgroups. If the task does some network thing then the v2
cgroup can never be freed after the session exited.

One of our machines ran into OOM due to this memory leak.

In the scenario described above when sk_alloc() is called
cgroup_sk_alloc() thought it's in v2 mode, so it stores
the cgroup pointer in sk->sk_cgrp_data and increments
the cgroup refcnt, but then sock_update_netprioidx()
thought it's in v1 mode, so it stores netprioidx value
in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.

Currently we do the mode switch when someone writes to the ifpriomap
cgroup control file. The easiest fix is to also do the switch when
a task is attached to a new cgroup.

Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20 08:20:12 +02:00
Cong Wang 60a4f2ce05 net: fix a potential recursive NETDEV_FEAT_CHANGE
[ Upstream commit dd912306ff008891c82cd9f63e8181e47a9cb2fb ]

syzbot managed to trigger a recursive NETDEV_FEAT_CHANGE event
between bonding master and slave. I managed to find a reproducer
for this:

  ip li set bond0 up
  ifenslave bond0 eth0
  brctl addbr br0
  ethtool -K eth0 lro off
  brctl addif br0 bond0
  ip li set br0 up

When a NETDEV_FEAT_CHANGE event is triggered on a bonding slave,
it captures this and calls bond_compute_features() to fixup its
master's and other slaves' features. However, when syncing with
its lower devices by netdev_sync_lower_features() this event is
triggered again on slaves when the LRO feature fails to change,
so it goes back and forth recursively until the kernel stack is
exhausted.

Commit 17b85d29e8 intentionally lets __netdev_update_features()
return -1 for such a failure case, so we have to just rely on
the existing check inside netdev_sync_lower_features() and skip
NETDEV_FEAT_CHANGE event only for this specific failure case.

Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
Reported-by: syzbot+e73ceacfd8560cc8a3ca@syzkaller.appspotmail.com
Reported-by: syzbot+c2fb6f9ddcea95ba49b5@syzkaller.appspotmail.com
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jann Horn <jannh@google.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20 08:20:08 +02:00
Arnd Bergmann fbe2c2c509 drop_monitor: work around gcc-10 stringop-overflow warning
[ Upstream commit dc30b4059f6e2abf3712ab537c8718562b21c45d ]

The current gcc-10 snapshot produces a false-positive warning:

net/core/drop_monitor.c: In function 'trace_drop_common.constprop':
cc1: error: writing 8 bytes into a region of size 0 [-Werror=stringop-overflow=]
In file included from net/core/drop_monitor.c:23:
include/uapi/linux/net_dropmon.h:36:8: note: at offset 0 to object 'entries' with size 4 declared here
   36 |  __u32 entries;
      |        ^~~~~~~

I reported this in the gcc bugzilla, but in case it does not get
fixed in the release, work around it by using a temporary variable.

Fixes: 9a8afc8d39 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol")
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94881
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20 08:20:06 +02:00
Roman Mashak e781af2fdc neigh: send protocol value in neighbor create notification
[ Upstream commit 38212bb31fe923d0a2c6299bd2adfbb84cddef2a ]

When a new neighbor entry has been added, event is generated but it does not
include protocol, because its value is assigned after the event notification
routine has run, so move protocol assignment code earlier.

Fixes: df9b0e30d4 ("neighbor: Add protocol attribute")
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14 07:58:20 +02:00
Jakub Kicinski 6ee2fdf2ba devlink: fix return value after hitting end in region read
[ Upstream commit 610a9346c138b9c2c93d38bf5f3728e74ae9cbd5 ]

Commit d5b90e99e1d5 ("devlink: report 0 after hitting end in region read")
fixed region dump, but region read still returns a spurious error:

$ devlink region read netdevsim/netdevsim1/dummy snapshot 0 addr 0 len 128
0000000000000000 a6 f4 c4 1c 21 35 95 a6 9d 34 c3 5b 87 5b 35 79
0000000000000010 f3 a0 d7 ee 4f 2f 82 7f c6 dd c4 f6 a5 c3 1b ae
0000000000000020 a4 fd c8 62 07 59 48 03 70 3b c7 09 86 88 7f 68
0000000000000030 6f 45 5d 6d 7d 0e 16 38 a9 d0 7a 4b 1e 1e 2e a6
0000000000000040 e6 1d ae 06 d6 18 00 85 ca 62 e8 7e 11 7e f6 0f
0000000000000050 79 7e f7 0f f3 94 68 bd e6 40 22 85 b6 be 6f b1
0000000000000060 af db ef 5e 34 f0 98 4b 62 9a e3 1b 8b 93 fc 17
devlink answers: Invalid argument
0000000000000070 61 e8 11 11 66 10 a5 f7 b1 ea 8d 40 60 53 ed 12

This is a minimal fix, I'll follow up with a restructuring
so we don't have two checks for the same condition.

Fixes: fdd41ec21e ("devlink: Return right error code in case of errors for region read")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-14 07:58:19 +02:00
Jiri Slaby 88348bd1f6 cgroup, netclassid: remove double cond_resched
commit 526f3d96b8f83b1b13d73bd0b5c79cc2c487ec8e upstream.

Commit 018d26fcd12a ("cgroup, netclassid: periodically release file_lock
on classid") added a second cond_resched to write_classid indirectly by
update_classid_task. Remove the one in write_classid.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-10 10:31:32 +02:00
Eric Dumazet f161048060 net: use indirect call wrappers for skb_copy_datagram_iter()
commit 29f3490ba9d2399d3d1b20c4aa74592d92bd4e11 upstream.

TCP recvmsg() calls skb_copy_datagram_iter(), which
calls an indirect function (cb pointing to simple_copy_to_iter())
for every MSS (fragment) present in the skb.

CONFIG_RETPOLINE=y forces a very expensive operation
that we can avoid thanks to indirect call wrappers.

This patch gives a 13% increase of performance on
a single flow, if the bottleneck is the thread reading
the TCP socket.

Fixes: 950fcaecd5 ("datagram: consolidate datagram copy to iter helpers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02 08:49:00 +02:00
Konstantin Khlebnikov f6b264f2a0 net: revert default NAPI poll timeout to 2 jiffies
[ Upstream commit a4837980fd9fa4c70a821d11831698901baef56b ]

For HZ < 1000 timeout 2000us rounds up to 1 jiffy but expires randomly
because next timer interrupt could come shortly after starting softirq.

For commonly used CONFIG_HZ=1000 nothing changes.

Fixes: 7acf8a1e8a ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Reported-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21 09:04:47 +02:00
Amritha Nambiar 238112fcf3 net: Fix Tx hash bound checking
commit 6e11d1578fba8d09d03a286740ffcf336d53928c upstream.

Fixes the lower and upper bounds when there are multiple TCs and
traffic is on the the same TC on the same device.

The lower bound is represented by 'qoffset' and the upper limit for
hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
mapping when there are multiple TCs, as the queue indices for upper TCs
will be offset by 'qoffset'.

v2: Fixed commit description based on comments.

Fixes: 1b837d489e ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
Fixes: eadec877ce ("net: Add support for subordinate traffic classes to netdev_pick_tx")
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-08 09:08:46 +02:00
Pablo Neira Ayuso f8c60f7a00 net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
commit 2c64605b590edadb3fb46d1ec6badb49e940b479 upstream.

net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
    net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
      pkt->skb->tc_redirected = 1;
              ^~
    net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
      pkt->skb->tc_from_ingress = 1;
              ^~

To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.

This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).

Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01 11:02:18 +02:00
John Fastabend 7f884cb145 bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free
commit 90db6d772f749e38171d04619a5e3cd8804a6d02 upstream.

The bucket->lock is not needed in the sock_hash_free and sock_map_free
calls, in fact it is causing a splat due to being inside rcu block.

| BUG: sleeping function called from invalid context at net/core/sock.c:2935
| in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 62, name: kworker/0:1
| 3 locks held by kworker/0:1/62:
|  #0: ffff88813b019748 ((wq_completion)events){+.+.}, at: process_one_work+0x1d7/0x5e0
|  #1: ffffc900000abe50 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0x1d7/0x5e0
|  #2: ffff8881381f6df8 (&stab->lock){+...}, at: sock_map_free+0x26/0x180
| CPU: 0 PID: 62 Comm: kworker/0:1 Not tainted 5.5.0-04008-g7b083332376e #454
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: events bpf_map_free_deferred
| Call Trace:
|  dump_stack+0x71/0xa0
|  ___might_sleep.cold+0xa6/0xb6
|  lock_sock_nested+0x28/0x90
|  sock_map_free+0x5f/0x180
|  bpf_map_free_deferred+0x58/0x80
|  process_one_work+0x260/0x5e0
|  worker_thread+0x4d/0x3e0
|  kthread+0x108/0x140
|  ? process_one_work+0x5e0/0x5e0
|  ? kthread_park+0x90/0x90
|  ret_from_fork+0x3a/0x50

The reason we have stab->lock and bucket->locks in sockmap code is to
handle checking EEXIST in update/delete cases. We need to be careful during
an update operation that we check for EEXIST and we need to ensure that the
psock object is not in some partial state of removal/insertion while we do
this. So both map_update_common and sock_map_delete need to guard from being
run together potentially deleting an entry we are checking, etc. But by the
time we get to the tear-down code in sock_{ma[|hash}_free we have already
disconnected the map and we just did synchronize_rcu() in the line above so
no updates/deletes should be in flight. Because of this we can drop the
bucket locks from the map free'ing code, noting no update/deletes can be
in-flight.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/158385850787.30597.8346421465837046618.stgit@john-Precision-5820-Tower
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-01 11:02:11 +02:00
Jakub Kicinski dd47083779 devlink: validate length of region addr/len
[ Upstream commit ff3b63b8c299b73ac599b120653b47e275407656 ]

DEVLINK_ATTR_REGION_CHUNK_ADDR and DEVLINK_ATTR_REGION_CHUNK_LEN
lack entries in the netlink policy. Corresponding nla_get_u64()s
may read beyond the end of the message.

Fixes: 4e54795a27 ("devlink: Add support for region snapshot read command")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:17:44 +01:00
Jakub Kicinski 4136c4ee41 devlink: validate length of param values
[ Upstream commit 8750939b6ad86abc3f53ec8a9683a1cded4a5654 ]

DEVLINK_ATTR_PARAM_VALUE_DATA may have different types
so it's not checked by the normal netlink policy. Make
sure the attribute length is what we expect.

Fixes: e3b7ca18ad ("devlink: Add param set command")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:17:43 +01:00
Shakeel Butt 4a14448182 net: memcg: late association of sock to memcg
[ Upstream commit d752a4986532cb6305dfd5290a614cde8072769d ]

If a TCP socket is allocated in IRQ context or cloned from unassociated
(i.e. not associated to a memcg) in IRQ context then it will remain
unassociated for its whole life. Almost half of the TCPs created on the
system are created in IRQ context, so, memory used by such sockets will
not be accounted by the memcg.

This issue is more widespread in cgroup v1 where network memory
accounting is opt-in but it can happen in cgroup v2 if the source socket
for the cloning was created in root memcg.

To fix the issue, just do the association of the sockets at the accept()
time in the process context and then force charge the memory buffer
already used and reserved by the socket.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:17:43 +01:00
Dmitry Yakunin 0a062dd0d4 cgroup, netclassid: periodically release file_lock on classid updating
[ Upstream commit 018d26fcd12a75fb9b5fe233762aa3f2f0854b88 ]

In our production environment we have faced with problem that updating
classid in cgroup with heavy tasks cause long freeze of the file tables
in this tasks. By heavy tasks we understand tasks with many threads and
opened sockets (e.g. balancers). This freeze leads to an increase number
of client timeouts.

This patch implements following logic to fix this issue:
аfter iterating 1000 file descriptors file table lock will be released
thus providing a time gap for socket creation/deletion.

Now update is non atomic and socket may be skipped using calls:

dup2(oldfd, newfd);
close(oldfd);

But this case is not typical. Moreover before this patch skip is possible
too by hiding socket fd in unix socket buffer.

New sockets will be allocated with updated classid because cgroup state
is updated before start of the file descriptors iteration.

So in common cases this patch has no side effects.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-18 07:17:38 +01:00