From 995cddcc3337088b2129dddc7d59fe0282d15d5f Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 7 Jun 2018 09:13:48 -0700 Subject: [PATCH 001/970] x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec() commit eab6870fee877258122a042bfd99ee7908c40280 upstream. Mark Rutland noticed that GCC optimization passes have the potential to elide necessary invocations of the array_index_mask_nospec() instruction sequence, so mark the asm() volatile. Mark explains: "The volatile will inhibit *some* cases where the compiler could lift the array_index_nospec() call out of a branch, e.g. where there are multiple invocations of array_index_nospec() with the same arguments: if (idx < foo) { idx1 = array_idx_nospec(idx, foo) do_something(idx1); } < some other code > if (idx < foo) { idx2 = array_idx_nospec(idx, foo); do_something_else(idx2); } ... since the compiler can determine that the two invocations yield the same result, and reuse the first result (likely the same register as idx was in originally) for the second branch, effectively re-writing the above as: if (idx < foo) { idx = array_idx_nospec(idx, foo); do_something(idx); } < some other code > if (idx < foo) { do_something_else(idx); } ... if we don't take the first branch, then speculatively take the second, we lose the nospec protection. There's more info on volatile asm in the GCC docs: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile " Reported-by: Mark Rutland Signed-off-by: Dan Williams Acked-by: Mark Rutland Acked-by: Thomas Gleixner Acked-by: Linus Torvalds Cc: Cc: Peter Zijlstra Fixes: babdde2698d4 ("x86: Implement array_index_mask_nospec") Link: https://lkml.kernel.org/lkml/152838798950.14521.4893346294059739135.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/barrier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 78d1c6a3d221..eb53c2c78a1f 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -37,7 +37,7 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, { unsigned long mask; - asm ("cmp %1,%2; sbb %0,%0;" + asm volatile ("cmp %1,%2; sbb %0,%0;" :"=r" (mask) :"g"(size),"r" (index) :"cc"); From b4eb80a751d3f9ece95255dbcb38539f2c96acac Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Fri, 25 May 2018 14:41:39 -0700 Subject: [PATCH 002/970] x86/mce: Improve error message when kernel cannot recover commit c7d606f560e4c698884697fef503e4abacdd8c25 upstream. Since we added support to add recovery from some errors inside the kernel in: commit b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries") we have done a less than stellar job at reporting the cause of recoverable machine checks that occur in other parts of the kernel. The user just gets the unhelpful message: mce: [Hardware Error]: Machine check: Action required: unknown MCACOD doubly unhelpful when they check the manual for the reported IA32_MSR_STATUS.MCACOD and see that it is listed as one of the standard recoverable values. Add an extra rule to the MCE severity table to catch this case and report it as: mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel Fixes: b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries") Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Cc: Qiuxu Zhuo Cc: Ashok Raj Cc: stable@vger.kernel.org # 4.6+ Cc: Dan Williams Cc: Borislav Petkov Link: https://lkml.kernel.org/r/4cc7c465150a9a48b8b9f45d0b840278e77eb9b5.1527283897.git.tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce-severity.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c index f46071cb2c90..3e0199ee5a2f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c @@ -143,6 +143,11 @@ static struct severity { SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR), USER ), + MCESEV( + PANIC, "Data load in unrecoverable area of kernel", + SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA), + KERNEL + ), #endif MCESEV( PANIC, "Action required: unknown MCACOD", From e7905a78ad570760210192307d9dc69051a072bd Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Fri, 25 May 2018 14:42:09 -0700 Subject: [PATCH 003/970] x86/mce: Check for alternate indication of machine check recovery on Skylake commit 4c5717da1d021cf368eabb3cb1adcaead56c0d1e upstream. Currently we just check the "CAPID0" register to see whether the CPU can recover from machine checks. But there are also some special SKUs which do not have all advanced RAS features, but do enable machine check recovery for use with NVDIMMs. Add a check for any of bits {8:5} in the "CAPID5" register (each reports some NVDIMM mode available, if any of them are set, then the system supports memory machine check recovery). Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Cc: Qiuxu Zhuo Cc: Ashok Raj Cc: stable@vger.kernel.org # 4.9 Cc: Dan Williams Cc: Borislav Petkov Link: https://lkml.kernel.org/r/03cbed6e99ddafb51c2eadf9a3b7c8d7a0cc204e.1527283897.git.tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/quirks.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c index 0bee04d41bed..b57100a2c834 100644 --- a/arch/x86/kernel/quirks.c +++ b/arch/x86/kernel/quirks.c @@ -643,12 +643,19 @@ static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev) /* Skylake */ static void quirk_intel_purley_xeon_ras_cap(struct pci_dev *pdev) { - u32 capid0; + u32 capid0, capid5; pci_read_config_dword(pdev, 0x84, &capid0); + pci_read_config_dword(pdev, 0x98, &capid5); - if ((capid0 & 0xc0) == 0xc0) + /* + * CAPID0{7:6} indicate whether this is an advanced RAS SKU + * CAPID5{8:5} indicate that various NVDIMM usage modes are + * enabled, so memory machine check recovery is also enabled. + */ + if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0)) static_branch_inc(&mcsafe_key); + } DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, quirk_intel_brickland_xeon_ras_cap); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2fc0, quirk_intel_brickland_xeon_ras_cap); From c267eaaceb58e0acf0132a509ce58649add44f5e Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Fri, 22 Jun 2018 11:54:23 +0200 Subject: [PATCH 004/970] x86/mce: Fix incorrect "Machine check from unknown source" message commit 40c36e2741d7fe1e66d6ec55477ba5fd19c9c5d2 upstream. Some injection testing resulted in the following console log: mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134 mce: [Hardware Error]: RIP 10: {pmem_do_bvec+0x11d/0x330 [nd_pmem]} mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88 mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043 mce: [Hardware Error]: Run the above through 'mcelog --ascii' Kernel panic - not syncing: Machine check from unknown source This confused everybody because the first line quite clearly shows that we found a logged error in "Bank 1", while the last line says "unknown source". The problem is that the Linux code doesn't do the right thing for a local machine check that results in a fatal error. It turns out that we know very early in the handler whether the machine check is fatal. The call to mce_no_way_out() has checked all the banks for the CPU that took the local machine check. If it says we must crash, we can do so right away with the right messages. We do scan all the banks again. This means that we might initially not see a problem, but during the second scan find something fatal. If this happens we print a slightly different message (so I can see if it actually every happens). [ bp: Remove unneeded severity assignment. ] Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Ashok Raj Cc: Dan Williams Cc: Qiuxu Zhuo Cc: linux-edac Cc: stable@vger.kernel.org # 4.2 Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 7bbd50fa72ad..886a44a1ecea 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1140,13 +1140,18 @@ void do_machine_check(struct pt_regs *regs, long error_code) lmce = m.mcgstatus & MCG_STATUS_LMCES; /* + * Local machine check may already know that we have to panic. + * Broadcast machine check begins rendezvous in mce_start() * Go through all banks in exclusion of the other CPUs. This way we * don't report duplicated events on shared banks because the first one - * to see it will clear it. If this is a Local MCE, then no need to - * perform rendezvous. + * to see it will clear it. */ - if (!lmce) + if (lmce) { + if (no_way_out) + mce_panic("Fatal local machine check", &m, msg); + } else { order = mce_start(&no_way_out); + } for (i = 0; i < cfg->banks; i++) { __clear_bit(i, toclear); @@ -1222,12 +1227,17 @@ void do_machine_check(struct pt_regs *regs, long error_code) no_way_out = worst >= MCE_PANIC_SEVERITY; } else { /* - * Local MCE skipped calling mce_reign() - * If we found a fatal error, we need to panic here. + * If there was a fatal machine check we should have + * already called mce_panic earlier in this function. + * Since we re-read the banks, we might have found + * something new. Check again to see if we found a + * fatal error. We call "mce_severity()" again to + * make sure we have the right "msg". */ - if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) - mce_panic("Machine check from unknown source", - NULL, NULL); + if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { + mce_severity(&m, cfg->tolerant, &msg, true); + mce_panic("Local fatal machine check!", &m, msg); + } } /* From 5a48f6084de709dbc55c56d91fc78011ae6c4b0e Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 22 Jun 2018 11:54:28 +0200 Subject: [PATCH 005/970] x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out() commit 1f74c8a64798e2c488f86efc97e308b85fb7d7aa upstream. mce_no_way_out() does a quick check during #MC to see whether some of the MCEs logged would require the kernel to panic immediately. And it passes a struct mce where MCi_STATUS gets written. However, after having saved a valid status value, the next iteration of the loop which goes over the MCA banks on the CPU, overwrites the valid status value because we're using struct mce as storage instead of a temporary variable. Which leads to MCE records with an empty status value: mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000 mce: [Hardware Error]: RIP 10: {trigger_mce+0x7/0x10} In order to prevent the loss of the status register value, return immediately when severity is a panic one so that we can panic immediately with the first fatal MCE logged. This is also the intention of this function and not to noodle over the banks while a fatal MCE is already logged. Tony: read the rest of the MCA bank to populate the struct mce fully. Suggested-by: Tony Luck Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Link: https://lkml.kernel.org/r/20180622095428.626-8-bp@alien8.de Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 886a44a1ecea..c49e146d4332 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -738,23 +738,25 @@ EXPORT_SYMBOL_GPL(machine_check_poll); static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, struct pt_regs *regs) { - int i, ret = 0; char *tmp; + int i; for (i = 0; i < mca_cfg.banks; i++) { m->status = mce_rdmsrl(msr_ops.status(i)); - if (m->status & MCI_STATUS_VAL) { - __set_bit(i, validp); - if (quirk_no_way_out) - quirk_no_way_out(i, m, regs); - } + if (!(m->status & MCI_STATUS_VAL)) + continue; + + __set_bit(i, validp); + if (quirk_no_way_out) + quirk_no_way_out(i, m, regs); if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { + mce_read_aux(m, i); *msg = tmp; - ret = 1; + return 1; } } - return ret; + return 0; } /* From 7a68dcdc9d22080c974bb5472cae7d1799ec8607 Mon Sep 17 00:00:00 2001 From: Siarhei Liakh Date: Thu, 14 Jun 2018 19:36:07 +0000 Subject: [PATCH 006/970] x86: Call fixup_exception() before notify_die() in math_error() commit 3ae6295ccb7cf6d344908209701badbbbb503e40 upstream. fpu__drop() has an explicit fwait which under some conditions can trigger a fixable FPU exception while in kernel. Thus, we should attempt to fixup the exception first, and only call notify_die() if the fixup failed just like in do_general_protection(). The original call sequence incorrectly triggers KDB entry on debug kernels under particular FPU-intensive workloads. Andy noted, that this makes the whole conditional irq enable thing even more inconsistent, but fixing that it outside the scope of this. Signed-off-by: Siarhei Liakh Signed-off-by: Thomas Gleixner Reviewed-by: Andy Lutomirski Cc: "H. Peter Anvin" Cc: "Borislav Petkov" Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/DM5PR11MB201156F1CAB2592B07C79A03B17D0@DM5PR11MB2011.namprd11.prod.outlook.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/traps.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index f2142932ff0b..5bbfa2f63b8c 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -799,16 +799,18 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr) char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" : "simd exception"; - if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, SIGFPE) == NOTIFY_STOP) - return; cond_local_irq_enable(regs); if (!user_mode(regs)) { - if (!fixup_exception(regs, trapnr)) { - task->thread.error_code = error_code; - task->thread.trap_nr = trapnr; + if (fixup_exception(regs, trapnr)) + return; + + task->thread.error_code = error_code; + task->thread.trap_nr = trapnr; + + if (notify_die(DIE_TRAP, str, regs, error_code, + trapnr, SIGFPE) != NOTIFY_STOP) die(str, regs, error_code); - } return; } From 5692dcf90e6907e6ea09707b6c7426f36f905772 Mon Sep 17 00:00:00 2001 From: Michael Schmitz Date: Mon, 14 May 2018 23:10:53 +1200 Subject: [PATCH 007/970] m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap() commit 3f90f9ef2dda316d64e420d5d51ba369587ccc55 upstream. If 020/030 support is enabled, get_io_area() leaves an IO_SIZE gap between mappings which is added to the vm_struct representing the mapping. __ioremap() uses the actual requested size (after alignment), while __iounmap() is passed the size from the vm_struct. On 020/030, early termination descriptors are used to set up mappings of extent 'size', which are validated on unmapping. The unmapped gap of size IO_SIZE defeats the sanity check of the pmd tables, causing __iounmap() to loop forever on 030. On 040/060, unmapping of page table entries does not check for a valid mapping, so the umapping loop always completes there. Adjust size to be unmapped by the gap that had been added in the vm_struct prior. This fixes the hang in atari_platform_init() reported a long time ago, and a similar one reported by Finn recently (addressed by removing ioremap() use from the SWIM driver. Tested on my Falcon in 030 mode - untested but should work the same on 040/060 (the extra page tables cleared there would never have been set up anyway). Signed-off-by: Michael Schmitz [geert: Minor commit description improvements] [geert: This was fixed in 2.4.23, but not in 2.5.x] Signed-off-by: Geert Uytterhoeven Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/m68k/mm/kmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/m68k/mm/kmap.c b/arch/m68k/mm/kmap.c index 6e4955bc542b..fcd52cefee29 100644 --- a/arch/m68k/mm/kmap.c +++ b/arch/m68k/mm/kmap.c @@ -88,7 +88,8 @@ static inline void free_io_area(void *addr) for (p = &iolist ; (tmp = *p) ; p = &tmp->next) { if (tmp->addr == addr) { *p = tmp->next; - __iounmap(tmp->addr, tmp->size); + /* remove gap added in get_io_area() */ + __iounmap(tmp->addr, tmp->size - IO_SIZE); kfree(tmp); return; } From d9c202b269dd752c977fb1eb3cc6875a69899648 Mon Sep 17 00:00:00 2001 From: Daniel Wagner Date: Tue, 8 May 2018 10:55:09 +0200 Subject: [PATCH 008/970] serial: sh-sci: Use spin_{try}lock_irqsave instead of open coding version commit 8afb1d2c12163f77777f84616a8e9444d0050ebe upstream. Commit 40f70c03e33a ("serial: sh-sci: add locking to console write function to avoid SMP lockup") copied the strategy to avoid locking problems in conjuncture with the console from the UART8250 driver. Instead using directly spin_{try}lock_irqsave(), local_irq_save() followed by spin_{try}lock() was used. While this is correct on mainline, for -rt it is a problem. spin_{try}lock() will check if it is running in a valid context. Since the local_irq_save() has already been executed, the context has changed and spin_{try}lock() will complain. The reason why spin_{try}lock() complains is that on -rt the spin locks are turned into mutexes and therefore can sleep. Sleeping with interrupts disabled is not valid. BUG: sleeping function called from invalid context at /home/wagi/work/rt/v4.4-cip-rt/kernel/locking/rtmutex.c:995 in_atomic(): 0, irqs_disabled(): 128, pid: 778, name: irq/76-eth0 CPU: 0 PID: 778 Comm: irq/76-eth0 Not tainted 4.4.126-test-cip22-rt14-00403-gcd03665c8318 #12 Hardware name: Generic RZ/G1 (Flattened Device Tree) Backtrace: [] (dump_backtrace) from [] (show_stack+0x18/0x1c) r7:c06b01f0 r6:60010193 r5:00000000 r4:c06b01f0 [] (show_stack) from [] (dump_stack+0x78/0x94) [] (dump_stack) from [] (___might_sleep+0x134/0x194) r7:60010113 r6:c06d3559 r5:00000000 r4:ffffe000 [] (___might_sleep) from [] (rt_spin_lock+0x20/0x74) r5:c06f4d60 r4:c06f4d60 [] (rt_spin_lock) from [] (serial_console_write+0x100/0x118) r5:c06f4d60 r4:c06f4d60 [] (serial_console_write) from [] (call_console_drivers.constprop.15+0x10c/0x124) r10:c06d2894 r9:c04e18b0 r8:00000028 r7:00000000 r6:c06d3559 r5:c06d2798 r4:c06b9914 r3:c02576e4 [] (call_console_drivers.constprop.15) from [] (console_unlock+0x32c/0x430) r10:c06d30d8 r9:00000028 r8:c06dd518 r7:00000005 r6:00000000 r5:c06d2798 r4:c06d2798 r3:00000028 [] (console_unlock) from [] (vprintk_emit+0x394/0x4f0) r10:c06d2798 r9:c06d30ee r8:00000006 r7:00000005 r6:c06a78fc r5:00000027 r4:00000003 [] (vprintk_emit) from [] (vprintk+0x28/0x30) r10:c060bd46 r9:00001000 r8:c06b9a90 r7:c06b9a90 r6:c06b994c r5:c06b9a3c r4:c0062fa8 [] (vprintk) from [] (vprintk_default+0x10/0x14) [] (vprintk_default) from [] (printk+0x78/0x84) [] (printk) from [] (credit_entropy_bits+0x17c/0x2cc) r3:00000001 r2:decade60 r1:c061a5ee r0:c061a523 r4:00000006 [] (credit_entropy_bits) from [] (add_interrupt_randomness+0x160/0x178) r10:466e7196 r9:1f536000 r8:fffeef74 r7:00000000 r6:c06b9a60 r5:c06b9a3c r4:dfbcf680 [] (add_interrupt_randomness) from [] (irq_thread+0x1e8/0x248) r10:c006537c r9:c06cdf21 r8:c0064fcc r7:df791c24 r6:df791c00 r5:ffffe000 r4:df525180 [] (irq_thread) from [] (kthread+0x108/0x11c) r10:00000000 r9:00000000 r8:c0065184 r7:df791c00 r6:00000000 r5:df791d00 r4:decac000 [] (kthread) from [] (ret_from_fork+0x14/0x3c) r8:00000000 r7:00000000 r6:00000000 r5:c003fa9c r4:df791d00 Cc: Sebastian Andrzej Siewior Signed-off-by: Daniel Wagner Reviewed-by: Geert Uytterhoeven Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sh-sci.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index da46f0fba5da..6ff53b604ff6 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -2807,16 +2807,15 @@ static void serial_console_write(struct console *co, const char *s, unsigned long flags; int locked = 1; - local_irq_save(flags); #if defined(SUPPORT_SYSRQ) if (port->sysrq) locked = 0; else #endif if (oops_in_progress) - locked = spin_trylock(&port->lock); + locked = spin_trylock_irqsave(&port->lock, flags); else - spin_lock(&port->lock); + spin_lock_irqsave(&port->lock, flags); /* first save SCSCR then disable interrupts, keep clock source */ ctrl = serial_port_in(port, SCSCR); @@ -2835,8 +2834,7 @@ static void serial_console_write(struct console *co, const char *s, serial_port_out(port, SCSCR, ctrl); if (locked) - spin_unlock(&port->lock); - local_irq_restore(flags); + spin_unlock_irqrestore(&port->lock, flags); } static int serial_console_setup(struct console *co, char *options) From c82ccd7122be45daa107244240a2ede87e562b86 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Apr 2018 09:14:56 -0500 Subject: [PATCH 009/970] signal/xtensa: Consistenly use SIGBUS in do_unaligned_user commit 7de712ccc096b81d23cc0a941cd9b8cb3956605d upstream. While working on changing this code to use force_sig_fault I discovered that do_unaliged_user is sets si_signo to SIGBUS and passes SIGSEGV to force_sig_info. Which is just b0rked. The code is reporting a SIGBUS error so replace the SIGSEGV with SIGBUS. Cc: Chris Zankel Cc: Max Filippov Cc: linux-xtensa@linux-xtensa.org Cc: stable@vger.kernel.org Acked-by: Max Filippov Fixes: 5a0015d62668 ("[PATCH] xtensa: Architecture support for Tensilica Xtensa Part 3") Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman --- arch/xtensa/kernel/traps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/xtensa/kernel/traps.c b/arch/xtensa/kernel/traps.c index ce37d5b899fe..44bd9a377ad1 100644 --- a/arch/xtensa/kernel/traps.c +++ b/arch/xtensa/kernel/traps.c @@ -334,7 +334,7 @@ do_unaligned_user (struct pt_regs *regs) info.si_errno = 0; info.si_code = BUS_ADRALN; info.si_addr = (void *) regs->excvaddr; - force_sig_info(SIGSEGV, &info, current); + force_sig_info(SIGBUS, &info, current); } #endif From 55365ad775af75e1ea56a496e845bd43cbf276bf Mon Sep 17 00:00:00 2001 From: Maxim Moseychuk Date: Thu, 4 Jan 2018 21:43:03 +0300 Subject: [PATCH 010/970] usb: do not reset if a low-speed or full-speed device timed out commit 6e01827ed93947895680fbdad68c072a0f4e2450 upstream. Some low-speed and full-speed devices (for example, bluetooth) do not have time to initialize. For them, ETIMEDOUT is a valid error. We need to give them another try. Otherwise, they will never be initialized correctly and in dmesg will be messages "Bluetooth: hci0 command 0x1002 tx timeout" or similars. Fixes: 264904ccc33c ("usb: retry reset if a device times out") Cc: stable Signed-off-by: Maxim Moseychuk Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hub.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index d8d992b73e88..8bf0090218dd 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -4509,7 +4509,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, * reset. But only on the first attempt, * lest we get into a time out/reset loop */ - if (r == 0 || (r == -ETIMEDOUT && retries == 0)) + if (r == 0 || (r == -ETIMEDOUT && + retries == 0 && + udev->speed > USB_SPEED_FULL)) break; } udev->descriptor.bMaxPacketSize0 = From cf05568cb828cbbe7c36c19187ff68aa1c2bd512 Mon Sep 17 00:00:00 2001 From: Ingo Flaschberger Date: Tue, 1 May 2018 16:10:33 +0200 Subject: [PATCH 011/970] 1wire: family module autoload fails because of upper/lower case mismatch. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 065c09563c872e52813a17218c52cd642be1dca6 upstream. 1wire family module autoload fails because of upper/lower  case mismatch. Signed-off-by: Ingo Flaschberger Acked-by: Evgeniy Polyakov Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/w1/w1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index ab0931e7a9bb..aa458f2fced1 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -741,7 +741,7 @@ int w1_attach_slave_device(struct w1_master *dev, struct w1_reg_num *rn) /* slave modules need to be loaded in a context with unlocked mutex */ mutex_unlock(&dev->mutex); - request_module("w1-family-0x%02x", rn->family); + request_module("w1-family-0x%02X", rn->family); mutex_lock(&dev->mutex); spin_lock(&w1_flock); From 1a1b2790f0bc991fcd21ad2e781dc7c1690446a8 Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Mon, 4 Jun 2018 12:13:26 +0100 Subject: [PATCH 012/970] ASoC: dapm: delete dapm_kcontrol_data paths list before freeing it commit ff2faf1289c1f81b5b26b9451dd1c2006aac8db8 upstream. dapm_kcontrol_data is freed as part of dapm_kcontrol_free(), leaving the paths pointer dangling in the list. This leads to system crash when we try to unload and reload sound card. I hit this bug during ADSP crash/reboot test case on Dragon board DB410c. Without this patch, on SLAB Poisoning enabled build, kernel crashes with "BUG kmalloc-128 (Tainted: G W ): Poison overwritten" Signed-off-by: Srinivas Kandagatla Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- sound/soc/soc-dapm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c index 6780eba55ec2..0b5d132bc3dd 100644 --- a/sound/soc/soc-dapm.c +++ b/sound/soc/soc-dapm.c @@ -425,6 +425,8 @@ static int dapm_kcontrol_data_alloc(struct snd_soc_dapm_widget *widget, static void dapm_kcontrol_free(struct snd_kcontrol *kctl) { struct dapm_kcontrol_data *data = snd_kcontrol_chip(kctl); + + list_del(&data->paths); kfree(data->wlist); kfree(data); } From d6aa7326e812915f84098164f0699fdc2ace6a0c Mon Sep 17 00:00:00 2001 From: Alexander Sverdlin Date: Sat, 28 Apr 2018 22:51:38 +0200 Subject: [PATCH 013/970] ASoC: cirrus: i2s: Fix LRCLK configuration commit 2d534113be9a2aa532a1ae127a57e83558aed358 upstream. The bit responsible for LRCLK polarity is i2s_tlrs (0), not i2s_trel (2) (refer to "EP93xx User's Guide"). Previously card drivers which specified SND_SOC_DAIFMT_NB_IF actually got SND_SOC_DAIFMT_NB_NF, an adaptation is necessary to retain the old behavior. Signed-off-by: Alexander Sverdlin Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- sound/soc/cirrus/edb93xx.c | 2 +- sound/soc/cirrus/ep93xx-i2s.c | 8 ++++---- sound/soc/cirrus/snappercl15.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/sound/soc/cirrus/edb93xx.c b/sound/soc/cirrus/edb93xx.c index 85962657aabe..517963ef4847 100644 --- a/sound/soc/cirrus/edb93xx.c +++ b/sound/soc/cirrus/edb93xx.c @@ -67,7 +67,7 @@ static struct snd_soc_dai_link edb93xx_dai = { .cpu_dai_name = "ep93xx-i2s", .codec_name = "spi0.0", .codec_dai_name = "cs4271-hifi", - .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_IF | + .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF | SND_SOC_DAIFMT_CBS_CFS, .ops = &edb93xx_ops, }; diff --git a/sound/soc/cirrus/ep93xx-i2s.c b/sound/soc/cirrus/ep93xx-i2s.c index 934f8aefdd90..38c240c97041 100644 --- a/sound/soc/cirrus/ep93xx-i2s.c +++ b/sound/soc/cirrus/ep93xx-i2s.c @@ -213,24 +213,24 @@ static int ep93xx_i2s_set_dai_fmt(struct snd_soc_dai *cpu_dai, switch (fmt & SND_SOC_DAIFMT_INV_MASK) { case SND_SOC_DAIFMT_NB_NF: /* Negative bit clock, lrclk low on left word */ - clk_cfg &= ~(EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_REL); + clk_cfg &= ~(EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_LRS); break; case SND_SOC_DAIFMT_NB_IF: /* Negative bit clock, lrclk low on right word */ clk_cfg &= ~EP93XX_I2S_CLKCFG_CKP; - clk_cfg |= EP93XX_I2S_CLKCFG_REL; + clk_cfg |= EP93XX_I2S_CLKCFG_LRS; break; case SND_SOC_DAIFMT_IB_NF: /* Positive bit clock, lrclk low on left word */ clk_cfg |= EP93XX_I2S_CLKCFG_CKP; - clk_cfg &= ~EP93XX_I2S_CLKCFG_REL; + clk_cfg &= ~EP93XX_I2S_CLKCFG_LRS; break; case SND_SOC_DAIFMT_IB_IF: /* Positive bit clock, lrclk low on right word */ - clk_cfg |= EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_REL; + clk_cfg |= EP93XX_I2S_CLKCFG_CKP | EP93XX_I2S_CLKCFG_LRS; break; } diff --git a/sound/soc/cirrus/snappercl15.c b/sound/soc/cirrus/snappercl15.c index 98089df08df6..c6737a573bc0 100644 --- a/sound/soc/cirrus/snappercl15.c +++ b/sound/soc/cirrus/snappercl15.c @@ -72,7 +72,7 @@ static struct snd_soc_dai_link snappercl15_dai = { .codec_dai_name = "tlv320aic23-hifi", .codec_name = "tlv320aic23-codec.0-001a", .platform_name = "ep93xx-i2s", - .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_IF | + .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF | SND_SOC_DAIFMT_CBS_CFS, .ops = &snappercl15_ops, }; From a879f6c232029d08aac63865c4779117d5114d5e Mon Sep 17 00:00:00 2001 From: Alexander Sverdlin Date: Sat, 28 Apr 2018 22:51:39 +0200 Subject: [PATCH 014/970] ASoC: cirrus: i2s: Fix {TX|RX}LinCtrlData setup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 5d302ed3cc80564fb835bed5fdba1e1250ecc9e5 upstream. According to "EP93xx User’s Guide", I2STXLinCtrlData and I2SRXLinCtrlData registers actually have different format. The only currently used bit (Left_Right_Justify) has different position. Fix this and simplify the whole setup taking into account the fact that both registers have zero default value. The practical effect of the above is repaired SND_SOC_DAIFMT_RIGHT_J support (currently unused). Signed-off-by: Alexander Sverdlin Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- sound/soc/cirrus/ep93xx-i2s.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/sound/soc/cirrus/ep93xx-i2s.c b/sound/soc/cirrus/ep93xx-i2s.c index 38c240c97041..0dc3852c4621 100644 --- a/sound/soc/cirrus/ep93xx-i2s.c +++ b/sound/soc/cirrus/ep93xx-i2s.c @@ -51,7 +51,9 @@ #define EP93XX_I2S_WRDLEN_24 (1 << 0) #define EP93XX_I2S_WRDLEN_32 (2 << 0) -#define EP93XX_I2S_LINCTRLDATA_R_JUST (1 << 2) /* Right justify */ +#define EP93XX_I2S_RXLINCTRLDATA_R_JUST BIT(1) /* Right justify */ + +#define EP93XX_I2S_TXLINCTRLDATA_R_JUST BIT(2) /* Right justify */ #define EP93XX_I2S_CLKCFG_LRS (1 << 0) /* lrclk polarity */ #define EP93XX_I2S_CLKCFG_CKP (1 << 1) /* Bit clock polarity */ @@ -170,25 +172,25 @@ static int ep93xx_i2s_set_dai_fmt(struct snd_soc_dai *cpu_dai, unsigned int fmt) { struct ep93xx_i2s_info *info = snd_soc_dai_get_drvdata(cpu_dai); - unsigned int clk_cfg, lin_ctrl; + unsigned int clk_cfg; + unsigned int txlin_ctrl = 0; + unsigned int rxlin_ctrl = 0; clk_cfg = ep93xx_i2s_read_reg(info, EP93XX_I2S_RXCLKCFG); - lin_ctrl = ep93xx_i2s_read_reg(info, EP93XX_I2S_RXLINCTRLDATA); switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) { case SND_SOC_DAIFMT_I2S: clk_cfg |= EP93XX_I2S_CLKCFG_REL; - lin_ctrl &= ~EP93XX_I2S_LINCTRLDATA_R_JUST; break; case SND_SOC_DAIFMT_LEFT_J: clk_cfg &= ~EP93XX_I2S_CLKCFG_REL; - lin_ctrl &= ~EP93XX_I2S_LINCTRLDATA_R_JUST; break; case SND_SOC_DAIFMT_RIGHT_J: clk_cfg &= ~EP93XX_I2S_CLKCFG_REL; - lin_ctrl |= EP93XX_I2S_LINCTRLDATA_R_JUST; + rxlin_ctrl |= EP93XX_I2S_RXLINCTRLDATA_R_JUST; + txlin_ctrl |= EP93XX_I2S_TXLINCTRLDATA_R_JUST; break; default: @@ -237,8 +239,8 @@ static int ep93xx_i2s_set_dai_fmt(struct snd_soc_dai *cpu_dai, /* Write new register values */ ep93xx_i2s_write_reg(info, EP93XX_I2S_RXCLKCFG, clk_cfg); ep93xx_i2s_write_reg(info, EP93XX_I2S_TXCLKCFG, clk_cfg); - ep93xx_i2s_write_reg(info, EP93XX_I2S_RXLINCTRLDATA, lin_ctrl); - ep93xx_i2s_write_reg(info, EP93XX_I2S_TXLINCTRLDATA, lin_ctrl); + ep93xx_i2s_write_reg(info, EP93XX_I2S_RXLINCTRLDATA, rxlin_ctrl); + ep93xx_i2s_write_reg(info, EP93XX_I2S_TXLINCTRLDATA, txlin_ctrl); return 0; } From 676b002f26f98214856dfebd3bba2673ed9037c9 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 1 Jun 2018 11:28:19 +0200 Subject: [PATCH 015/970] clk: renesas: cpg-mssr: Stop using printk format %pCr commit ef4b0be62641d296cf4c0ad8f75ab83ab066ed51 upstream. Printk format "%pCr" will be removed soon, as clk_get_rate() must not be called in atomic context. Replace it by open-coding the operation. This is safe here, as the code runs in task context. Link: http://lkml.kernel.org/r/1527845302-12159-2-git-send-email-geert+renesas@glider.be To: Jia-Ju Bai To: Jonathan Corbet To: Michael Turquette To: Stephen Boyd To: Zhang Rui To: Eduardo Valentin To: Eric Anholt To: Stefan Wahren To: Greg Kroah-Hartman Cc: Sergey Senozhatsky Cc: Petr Mladek Cc: Linus Torvalds Cc: Steven Rostedt Cc: linux-doc@vger.kernel.org Cc: linux-clk@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-renesas-soc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Geert Uytterhoeven Cc: stable@vger.kernel.org # 4.5+ Signed-off-by: Geert Uytterhoeven Acked-by: Stephen Boyd Signed-off-by: Petr Mladek Signed-off-by: Greg Kroah-Hartman --- drivers/clk/renesas/renesas-cpg-mssr.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c b/drivers/clk/renesas/renesas-cpg-mssr.c index 25c41cd9cdfc..7ecc5eac3d7f 100644 --- a/drivers/clk/renesas/renesas-cpg-mssr.c +++ b/drivers/clk/renesas/renesas-cpg-mssr.c @@ -243,8 +243,9 @@ struct clk *cpg_mssr_clk_src_twocell_get(struct of_phandle_args *clkspec, dev_err(dev, "Cannot get %s clock %u: %ld", type, clkidx, PTR_ERR(clk)); else - dev_dbg(dev, "clock (%u, %u) is %pC at %pCr Hz\n", - clkspec->args[0], clkspec->args[1], clk, clk); + dev_dbg(dev, "clock (%u, %u) is %pC at %lu Hz\n", + clkspec->args[0], clkspec->args[1], clk, + clk_get_rate(clk)); return clk; } @@ -304,7 +305,7 @@ static void __init cpg_mssr_register_core_clk(const struct cpg_core_clk *core, if (IS_ERR_OR_NULL(clk)) goto fail; - dev_dbg(dev, "Core clock %pC at %pCr Hz\n", clk, clk); + dev_dbg(dev, "Core clock %pC at %lu Hz\n", clk, clk_get_rate(clk)); priv->clks[id] = clk; return; @@ -372,7 +373,7 @@ static void __init cpg_mssr_register_mod_clk(const struct mssr_mod_clk *mod, if (IS_ERR(clk)) goto fail; - dev_dbg(dev, "Module clock %pC at %pCr Hz\n", clk, clk); + dev_dbg(dev, "Module clock %pC at %lu Hz\n", clk, clk_get_rate(clk)); priv->clks[id] = clk; return; From ec7bea37c833616b63a740d1ba54d2a030b62b32 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 1 Jun 2018 11:28:22 +0200 Subject: [PATCH 016/970] lib/vsprintf: Remove atomic-unsafe support for %pCr commit 666902e42fd8344b923c02dc5b0f37948ff4f225 upstream. "%pCr" formats the current rate of a clock, and calls clk_get_rate(). The latter obtains a mutex, hence it must not be called from atomic context. Remove support for this rarely-used format, as vsprintf() (and e.g. printk()) must be callable from any context. Any remaining out-of-tree users will start seeing the clock's name printed instead of its rate. Reported-by: Jia-Ju Bai Fixes: 900cca2944254edd ("lib/vsprintf: add %pC{,n,r} format specifiers for clocks") Link: http://lkml.kernel.org/r/1527845302-12159-5-git-send-email-geert+renesas@glider.be To: Jia-Ju Bai To: Jonathan Corbet To: Michael Turquette To: Stephen Boyd To: Zhang Rui To: Eduardo Valentin To: Eric Anholt To: Stefan Wahren To: Greg Kroah-Hartman Cc: Sergey Senozhatsky Cc: Petr Mladek Cc: Linus Torvalds Cc: Steven Rostedt Cc: linux-doc@vger.kernel.org Cc: linux-clk@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-renesas-soc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Geert Uytterhoeven Cc: stable@vger.kernel.org # 4.1+ Signed-off-by: Geert Uytterhoeven Signed-off-by: Petr Mladek Signed-off-by: Greg Kroah-Hartman --- Documentation/printk-formats.txt | 3 +-- lib/vsprintf.c | 3 --- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 5962949944fd..d2fbeeb29582 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -279,11 +279,10 @@ struct clk: %pC pll1 %pCn pll1 - %pCr 1560000000 For printing struct clk structures. '%pC' and '%pCn' print the name (Common Clock Framework) or address (legacy clock framework) of the - structure; '%pCr' prints the current clock rate. + structure. Passed by reference. diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 0967771d8f7f..79ba3cc07026 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1391,9 +1391,6 @@ char *clock(char *buf, char *end, struct clk *clk, struct printf_spec spec, return string(buf, end, NULL, spec); switch (fmt[1]) { - case 'r': - return number(buf, end, clk_get_rate(clk), spec); - case 'n': default: #ifdef CONFIG_COMMON_CLK From 95f871342295f2a616fb28d5e0cf72939fe5db5d Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Sat, 24 Mar 2018 17:57:49 +0100 Subject: [PATCH 017/970] mips: ftrace: fix static function graph tracing commit 6fb8656646f996d1eef42e6d56203c4915cb9e08 upstream. ftrace_graph_caller was never run after calling ftrace_trace_function, breaking the function graph tracer. Fix this, bringing it in line with the x86 implementation. While we're at it, also streamline the control flow of _mcount a bit to reduce the number of branches. This issue was reported before: https://www.linux-mips.org/archives/linux-mips/2014-11/msg00295.html Signed-off-by: Matthias Schiffer Tested-by: Matt Redfearn Patchwork: https://patchwork.linux-mips.org/patch/18929/ Signed-off-by: Paul Burton Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/mcount.S | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/arch/mips/kernel/mcount.S b/arch/mips/kernel/mcount.S index 2f7c734771f4..0df911e772ae 100644 --- a/arch/mips/kernel/mcount.S +++ b/arch/mips/kernel/mcount.S @@ -116,10 +116,20 @@ ftrace_stub: NESTED(_mcount, PT_SIZE, ra) PTR_LA t1, ftrace_stub PTR_L t2, ftrace_trace_function /* Prepare t2 for (1) */ - bne t1, t2, static_trace + beq t1, t2, fgraph_trace nop + MCOUNT_SAVE_REGS + + move a0, ra /* arg1: self return address */ + jalr t2 /* (1) call *ftrace_trace_function */ + move a1, AT /* arg2: parent's return address */ + + MCOUNT_RESTORE_REGS + +fgraph_trace: #ifdef CONFIG_FUNCTION_GRAPH_TRACER + PTR_LA t1, ftrace_stub PTR_L t3, ftrace_graph_return bne t1, t3, ftrace_graph_caller nop @@ -128,24 +138,11 @@ NESTED(_mcount, PT_SIZE, ra) bne t1, t3, ftrace_graph_caller nop #endif - b ftrace_stub -#ifdef CONFIG_32BIT - addiu sp, sp, 8 -#else - nop -#endif -static_trace: - MCOUNT_SAVE_REGS - - move a0, ra /* arg1: self return address */ - jalr t2 /* (1) call *ftrace_trace_function */ - move a1, AT /* arg2: parent's return address */ - - MCOUNT_RESTORE_REGS #ifdef CONFIG_32BIT addiu sp, sp, 8 #endif + .globl ftrace_stub ftrace_stub: RETURN_BACK From 3e4fab744be24bf88dac651c5878c831743c0fbe Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 30 May 2018 08:19:22 -0400 Subject: [PATCH 018/970] branch-check: fix long->int truncation when profiling branches commit 2026d35741f2c3ece73c11eb7e4a15d7c2df9ebe upstream. The function __builtin_expect returns long type (see the gcc documentation), and so do macros likely and unlikely. Unfortunatelly, when CONFIG_PROFILE_ANNOTATED_BRANCHES is selected, the macros likely and unlikely expand to __branch_check__ and __branch_check__ truncates the long type to int. This unintended truncation may cause bugs in various kernel code (we found a bug in dm-writecache because of it), so it's better to fix __branch_check__ to return long. Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1805300818140.24812@file01.intranet.prod.int.rdu2.redhat.com Cc: Ingo Molnar Cc: stable@vger.kernel.org Fixes: 1f0d69a9fc815 ("tracing: profile likely and unlikely annotations") Signed-off-by: Mikulas Patocka Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 5ce911db7d88..4f3dfabb680f 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -113,7 +113,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); #define unlikely_notrace(x) __builtin_expect(!!(x), 0) #define __branch_check__(x, expect) ({ \ - int ______r; \ + long ______r; \ static struct ftrace_branch_data \ __attribute__((__aligned__(4))) \ __attribute__((section("_ftrace_annotated_branch"))) \ From d11ec041b2c4fddd7d963cf09895fbcbe14fec2d Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Tue, 22 May 2018 08:14:51 -0500 Subject: [PATCH 019/970] ipmi:bt: Set the timeout before doing a capabilities check commit fe50a7d0393a552e4539da2d31261a59d6415950 upstream. There was one place where the timeout value for an operation was not being set, if a capabilities request was done from idle. Move the timeout value setting to before where that change might be requested. IMHO the cause here is the invisible returns in the macros. Maybe that's a job for later, though. Reported-by: Nordmark Claes Signed-off-by: Corey Minyard Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/char/ipmi/ipmi_bt_sm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_bt_sm.c b/drivers/char/ipmi/ipmi_bt_sm.c index feafdab734ae..4835b588b783 100644 --- a/drivers/char/ipmi/ipmi_bt_sm.c +++ b/drivers/char/ipmi/ipmi_bt_sm.c @@ -522,11 +522,12 @@ static enum si_sm_result bt_event(struct si_sm_data *bt, long time) if (status & BT_H_BUSY) /* clear a leftover H_BUSY */ BT_CONTROL(BT_H_BUSY); + bt->timeout = bt->BT_CAP_req2rsp; + /* Read BT capabilities if it hasn't been done yet */ if (!bt->BT_CAP_outreqs) BT_STATE_CHANGE(BT_STATE_CAPABILITIES_BEGIN, SI_SM_CALL_WITHOUT_DELAY); - bt->timeout = bt->BT_CAP_req2rsp; BT_SI_SM_RETURN(SI_SM_IDLE); case BT_STATE_XACTION_START: From f1e9a633e660cbb792c2bc8409e7defd964bae95 Mon Sep 17 00:00:00 2001 From: Amit Pundir Date: Mon, 16 Apr 2018 12:10:24 +0530 Subject: [PATCH 020/970] Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader commit 7dc5fe0814c35ec4e7d2e8fa30abab72e0e6a172 upstream. AOSP use userspace firmware loader to load firmwares, which will return -EAGAIN in case qca/rampatch_00440302.bin is not found. Since there is no rampatch for dragonboard820c QCA controller revision, just make it work as is. CC: Loic Poulain CC: Nicolas Dechesne CC: Marcel Holtmann CC: Johan Hedberg CC: Stable Signed-off-by: Amit Pundir Signed-off-by: Marcel Holtmann Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/hci_qca.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 74b2f4a14643..3a8b9aef96a6 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -939,6 +939,12 @@ static int qca_setup(struct hci_uart *hu) } else if (ret == -ENOENT) { /* No patch/nvm-config found, run with original fw/config */ ret = 0; + } else if (ret == -EAGAIN) { + /* + * Userspace firmware loader will return -EAGAIN in case no + * patch/nvm-config is found, so run with original fw/config. + */ + ret = 0; } /* Setup bdaddr */ From ebdc37febe594035d8cf3f5424ce97a926d2cddf Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 8 Feb 2018 15:17:38 +0100 Subject: [PATCH 021/970] fuse: atomic_o_trunc should truncate pagecache commit df0e91d488276086bc07da2e389986cae0048c37 upstream. Fuse has an "atomic_o_trunc" mode, where userspace filesystem uses the O_TRUNC flag in the OPEN request to truncate the file atomically with the open. In this mode there's no need to send a SETATTR request to userspace after the open, so fuse_do_setattr() checks this mode and returns. But this misses the important step of truncating the pagecache. Add the missing parts of truncation to the ATTR_OPEN branch. Reported-by: Chad Austin Fixes: 6ff958edbf39 ("fuse: add atomic open+truncate support") Signed-off-by: Miklos Szeredi Cc: Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dir.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 4bbad745415a..cca8dd3bda09 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1633,8 +1633,19 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, return err; if (attr->ia_valid & ATTR_OPEN) { - if (fc->atomic_o_trunc) + /* This is coming from open(..., ... | O_TRUNC); */ + WARN_ON(!(attr->ia_valid & ATTR_SIZE)); + WARN_ON(attr->ia_size != 0); + if (fc->atomic_o_trunc) { + /* + * No need to send request to userspace, since actual + * truncation has already been done by OPEN. But still + * need to truncate page cache. + */ + i_size_write(inode, 0); + truncate_pagecache(inode, 0); return 0; + } file = NULL; } From a0fbcaf9993ea291955ead3dac4b52457b3ec799 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Tue, 1 May 2018 13:12:14 +0900 Subject: [PATCH 022/970] fuse: don't keep dead fuse_conn at fuse_fill_super(). commit 543b8f8662fe6d21f19958b666ab0051af9db21a upstream. syzbot is reporting use-after-free at fuse_kill_sb_blk() [1]. Since sb->s_fs_info field is not cleared after fc was released by fuse_conn_put() when initialization failed, fuse_kill_sb_blk() finds already released fc and tries to hold the lock. Fix this by clearing sb->s_fs_info field after calling fuse_conn_put(). [1] https://syzkaller.appspot.com/bug?id=a07a680ed0a9290585ca424546860464dd9658db Signed-off-by: Tetsuo Handa Reported-by: syzbot Fixes: 3b463ae0c626 ("fuse: invalidation reverse calls") Cc: John Muir Cc: Csaba Henk Cc: Anand Avati Cc: # v2.6.31 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 6fe6a88ecb4a..f95e1d49b048 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1184,6 +1184,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) err_put_conn: fuse_bdi_destroy(fc); fuse_conn_put(fc); + sb->s_fs_info = NULL; err_fput: fput(file); err: From 12715f3ef147e56cb3c570ec169de80f6717838e Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 31 May 2018 12:26:10 +0200 Subject: [PATCH 023/970] fuse: fix control dir setup and teardown commit 6becdb601bae2a043d7fb9762c4d48699528ea6e upstream. syzbot is reporting NULL pointer dereference at fuse_ctl_remove_conn() [1]. Since fc->ctl_ndents is incremented by fuse_ctl_add_conn() when new_inode() failed, fuse_ctl_remove_conn() reaches an inode-less dentry and tries to clear d_inode(dentry)->i_private field. Fix by only adding the dentry to the array after being fully set up. When tearing down the control directory, do d_invalidate() on it to get rid of any mounts that might have been added. [1] https://syzkaller.appspot.com/bug?id=f396d863067238959c91c0b7cfc10b163638cac6 Reported-by: syzbot Fixes: bafa96541b25 ("[PATCH] fuse: add control filesystem") Cc: # v2.6.18 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/control.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/fuse/control.c b/fs/fuse/control.c index 6e22748b0704..e25c40c10f4f 100644 --- a/fs/fuse/control.c +++ b/fs/fuse/control.c @@ -211,10 +211,11 @@ static struct dentry *fuse_ctl_add_dentry(struct dentry *parent, if (!dentry) return NULL; - fc->ctl_dentry[fc->ctl_ndents++] = dentry; inode = new_inode(fuse_control_sb); - if (!inode) + if (!inode) { + dput(dentry); return NULL; + } inode->i_ino = get_next_ino(); inode->i_mode = mode; @@ -228,6 +229,9 @@ static struct dentry *fuse_ctl_add_dentry(struct dentry *parent, set_nlink(inode, nlink); inode->i_private = fc; d_add(dentry, inode); + + fc->ctl_dentry[fc->ctl_ndents++] = dentry; + return dentry; } @@ -284,7 +288,10 @@ void fuse_ctl_remove_conn(struct fuse_conn *fc) for (i = fc->ctl_ndents - 1; i >= 0; i--) { struct dentry *dentry = fc->ctl_dentry[i]; d_inode(dentry)->i_private = NULL; - d_drop(dentry); + if (!i) { + /* Get rid of submounts: */ + d_invalidate(dentry); + } dput(dentry); } drop_nlink(d_inode(fuse_control_sb->s_root)); From 10e46042f27d6bebcba97fe1a17af1eb7add7db9 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Wed, 30 May 2018 18:48:04 +0530 Subject: [PATCH 024/970] powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch commit 91d06971881f71d945910de128658038513d1b24 upstream. Currently we do not have an isync, or any other context synchronizing instruction prior to the slbie/slbmte in _switch() that updates the SLB entry for the kernel stack. However that is not correct as outlined in the ISA. From Power ISA Version 3.0B, Book III, Chapter 11, page 1133: "Changing the contents of ... the contents of SLB entries ... can have the side effect of altering the context in which data addresses and instruction addresses are interpreted, and in which instructions are executed and data accesses are performed. ... These side effects need not occur in program order, and therefore may require explicit synchronization by software. ... The synchronizing instruction before the context-altering instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alteration." And page 1136: "For data accesses, the context synchronizing instruction before the slbie, slbieg, slbia, slbmte, tlbie, or tlbiel instruction ensures that all preceding instructions that access data storage have completed to a point at which they have reported all exceptions they will cause." We're not aware of any bugs caused by this, but it should be fixed regardless. Add the missing isync when updating kernel stack SLB entry. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V [mpe: Flesh out change log with more ISA text & explanation] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/entry_64.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2dc52e6d2af4..e24ae0fa80ed 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -586,6 +586,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) * actually hit this code path. */ + isync slbie r6 slbie r6 /* Workaround POWER5 < DD2.1 issue */ slbmte r7,r0 From 5ea3b9bddf844e72f701ae8a8ebe75e431249435 Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Thu, 17 May 2018 15:37:15 +1000 Subject: [PATCH 025/970] powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG commit 4f7c06e26ec9cf7fe9f0c54dc90079b6a4f4b2c3 upstream. In commit e2a800beaca1 ("powerpc/hw_brk: Fix off by one error when validating DAWR region end") we fixed setting the DAWR end point to its max value via PPC_PTRACE_SETHWDEBUG. Unfortunately we broke PTRACE_SET_DEBUGREG when setting a 512 byte aligned breakpoint. PTRACE_SET_DEBUGREG currently sets the length of the breakpoint to zero (memset() in hw_breakpoint_init()). This worked with arch_validate_hwbkpt_settings() before the above patch was applied but is now broken if the breakpoint is 512byte aligned. This sets the length of the breakpoint to 8 bytes when using PTRACE_SET_DEBUGREG. Fixes: e2a800beaca1 ("powerpc/hw_brk: Fix off by one error when validating DAWR region end") Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Michael Neuling Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/ptrace.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c index d97370866a5f..adfa63e7df8c 100644 --- a/arch/powerpc/kernel/ptrace.c +++ b/arch/powerpc/kernel/ptrace.c @@ -2380,6 +2380,7 @@ static int ptrace_set_debugreg(struct task_struct *task, unsigned long addr, /* Create a new breakpoint request if one doesn't exist already */ hw_breakpoint_init(&attr); attr.bp_addr = hw_brk.address; + attr.bp_len = 8; arch_bp_generic_fields(hw_brk.type, &attr.bp_type); From 90f88f05d8775d30c68f7cd2058d55c1a9d868da Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Thu, 17 May 2018 15:37:14 +1000 Subject: [PATCH 026/970] powerpc/ptrace: Fix enforcement of DAWR constraints commit cd6ef7eebf171bfcba7dc2df719c2a4958775040 upstream. Back when we first introduced the DAWR, in commit 4ae7ebe9522a ("powerpc: Change hardware breakpoint to allow longer ranges"), we screwed up the constraint making it a 1024 byte boundary rather than a 512. This makes the check overly permissive. Fortunately GDB is the only real user and it always did they right thing, so we never noticed. This fixes the constraint to 512 bytes. Fixes: 4ae7ebe9522a ("powerpc: Change hardware breakpoint to allow longer ranges") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/hw_breakpoint.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 469d86d1c2a5..532c585ec24b 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -175,8 +175,8 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) if (cpu_has_feature(CPU_FTR_DAWR)) { length_max = 512 ; /* 64 doublewords */ /* DAWR region can't cross 512 boundary */ - if ((bp->attr.bp_addr >> 10) != - ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 10)) + if ((bp->attr.bp_addr >> 9) != + ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 9)) return -EINVAL; } if (info->len > From f9b25660d64bd8f86db67b7cfa6d5c30f53627a0 Mon Sep 17 00:00:00 2001 From: Alexey Kardashevskiy Date: Wed, 30 May 2018 19:22:50 +1000 Subject: [PATCH 027/970] powerpc/powernv/ioda2: Remove redundant free of TCE pages commit 98fd72fe82527fd26618062b60cfd329451f2329 upstream. When IODA2 creates a PE, it creates an IOMMU table with it_ops::free set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages(). Since iommu_tce_table_put() calls it_ops::free when the last reference to the table is released, explicit call to pnv_pci_ioda2_table_free_pages() is not needed so let's remove it. This should fix double free in the case of PCI hotuplug as pnv_pci_ioda2_table_free_pages() does not reset neither iommu_table::it_base nor ::it_size. This was not exposed by SRIOV as it uses different code path via pnv_pcibios_sriov_disable(). IODA1 does not inialize it_ops::free so it does not have this issue. Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Alexey Kardashevskiy Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powernv/pci-ioda.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index f602307a4386..9ed90c502005 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -3424,7 +3424,6 @@ static void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe) WARN_ON(pe->table_group.group); } - pnv_pci_ioda2_table_free_pages(tbl); iommu_free_table(tbl, "pnv"); } From 443004a666ede252f85bcdf8faab85d8174f0367 Mon Sep 17 00:00:00 2001 From: "Gautham R. Shenoy" Date: Thu, 31 May 2018 17:45:09 +0530 Subject: [PATCH 028/970] cpuidle: powernv: Fix promotion from snooze if next state disabled commit 0a4ec6aa035a52c422eceb2ed51ed88392a3d6c2 upstream. The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") introduced a timeout for the snooze idle state so that it could be eventually be promoted to a deeper idle state. The snooze timeout value is static and set to the target residency of the next idle state, which would train the cpuidle governor to pick the next idle state eventually. The unfortunate side-effect of this is that if the next idle state(s) is disabled, the CPU will forever remain in snooze, despite the fact that the system is completely idle, and other deeper idle states are available. This patch fixes the issue by dynamically setting the snooze timeout to the target residency of the next enabled state on the device. Before Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01297 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 96.41| 0.00| 0.00 0| 8| 1| 96.43| 0.00| 0.00 0| 8| 2| 96.47| 0.00| 0.00 0| 8| 3| 96.35| 0.00| 0.00 0| 8| 4| 96.37| 0.00| 0.00 0| 8| 5| 96.37| 0.00| 0.00 0| 8| 6| 96.47| 0.00| 0.00 0| 8| 7| 96.47| 0.00| 0.00 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.05033 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 16| 0| 89.79| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 1| 90.12| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 2| 90.21| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 0| 16| 3| 90.29| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00 After Patch: POWER8 : Only nap disabled. $ cpupower monitor sleep 30 sleep took 30.01200 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | Nap | Fast 0| 8| 0| 16.58| 0.00| 77.21 0| 8| 1| 18.42| 0.00| 75.38 0| 8| 2| 4.70| 0.00| 94.09 0| 8| 3| 17.06| 0.00| 81.73 0| 8| 4| 3.06| 0.00| 95.73 0| 8| 5| 7.00| 0.00| 96.80 0| 8| 6| 1.00| 0.00| 98.79 0| 8| 7| 5.62| 0.00| 94.17 POWER9: Shallow states (stop0lite, stop1lite, stop2lite, stop0, stop1, stop2) disabled: $ cpupower monitor sleep 30 sleep took 30.02110 seconds and exited with status 0 |Idle_Stats PKG |CORE|CPU | snoo | stop | stop | stop | stop | stop | stop | stop | stop 0| 0| 0| 0.69| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 9.39| 89.70 0| 0| 1| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.05| 93.21 0| 0| 2| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 89.93 0| 0| 3| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 93.26 Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Gautham R. Shenoy Reviewed-by: Balbir Singh Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- drivers/cpuidle/cpuidle-powernv.c | 32 +++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 854a56781100..fd96af1d2ef0 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -32,9 +32,31 @@ static struct cpuidle_state *cpuidle_state_table; static u64 stop_psscr_table[CPUIDLE_STATE_MAX]; -static u64 snooze_timeout; +static u64 default_snooze_timeout; static bool snooze_timeout_en; +static u64 get_snooze_timeout(struct cpuidle_device *dev, + struct cpuidle_driver *drv, + int index) +{ + int i; + + if (unlikely(!snooze_timeout_en)) + return default_snooze_timeout; + + for (i = index + 1; i < drv->state_count; i++) { + struct cpuidle_state *s = &drv->states[i]; + struct cpuidle_state_usage *su = &dev->states_usage[i]; + + if (s->disabled || su->disable) + continue; + + return s->target_residency * tb_ticks_per_usec; + } + + return default_snooze_timeout; +} + static int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) @@ -44,7 +66,7 @@ static int snooze_loop(struct cpuidle_device *dev, local_irq_enable(); set_thread_flag(TIF_POLLING_NRFLAG); - snooze_exit_time = get_tb() + snooze_timeout; + snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index); ppc64_runlatch_off(); while (!need_resched()) { HMT_low(); @@ -337,11 +359,9 @@ static int powernv_idle_probe(void) cpuidle_state_table = powernv_states; /* Device tree can indicate more idle states */ max_idle_state = powernv_add_idle_states(); - if (max_idle_state > 1) { + default_snooze_timeout = TICK_USEC * tb_ticks_per_usec; + if (max_idle_state > 1) snooze_timeout_en = true; - snooze_timeout = powernv_states[1].target_residency * - tb_ticks_per_usec; - } } else return -ENODEV; From 81d6e715d16124862aaeb4b9d9648fb40e9ae779 Mon Sep 17 00:00:00 2001 From: Mahesh Salgaonkar Date: Fri, 27 Apr 2018 11:53:18 +0530 Subject: [PATCH 029/970] powerpc/fadump: Unregister fadump on kexec down path. commit 722cde76d68e8cc4f3de42e71c82fd40dea4f7b9 upstream. Unregister fadump on kexec down path otherwise the fadump registration in new kexec-ed kernel complains that fadump is already registered. This makes new kernel to continue using fadump registered by previous kernel which may lead to invalid vmcore generation. Hence this patch fixes this issue by un-registering fadump in fadump_cleanup() which is called during kexec path so that new kernel can register fadump with new valid values. Fixes: b500afff11f6 ("fadump: Invalidate registration and release reserved memory for general use.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Mahesh Salgaonkar Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/fadump.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 8f0c7c5d93f2..93a6eeba3ace 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1033,6 +1033,9 @@ void fadump_cleanup(void) init_fadump_mem_struct(&fdm, be64_to_cpu(fdm_active->cpu_state_data.destination_address)); fadump_invalidate_dump(&fdm); + } else if (fw_dump.dump_registered) { + /* Un-register Firmware-assisted dump if it was registered. */ + fadump_unregister_dump(&fdm); } } From 8f27499338a23a9e6eada6ab3a126c0616b21296 Mon Sep 17 00:00:00 2001 From: David Rivshin Date: Wed, 25 Apr 2018 21:15:01 +0100 Subject: [PATCH 030/970] ARM: 8764/1: kgdb: fix NUMREGBYTES so that gdb_regs[] is the correct size commit 76ed0b803a2ab793a1b27d1dfe0de7955282cd34 upstream. NUMREGBYTES (which is used as the size for gdb_regs[]) is incorrectly based on DBG_MAX_REG_NUM instead of GDB_MAX_REGS. DBG_MAX_REG_NUM is the number of total registers, while GDB_MAX_REGS is the number of 'unsigned longs' it takes to serialize those registers. Since FP registers require 3 'unsigned longs' each, DBG_MAX_REG_NUM is smaller than GDB_MAX_REGS. This causes GDB 8.0 give the following error on connect: "Truncated register 19 in remote 'g' packet" This also causes the register serialization/deserialization logic to overflow gdb_regs[], overwriting whatever follows. Fixes: 834b2964b7ab ("kgdb,arm: fix register dump") Cc: # 2.6.37+ Signed-off-by: David Rivshin Acked-by: Rabin Vincent Tested-by: Daniel Thompson Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/kgdb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kgdb.h b/arch/arm/include/asm/kgdb.h index 0a9d5dd93294..6949c7d4481c 100644 --- a/arch/arm/include/asm/kgdb.h +++ b/arch/arm/include/asm/kgdb.h @@ -76,7 +76,7 @@ extern int kgdb_fault_expected; #define KGDB_MAX_NO_CPUS 1 #define BUFMAX 400 -#define NUMREGBYTES (DBG_MAX_REG_NUM << 2) +#define NUMREGBYTES (GDB_MAX_REGS << 2) #define NUMCRITREGBYTES (32 << 2) #define _R0 0 From 12942d52f23d79df25814730f559c54150ab3e84 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 22 Jun 2018 10:25:25 +0100 Subject: [PATCH 031/970] arm64: kpti: Use early_param for kpti= command-line option commit b5b7dd647f2d21b93f734ce890671cd908e69b0a upstream. We inspect __kpti_forced early on as part of the cpufeature enable callback which remaps the swapper page table using non-global entries. Ensure that __kpti_forced has been updated to reflect the kpti= command-line option before we start using it. Fixes: ea1e3de85e94 ("arm64: entry: Add fake CPU feature for unmapping the kernel at EL0") Cc: # 4.16.x- Reported-by: Wei Xu Tested-by: Sudeep Holla Tested-by: Wei Xu Signed-off-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/cpufeature.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 7959d2c92010..625c2b240ffb 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -826,7 +826,7 @@ static int __init parse_kpti(char *str) __kpti_forced = enabled ? 1 : -1; return 0; } -__setup("kpti=", parse_kpti); +early_param("kpti", parse_kpti); #endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */ static const struct arm64_cpu_capabilities arm64_features[] = { From fb6786ce77ac4d65351832412703b8793e122f12 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Fri, 22 Jun 2018 16:23:45 +0100 Subject: [PATCH 032/970] arm64: mm: Ensure writes to swapper are ordered wrt subsequent cache maintenance commit 71c8fc0c96abf8e53e74ed4d891d671e585f9076 upstream. When rewriting swapper using nG mappings, we must performance cache maintenance around each page table access in order to avoid coherency problems with the host's cacheable alias under KVM. To ensure correct ordering of the maintenance with respect to Device memory accesses made with the Stage-1 MMU disabled, DMBs need to be added between the maintenance and the corresponding memory access. This patch adds a missing DMB between writing a new page table entry and performing a clean+invalidate on the same line. Fixes: f992b4dfd58b ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings") Cc: # 4.16.x- Acked-by: Mark Rutland Signed-off-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman --- arch/arm64/mm/proc.S | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 66cce2138f95..18d96d349a8b 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -186,8 +186,9 @@ ENDPROC(idmap_cpu_replace_ttbr1) .macro __idmap_kpti_put_pgtable_ent_ng, type orr \type, \type, #PTE_NG // Same bit for blocks and pages - str \type, [cur_\()\type\()p] // Update the entry and ensure it - dc civac, cur_\()\type\()p // is visible to all CPUs. + str \type, [cur_\()\type\()p] // Update the entry and ensure + dmb sy // that it is visible to all + dc civac, cur_\()\type\()p // CPUs. .endm /* From f92ec84c49f91756cbdd048c1a3136f15810cd94 Mon Sep 17 00:00:00 2001 From: Stefan M Schaeckeler Date: Mon, 21 May 2018 16:26:14 -0700 Subject: [PATCH 033/970] of: unittest: for strings, account for trailing \0 in property length field commit 3b9cf7905fe3ab35ab437b5072c883e609d3498d upstream. For strings, account for trailing \0 in property length field: This is consistent with how dtc builds string properties. Function __of_prop_dup() would misbehave on such properties as it duplicates properties based on the property length field creating new string values without trailing \0s. Signed-off-by: Stefan M Schaeckeler Reviewed-by: Frank Rowand Tested-by: Frank Rowand Cc: Signed-off-by: Rob Herring Signed-off-by: Greg Kroah-Hartman --- drivers/of/unittest.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 53c83d66eb7e..90b5a898d6b1 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -155,20 +155,20 @@ static void __init of_unittest_dynamic(void) /* Add a new property - should pass*/ prop->name = "new-property"; prop->value = "new-property-data"; - prop->length = strlen(prop->value); + prop->length = strlen(prop->value) + 1; unittest(of_add_property(np, prop) == 0, "Adding a new property failed\n"); /* Try to add an existing property - should fail */ prop++; prop->name = "new-property"; prop->value = "new-property-data-should-fail"; - prop->length = strlen(prop->value); + prop->length = strlen(prop->value) + 1; unittest(of_add_property(np, prop) != 0, "Adding an existing property should have failed\n"); /* Try to modify an existing property - should pass */ prop->value = "modify-property-data-should-pass"; - prop->length = strlen(prop->value); + prop->length = strlen(prop->value) + 1; unittest(of_update_property(np, prop) == 0, "Updating an existing property should have passed\n"); @@ -176,7 +176,7 @@ static void __init of_unittest_dynamic(void) prop++; prop->name = "modify-property"; prop->value = "modify-missing-property-data-should-pass"; - prop->length = strlen(prop->value); + prop->length = strlen(prop->value) + 1; unittest(of_update_property(np, prop) == 0, "Updating a missing property should have passed\n"); From 9321e8303406e2a194ba8e88fa8b57ee360f3f00 Mon Sep 17 00:00:00 2001 From: Mike Marciniszyn Date: Fri, 18 May 2018 17:07:01 -0700 Subject: [PATCH 034/970] IB/qib: Fix DMA api warning with debug kernel commit 0252f73334f9ef68868e4684200bea3565a4fcee upstream. The following error occurs in a debug build when running MPI PSM: [ 307.415911] WARNING: CPU: 4 PID: 23867 at lib/dma-debug.c:1158 check_unmap+0x4ee/0xa20 [ 307.455661] ib_qib 0000:05:00.0: DMA-API: device driver failed to check map error[device address=0x00000000df82b000] [size=4096 bytes] [mapped as page] [ 307.517494] Modules linked in: [ 307.531584] ib_isert iscsi_target_mod ib_srpt target_core_mod rpcrdma sunrpc ib_srp scsi_transport_srp scsi_tgt ib_iser libiscsi ib_ipoib scsi_transport_iscsi rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_qib intel_powerclamp coretemp rdmavt intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel ipmi_ssif ib_core aesni_intel sg ipmi_si lrw gf128mul dca glue_helper ipmi_devintf iTCO_wdt gpio_ich hpwdt iTCO_vendor_support ablk_helper hpilo acpi_power_meter cryptd ipmi_msghandler ie31200_edac shpchp pcc_cpufreq lpc_ich pcspkr ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul crct10dif_common drm crc32c_intel libahci tg3 libata serio_raw ptp i2c_core [ 307.846113] pps_core dm_mirror dm_region_hash dm_log dm_mod [ 307.866505] CPU: 4 PID: 23867 Comm: mpitests-IMB-MP Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 307.911178] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013 [ 307.944206] Call Trace: [ 307.956973] [] dump_stack+0x19/0x1b [ 307.982201] [] __warn+0xd8/0x100 [ 308.005999] [] warn_slowpath_fmt+0x5f/0x80 [ 308.034260] [] check_unmap+0x4ee/0xa20 [ 308.060801] [] ? page_add_file_rmap+0x2a/0x1d0 [ 308.090689] [] debug_dma_unmap_page+0x9d/0xb0 [ 308.120155] [] ? might_fault+0xa0/0xb0 [ 308.146656] [] qib_tid_free.isra.14+0x215/0x2a0 [ib_qib] [ 308.180739] [] qib_write+0x894/0x1280 [ib_qib] [ 308.210733] [] ? __inode_security_revalidate+0x70/0x80 [ 308.244837] [] ? security_file_permission+0x27/0xb0 [ 308.266025] qib_ib0.8006: multicast join failed for ff12:401b:8006:0000:0000:0000:ffff:ffff, status -22 [ 308.323421] [] vfs_write+0xc3/0x1f0 [ 308.347077] [] ? fget_light+0xfc/0x510 [ 308.372533] [] SyS_write+0x8a/0x100 [ 308.396456] [] system_call_fastpath+0x1c/0x21 The code calls a qib_map_page() which has never correctly tested for a mapping error. Fix by testing for pci_dma_mapping_error() in all cases and properly handling the failure in the caller. Additionally, streamline qib_map_page() arguments to satisfy just the single caller. Cc: Reviewed-by: Alex Estrin Tested-by: Don Dutile Reviewed-by: Don Dutile Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/qib/qib.h | 3 +-- drivers/infiniband/hw/qib/qib_file_ops.c | 10 +++++++--- drivers/infiniband/hw/qib/qib_user_pages.c | 20 ++++++++++++-------- 3 files changed, 20 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h index a3e21a25cea5..0078f2ad9b92 100644 --- a/drivers/infiniband/hw/qib/qib.h +++ b/drivers/infiniband/hw/qib/qib.h @@ -1448,8 +1448,7 @@ u64 qib_sps_ints(void); /* * dma_addr wrappers - all 0's invalid for hw */ -dma_addr_t qib_map_page(struct pci_dev *, struct page *, unsigned long, - size_t, int); +int qib_map_page(struct pci_dev *d, struct page *p, dma_addr_t *daddr); const char *qib_get_unit_name(int unit); const char *qib_get_card_name(struct rvt_dev_info *rdi); struct pci_dev *qib_get_pci_dev(struct rvt_dev_info *rdi); diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c index 382466a90da7..cc6a92316c7c 100644 --- a/drivers/infiniband/hw/qib/qib_file_ops.c +++ b/drivers/infiniband/hw/qib/qib_file_ops.c @@ -364,6 +364,8 @@ static int qib_tid_update(struct qib_ctxtdata *rcd, struct file *fp, goto done; } for (i = 0; i < cnt; i++, vaddr += PAGE_SIZE) { + dma_addr_t daddr; + for (; ntids--; tid++) { if (tid == tidcnt) tid = 0; @@ -380,12 +382,14 @@ static int qib_tid_update(struct qib_ctxtdata *rcd, struct file *fp, ret = -ENOMEM; break; } + ret = qib_map_page(dd->pcidev, pagep[i], &daddr); + if (ret) + break; + tidlist[i] = tid + tidoff; /* we "know" system pages and TID pages are same size */ dd->pageshadow[ctxttid + tid] = pagep[i]; - dd->physshadow[ctxttid + tid] = - qib_map_page(dd->pcidev, pagep[i], 0, PAGE_SIZE, - PCI_DMA_FROMDEVICE); + dd->physshadow[ctxttid + tid] = daddr; /* * don't need atomic or it's overhead */ diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 75f08624ac05..4715edff5488 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -98,23 +98,27 @@ static int __qib_get_user_pages(unsigned long start_page, size_t num_pages, * * I'm sure we won't be so lucky with other iommu's, so FIXME. */ -dma_addr_t qib_map_page(struct pci_dev *hwdev, struct page *page, - unsigned long offset, size_t size, int direction) +int qib_map_page(struct pci_dev *hwdev, struct page *page, dma_addr_t *daddr) { dma_addr_t phys; - phys = pci_map_page(hwdev, page, offset, size, direction); + phys = pci_map_page(hwdev, page, 0, PAGE_SIZE, PCI_DMA_FROMDEVICE); + if (pci_dma_mapping_error(hwdev, phys)) + return -ENOMEM; - if (phys == 0) { - pci_unmap_page(hwdev, phys, size, direction); - phys = pci_map_page(hwdev, page, offset, size, direction); + if (!phys) { + pci_unmap_page(hwdev, phys, PAGE_SIZE, PCI_DMA_FROMDEVICE); + phys = pci_map_page(hwdev, page, 0, PAGE_SIZE, + PCI_DMA_FROMDEVICE); + if (pci_dma_mapping_error(hwdev, phys)) + return -ENOMEM; /* * FIXME: If we get 0 again, we should keep this page, * map another, then free the 0 page. */ } - - return phys; + *daddr = phys; + return 0; } /** From 9cac0a08e476775df3ae5f90730a766ea01b6232 Mon Sep 17 00:00:00 2001 From: Alex Estrin Date: Wed, 2 May 2018 06:43:15 -0700 Subject: [PATCH 035/970] IB/{hfi1, qib}: Add handling of kernel restart commit 8d3e71136a080d007620472f50c7b3e63ba0f5cf upstream. A warm restart will fail to unload the driver, leaving link state potentially flapping up to the point the BIOS resets the adapter. Correct the issue by hooking the shutdown pci method, which will bring port down. Cc: # 4.9.x Reviewed-by: Mike Marciniszyn Signed-off-by: Alex Estrin Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/hfi.h | 1 + drivers/infiniband/hw/hfi1/init.c | 13 +++++++++++++ drivers/infiniband/hw/qib/qib.h | 1 + drivers/infiniband/hw/qib/qib_init.c | 13 +++++++++++++ 4 files changed, 28 insertions(+) diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index a3279f3d2578..a79d9b340cfa 100644 --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1631,6 +1631,7 @@ struct cc_state *get_cc_state_protected(struct hfi1_pportdata *ppd) #define HFI1_HAS_SDMA_TIMEOUT 0x8 #define HFI1_HAS_SEND_DMA 0x10 /* Supports Send DMA */ #define HFI1_FORCED_FREEZE 0x80 /* driver forced freeze mode */ +#define HFI1_SHUTDOWN 0x100 /* device is shutting down */ /* IB dword length mask in PBC (lower 11 bits); same for all chips */ #define HFI1_PBC_LENGTH_MASK ((1 << 11) - 1) diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index ae1f90ddd4e8..c81c44525dd5 100644 --- a/drivers/infiniband/hw/hfi1/init.c +++ b/drivers/infiniband/hw/hfi1/init.c @@ -857,6 +857,10 @@ static void shutdown_device(struct hfi1_devdata *dd) unsigned pidx; int i; + if (dd->flags & HFI1_SHUTDOWN) + return; + dd->flags |= HFI1_SHUTDOWN; + for (pidx = 0; pidx < dd->num_pports; ++pidx) { ppd = dd->pport + pidx; @@ -1168,6 +1172,7 @@ void hfi1_disable_after_error(struct hfi1_devdata *dd) static void remove_one(struct pci_dev *); static int init_one(struct pci_dev *, const struct pci_device_id *); +static void shutdown_one(struct pci_dev *); #define DRIVER_LOAD_MSG "Intel " DRIVER_NAME " loaded: " #define PFX DRIVER_NAME ": " @@ -1184,6 +1189,7 @@ static struct pci_driver hfi1_pci_driver = { .name = DRIVER_NAME, .probe = init_one, .remove = remove_one, + .shutdown = shutdown_one, .id_table = hfi1_pci_tbl, .err_handler = &hfi1_pci_err_handler, }; @@ -1590,6 +1596,13 @@ static void remove_one(struct pci_dev *pdev) postinit_cleanup(dd); } +static void shutdown_one(struct pci_dev *pdev) +{ + struct hfi1_devdata *dd = pci_get_drvdata(pdev); + + shutdown_device(dd); +} + /** * hfi1_create_rcvhdrq - create a receive header queue * @dd: the hfi1_ib device diff --git a/drivers/infiniband/hw/qib/qib.h b/drivers/infiniband/hw/qib/qib.h index 0078f2ad9b92..ef092cca092e 100644 --- a/drivers/infiniband/hw/qib/qib.h +++ b/drivers/infiniband/hw/qib/qib.h @@ -1250,6 +1250,7 @@ static inline struct qib_ibport *to_iport(struct ib_device *ibdev, u8 port) #define QIB_BADINTR 0x8000 /* severe interrupt problems */ #define QIB_DCA_ENABLED 0x10000 /* Direct Cache Access enabled */ #define QIB_HAS_QSFP 0x20000 /* device (card instance) has QSFP */ +#define QIB_SHUTDOWN 0x40000 /* device is shutting down */ /* * values for ppd->lflags (_ib_port_ related flags) diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c index 1730aa839a47..caf7c5120b0a 100644 --- a/drivers/infiniband/hw/qib/qib_init.c +++ b/drivers/infiniband/hw/qib/qib_init.c @@ -878,6 +878,10 @@ static void qib_shutdown_device(struct qib_devdata *dd) struct qib_pportdata *ppd; unsigned pidx; + if (dd->flags & QIB_SHUTDOWN) + return; + dd->flags |= QIB_SHUTDOWN; + for (pidx = 0; pidx < dd->num_pports; ++pidx) { ppd = dd->pport + pidx; @@ -1223,6 +1227,7 @@ void qib_disable_after_error(struct qib_devdata *dd) static void qib_remove_one(struct pci_dev *); static int qib_init_one(struct pci_dev *, const struct pci_device_id *); +static void qib_shutdown_one(struct pci_dev *); #define DRIVER_LOAD_MSG "Intel " QIB_DRV_NAME " loaded: " #define PFX QIB_DRV_NAME ": " @@ -1240,6 +1245,7 @@ static struct pci_driver qib_driver = { .name = QIB_DRV_NAME, .probe = qib_init_one, .remove = qib_remove_one, + .shutdown = qib_shutdown_one, .id_table = qib_pci_tbl, .err_handler = &qib_pci_err_handler, }; @@ -1591,6 +1597,13 @@ static void qib_remove_one(struct pci_dev *pdev) qib_postinit_cleanup(dd); } +static void qib_shutdown_one(struct pci_dev *pdev) +{ + struct qib_devdata *dd = pci_get_drvdata(pdev); + + qib_shutdown_device(dd); +} + /** * qib_create_rcvhdrq - create a receive header queue * @dd: the qlogic_ib device From e355402cf19cf30e5f659ac21ddf40ccd7e5bcf0 Mon Sep 17 00:00:00 2001 From: Erez Shitrit Date: Mon, 21 May 2018 11:41:01 +0300 Subject: [PATCH 036/970] IB/mlx5: Fetch soft WQE's on fatal error state commit 7b74a83cf54a3747e22c57e25712bd70eef8acee upstream. On fatal error the driver simulates CQE's for ULPs that rely on completion of all their posted work-request. For the GSI traffic, the mlx5 has its own mechanism that sends the completions via software CQE's directly to the relevant CQ. This should be kept in fatal error too, so the driver should simulate such CQE's with the specified error state in order to complete GSI QP work requests. Without the fix the next deadlock might appears: schedule_timeout+0x274/0x350 wait_for_common+0xec/0x240 mcast_remove_one+0xd0/0x120 [ib_core] ib_unregister_device+0x12c/0x230 [ib_core] mlx5_ib_remove+0xc4/0x270 [mlx5_ib] mlx5_detach_device+0x184/0x1a0 [mlx5_core] mlx5_unload_one+0x308/0x340 [mlx5_core] mlx5_pci_err_detected+0x74/0xe0 [mlx5_core] Cc: # 4.7 Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs") Signed-off-by: Erez Shitrit Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx5/cq.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index fc62a7ded734..a19ebb19952e 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -645,7 +645,7 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, } static int poll_soft_wc(struct mlx5_ib_cq *cq, int num_entries, - struct ib_wc *wc) + struct ib_wc *wc, bool is_fatal_err) { struct mlx5_ib_dev *dev = to_mdev(cq->ibcq.device); struct mlx5_ib_wc *soft_wc, *next; @@ -658,6 +658,10 @@ static int poll_soft_wc(struct mlx5_ib_cq *cq, int num_entries, mlx5_ib_dbg(dev, "polled software generated completion on CQ 0x%x\n", cq->mcq.cqn); + if (unlikely(is_fatal_err)) { + soft_wc->wc.status = IB_WC_WR_FLUSH_ERR; + soft_wc->wc.vendor_err = MLX5_CQE_SYNDROME_WR_FLUSH_ERR; + } wc[npolled++] = soft_wc->wc; list_del(&soft_wc->list); kfree(soft_wc); @@ -678,12 +682,17 @@ int mlx5_ib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc) spin_lock_irqsave(&cq->lock, flags); if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) { - mlx5_ib_poll_sw_comp(cq, num_entries, wc, &npolled); + /* make sure no soft wqe's are waiting */ + if (unlikely(!list_empty(&cq->wc_list))) + soft_polled = poll_soft_wc(cq, num_entries, wc, true); + + mlx5_ib_poll_sw_comp(cq, num_entries - soft_polled, + wc + soft_polled, &npolled); goto out; } if (unlikely(!list_empty(&cq->wc_list))) - soft_polled = poll_soft_wc(cq, num_entries, wc); + soft_polled = poll_soft_wc(cq, num_entries, wc, false); for (npolled = 0; npolled < num_entries - soft_polled; npolled++) { if (mlx5_poll_one(cq, &cur_qp, wc + soft_polled + npolled)) From a664281b85e01066336a0eb46bd09054c0705009 Mon Sep 17 00:00:00 2001 From: Alex Estrin Date: Tue, 15 May 2018 18:31:39 -0700 Subject: [PATCH 037/970] IB/isert: Fix for lib/dma_debug check_sync warning commit 763b69654bfb88ea3230d015e7d755ee8339f8ee upstream. The following error message occurs on a target host in a debug build during session login: [ 3524.411874] WARNING: CPU: 5 PID: 12063 at lib/dma-debug.c:1207 check_sync+0x4ec/0x5b0 [ 3524.421057] infiniband hfi1_0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x0000000000000000] [size=76 bytes] ......snip ..... [ 3524.535846] CPU: 5 PID: 12063 Comm: iscsi_np Kdump: loaded Not tainted 3.10.0-862.el7.x86_64.debug #1 [ 3524.546764] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 [ 3524.555740] Call Trace: [ 3524.559102] [] dump_stack+0x19/0x1b [ 3524.565477] [] __warn+0xd8/0x100 [ 3524.571557] [] warn_slowpath_fmt+0x5f/0x80 [ 3524.578610] [] check_sync+0x4ec/0x5b0 [ 3524.585177] [] ? set_cpus_allowed_ptr+0x5f/0x1c0 [ 3524.592812] [] debug_dma_sync_single_for_cpu+0x80/0x90 [ 3524.601029] [] ? x2apic_send_IPI_mask+0x13/0x20 [ 3524.608574] [] ? native_smp_send_reschedule+0x5b/0x80 [ 3524.616699] [] ? resched_curr+0xf6/0x140 [ 3524.623567] [] isert_create_send_desc.isra.26+0xe0/0x110 [ib_isert] [ 3524.633060] [] isert_put_login_tx+0x55/0x8b0 [ib_isert] [ 3524.641383] [] ? try_to_wake_up+0x1a4/0x430 [ 3524.648561] [] iscsi_target_do_tx_login_io+0xdd/0x230 [iscsi_target_mod] [ 3524.658557] [] iscsi_target_do_login+0x1a7/0x600 [iscsi_target_mod] [ 3524.668084] [] ? kstrdup+0x49/0x60 [ 3524.674420] [] iscsi_target_start_negotiation+0x56/0xc0 [iscsi_target_mod] [ 3524.684656] [] __iscsi_target_login_thread+0x90e/0x1070 [iscsi_target_mod] [ 3524.694901] [] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.705446] [] ? __iscsi_target_login_thread+0x1070/0x1070 [iscsi_target_mod] [ 3524.715976] [] iscsi_target_login_thread+0x28/0x60 [iscsi_target_mod] [ 3524.725739] [] kthread+0xef/0x100 [ 3524.732007] [] ? insert_kthread_work+0x80/0x80 [ 3524.739540] [] ret_from_fork_nospec_begin+0x21/0x21 [ 3524.747558] [] ? insert_kthread_work+0x80/0x80 [ 3524.755088] ---[ end trace 23f8bf9238bd1ed8 ]--- [ 3595.510822] iSCSI/iqn.1994-05.com.redhat:537fa56299: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION. The code calls dma_sync on login_tx_desc->dma_addr prior to initializing it with dma-mapped address. login_tx_desc is a part of iser_conn structure and is used only once during login negotiation, so the issue is fixed by eliminating dma_sync call for this buffer using a special case routine. Cc: Reviewed-by: Mike Marciniszyn Reviewed-by: Don Dutile Signed-off-by: Alex Estrin Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/isert/ib_isert.c | 26 ++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index b879d21b7548..16e56bfd4e2f 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -879,15 +879,9 @@ isert_login_post_send(struct isert_conn *isert_conn, struct iser_tx_desc *tx_des } static void -isert_create_send_desc(struct isert_conn *isert_conn, - struct isert_cmd *isert_cmd, - struct iser_tx_desc *tx_desc) +__isert_create_send_desc(struct isert_device *device, + struct iser_tx_desc *tx_desc) { - struct isert_device *device = isert_conn->device; - struct ib_device *ib_dev = device->ib_device; - - ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr, - ISER_HEADERS_LEN, DMA_TO_DEVICE); memset(&tx_desc->iser_header, 0, sizeof(struct iser_ctrl)); tx_desc->iser_header.flags = ISCSI_CTRL; @@ -900,6 +894,20 @@ isert_create_send_desc(struct isert_conn *isert_conn, } } +static void +isert_create_send_desc(struct isert_conn *isert_conn, + struct isert_cmd *isert_cmd, + struct iser_tx_desc *tx_desc) +{ + struct isert_device *device = isert_conn->device; + struct ib_device *ib_dev = device->ib_device; + + ib_dma_sync_single_for_cpu(ib_dev, tx_desc->dma_addr, + ISER_HEADERS_LEN, DMA_TO_DEVICE); + + __isert_create_send_desc(device, tx_desc); +} + static int isert_init_tx_hdrs(struct isert_conn *isert_conn, struct iser_tx_desc *tx_desc) @@ -987,7 +995,7 @@ isert_put_login_tx(struct iscsi_conn *conn, struct iscsi_login *login, struct iser_tx_desc *tx_desc = &isert_conn->login_tx_desc; int ret; - isert_create_send_desc(isert_conn, NULL, tx_desc); + __isert_create_send_desc(device, tx_desc); memcpy(&tx_desc->iscsi_header, &login->rsp[0], sizeof(struct iscsi_hdr)); From 52e167187be8b584673204f6d87b07aff7725e47 Mon Sep 17 00:00:00 2001 From: Max Gurtovoy Date: Thu, 31 May 2018 11:05:23 +0300 Subject: [PATCH 038/970] IB/isert: fix T10-pi check mask setting commit 0e12af84cdd3056460f928adc164f9e87f4b303b upstream. A copy/paste bug (probably) caused setting of an app_tag check mask in case where a ref_tag check was needed. Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API") Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Martin K. Petersen Signed-off-by: Max Gurtovoy Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/isert/ib_isert.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 16e56bfd4e2f..02a5e2d7e574 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -2090,7 +2090,7 @@ isert_set_sig_attrs(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs) sig_attrs->check_mask = (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD ? 0xc0 : 0) | - (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x30 : 0) | + (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) | (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0); return 0; } From afe249e3e38d51e58eba9b6992646a1f8070883d Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 29 May 2018 14:56:14 +0300 Subject: [PATCH 039/970] RDMA/mlx4: Discard unknown SQP work requests commit 6b1ca7ece15e94251d1d0d919f813943e4a58059 upstream. There is no need to crash the machine if unknown work request was received in SQP MAD. Cc: # 3.6 Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs") Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx4/mad.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 18d309e40f1b..d9323d7c479c 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -1897,7 +1897,6 @@ static void mlx4_ib_sqp_comp_worker(struct work_struct *work) "buf:%lld\n", wc.wr_id); break; default: - BUG_ON(1); break; } } else { From e9dc5dce0925ea4412345ca034bba9a252de2783 Mon Sep 17 00:00:00 2001 From: Tokunori Ikegami Date: Wed, 30 May 2018 18:32:26 +0900 Subject: [PATCH 040/970] mtd: cfi_cmdset_0002: Change write buffer to check correct value commit dfeae1073583dc35c33b32150e18b7048bbb37e6 upstream. For the word write it is checked if the chip has the correct value. But it is not checked for the write buffer as only checked if ready. To make sure for the write buffer change to check the value. It is enough as this patch is only checking the last written word. Since it is described by data sheets to check the operation status. Signed-off-by: Tokunori Ikegami Reviewed-by: Joakim Tjernlund Cc: Chris Packham Cc: Brian Norris Cc: David Woodhouse Cc: Boris Brezillon Cc: Marek Vasut Cc: Richard Weinberger Cc: Cyrille Pitchen Cc: linux-mtd@lists.infradead.org Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 107c05b3ddbb..68eaa5eb15d5 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -1876,7 +1876,7 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip, if (time_after(jiffies, timeo) && !chip_ready(map, adr)) break; - if (chip_ready(map, adr)) { + if (chip_good(map, adr, datum)) { xip_enable(map, chip, adr); goto op_done; } From 552eacd58ee4c930f313f8e01f4cee75fc48afed Mon Sep 17 00:00:00 2001 From: Joakim Tjernlund Date: Wed, 6 Jun 2018 12:13:27 +0200 Subject: [PATCH 041/970] mtd: cfi_cmdset_0002: Use right chip in do_ppb_xxlock() commit f93aa8c4de307069c270b2d81741961162bead6c upstream. do_ppb_xxlock() fails to add chip->start when querying for lock status (and chip_ready test), which caused false status reports. Fix that by adding adr += chip->start and adjust call sites accordingly. Fixes: 1648eaaa1575 ("mtd: cfi_cmdset_0002: Support Persistent Protection Bits (PPB) locking") Cc: stable@vger.kernel.org Signed-off-by: Joakim Tjernlund Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 68eaa5eb15d5..2132bd76f972 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2549,8 +2549,9 @@ static int __maybe_unused do_ppb_xxlock(struct map_info *map, unsigned long timeo; int ret; + adr += chip->start; mutex_lock(&chip->mutex); - ret = get_chip(map, chip, adr + chip->start, FL_LOCKING); + ret = get_chip(map, chip, adr, FL_LOCKING); if (ret) { mutex_unlock(&chip->mutex); return ret; @@ -2568,8 +2569,8 @@ static int __maybe_unused do_ppb_xxlock(struct map_info *map, if (thunk == DO_XXLOCK_ONEBLOCK_LOCK) { chip->state = FL_LOCKING; - map_write(map, CMD(0xA0), chip->start + adr); - map_write(map, CMD(0x00), chip->start + adr); + map_write(map, CMD(0xA0), adr); + map_write(map, CMD(0x00), adr); } else if (thunk == DO_XXLOCK_ONEBLOCK_UNLOCK) { /* * Unlocking of one specific sector is not supported, so we @@ -2607,7 +2608,7 @@ static int __maybe_unused do_ppb_xxlock(struct map_info *map, map_write(map, CMD(0x00), chip->start); chip->state = FL_READY; - put_chip(map, chip, adr + chip->start); + put_chip(map, chip, adr); mutex_unlock(&chip->mutex); return ret; From 0bf4e48c20ca447e577c77bfc9205f1c8890c65d Mon Sep 17 00:00:00 2001 From: Joakim Tjernlund Date: Wed, 6 Jun 2018 12:13:28 +0200 Subject: [PATCH 042/970] mtd: cfi_cmdset_0002: fix SEGV unlocking multiple chips commit 5fdfc3dbad099281bf027a353d5786c09408a8e5 upstream. cfi_ppb_unlock() tries to relock all sectors that were locked before unlocking the whole chip. This locking used the chip start address + the FULL offset from the first flash chip, thereby forming an illegal address. Fix that by using the chip offset(adr). Fixes: 1648eaaa1575 ("mtd: cfi_cmdset_0002: Support Persistent Protection Bits (PPB) locking") Cc: stable@vger.kernel.org Signed-off-by: Joakim Tjernlund Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 2132bd76f972..57b7fd8b930b 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2531,7 +2531,7 @@ static int cfi_atmel_unlock(struct mtd_info *mtd, loff_t ofs, uint64_t len) struct ppb_lock { struct flchip *chip; - loff_t offset; + unsigned long adr; int locked; }; @@ -2667,7 +2667,7 @@ static int __maybe_unused cfi_ppb_unlock(struct mtd_info *mtd, loff_t ofs, */ if ((adr < ofs) || (adr >= (ofs + len))) { sect[sectors].chip = &cfi->chips[chipnum]; - sect[sectors].offset = offset; + sect[sectors].adr = adr; sect[sectors].locked = do_ppb_xxlock( map, &cfi->chips[chipnum], adr, 0, DO_XXLOCK_ONEBLOCK_GETLOCK); @@ -2711,7 +2711,7 @@ static int __maybe_unused cfi_ppb_unlock(struct mtd_info *mtd, loff_t ofs, */ for (i = 0; i < sectors; i++) { if (sect[i].locked) - do_ppb_xxlock(map, sect[i].chip, sect[i].offset, 0, + do_ppb_xxlock(map, sect[i].chip, sect[i].adr, 0, DO_XXLOCK_ONEBLOCK_LOCK); } From b4e24c2842e15f221ac3527af781d7b8940b5cfb Mon Sep 17 00:00:00 2001 From: Joakim Tjernlund Date: Wed, 6 Jun 2018 12:13:29 +0200 Subject: [PATCH 043/970] mtd: cfi_cmdset_0002: Fix unlocking requests crossing a chip boudary commit 0cd8116f172eed018907303dbff5c112690eeb91 upstream. The "sector is in requested range" test used to determine whether sectors should be re-locked or not is done on a variable that is reset everytime we cross a chip boundary, which can lead to some blocks being re-locked while the caller expect them to be unlocked. Fix the check to make sure this cannot happen. Fixes: 1648eaaa1575 ("mtd: cfi_cmdset_0002: Support Persistent Protection Bits (PPB) locking") Cc: stable@vger.kernel.org Signed-off-by: Joakim Tjernlund Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 57b7fd8b930b..6c042a2439e8 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2665,7 +2665,7 @@ static int __maybe_unused cfi_ppb_unlock(struct mtd_info *mtd, loff_t ofs, * sectors shall be unlocked, so lets keep their locking * status at "unlocked" (locked=0) for the final re-locking. */ - if ((adr < ofs) || (adr >= (ofs + len))) { + if ((offset < ofs) || (offset >= (ofs + len))) { sect[sectors].chip = &cfi->chips[chipnum]; sect[sectors].adr = adr; sect[sectors].locked = do_ppb_xxlock( From 5fdb3c468b515694159583d01a3154e0ffa4a81f Mon Sep 17 00:00:00 2001 From: Joakim Tjernlund Date: Wed, 6 Jun 2018 12:13:30 +0200 Subject: [PATCH 044/970] mtd: cfi_cmdset_0002: Avoid walking all chips when unlocking. commit f1ce87f6080b1dda7e7b1eda3da332add19d87b9 upstream. cfi_ppb_unlock() walks all flash chips when unlocking sectors, avoid walking chips unaffected by the unlock operation. Fixes: 1648eaaa1575 ("mtd: cfi_cmdset_0002: Support Persistent Protection Bits (PPB) locking") Cc: stable@vger.kernel.org Signed-off-by: Joakim Tjernlund Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 6c042a2439e8..33d025e42793 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2681,6 +2681,8 @@ static int __maybe_unused cfi_ppb_unlock(struct mtd_info *mtd, loff_t ofs, i++; if (adr >> cfi->chipshift) { + if (offset >= (ofs + len)) + break; adr = 0; chipnum++; From 83f9549d650b477b46fc070461f82ef536eae702 Mon Sep 17 00:00:00 2001 From: Tokunori Ikegami Date: Sun, 3 Jun 2018 23:02:01 +0900 Subject: [PATCH 045/970] MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 2a027b47dba6b77ab8c8e47b589ae9bbc5ac6175 upstream. The erratum and workaround are described by BCM5300X-ES300-RDS.pdf as below. R10: PCIe Transactions Periodically Fail Description: The BCM5300X PCIe does not maintain transaction ordering. This may cause PCIe transaction failure. Fix Comment: Add a dummy PCIe configuration read after a PCIe configuration write to ensure PCIe configuration access ordering. Set ES bit of CP0 configu7 register to enable sync function so that the sync instruction is functional. Resolution: hndpci.c: extpci_write_config() hndmips.c: si_mips_init() mipsinc.h CONF7_ES This is fixed by the CFE MIPS bcmsi chipset driver also for BCM47XX. Also the dummy PCIe configuration read is already implemented in the Linux BCMA driver. Enable ExternalSync in Config7 when CONFIG_BCMA_DRIVER_PCI_HOSTMODE=y too so that the sync instruction is externalised. Signed-off-by: Tokunori Ikegami Reviewed-by: Paul Burton Acked-by: Hauke Mehrtens Cc: Chris Packham Cc: Rafał Miłecki Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/19461/ Signed-off-by: James Hogan Signed-off-by: Greg Kroah-Hartman --- arch/mips/bcm47xx/setup.c | 6 ++++++ arch/mips/include/asm/mipsregs.h | 3 +++ 2 files changed, 9 insertions(+) diff --git a/arch/mips/bcm47xx/setup.c b/arch/mips/bcm47xx/setup.c index 6054d49e608e..8c9cbf13d32a 100644 --- a/arch/mips/bcm47xx/setup.c +++ b/arch/mips/bcm47xx/setup.c @@ -212,6 +212,12 @@ static int __init bcm47xx_cpu_fixes(void) */ if (bcm47xx_bus.bcma.bus.chipinfo.id == BCMA_CHIP_ID_BCM4706) cpu_wait = NULL; + + /* + * BCM47XX Erratum "R10: PCIe Transactions Periodically Fail" + * Enable ExternalSync for sync instruction to take effect + */ + set_c0_config7(MIPS_CONF7_ES); break; #endif } diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h index df78b2ca70eb..22a6782f84f5 100644 --- a/arch/mips/include/asm/mipsregs.h +++ b/arch/mips/include/asm/mipsregs.h @@ -663,6 +663,8 @@ #define MIPS_CONF7_WII (_ULCAST_(1) << 31) #define MIPS_CONF7_RPS (_ULCAST_(1) << 2) +/* ExternalSync */ +#define MIPS_CONF7_ES (_ULCAST_(1) << 8) #define MIPS_CONF7_IAR (_ULCAST_(1) << 10) #define MIPS_CONF7_AR (_ULCAST_(1) << 16) @@ -2641,6 +2643,7 @@ __BUILD_SET_C0(status) __BUILD_SET_C0(cause) __BUILD_SET_C0(config) __BUILD_SET_C0(config5) +__BUILD_SET_C0(config7) __BUILD_SET_C0(intcontrol) __BUILD_SET_C0(intctl) __BUILD_SET_C0(srsmap) From 5e1deade6064d088ad4852f9a2299eeeefdfe3d2 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Wed, 25 Apr 2018 14:27:37 -0600 Subject: [PATCH 046/970] PCI: Add ACS quirk for Intel 7th & 8th Gen mobile commit e8440f4bfedc623bee40c84797ac78d9303d0db6 upstream. The specification update indicates these have the same errata for implementing non-standard ACS capabilities. Signed-off-by: Alex Williamson Signed-off-by: Bjorn Helgaas CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/pci/quirks.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index b55f9179c94e..c7a695c2303a 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4225,11 +4225,24 @@ static int pci_quirk_qcom_rp_acs(struct pci_dev *dev, u16 acs_flags) * 0xa290-0xa29f PCI Express Root port #{0-16} * 0xa2e7-0xa2ee PCI Express Root port #{17-24} * + * Mobile chipsets are also affected, 7th & 8th Generation + * Specification update confirms ACS errata 22, status no fix: (7th Generation + * Intel Processor Family I/O for U/Y Platforms and 8th Generation Intel + * Processor Family I/O for U Quad Core Platforms Specification Update, + * August 2017, Revision 002, Document#: 334660-002)[6] + * Device IDs from I/O datasheet: (7th Generation Intel Processor Family I/O + * for U/Y Platforms and 8th Generation Intel ® Processor Family I/O for U + * Quad Core Platforms, Vol 1 of 2, August 2017, Document#: 334658-003)[7] + * + * 0x9d10-0x9d1b PCI Express Root port #{1-12} + * * [1] http://www.intel.com/content/www/us/en/chipsets/100-series-chipset-datasheet-vol-2.html * [2] http://www.intel.com/content/www/us/en/chipsets/100-series-chipset-datasheet-vol-1.html * [3] http://www.intel.com/content/www/us/en/chipsets/100-series-chipset-spec-update.html * [4] http://www.intel.com/content/www/us/en/chipsets/200-series-chipset-pch-spec-update.html * [5] http://www.intel.com/content/www/us/en/chipsets/200-series-chipset-pch-datasheet-vol-1.html + * [6] https://www.intel.com/content/www/us/en/processors/core/7th-gen-core-family-mobile-u-y-processor-lines-i-o-spec-update.html + * [7] https://www.intel.com/content/www/us/en/processors/core/7th-gen-core-family-mobile-u-y-processor-lines-i-o-datasheet-vol-1.html */ static bool pci_quirk_intel_spt_pch_acs_match(struct pci_dev *dev) { @@ -4239,6 +4252,7 @@ static bool pci_quirk_intel_spt_pch_acs_match(struct pci_dev *dev) switch (dev->device) { case 0xa110 ... 0xa11f: case 0xa167 ... 0xa16a: /* Sunrise Point */ case 0xa290 ... 0xa29f: case 0xa2e7 ... 0xa2ee: /* Union Point */ + case 0x9d10 ... 0x9d1b: /* 7th & 8th Gen Mobile */ return true; } From 0d3d58337d4bd414d069c1f4d6330e02ef2ecd1e Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Fri, 27 Apr 2018 13:06:30 -0500 Subject: [PATCH 047/970] PCI: Add ACS quirk for Intel 300 series commit f154a718e6cc0d834f5ac4dc4c3b174e65f3659e upstream. Intel 300 series chipset still has the same ACS issue as the previous generations so extend the ACS quirk to cover it as well. Signed-off-by: Mika Westerberg Signed-off-by: Bjorn Helgaas CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/pci/quirks.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index c7a695c2303a..a05d143ac43b 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -4236,6 +4236,11 @@ static int pci_quirk_qcom_rp_acs(struct pci_dev *dev, u16 acs_flags) * * 0x9d10-0x9d1b PCI Express Root port #{1-12} * + * The 300 series chipset suffers from the same bug so include those root + * ports here as well. + * + * 0xa32c-0xa343 PCI Express Root port #{0-24} + * * [1] http://www.intel.com/content/www/us/en/chipsets/100-series-chipset-datasheet-vol-2.html * [2] http://www.intel.com/content/www/us/en/chipsets/100-series-chipset-datasheet-vol-1.html * [3] http://www.intel.com/content/www/us/en/chipsets/100-series-chipset-spec-update.html @@ -4253,6 +4258,7 @@ static bool pci_quirk_intel_spt_pch_acs_match(struct pci_dev *dev) case 0xa110 ... 0xa11f: case 0xa167 ... 0xa16a: /* Sunrise Point */ case 0xa290 ... 0xa29f: case 0xa2e7 ... 0xa2ee: /* Union Point */ case 0x9d10 ... 0x9d1b: /* 7th & 8th Gen Mobile */ + case 0xa32c ... 0xa343: /* 300 series */ return true; } From ca558fb836d3b04fbda144fa22e295d5f41bc2de Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Wed, 23 May 2018 17:14:39 -0500 Subject: [PATCH 048/970] PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume commit 13c65840feab8109194f9490c9870587173cb29d upstream. After a suspend/resume cycle the Presence Detect or Data Link Layer Status Changed bits might be set. If we don't clear them those events will not fire anymore and nothing happens for instance when a device is now hot-unplugged. Fix this by clearing those bits in a newly introduced function pcie_reenable_notification(). This should be fine because immediately after, we check if the adapter is still present by reading directly from the status register. Signed-off-by: Mika Westerberg Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki Reviewed-by: Andy Shevchenko Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/pciehp.h | 2 +- drivers/pci/hotplug/pciehp_core.c | 2 +- drivers/pci/hotplug/pciehp_hpc.c | 13 ++++++++++++- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 37d70b5ad22f..2bba8481beb1 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -134,7 +134,7 @@ struct controller *pcie_init(struct pcie_device *dev); int pcie_init_notification(struct controller *ctrl); int pciehp_enable_slot(struct slot *p_slot); int pciehp_disable_slot(struct slot *p_slot); -void pcie_enable_notification(struct controller *ctrl); +void pcie_reenable_notification(struct controller *ctrl); int pciehp_power_on_slot(struct slot *slot); void pciehp_power_off_slot(struct slot *slot); void pciehp_get_power_status(struct slot *slot, u8 *status); diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index 7d32fa33dcef..6620b1095046 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -297,7 +297,7 @@ static int pciehp_resume(struct pcie_device *dev) ctrl = get_service_data(dev); /* reinitialize the chipset's event detection logic */ - pcie_enable_notification(ctrl); + pcie_reenable_notification(ctrl); slot = ctrl->slot; diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index d08dfc8b9ba9..8d811ea353c8 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -673,7 +673,7 @@ static irqreturn_t pcie_isr(int irq, void *dev_id) return handled; } -void pcie_enable_notification(struct controller *ctrl) +static void pcie_enable_notification(struct controller *ctrl) { u16 cmd, mask; @@ -711,6 +711,17 @@ void pcie_enable_notification(struct controller *ctrl) pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd); } +void pcie_reenable_notification(struct controller *ctrl) +{ + /* + * Clear both Presence and Data Link Layer Changed to make sure + * those events still fire after we have re-enabled them. + */ + pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_SLTSTA, + PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC); + pcie_enable_notification(ctrl); +} + static void pcie_disable_notification(struct controller *ctrl) { u16 mask; From db2baeef79d1d0ff0fac4589f8d7dc215ea36889 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Fri, 11 May 2018 19:54:19 +0900 Subject: [PATCH 049/970] printk: fix possible reuse of va_list variable commit 988a35f8da1dec5a8cd2788054d1e717be61bf25 upstream. I noticed that there is a possibility that printk_safe_log_store() causes kernel oops because "args" parameter is passed to vsnprintf() again when atomic_cmpxchg() detected that we raced. Fix this by using va_copy(). Link: http://lkml.kernel.org/r/201805112002.GIF21216.OFVHFOMLJtQFSO@I-love.SAKURA.ne.jp Cc: Peter Zijlstra Cc: Steven Rostedt Cc: dvyukov@google.com Cc: syzkaller@googlegroups.com Cc: fengguang.wu@intel.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Tetsuo Handa Fixes: 42a0bb3f71383b45 ("printk/nmi: generic solution for safe printk in NMI") Cc: 4.7+ # v4.7+ Reviewed-by: Sergey Senozhatsky Signed-off-by: Petr Mladek Signed-off-by: Greg Kroah-Hartman --- kernel/printk/nmi.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c index 16bab471c7e2..5fa65aa904d3 100644 --- a/kernel/printk/nmi.c +++ b/kernel/printk/nmi.c @@ -63,6 +63,7 @@ static int vprintk_nmi(const char *fmt, va_list args) struct nmi_seq_buf *s = this_cpu_ptr(&nmi_print_seq); int add = 0; size_t len; + va_list ap; again: len = atomic_read(&s->len); @@ -79,7 +80,9 @@ static int vprintk_nmi(const char *fmt, va_list args) if (!len) smp_rmb(); - add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, args); + va_copy(ap, args); + add = vsnprintf(s->buffer + len, sizeof(s->buffer) - len, fmt, ap); + va_end(ap); /* * Do it once again if the buffer has been flushed in the meantime. From 344d6159fede9a3f7c7b846c662d3744ae89d132 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Tue, 12 Jun 2018 17:54:42 +0800 Subject: [PATCH 050/970] MIPS: io: Add barrier after register read in inX() commit 18f3e95b90b28318ef35910d21c39908de672331 upstream. While a barrier is present in the outX() functions before the register write, a similar barrier is missing in the inX() functions after the register read. This could allow memory accesses following inX() to observe stale data. This patch is very similar to commit a1cc7034e33d12dc1 ("MIPS: io: Add barrier after register read in readX()"). Because war_io_reorder_wmb() is both used by writeX() and outX(), if readX() need a barrier then so does inX(). Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen Patchwork: https://patchwork.linux-mips.org/patch/19516/ Signed-off-by: Paul Burton Cc: James Hogan Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang Cc: Zhangjin Wu Cc: Huacai Chen Signed-off-by: Greg Kroah-Hartman --- arch/mips/include/asm/io.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h index ecabc00c1e66..853b2f4954fa 100644 --- a/arch/mips/include/asm/io.h +++ b/arch/mips/include/asm/io.h @@ -412,6 +412,8 @@ static inline type pfx##in##bwlq##p(unsigned long port) \ __val = *__addr; \ slow; \ \ + /* prevent prefetching of coherent DMA data prematurely */ \ + rmb(); \ return pfx##ioswab##bwlq(__addr, __val); \ } From 8fd86587ea975c2dc324c7c5bf804619bcea22b6 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 22 Jun 2018 16:33:57 +0200 Subject: [PATCH 051/970] time: Make sure jiffies_to_msecs() preserves non-zero time periods MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit abcbcb80cd09cd40f2089d912764e315459b71f7 upstream. For the common cases where 1000 is a multiple of HZ, or HZ is a multiple of 1000, jiffies_to_msecs() never returns zero when passed a non-zero time period. However, if HZ > 1000 and not an integer multiple of 1000 (e.g. 1024 or 1200, as used on alpha and DECstation), jiffies_to_msecs() may return zero for small non-zero time periods. This may break code that relies on receiving back a non-zero value. jiffies_to_usecs() does not need such a fix: one jiffy can only be less than one µs if HZ > 1000000, and such large values of HZ are already rejected at build time, twice: - include/linux/jiffies.h does #error if HZ >= 12288, - kernel/time/time.c has BUILD_BUG_ON(HZ > USEC_PER_SEC). Broken since forever. Signed-off-by: Geert Uytterhoeven Signed-off-by: Thomas Gleixner Reviewed-by: Arnd Bergmann Cc: John Stultz Cc: Stephen Boyd Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180622143357.7495-1-geert@linux-m68k.org Signed-off-by: Greg Kroah-Hartman --- kernel/time/time.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/time/time.c b/kernel/time/time.c index bd62fb8e8e77..39468651a064 100644 --- a/kernel/time/time.c +++ b/kernel/time/time.c @@ -28,6 +28,7 @@ */ #include +#include #include #include #include @@ -258,9 +259,10 @@ unsigned int jiffies_to_msecs(const unsigned long j) return (j + (HZ / MSEC_PER_SEC) - 1)/(HZ / MSEC_PER_SEC); #else # if BITS_PER_LONG == 32 - return (HZ_TO_MSEC_MUL32 * j) >> HZ_TO_MSEC_SHR32; + return (HZ_TO_MSEC_MUL32 * j + (1ULL << HZ_TO_MSEC_SHR32) - 1) >> + HZ_TO_MSEC_SHR32; # else - return (j * HZ_TO_MSEC_NUM) / HZ_TO_MSEC_DEN; + return DIV_ROUND_UP(j * HZ_TO_MSEC_NUM, HZ_TO_MSEC_DEN); # endif #endif } From 41b1d57a672f45f1d3a158cf28c24f1885201c9e Mon Sep 17 00:00:00 2001 From: "Maciej S. Szmigiero" Date: Sat, 19 May 2018 14:23:54 +0200 Subject: [PATCH 052/970] X.509: unpack RSA signatureValue field from BIT STRING commit b65c32ec5a942ab3ada93a048089a938918aba7f upstream. The signatureValue field of a X.509 certificate is encoded as a BIT STRING. For RSA signatures this BIT STRING is of so-called primitive subtype, which contains a u8 prefix indicating a count of unused bits in the encoding. We have to strip this prefix from signature data, just as we already do for key data in x509_extract_key_data() function. This wasn't noticed earlier because this prefix byte is zero for RSA key sizes divisible by 8. Since BIT STRING is a big-endian encoding adding zero prefixes has no bearing on its value. The signature length, however was incorrect, which is a problem for RSA implementations that need it to be exactly correct (like AMD CCP). Signed-off-by: Maciej S. Szmigiero Fixes: c26fd69fa009 ("X.509: Add a crypto key parser for binary (DER) X.509 certificates") Cc: stable@vger.kernel.org Signed-off-by: James Morris Signed-off-by: Greg Kroah-Hartman --- crypto/asymmetric_keys/x509_cert_parser.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/crypto/asymmetric_keys/x509_cert_parser.c b/crypto/asymmetric_keys/x509_cert_parser.c index ce2df8c9c583..7e6a43ffdcbe 100644 --- a/crypto/asymmetric_keys/x509_cert_parser.c +++ b/crypto/asymmetric_keys/x509_cert_parser.c @@ -249,6 +249,15 @@ int x509_note_signature(void *context, size_t hdrlen, return -EINVAL; } + if (strcmp(ctx->cert->sig->pkey_algo, "rsa") == 0) { + /* Discard the BIT STRING metadata */ + if (vlen < 1 || *(const u8 *)value != 0) + return -EBADMSG; + + value++; + vlen--; + } + ctx->cert->raw_sig = value; ctx->cert->raw_sig_size = vlen; return 0; From 77c82917d533ce49fff2731ddaa1dbc3fbfd9551 Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Mon, 11 Jun 2018 19:24:16 +0100 Subject: [PATCH 053/970] Btrfs: fix return value on rename exchange failure commit c5b4a50b74018b3677098151ec5f4fce07d5e6a0 upstream. If we failed during a rename exchange operation after starting/joining a transaction, we would end up replacing the return value, stored in the local 'ret' variable, with the return value from btrfs_end_transaction(). So this could end up returning 0 (success) to user space despite the operation having failed and aborted the transaction, because if there are multiple tasks having a reference on the transaction at the time btrfs_end_transaction() is called by the rename exchange, that function returns 0 (otherwise it returns -EIO and not the original error value). So fix this by not overwriting the return value on error after getting a transaction handle. Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f073de65e818..23ffe27e4851 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9561,6 +9561,7 @@ static int btrfs_rename_exchange(struct inode *old_dir, u64 new_idx = 0; u64 root_objectid; int ret; + int ret2; bool root_log_pinned = false; bool dest_log_pinned = false; @@ -9751,7 +9752,8 @@ static int btrfs_rename_exchange(struct inode *old_dir, dest_log_pinned = false; } } - ret = btrfs_end_transaction(trans, root); + ret2 = btrfs_end_transaction(trans, root); + ret = ret ? ret : ret2; out_notrans: if (new_ino == BTRFS_FIRST_FREE_OBJECTID) up_read(&dest->fs_info->subvol_sem); From 3fd6a73da159049bade087487256af9030423975 Mon Sep 17 00:00:00 2001 From: Liu Bo Date: Wed, 31 Jan 2018 17:09:13 -0700 Subject: [PATCH 054/970] Btrfs: fix unexpected cow in run_delalloc_nocow commit 5811375325420052fcadd944792a416a43072b7f upstream. Fstests generic/475 provides a way to fail metadata reads while checking if checksum exists for the inode inside run_delalloc_nocow(), and csum_exist_in_range() interprets error (-EIO) as inode having checksum and makes its caller enter the cow path. In case of free space inode, this ends up with a warning in cow_file_range(). The same problem applies to btrfs_cross_ref_exist() since it may also read metadata in between. With this, run_delalloc_nocow() bails out when errors occur at the two places. cc: v2.6.28+ Fixes: 17d217fe970d ("Btrfs: fix nodatasum handling in balancing code") Signed-off-by: Liu Bo Signed-off-by: David Sterba Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/inode.c | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 23ffe27e4851..bd036557c6bc 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1230,6 +1230,8 @@ static noinline int csum_exist_in_range(struct btrfs_root *root, list_del(&sums->list); kfree(sums); } + if (ret < 0) + return ret; return 1; } @@ -1381,10 +1383,23 @@ static noinline int run_delalloc_nocow(struct inode *inode, goto out_check; if (btrfs_extent_readonly(root, disk_bytenr)) goto out_check; - if (btrfs_cross_ref_exist(trans, root, ino, + ret = btrfs_cross_ref_exist(trans, root, ino, found_key.offset - - extent_offset, disk_bytenr)) + extent_offset, disk_bytenr); + if (ret) { + /* + * ret could be -EIO if the above fails to read + * metadata. + */ + if (ret < 0) { + if (cow_start != (u64)-1) + cur_offset = cow_start; + goto error; + } + + WARN_ON_ONCE(nolock); goto out_check; + } disk_bytenr += extent_offset; disk_bytenr += cur_offset - found_key.offset; num_bytes = min(end + 1, extent_end) - cur_offset; @@ -1402,8 +1417,20 @@ static noinline int run_delalloc_nocow(struct inode *inode, * this ensure that csum for a given extent are * either valid or do not exist. */ - if (csum_exist_in_range(root, disk_bytenr, num_bytes)) + ret = csum_exist_in_range(root, disk_bytenr, num_bytes); + if (ret) { + /* + * ret could be -EIO if the above fails to read + * metadata. + */ + if (ret < 0) { + if (cow_start != (u64)-1) + cur_offset = cow_start; + goto error; + } + WARN_ON_ONCE(nolock); goto out_check; + } if (!btrfs_inc_nocow_writers(root->fs_info, disk_bytenr)) goto out_check; From 0400b066ea2f8730456e607bda3896e3985c2a1a Mon Sep 17 00:00:00 2001 From: Martin Kelly Date: Mon, 26 Mar 2018 14:27:51 -0700 Subject: [PATCH 055/970] iio:buffer: make length types match kfifo types commit c043ec1ca5baae63726aae32abbe003192bc6eec upstream. Currently, we use int for buffer length and bytes_per_datum. However, kfifo uses unsigned int for length and size_t for element size. We need to make sure these matches or we will have bugs related to overflow (in the range between INT_MAX and UINT_MAX for length, for example). In addition, set_bytes_per_datum uses size_t while bytes_per_datum is an int, which would cause bugs for large values of bytes_per_datum. Change buffer length to use unsigned int and bytes_per_datum to use size_t. Signed-off-by: Martin Kelly Cc: Signed-off-by: Jonathan Cameron [bwh: Backported to 4.9: - Drop change to iio_dma_buffer_set_length() - Adjust filename, context] Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/iio/buffer/kfifo_buf.c | 4 ++-- include/linux/iio/buffer.h | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/iio/buffer/kfifo_buf.c b/drivers/iio/buffer/kfifo_buf.c index 7ef9b13262a8..e44181f9eb36 100644 --- a/drivers/iio/buffer/kfifo_buf.c +++ b/drivers/iio/buffer/kfifo_buf.c @@ -19,7 +19,7 @@ struct iio_kfifo { #define iio_to_kfifo(r) container_of(r, struct iio_kfifo, buffer) static inline int __iio_allocate_kfifo(struct iio_kfifo *buf, - int bytes_per_datum, int length) + size_t bytes_per_datum, unsigned int length) { if ((length == 0) || (bytes_per_datum == 0)) return -EINVAL; @@ -71,7 +71,7 @@ static int iio_set_bytes_per_datum_kfifo(struct iio_buffer *r, size_t bpd) return 0; } -static int iio_set_length_kfifo(struct iio_buffer *r, int length) +static int iio_set_length_kfifo(struct iio_buffer *r, unsigned int length) { /* Avoid an invalid state */ if (length < 2) diff --git a/include/linux/iio/buffer.h b/include/linux/iio/buffer.h index 70a5164f4728..821965c90070 100644 --- a/include/linux/iio/buffer.h +++ b/include/linux/iio/buffer.h @@ -61,7 +61,7 @@ struct iio_buffer_access_funcs { int (*request_update)(struct iio_buffer *buffer); int (*set_bytes_per_datum)(struct iio_buffer *buffer, size_t bpd); - int (*set_length)(struct iio_buffer *buffer, int length); + int (*set_length)(struct iio_buffer *buffer, unsigned int length); int (*enable)(struct iio_buffer *buffer, struct iio_dev *indio_dev); int (*disable)(struct iio_buffer *buffer, struct iio_dev *indio_dev); @@ -96,8 +96,8 @@ struct iio_buffer_access_funcs { * @watermark: [INTERN] number of datums to wait for poll/read. */ struct iio_buffer { - int length; - int bytes_per_datum; + unsigned int length; + size_t bytes_per_datum; struct attribute_group *scan_el_attrs; long *scan_mask; bool scan_timestamp; From f0c543159a4abad4412e55db33ac730c9f21684d Mon Sep 17 00:00:00 2001 From: Himanshu Madhani Date: Sun, 3 Jun 2018 22:09:53 -0700 Subject: [PATCH 056/970] scsi: qla2xxx: Fix setting lower transfer speed if GPSC fails commit 413c2f33489b134e3cc65d9c3ff7861e8fdfe899 upstream. This patch prevents driver from setting lower default speed of 1 GB/sec, if the switch does not support Get Port Speed Capabilities (GPSC) command. Setting this default speed results into much lower write performance for large sequential WRITE. This patch modifies driver to check for gpsc_supported flags and prevents driver from issuing MBC_SET_PORT_PARAM (001Ah) to set default speed of 1 GB/sec. If driver does not send this mailbox command, firmware assumes maximum supported link speed and will operate at the max speed. Cc: stable@vger.kernel.org Signed-off-by: Himanshu Madhani Reported-by: Eda Zhou Reviewed-by: Ewan D. Milne Tested-by: Ewan D. Milne Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/qla2xxx/qla_init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 4441a559f139..34bbcfcae67c 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -3319,7 +3319,8 @@ qla2x00_iidma_fcport(scsi_qla_host_t *vha, fc_port_t *fcport) return; if (fcport->fp_speed == PORT_SPEED_UNKNOWN || - fcport->fp_speed > ha->link_data_rate) + fcport->fp_speed > ha->link_data_rate || + !ha->flags.gpsc_supported) return; rval = qla2x00_set_idma_speed(vha, fcport->loop_id, fcport->fp_speed, From 9779f499d88f55df72cc504b830115aecea9cfbe Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Thu, 17 May 2018 19:14:43 +0200 Subject: [PATCH 057/970] scsi: zfcp: fix missing SCSI trace for result of eh_host_reset_handler commit df30781699f53e4fd4c494c6f7dd16e3d5c21d30 upstream. For problem determination we need to see whether and why we were successful or not. This allows deduction of scsi_eh escalation. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : schrh_r SCSI host reset handler result Request ID : 0x0000000000000000 none (invalid) SCSI ID : 0xffffffff none (invalid) SCSI LUN : 0xffffffff none (invalid) SCSI LUN high : 0xffffffff none (invalid) SCSI result : 0x00002002 field re-used for midlayer value: SUCCESS or in other cases: 0x2009 == FAST_IO_FAIL SCSI retries : 0xff none (invalid) SCSI allowed : 0xff none (invalid) SCSI scribble : 0xffffffffffffffff none (invalid) SCSI opcode : ffffffff ffffffff ffffffff ffffffff none (invalid) FCP rsp inf cod: 0xff none (invalid) FCP rsp IU : 00000000 00000000 00000000 00000000 none (invalid) 00000000 00000000 v2.6.35 commit a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") introduced the first return with something other than the previously hardcoded single SUCCESS return path. Signed-off-by: Steffen Maier Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") Cc: #2.6.38+ Reviewed-by: Jens Remus Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_dbf.c | 40 +++++++++++++++++++++++++++++++++++ drivers/s390/scsi/zfcp_ext.h | 2 ++ drivers/s390/scsi/zfcp_scsi.c | 11 +++++----- 3 files changed, 48 insertions(+), 5 deletions(-) diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c index 4534a7ce77b8..b6caad0fee24 100644 --- a/drivers/s390/scsi/zfcp_dbf.c +++ b/drivers/s390/scsi/zfcp_dbf.c @@ -625,6 +625,46 @@ void zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd *sc, spin_unlock_irqrestore(&dbf->scsi_lock, flags); } +/** + * zfcp_dbf_scsi_eh() - Trace event for special cases of scsi_eh callbacks. + * @tag: Identifier for event. + * @adapter: Pointer to zfcp adapter as context for this event. + * @scsi_id: SCSI ID/target to indicate scope of task management function (TMF). + * @ret: Return value of calling function. + * + * This SCSI trace variant does not depend on any of: + * scsi_cmnd, zfcp_fsf_req, scsi_device. + */ +void zfcp_dbf_scsi_eh(char *tag, struct zfcp_adapter *adapter, + unsigned int scsi_id, int ret) +{ + struct zfcp_dbf *dbf = adapter->dbf; + struct zfcp_dbf_scsi *rec = &dbf->scsi_buf; + unsigned long flags; + static int const level = 1; + + if (unlikely(!debug_level_enabled(adapter->dbf->scsi, level))) + return; + + spin_lock_irqsave(&dbf->scsi_lock, flags); + memset(rec, 0, sizeof(*rec)); + + memcpy(rec->tag, tag, ZFCP_DBF_TAG_LEN); + rec->id = ZFCP_DBF_SCSI_CMND; + rec->scsi_result = ret; /* re-use field, int is 4 bytes and fits */ + rec->scsi_retries = ~0; + rec->scsi_allowed = ~0; + rec->fcp_rsp_info = ~0; + rec->scsi_id = scsi_id; + rec->scsi_lun = (u32)ZFCP_DBF_INVALID_LUN; + rec->scsi_lun_64_hi = (u32)(ZFCP_DBF_INVALID_LUN >> 32); + rec->host_scribble = ~0; + memset(rec->scsi_opcode, 0xff, ZFCP_DBF_SCSI_OPCODE); + + debug_event(dbf->scsi, level, rec, sizeof(*rec)); + spin_unlock_irqrestore(&dbf->scsi_lock, flags); +} + static debug_info_t *zfcp_dbf_reg(const char *name, int size, int rec_size) { struct debug_info *d; diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h index 7a7984a50683..7eaa2e13721f 100644 --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -52,6 +52,8 @@ extern void zfcp_dbf_san_res(char *, struct zfcp_fsf_req *); extern void zfcp_dbf_san_in_els(char *, struct zfcp_fsf_req *); extern void zfcp_dbf_scsi(char *, int, struct scsi_cmnd *, struct zfcp_fsf_req *); +extern void zfcp_dbf_scsi_eh(char *tag, struct zfcp_adapter *adapter, + unsigned int scsi_id, int ret); /* zfcp_erp.c */ extern void zfcp_erp_set_adapter_status(struct zfcp_adapter *, u32); diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index bb99db2948ab..fbabf7e15cb7 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -322,15 +322,16 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt) { struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device); struct zfcp_adapter *adapter = zfcp_sdev->port->adapter; - int ret; + int ret = SUCCESS, fc_ret; zfcp_erp_adapter_reopen(adapter, 0, "schrh_1"); zfcp_erp_wait(adapter); - ret = fc_block_scsi_eh(scpnt); - if (ret) - return ret; + fc_ret = fc_block_scsi_eh(scpnt); + if (fc_ret) + ret = fc_ret; - return SUCCESS; + zfcp_dbf_scsi_eh("schrh_r", adapter, ~0, ret); + return ret; } struct scsi_transport_template *zfcp_scsi_transport_template; From 97d3625bdd43e6816b179ea56a7aa58060069eee Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Thu, 17 May 2018 19:14:44 +0200 Subject: [PATCH 058/970] scsi: zfcp: fix missing SCSI trace for retry of abort / scsi_eh TMF commit 81979ae63e872ef650a7197f6ce6590059d37172 upstream. We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the start of our eh callback and an actual send/recv of an abort / TMF request. In order to see the temporal sequence including any abort / TMF send retries, add a trace before the above two blocking functions. This supports problem determination with scsi_eh and parallel zfcp ERP. No need to explicitly trace the beginning of our eh callback, since we typically can send an abort / TMF and see its HBA response (in the worst case, it's a pseudo response on dismiss all of adapter recovery, e.g. due to an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send, we now get a trace record for the first "abrt_wt" or "[lt]r_wait" which denotes almost the beginning of the callback. No need to explicitly trace the wakeup after the above two blocking functions because the next retry loop causes another trace in any case and that is sufficient. Example trace records formatted with zfcpdbf from s390-tools: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : abrt_wt abort, before zfcp_erp_wait() Request ID : 0x0000000000000000 none (invalid) SCSI ID : 0x SCSI LUN : 0x SCSI LUN high : 0x SCSI result : 0x SCSI retries : 0x SCSI allowed : 0x SCSI scribble : 0x SCSI opcode : FCP rsp inf cod: 0x.. none (invalid) FCP rsp IU : ... none (invalid) Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_wait LUN reset, before zfcp_erp_wait() Request ID : 0x0000000000000000 none (invalid) SCSI ID : 0x SCSI LUN : 0x SCSI LUN high : 0x SCSI result : 0x... unrelated SCSI retries : 0x.. unrelated SCSI allowed : 0x.. unrelated SCSI scribble : 0x... unrelated SCSI opcode : ... unrelated FCP rsp inf cod: 0x.. none (invalid) FCP rsp IU : ... none (invalid) Signed-off-by: Steffen Maier Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp") Fixes: af4de36d911a ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED") Cc: #2.6.38+ Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_scsi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index fbabf7e15cb7..e7a36109c79c 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -180,6 +180,7 @@ static int zfcp_scsi_eh_abort_handler(struct scsi_cmnd *scpnt) if (abrt_req) break; + zfcp_dbf_scsi_abort("abrt_wt", scpnt, NULL); zfcp_erp_wait(adapter); ret = fc_block_scsi_eh(scpnt); if (ret) { @@ -276,6 +277,7 @@ static int zfcp_task_mgmt_function(struct scsi_cmnd *scpnt, u8 tm_flags) if (fsf_req) break; + zfcp_dbf_scsi_devreset("wait", scpnt, tm_flags, NULL); zfcp_erp_wait(adapter); ret = fc_block_scsi_eh(scpnt); if (ret) { From b0c2fc11ced965b01f174f69ecfac7feca48bc84 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Thu, 17 May 2018 19:14:45 +0200 Subject: [PATCH 059/970] scsi: zfcp: fix misleading REC trigger trace where erp_action setup failed commit 512857a795cbbda5980efa4cdb3c0b6602330408 upstream. If a SCSI device is deleted during scsi_eh host reset, we cannot get a reference to the SCSI device anymore since scsi_device_get returns !=0 by design. Assuming the recovery of adapter and port(s) was successful, zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for the half-gone SCSI device. Unfortunately, it causes the following confusing trace record which states that zfcp will do a LUN recovery as "ERP need" is ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want". Old example trace record formatted with zfcpdbf from s390-tools: Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded LUN : 0x WWPN : 0x D_ID : 0x Adapter status : 0x5400050b Port status : 0x54000001 LUN status : 0x40000000 ZFCP_STATUS_COMMON_RUNNING but not ZFCP_STATUS_COMMON_UNBLOCKED as it was closed on close part of adapter reopen ERP want : 0x01 ERP need : 0x01 misleading However, zfcp_erp_setup_act() returns NULL as it cannot get the reference. Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery actually happens. We always do want the recovery trigger trace record even if no erp_action could be enqueued as in this case. For other cases where we did not enqueue an erp_action, 'need' has always been zero to indicate this. In order to indicate above goto out, introduce an eyecatcher "flag" to mark the "ERP need" as 'not needed' but still keep the information which erp_action type, that zfcp_erp_required_act() had decided upon, is needed. 0xc_ is chosen to be visibly different from 0x0_ in "ERP want". New example trace record formatted with zfcpdbf from s390-tools: Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded LUN : 0x WWPN : 0x D_ID : 0x Adapter status : 0x5400050b Port status : 0x54000001 LUN status : 0x40000000 ERP want : 0x01 ERP need : 0xc1 would need LUN ERP, but no action set up ^ Before v2.6.38 commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") we could detect this case because the "erp_action" field in the trace was NULL. The rework removed erp_action as argument and field from the trace. This patch here is for tracing. A fix to allow LUN recovery in the case at hand is a topic for a separate patch. See also commit fdbd1c5e27da ("[SCSI] zfcp: Allow running unit/LUN shutdown without acquiring reference") for a similar case and background info. Signed-off-by: Steffen Maier Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.") Cc: #2.6.38+ Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_erp.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index 3b23d6754598..ae624789a6c4 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -34,11 +34,23 @@ enum zfcp_erp_steps { ZFCP_ERP_STEP_LUN_OPENING = 0x2000, }; +/** + * enum zfcp_erp_act_type - Type of ERP action object. + * @ZFCP_ERP_ACTION_REOPEN_LUN: LUN recovery. + * @ZFCP_ERP_ACTION_REOPEN_PORT: Port recovery. + * @ZFCP_ERP_ACTION_REOPEN_PORT_FORCED: Forced port recovery. + * @ZFCP_ERP_ACTION_REOPEN_ADAPTER: Adapter recovery. + * @ZFCP_ERP_ACTION_NONE: Eyecatcher pseudo flag to bitwise or-combine with + * either of the other enum values. + * Used to indicate that an ERP action could not be + * set up despite a detected need for some recovery. + */ enum zfcp_erp_act_type { ZFCP_ERP_ACTION_REOPEN_LUN = 1, ZFCP_ERP_ACTION_REOPEN_PORT = 2, ZFCP_ERP_ACTION_REOPEN_PORT_FORCED = 3, ZFCP_ERP_ACTION_REOPEN_ADAPTER = 4, + ZFCP_ERP_ACTION_NONE = 0xc0, }; enum zfcp_erp_act_state { @@ -256,8 +268,10 @@ static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter, goto out; act = zfcp_erp_setup_act(need, act_status, adapter, port, sdev); - if (!act) + if (!act) { + need |= ZFCP_ERP_ACTION_NONE; /* marker for trace */ goto out; + } atomic_or(ZFCP_STATUS_ADAPTER_ERP_PENDING, &adapter->status); ++adapter->erp_total_count; list_add_tail(&act->list, &adapter->erp_ready_head); From 48ae373c57f009b32a03637e0809fabd9132fe81 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Thu, 17 May 2018 19:14:46 +0200 Subject: [PATCH 060/970] scsi: zfcp: fix missing REC trigger trace on terminate_rport_io early return commit 96d9270499471545048ed8a6d7f425a49762283d upstream. get_device() and its internally used kobject_get() only return NULL if they get passed NULL as argument. zfcp_get_port_by_wwpn() loops over adapter->port_list so the iteration variable port is always non-NULL. Struct device is embedded in struct zfcp_port so &port->dev is always non-NULL. This is the argument to get_device(). However, if we get an fc_rport in terminate_rport_io() for which we cannot find a match within zfcp_get_port_by_wwpn(), the latter can return NULL. v2.6.30 commit 70932935b61e ("[SCSI] zfcp: Fix oops when port disappears") introduced an early return without adding a trace record for this case. Even if we don't need recovery in this case, for debugging we should still see that our callback was invoked originally by scsi_transport_fc. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : sctrpin SCSI terminate rport I/O, no zfcp port LUN : 0xffffffffffffffff none (invalid) WWPN : 0x WWPN D_ID : 0x N_Port-ID Adapter status : 0x... Port status : 0xffffffff unknown (-1) LUN status : 0x00000000 none (invalid) Ready count : 0x... Running count : 0x... ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED ERP need : 0xc0 ZFCP_ERP_ACTION_NONE Signed-off-by: Steffen Maier Fixes: 70932935b61e ("[SCSI] zfcp: Fix oops when port disappears") Cc: #2.6.38+ Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_erp.c | 20 ++++++++++++++++++++ drivers/s390/scsi/zfcp_ext.h | 3 +++ drivers/s390/scsi/zfcp_scsi.c | 5 +++++ 3 files changed, 28 insertions(+) diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index ae624789a6c4..e60c16b2ba45 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -282,6 +282,26 @@ static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter, return retval; } +void zfcp_erp_port_forced_no_port_dbf(char *id, struct zfcp_adapter *adapter, + u64 port_name, u32 port_id) +{ + unsigned long flags; + static /* don't waste stack */ struct zfcp_port tmpport; + + write_lock_irqsave(&adapter->erp_lock, flags); + /* Stand-in zfcp port with fields just good enough for + * zfcp_dbf_rec_trig() and zfcp_dbf_set_common(). + * Under lock because tmpport is static. + */ + atomic_set(&tmpport.status, -1); /* unknown */ + tmpport.wwpn = port_name; + tmpport.d_id = port_id; + zfcp_dbf_rec_trig(id, adapter, &tmpport, NULL, + ZFCP_ERP_ACTION_REOPEN_PORT_FORCED, + ZFCP_ERP_ACTION_NONE); + write_unlock_irqrestore(&adapter->erp_lock, flags); +} + static int _zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter, int clear_mask, char *id) { diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h index 7eaa2e13721f..b326f05c7f89 100644 --- a/drivers/s390/scsi/zfcp_ext.h +++ b/drivers/s390/scsi/zfcp_ext.h @@ -58,6 +58,9 @@ extern void zfcp_dbf_scsi_eh(char *tag, struct zfcp_adapter *adapter, /* zfcp_erp.c */ extern void zfcp_erp_set_adapter_status(struct zfcp_adapter *, u32); extern void zfcp_erp_clear_adapter_status(struct zfcp_adapter *, u32); +extern void zfcp_erp_port_forced_no_port_dbf(char *id, + struct zfcp_adapter *adapter, + u64 port_name, u32 port_id); extern void zfcp_erp_adapter_reopen(struct zfcp_adapter *, int, char *); extern void zfcp_erp_adapter_shutdown(struct zfcp_adapter *, int, char *); extern void zfcp_erp_set_port_status(struct zfcp_port *, u32); diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c index e7a36109c79c..3afb200b2829 100644 --- a/drivers/s390/scsi/zfcp_scsi.c +++ b/drivers/s390/scsi/zfcp_scsi.c @@ -603,6 +603,11 @@ static void zfcp_scsi_terminate_rport_io(struct fc_rport *rport) if (port) { zfcp_erp_port_forced_reopen(port, 0, "sctrpi1"); put_device(&port->dev); + } else { + zfcp_erp_port_forced_no_port_dbf( + "sctrpin", adapter, + rport->port_name /* zfcp_scsi_rport_register */, + rport->port_id /* zfcp_scsi_rport_register */); } } From 21224f6f135ab267b07729f3274fcfc028b3ace0 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Thu, 17 May 2018 19:14:47 +0200 Subject: [PATCH 061/970] scsi: zfcp: fix missing REC trigger trace on terminate_rport_io for ERP_FAILED commit d70aab55924b44f213fec2b900b095430b33eec6 upstream. For problem determination we always want to see when we were invoked on the terminate_rport_io callback whether we perform something or not. Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec: loose remote port t workqueue [s] zfcp_q_ IRQ zfcperp === ================== =================== ============================ 0 recv RSCN q p.test_link_work block rport start fast_io_fail_tmo send ADISC ELS 4 recv ADISC fail block zfcp_port port forced reopen send open port 12 recv open port fail q p.gid_pn_work zfcp_erp_wakeup (zfcp_erp_wait would return) GID_PN fail Before this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail, e.g. with the typical 5 sec setting. port.status |= ERP_FAILED If fast_io_fail_tmo triggers after this point, we missed a SCSI trace. workqueue fc_dl_ ================== 27 fc_timeout_fail_rport_io fc_terminate_rport_io zfcp_scsi_terminate_rport_io zfcp_erp_port_forced_reopen _zfcp_erp_port_forced_reopen if (port.status & ERP_FAILED) return; Therefore, write a trace before above early return. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 ZFCP_DBF_REC_TRIG Tag : sctrpi1 SCSI terminate rport I/O LUN : 0xffffffffffffffff none (invalid) WWPN : 0x D_ID : 0x Adapter status : 0x... Port status : 0x... LUN status : 0x00000000 none (invalid) Ready count : 0x... Running count : 0x... ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED Signed-off-by: Steffen Maier Cc: #2.6.38+ Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_erp.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index e60c16b2ba45..e695f95ac3c2 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -41,9 +41,13 @@ enum zfcp_erp_steps { * @ZFCP_ERP_ACTION_REOPEN_PORT_FORCED: Forced port recovery. * @ZFCP_ERP_ACTION_REOPEN_ADAPTER: Adapter recovery. * @ZFCP_ERP_ACTION_NONE: Eyecatcher pseudo flag to bitwise or-combine with - * either of the other enum values. + * either of the first four enum values. * Used to indicate that an ERP action could not be * set up despite a detected need for some recovery. + * @ZFCP_ERP_ACTION_FAILED: Eyecatcher pseudo flag to bitwise or-combine with + * either of the first four enum values. + * Used to indicate that ERP not needed because + * the object has ZFCP_STATUS_COMMON_ERP_FAILED. */ enum zfcp_erp_act_type { ZFCP_ERP_ACTION_REOPEN_LUN = 1, @@ -51,6 +55,7 @@ enum zfcp_erp_act_type { ZFCP_ERP_ACTION_REOPEN_PORT_FORCED = 3, ZFCP_ERP_ACTION_REOPEN_ADAPTER = 4, ZFCP_ERP_ACTION_NONE = 0xc0, + ZFCP_ERP_ACTION_FAILED = 0xe0, }; enum zfcp_erp_act_state { @@ -378,8 +383,12 @@ static void _zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear, zfcp_erp_port_block(port, clear); zfcp_scsi_schedule_rport_block(port); - if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) + if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) { + zfcp_dbf_rec_trig(id, port->adapter, port, NULL, + ZFCP_ERP_ACTION_REOPEN_PORT_FORCED, + ZFCP_ERP_ACTION_FAILED); return; + } zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_PORT_FORCED, port->adapter, port, NULL, id, 0); From 2df7e6f33c64c61e9a33dbbed6bed4af78e94d67 Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Thu, 17 May 2018 19:14:48 +0200 Subject: [PATCH 062/970] scsi: zfcp: fix missing REC trigger trace for all objects in ERP_FAILED commit 8c3d20aada70042a39c6a6625be037c1472ca610 upstream. That other commit introduced an inconsistency because it would trace on ERP_FAILED for all callers of port forced reopen triggers (not just terminate_rport_io), but it would not trace on ERP_FAILED for all callers of other ERP triggers such as adapter, port regular, LUN. Therefore, generalize that other commit. zfcp_erp_action_enqueue() already had two early outs which re-used the one zfcp_dbf_rec_trig() call. All ERP trigger functions finally run through zfcp_erp_action_enqueue(). So move the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into zfcp_erp_action_enqueue() and add another early out with new trace marker for pseudo ERP need in this case. This removes all early returns from all ERP trigger functions so we always end up at zfcp_dbf_rec_trig(). Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 ZFCP_DBF_REC_TRIG Tag : ....... LUN : 0x... WWPN : 0x... D_ID : 0x... Adapter status : 0x... Port status : 0x... LUN status : 0x... Ready count : 0x... Running count : 0x... ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_... ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED Signed-off-by: Steffen Maier Cc: #2.6.38+ Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_erp.c | 79 +++++++++++++++++++++++------------- 1 file changed, 51 insertions(+), 28 deletions(-) diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index e695f95ac3c2..0ac35f8e4429 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -142,6 +142,49 @@ static void zfcp_erp_action_dismiss_adapter(struct zfcp_adapter *adapter) } } +static int zfcp_erp_handle_failed(int want, struct zfcp_adapter *adapter, + struct zfcp_port *port, + struct scsi_device *sdev) +{ + int need = want; + struct zfcp_scsi_dev *zsdev; + + switch (want) { + case ZFCP_ERP_ACTION_REOPEN_LUN: + zsdev = sdev_to_zfcp(sdev); + if (atomic_read(&zsdev->status) & ZFCP_STATUS_COMMON_ERP_FAILED) + need = 0; + break; + case ZFCP_ERP_ACTION_REOPEN_PORT_FORCED: + if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) + need = 0; + break; + case ZFCP_ERP_ACTION_REOPEN_PORT: + if (atomic_read(&port->status) & + ZFCP_STATUS_COMMON_ERP_FAILED) { + need = 0; + /* ensure propagation of failed status to new devices */ + zfcp_erp_set_port_status( + port, ZFCP_STATUS_COMMON_ERP_FAILED); + } + break; + case ZFCP_ERP_ACTION_REOPEN_ADAPTER: + if (atomic_read(&adapter->status) & + ZFCP_STATUS_COMMON_ERP_FAILED) { + need = 0; + /* ensure propagation of failed status to new devices */ + zfcp_erp_set_adapter_status( + adapter, ZFCP_STATUS_COMMON_ERP_FAILED); + } + break; + default: + need = 0; + break; + } + + return need; +} + static int zfcp_erp_required_act(int want, struct zfcp_adapter *adapter, struct zfcp_port *port, struct scsi_device *sdev) @@ -265,6 +308,12 @@ static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter, int retval = 1, need; struct zfcp_erp_action *act; + need = zfcp_erp_handle_failed(want, adapter, port, sdev); + if (!need) { + need = ZFCP_ERP_ACTION_FAILED; /* marker for trace */ + goto out; + } + if (!adapter->erp_thread) return -EIO; @@ -313,12 +362,6 @@ static int _zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter, zfcp_erp_adapter_block(adapter, clear_mask); zfcp_scsi_schedule_rports_block(adapter); - /* ensure propagation of failed status to new devices */ - if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_ERP_FAILED) { - zfcp_erp_set_adapter_status(adapter, - ZFCP_STATUS_COMMON_ERP_FAILED); - return -EIO; - } return zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER, adapter, NULL, NULL, id, 0); } @@ -337,12 +380,8 @@ void zfcp_erp_adapter_reopen(struct zfcp_adapter *adapter, int clear, char *id) zfcp_scsi_schedule_rports_block(adapter); write_lock_irqsave(&adapter->erp_lock, flags); - if (atomic_read(&adapter->status) & ZFCP_STATUS_COMMON_ERP_FAILED) - zfcp_erp_set_adapter_status(adapter, - ZFCP_STATUS_COMMON_ERP_FAILED); - else - zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER, adapter, - NULL, NULL, id, 0); + zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER, adapter, + NULL, NULL, id, 0); write_unlock_irqrestore(&adapter->erp_lock, flags); } @@ -383,13 +422,6 @@ static void _zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear, zfcp_erp_port_block(port, clear); zfcp_scsi_schedule_rport_block(port); - if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) { - zfcp_dbf_rec_trig(id, port->adapter, port, NULL, - ZFCP_ERP_ACTION_REOPEN_PORT_FORCED, - ZFCP_ERP_ACTION_FAILED); - return; - } - zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_PORT_FORCED, port->adapter, port, NULL, id, 0); } @@ -415,12 +447,6 @@ static int _zfcp_erp_port_reopen(struct zfcp_port *port, int clear, char *id) zfcp_erp_port_block(port, clear); zfcp_scsi_schedule_rport_block(port); - if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) { - /* ensure propagation of failed status to new devices */ - zfcp_erp_set_port_status(port, ZFCP_STATUS_COMMON_ERP_FAILED); - return -EIO; - } - return zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_PORT, port->adapter, port, NULL, id, 0); } @@ -460,9 +486,6 @@ static void _zfcp_erp_lun_reopen(struct scsi_device *sdev, int clear, char *id, zfcp_erp_lun_block(sdev, clear); - if (atomic_read(&zfcp_sdev->status) & ZFCP_STATUS_COMMON_ERP_FAILED) - return; - zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_LUN, adapter, zfcp_sdev->port, sdev, id, act_status); } From c6751cb1e828d7aa93cedfcee65534318278713e Mon Sep 17 00:00:00 2001 From: Steffen Maier Date: Thu, 17 May 2018 19:14:49 +0200 Subject: [PATCH 063/970] scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread commit 6a76550841d412330bd86aed3238d1888ba70f0e upstream. Example trace record formatted with zfcpdbf from s390-tools: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 ZFCP_DBF_REC_TRIG Tag : ....... LUN : 0x... WWPN : 0x... D_ID : 0x... Adapter status : 0x... Port status : 0x... LUN status : 0x... Ready count : 0x... Running count : 0x... ERP want : 0x0. ZFCP_ERP_ACTION_REOPEN_... ERP need : 0xc0 ZFCP_ERP_ACTION_NONE Signed-off-by: Steffen Maier Cc: #2.6.38+ Reviewed-by: Benjamin Block Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/s390/scsi/zfcp_erp.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c index 0ac35f8e4429..2abcd331b05d 100644 --- a/drivers/s390/scsi/zfcp_erp.c +++ b/drivers/s390/scsi/zfcp_erp.c @@ -314,8 +314,11 @@ static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter, goto out; } - if (!adapter->erp_thread) - return -EIO; + if (!adapter->erp_thread) { + need = ZFCP_ERP_ACTION_NONE; /* marker for trace */ + retval = -EIO; + goto out; + } need = zfcp_erp_required_act(want, adapter, port, sdev); if (!need) From f216d1e9339dfb2981d2a0e44a15501cfa76f4ad Mon Sep 17 00:00:00 2001 From: Robert Elliott Date: Thu, 31 May 2018 18:36:36 -0500 Subject: [PATCH 064/970] linvdimm, pmem: Preserve read-only setting for pmem devices commit 254a4cd50b9fe2291a12b8902e08e56dcc4e9b10 upstream. The pmem driver does not honor a forced read-only setting for very long: $ blockdev --setro /dev/pmem0 $ blockdev --getro /dev/pmem0 1 followed by various commands like these: $ blockdev --rereadpt /dev/pmem0 or $ mkfs.ext4 /dev/pmem0 results in this in the kernel serial log: nd_pmem namespace0.0: region0 read-write, marking pmem0 read-write with the read-only setting lost: $ blockdev --getro /dev/pmem0 0 That's from bus.c nvdimm_revalidate_disk(), which always applies the setting from nd_region (which is initially based on the ACPI NFIT NVDIMM state flags not_armed bit). In contrast, commit 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading partition") fixed this issue for SCSI devices to preserve the previous setting if it was set to read-only. This patch modifies bus.c to preserve any previous read-only setting. It also eliminates the kernel serial log print except for cases where read-write is changed to read-only, so it doesn't print read-only to read-only non-changes. Cc: Fixes: 581388209405 ("libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only") Signed-off-by: Robert Elliott Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- drivers/nvdimm/bus.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index 8311a93cabd8..c1a65ce31243 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -505,14 +505,18 @@ int nvdimm_revalidate_disk(struct gendisk *disk) { struct device *dev = disk_to_dev(disk)->parent; struct nd_region *nd_region = to_nd_region(dev->parent); - const char *pol = nd_region->ro ? "only" : "write"; + int disk_ro = get_disk_ro(disk); - if (nd_region->ro == get_disk_ro(disk)) + /* + * Upgrade to read-only if the region is read-only preserve as + * read-only if the disk is already read-only. + */ + if (disk_ro || nd_region->ro == disk_ro) return 0; - dev_info(dev, "%s read-%s, marking %s read-%s\n", - dev_name(&nd_region->dev), pol, disk->disk_name, pol); - set_disk_ro(disk, nd_region->ro); + dev_info(dev, "%s read-only, marking %s read-only\n", + dev_name(&nd_region->dev), disk->disk_name); + set_disk_ro(disk, 1); return 0; From c0eb205dfe159d02bd7821288765ac0a40a0695c Mon Sep 17 00:00:00 2001 From: Marcin Ziemianowicz Date: Sun, 29 Apr 2018 15:01:11 -0400 Subject: [PATCH 065/970] clk: at91: PLL recalc_rate() now using cached MUL and DIV values commit a982e45dc150da3a08907b6dd676b735391704b4 upstream. When a USB device is connected to the USB host port on the SAM9N12 then you get "-62" error which seems to indicate USB replies from the device are timing out. Based on a logic sniffer, I saw the USB bus was running at half speed. The PLL code uses cached MUL and DIV values which get set in set_rate() and applied in prepare(), but the recalc_rate() function instead queries the hardware instead of using these cached values. Therefore, if recalc_rate() is called between a set_rate() and prepare(), the wrong frequency is calculated and later the USB clock divider for the SAM9N12 SOC will be configured for an incorrect clock. In my case, the PLL hardware was set to 96 Mhz before the OHCI driver loads, and therefore the usb clock divider was being set to /2 even though the OHCI driver set the PLL to 48 Mhz. As an alternative explanation, I noticed this was fixed in the past by 87e2ed338f1b ("clk: at91: fix recalc_rate implementation of PLL driver") but the bug was later re-introduced by 1bdf02326b71 ("clk: at91: make use of syscon/regmap internally"). Fixes: 1bdf02326b71 ("clk: at91: make use of syscon/regmap internally) Cc: Signed-off-by: Marcin Ziemianowicz Acked-by: Boris Brezillon Signed-off-by: Stephen Boyd Signed-off-by: Greg Kroah-Hartman --- drivers/clk/at91/clk-pll.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/clk/at91/clk-pll.c b/drivers/clk/at91/clk-pll.c index 45ad168e1496..2bb2551c6245 100644 --- a/drivers/clk/at91/clk-pll.c +++ b/drivers/clk/at91/clk-pll.c @@ -132,19 +132,8 @@ static unsigned long clk_pll_recalc_rate(struct clk_hw *hw, unsigned long parent_rate) { struct clk_pll *pll = to_clk_pll(hw); - unsigned int pllr; - u16 mul; - u8 div; - regmap_read(pll->regmap, PLL_REG(pll->id), &pllr); - - div = PLL_DIV(pllr); - mul = PLL_MUL(pllr, pll->layout); - - if (!div || !mul) - return 0; - - return (parent_rate / div) * (mul + 1); + return (parent_rate / pll->div) * (pll->mul + 1); } static long clk_pll_get_best_div_mul(struct clk_pll *pll, unsigned long rate, From 486684887ab588309c3497785412bb67e162b7a6 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 26 Apr 2018 14:46:29 +1000 Subject: [PATCH 066/970] md: fix two problems with setting the "re-add" device state. commit 011abdc9df559ec75779bb7c53a744c69b2a94c6 upstream. If "re-add" is written to the "state" file for a device which is faulty, this has an effect similar to removing and re-adding the device. It should take up the same slot in the array that it previously had, and an accelerated (e.g. bitmap-based) rebuild should happen. The slot that "it previously had" is determined by rdev->saved_raid_disk. However this is not set when a device fails (only when a device is added), and it is cleared when resync completes. This means that "re-add" will normally work once, but may not work a second time. This patch includes two fixes. 1/ when a device fails, record the ->raid_disk value in ->saved_raid_disk before clearing ->raid_disk 2/ when "re-add" is written to a device for which ->saved_raid_disk is not set, fail. I think this is suitable for stable as it can cause re-adding a device to be forced to do a full resync which takes a lot longer and so puts data at more risk. Cc: (v4.1) Fixes: 97f6cd39da22 ("md-cluster: re-add capabilities") Signed-off-by: NeilBrown Reviewed-by: Goldwyn Rodrigues Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman --- drivers/md/md.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index cae8f3c12e32..3bb985679f34 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2694,7 +2694,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len) err = 0; } } else if (cmd_match(buf, "re-add")) { - if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1)) { + if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1) && + rdev->saved_raid_disk >= 0) { /* clear_bit is performed _after_ all the devices * have their local Faulty bit cleared. If any writes * happen in the meantime in the local node, they @@ -8272,6 +8273,7 @@ static int remove_and_add_spares(struct mddev *mddev, if (mddev->pers->hot_remove_disk( mddev, rdev) == 0) { sysfs_unlink_rdev(mddev, rdev); + rdev->saved_raid_disk = rdev->raid_disk; rdev->raid_disk = -1; removed++; } From ec7ee4d60f25f9a4ba264090b2671346d078ed2a Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Mon, 4 Jun 2018 10:39:01 +0100 Subject: [PATCH 067/970] rpmsg: smd: do not use mananged resources for endpoints and channels commit 4a2e84c6ed85434ce7843e4844b4d3263f7e233b upstream. All the managed resources would be freed by the time release function is invoked. Handling such memory in qcom_smd_edge_release() would do bad things. Found this issue while testing Audio usecase where the dsp is started up and shutdown in a loop. This patch fixes this issue by using simple kzalloc for allocating channel->name and channel which is then freed in qcom_smd_edge_release(). Without this patch restarting a remoteproc would crash the system. Fixes: 53e2822e56c7 ("rpmsg: Introduce Qualcomm SMD backend") Cc: Signed-off-by: Srinivas Kandagatla Signed-off-by: Bjorn Andersson Signed-off-by: Greg Kroah-Hartman --- drivers/rpmsg/qcom_smd.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/rpmsg/qcom_smd.c b/drivers/rpmsg/qcom_smd.c index 1d4770c02e57..fd3d9419c468 100644 --- a/drivers/rpmsg/qcom_smd.c +++ b/drivers/rpmsg/qcom_smd.c @@ -1006,12 +1006,12 @@ static struct qcom_smd_channel *qcom_smd_create_channel(struct qcom_smd_edge *ed void *info; int ret; - channel = devm_kzalloc(&edge->dev, sizeof(*channel), GFP_KERNEL); + channel = kzalloc(sizeof(*channel), GFP_KERNEL); if (!channel) return ERR_PTR(-ENOMEM); channel->edge = edge; - channel->name = devm_kstrdup(&edge->dev, name, GFP_KERNEL); + channel->name = kstrdup(name, GFP_KERNEL); if (!channel->name) return ERR_PTR(-ENOMEM); @@ -1061,8 +1061,8 @@ static struct qcom_smd_channel *qcom_smd_create_channel(struct qcom_smd_edge *ed return channel; free_name_and_channel: - devm_kfree(&edge->dev, channel->name); - devm_kfree(&edge->dev, channel); + kfree(channel->name); + kfree(channel); return ERR_PTR(ret); } @@ -1279,13 +1279,13 @@ static int qcom_smd_parse_edge(struct device *dev, */ static void qcom_smd_edge_release(struct device *dev) { - struct qcom_smd_channel *channel; + struct qcom_smd_channel *channel, *tmp; struct qcom_smd_edge *edge = to_smd_edge(dev); - list_for_each_entry(channel, &edge->channels, list) { - SET_RX_CHANNEL_INFO(channel, state, SMD_CHANNEL_CLOSED); - SET_RX_CHANNEL_INFO(channel, head, 0); - SET_RX_CHANNEL_INFO(channel, tail, 0); + list_for_each_entry_safe(channel, tmp, &edge->channels, list) { + list_del(&channel->list); + kfree(channel->name); + kfree(channel); } kfree(edge); From 9eb99e738beb405cb69ce0c244e6bb0ee9666588 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Wed, 16 May 2018 22:17:03 +0200 Subject: [PATCH 068/970] ubi: fastmap: Cancel work upon detach commit 6e7d80161066c99d12580d1b985cb1408bb58cf1 upstream. Ben Hutchings pointed out that 29b7a6fa1ec0 ("ubi: fastmap: Don't flush fastmap work on detach") does not really fix the problem, it just reduces the risk to hit the race window where fastmap work races against free()'ing ubi->volumes[]. The correct approach is making sure that no more fastmap work is in progress before we free ubi data structures. So we cancel fastmap work right after the ubi background thread is stopped. By setting ubi->thread_enabled to zero we make sure that no further work tries to wake the thread. Fixes: 29b7a6fa1ec0 ("ubi: fastmap: Don't flush fastmap work on detach") Fixes: 74cdaf24004a ("UBI: Fastmap: Fix memory leaks while closing the WL sub-system") Cc: stable@vger.kernel.org Cc: Ben Hutchings Cc: Martin Townsend Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/ubi/build.c | 3 +++ drivers/mtd/ubi/wl.c | 4 +--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c index 6cb5ca52cb5a..ad2b57c6b13f 100644 --- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -1137,6 +1137,9 @@ int ubi_detach_mtd_dev(int ubi_num, int anyway) */ get_device(&ubi->dev); +#ifdef CONFIG_MTD_UBI_FASTMAP + cancel_work_sync(&ubi->fm_work); +#endif ubi_debugfs_exit_dev(ubi); uif_close(ubi); diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 668b46202507..23a6986d512b 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -1505,6 +1505,7 @@ int ubi_thread(void *u) } dbg_wl("background thread \"%s\" is killed", ubi->bgt_name); + ubi->thread_enabled = 0; return 0; } @@ -1514,9 +1515,6 @@ int ubi_thread(void *u) */ static void shutdown_work(struct ubi_device *ubi) { -#ifdef CONFIG_MTD_UBI_FASTMAP - flush_work(&ubi->fm_work); -#endif while (!list_empty(&ubi->works)) { struct ubi_work *wrk; From df15c6eeab46aac3622821e2659677c3eb4abf9d Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Mon, 28 May 2018 22:04:32 +0200 Subject: [PATCH 069/970] ubi: fastmap: Correctly handle interrupted erasures in EBA commit 781932375ffc6411713ee0926ccae8596ed0261c upstream. Fastmap cannot track the LEB unmap operation, therefore it can happen that after an interrupted erasure the mapping still looks good from Fastmap's point of view, while reading from the PEB will cause an ECC error and confuses the upper layer. Instead of teaching users of UBI how to deal with that, we read back the VID header and check for errors. If the PEB is empty or shows ECC errors we fixup the mapping and schedule the PEB for erasure. Fixes: dbb7d2a88d2a ("UBI: Add fastmap core") Cc: Reported-by: martin bayern Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/ubi/eba.c | 90 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 89 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c index 388e46be6ad9..d0884bd9d955 100644 --- a/drivers/mtd/ubi/eba.c +++ b/drivers/mtd/ubi/eba.c @@ -490,6 +490,82 @@ int ubi_eba_unmap_leb(struct ubi_device *ubi, struct ubi_volume *vol, return err; } +#ifdef CONFIG_MTD_UBI_FASTMAP +/** + * check_mapping - check and fixup a mapping + * @ubi: UBI device description object + * @vol: volume description object + * @lnum: logical eraseblock number + * @pnum: physical eraseblock number + * + * Checks whether a given mapping is valid. Fastmap cannot track LEB unmap + * operations, if such an operation is interrupted the mapping still looks + * good, but upon first read an ECC is reported to the upper layer. + * Normaly during the full-scan at attach time this is fixed, for Fastmap + * we have to deal with it while reading. + * If the PEB behind a LEB shows this symthom we change the mapping to + * %UBI_LEB_UNMAPPED and schedule the PEB for erasure. + * + * Returns 0 on success, negative error code in case of failure. + */ +static int check_mapping(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, + int *pnum) +{ + int err; + struct ubi_vid_io_buf *vidb; + + if (!ubi->fast_attach) + return 0; + + vidb = ubi_alloc_vid_buf(ubi, GFP_NOFS); + if (!vidb) + return -ENOMEM; + + err = ubi_io_read_vid_hdr(ubi, *pnum, vidb, 0); + if (err > 0 && err != UBI_IO_BITFLIPS) { + int torture = 0; + + switch (err) { + case UBI_IO_FF: + case UBI_IO_FF_BITFLIPS: + case UBI_IO_BAD_HDR: + case UBI_IO_BAD_HDR_EBADMSG: + break; + default: + ubi_assert(0); + } + + if (err == UBI_IO_BAD_HDR_EBADMSG || err == UBI_IO_FF_BITFLIPS) + torture = 1; + + down_read(&ubi->fm_eba_sem); + vol->eba_tbl->entries[lnum].pnum = UBI_LEB_UNMAPPED; + up_read(&ubi->fm_eba_sem); + ubi_wl_put_peb(ubi, vol->vol_id, lnum, *pnum, torture); + + *pnum = UBI_LEB_UNMAPPED; + } else if (err < 0) { + ubi_err(ubi, "unable to read VID header back from PEB %i: %i", + *pnum, err); + + goto out_free; + } + + err = 0; + +out_free: + ubi_free_vid_buf(vidb); + + return err; +} +#else +static int check_mapping(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, + int *pnum) +{ + return 0; +} +#endif + /** * ubi_eba_read_leb - read data. * @ubi: UBI device description object @@ -522,7 +598,13 @@ int ubi_eba_read_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, return err; pnum = vol->eba_tbl->entries[lnum].pnum; - if (pnum < 0) { + if (pnum >= 0) { + err = check_mapping(ubi, vol, lnum, &pnum); + if (err < 0) + goto out_unlock; + } + + if (pnum == UBI_LEB_UNMAPPED) { /* * The logical eraseblock is not mapped, fill the whole buffer * with 0xFF bytes. The exception is static volumes for which @@ -930,6 +1012,12 @@ int ubi_eba_write_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, return err; pnum = vol->eba_tbl->entries[lnum].pnum; + if (pnum >= 0) { + err = check_mapping(ubi, vol, lnum, &pnum); + if (err < 0) + goto out; + } + if (pnum >= 0) { dbg_eba("write %d bytes at offset %d of LEB %d:%d, PEB %d", len, offset, vol_id, lnum, pnum); From da05be555697c3fdd3f1507b59917c1cc3b03566 Mon Sep 17 00:00:00 2001 From: Silvio Cesare Date: Fri, 4 May 2018 13:44:02 +1000 Subject: [PATCH 070/970] UBIFS: Fix potential integer overflow in allocation commit 353748a359f1821ee934afc579cf04572406b420 upstream. There is potential for the size and len fields in ubifs_data_node to be too large causing either a negative value for the length fields or an integer overflow leading to an incorrect memory allocation. Likewise, when the len field is small, an integer underflow may occur. Signed-off-by: Silvio Cesare Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/journal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c index 7d764e3b6c79..504658fd0d08 100644 --- a/fs/ubifs/journal.c +++ b/fs/ubifs/journal.c @@ -1265,7 +1265,7 @@ static int recomp_data_node(const struct ubifs_info *c, int err, len, compr_type, out_len; out_len = le32_to_cpu(dn->size); - buf = kmalloc(out_len * WORST_COMPR_FACTOR, GFP_NOFS); + buf = kmalloc_array(out_len, WORST_COMPR_FACTOR, GFP_NOFS); if (!buf) return -ENOMEM; From 47f764c65c561ba1dcf7c6a9688c82202ede58ca Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 20 Nov 2017 11:45:44 +0100 Subject: [PATCH 071/970] backlight: as3711_bl: Fix Device Tree node lookup commit 4a9c8bb2aca5b5a2a15744333729745dd9903562 upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent mfd node was also prematurely freed. Cc: stable # 3.10 Fixes: 59eb2b5e57ea ("drivers/video/backlight/as3711_bl.c: add OF support") Signed-off-by: Johan Hovold Acked-by: Daniel Thompson Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/video/backlight/as3711_bl.c | 33 ++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/video/backlight/as3711_bl.c b/drivers/video/backlight/as3711_bl.c index 734a9158946b..e55304d5cf07 100644 --- a/drivers/video/backlight/as3711_bl.c +++ b/drivers/video/backlight/as3711_bl.c @@ -262,10 +262,10 @@ static int as3711_bl_register(struct platform_device *pdev, static int as3711_backlight_parse_dt(struct device *dev) { struct as3711_bl_pdata *pdata = dev_get_platdata(dev); - struct device_node *bl = - of_find_node_by_name(dev->parent->of_node, "backlight"), *fb; + struct device_node *bl, *fb; int ret; + bl = of_get_child_by_name(dev->parent->of_node, "backlight"); if (!bl) { dev_dbg(dev, "backlight node not found\n"); return -ENODEV; @@ -279,7 +279,7 @@ static int as3711_backlight_parse_dt(struct device *dev) if (pdata->su1_max_uA <= 0) ret = -EINVAL; if (ret < 0) - return ret; + goto err_put_bl; } fb = of_parse_phandle(bl, "su2-dev", 0); @@ -292,7 +292,7 @@ static int as3711_backlight_parse_dt(struct device *dev) if (pdata->su2_max_uA <= 0) ret = -EINVAL; if (ret < 0) - return ret; + goto err_put_bl; if (of_find_property(bl, "su2-feedback-voltage", NULL)) { pdata->su2_feedback = AS3711_SU2_VOLTAGE; @@ -314,8 +314,10 @@ static int as3711_backlight_parse_dt(struct device *dev) pdata->su2_feedback = AS3711_SU2_CURR_AUTO; count++; } - if (count != 1) - return -EINVAL; + if (count != 1) { + ret = -EINVAL; + goto err_put_bl; + } count = 0; if (of_find_property(bl, "su2-fbprot-lx-sd4", NULL)) { @@ -334,8 +336,10 @@ static int as3711_backlight_parse_dt(struct device *dev) pdata->su2_fbprot = AS3711_SU2_GPIO4; count++; } - if (count != 1) - return -EINVAL; + if (count != 1) { + ret = -EINVAL; + goto err_put_bl; + } count = 0; if (of_find_property(bl, "su2-auto-curr1", NULL)) { @@ -355,11 +359,20 @@ static int as3711_backlight_parse_dt(struct device *dev) * At least one su2-auto-curr* must be specified iff * AS3711_SU2_CURR_AUTO is used */ - if (!count ^ (pdata->su2_feedback != AS3711_SU2_CURR_AUTO)) - return -EINVAL; + if (!count ^ (pdata->su2_feedback != AS3711_SU2_CURR_AUTO)) { + ret = -EINVAL; + goto err_put_bl; + } } + of_node_put(bl); + return 0; + +err_put_bl: + of_node_put(bl); + + return ret; } static int as3711_backlight_probe(struct platform_device *pdev) From a89e596f129117b902210c57599e198d1441ee93 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 20 Nov 2017 11:45:45 +0100 Subject: [PATCH 072/970] backlight: max8925_bl: Fix Device Tree node lookup commit d1cc0ec3da23e44c23712579515494b374f111c9 upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent mfd node was also prematurely freed, while the child backlight node was leaked. Cc: stable # 3.9 Fixes: 47ec340cb8e2 ("mfd: max8925: Support dt for backlight") Signed-off-by: Johan Hovold Acked-by: Daniel Thompson Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/video/backlight/max8925_bl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/video/backlight/max8925_bl.c b/drivers/video/backlight/max8925_bl.c index 7b738d60ecc2..f3aa6088f1d9 100644 --- a/drivers/video/backlight/max8925_bl.c +++ b/drivers/video/backlight/max8925_bl.c @@ -116,7 +116,7 @@ static void max8925_backlight_dt_init(struct platform_device *pdev) if (!pdata) return; - np = of_find_node_by_name(nproot, "backlight"); + np = of_get_child_by_name(nproot, "backlight"); if (!np) { dev_err(&pdev->dev, "failed to find backlight node\n"); return; @@ -125,6 +125,8 @@ static void max8925_backlight_dt_init(struct platform_device *pdev) if (!of_property_read_u32(np, "maxim,max8925-dual-string", &val)) pdata->dual_string = val; + of_node_put(np); + pdev->dev.platform_data = pdata; } From 099fae46d8dfed70744321bbbd265cc7af2982f3 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 20 Nov 2017 11:45:46 +0100 Subject: [PATCH 073/970] backlight: tps65217_bl: Fix Device Tree node lookup commit 2b12dfa124dbadf391cb9a616aaa6b056823bf75 upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. This would only cause trouble if the child node is missing while there is an unrelated node named "backlight" elsewhere in the tree. Cc: stable # 3.7 Fixes: eebfdc17cc6c ("backlight: Add TPS65217 WLED driver") Signed-off-by: Johan Hovold Acked-by: Daniel Thompson Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/video/backlight/tps65217_bl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/backlight/tps65217_bl.c b/drivers/video/backlight/tps65217_bl.c index fd524ad860a5..f45d0c9467db 100644 --- a/drivers/video/backlight/tps65217_bl.c +++ b/drivers/video/backlight/tps65217_bl.c @@ -184,11 +184,11 @@ static struct tps65217_bl_pdata * tps65217_bl_parse_dt(struct platform_device *pdev) { struct tps65217 *tps = dev_get_drvdata(pdev->dev.parent); - struct device_node *node = of_node_get(tps->dev->of_node); + struct device_node *node; struct tps65217_bl_pdata *pdata, *err; u32 val; - node = of_find_node_by_name(node, "backlight"); + node = of_get_child_by_name(tps->dev->of_node, "backlight"); if (!node) return ERR_PTR(-ENODEV); From 49d98a8e1f55f8406c45ce2b88eb4912d9877f67 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 24 Apr 2018 18:00:10 +0300 Subject: [PATCH 074/970] mfd: intel-lpss: Program REMAP register in PIO mode commit d28b62520830b2d0bffa2d98e81afc9f5e537e8b upstream. According to documentation REMAP register has to be programmed in either DMA or PIO mode of the slice. Move the DMA capability check below to let REMAP register be programmed in PIO mode. Cc: stable@vger.kernel.org # 4.3+ Fixes: 4b45efe85263 ("mfd: Add support for Intel Sunrisepoint LPSS devices") Signed-off-by: Andy Shevchenko Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/intel-lpss.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mfd/intel-lpss.c b/drivers/mfd/intel-lpss.c index 70c646b0097d..19ac8bc8e7ea 100644 --- a/drivers/mfd/intel-lpss.c +++ b/drivers/mfd/intel-lpss.c @@ -275,11 +275,11 @@ static void intel_lpss_init_dev(const struct intel_lpss *lpss) intel_lpss_deassert_reset(lpss); + intel_lpss_set_remap_addr(lpss); + if (!intel_lpss_has_idma(lpss)) return; - intel_lpss_set_remap_addr(lpss); - /* Make sure that SPI multiblock DMA transfers are re-enabled */ if (lpss->type == LPSS_DEV_SPI) writel(value, lpss->priv + LPSS_PRIV_SSP_REG); From dfd2eff6f457f2193f0dfd0de5cb3aa49df3d3e9 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Mon, 4 Jun 2018 15:56:54 +0300 Subject: [PATCH 075/970] perf tools: Fix symbol and object code resolution for vdso32 and vdsox32 commit aef4feace285f27c8ed35830a5d575bec7f3e90a upstream. Fix __kmod_path__parse() so that perf tools does not treat vdso32 and vdsox32 as kernel modules and fail to find the object. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: Wang Nan Cc: stable@vger.kernel.org Fixes: 1f121b03d058 ("perf tools: Deal with kernel module names in '[]' correctly") Link: http://lkml.kernel.org/r/1528117014-30032-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/dso.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c index d2c6cdd9d42b..8bec05365aae 100644 --- a/tools/perf/util/dso.c +++ b/tools/perf/util/dso.c @@ -253,6 +253,8 @@ int __kmod_path__parse(struct kmod_path *m, const char *path, if ((strncmp(name, "[kernel.kallsyms]", 17) == 0) || (strncmp(name, "[guest.kernel.kallsyms", 22) == 0) || (strncmp(name, "[vdso]", 6) == 0) || + (strncmp(name, "[vdso32]", 8) == 0) || + (strncmp(name, "[vdsox32]", 9) == 0) || (strncmp(name, "[vsyscall]", 10) == 0)) { m->kmod = false; From 31606f7f56de573470f0b7037b764df381a87fa6 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 31 May 2018 13:23:42 +0300 Subject: [PATCH 076/970] perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING commit dbcb82b93f3e8322891e47472c89e63058b81e99 upstream. sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched. In one case, INTEL_PT_SS_NOT_TRACING state was not correctly transitioning to INTEL_PT_SS_TRACING state due to a missing case clause. Add it. Signed-off-by: Adrian Hunter Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/intel-pt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index b1161d725ce9..5fb78b661fd3 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -1344,6 +1344,7 @@ static int intel_pt_sample(struct intel_pt_queue *ptq) if (intel_pt_is_switch_ip(ptq, state->to_ip)) { switch (ptq->switch_state) { + case INTEL_PT_SS_NOT_TRACING: case INTEL_PT_SS_UNKNOWN: case INTEL_PT_SS_EXPECTING_SWITCH_IP: err = intel_pt_next_tid(pt, ptq); From 282f1f66b5a0f4875f2562144f5e5c80bc86dfc1 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 31 May 2018 13:23:43 +0300 Subject: [PATCH 077/970] perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP commit bd2e49ec48feb1855f7624198849eea4610e2286 upstream. It is possible to have a CBR packet between a FUP packet and corresponding TIP packet. Stop treating it as an error. Signed-off-by: Adrian Hunter Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index cac39532c057..0fd7697b9d35 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -1517,7 +1517,6 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder) case INTEL_PT_PSB: case INTEL_PT_TSC: case INTEL_PT_TMA: - case INTEL_PT_CBR: case INTEL_PT_MODE_TSX: case INTEL_PT_BAD: case INTEL_PT_PSBEND: @@ -1526,6 +1525,10 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder) decoder->pkt_step = 0; return -ENOENT; + case INTEL_PT_CBR: + intel_pt_calc_cbr(decoder); + break; + case INTEL_PT_OVF: return intel_pt_overflow(decoder); From 4213d9b8cdb10d4d29fd232898692a1a56c1fd58 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 31 May 2018 13:23:44 +0300 Subject: [PATCH 078/970] perf intel-pt: Fix MTC timing after overflow commit dd27b87ab5fcf3ea1c060b5e3ab5d31cc78e9f4c upstream. On some platforms, overflows will clear before MTC wraparound, and there is no following TSC/TMA packet. In that case the previous TMA is valid. Since there will be a valid TMA either way, stop setting 'have_tma' to false upon overflow. Signed-off-by: Adrian Hunter Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 0fd7697b9d35..21c81da190b2 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -1298,7 +1298,6 @@ static int intel_pt_overflow(struct intel_pt_decoder *decoder) { intel_pt_log("ERROR: Buffer overflow\n"); intel_pt_clear_tx_flags(decoder); - decoder->have_tma = false; decoder->cbr = 0; decoder->timestamp_insn_cnt = 0; decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC; From d129ab791de910a81bb53f0053d3476c26982aea Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 31 May 2018 13:23:45 +0300 Subject: [PATCH 079/970] perf intel-pt: Fix "Unexpected indirect branch" error commit 9fb523363f6e3984457fee95bb7019395384ffa7 upstream. Some Atom CPUs can produce FUP packets that contain NLIP (next linear instruction pointer) instead of CLIP (current linear instruction pointer). That will result in "Unexpected indirect branch" errors. Fix by comparing IP to NLIP in that case. Signed-off-by: Adrian Hunter Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- .../util/intel-pt-decoder/intel-pt-decoder.c | 17 +++++++++++++++-- .../util/intel-pt-decoder/intel-pt-decoder.h | 9 +++++++++ tools/perf/util/intel-pt.c | 4 ++++ 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c index 21c81da190b2..d27715ff9a5f 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c @@ -112,6 +112,7 @@ struct intel_pt_decoder { bool have_cyc; bool fixup_last_mtc; bool have_last_ip; + enum intel_pt_param_flags flags; uint64_t pos; uint64_t last_ip; uint64_t ip; @@ -215,6 +216,8 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params) decoder->data = params->data; decoder->return_compression = params->return_compression; + decoder->flags = params->flags; + decoder->period = params->period; decoder->period_type = params->period_type; @@ -1012,6 +1015,15 @@ static int intel_pt_walk_insn(struct intel_pt_decoder *decoder, return err; } +static inline bool intel_pt_fup_with_nlip(struct intel_pt_decoder *decoder, + struct intel_pt_insn *intel_pt_insn, + uint64_t ip, int err) +{ + return decoder->flags & INTEL_PT_FUP_WITH_NLIP && !err && + intel_pt_insn->branch == INTEL_PT_BR_INDIRECT && + ip == decoder->ip + intel_pt_insn->length; +} + static int intel_pt_walk_fup(struct intel_pt_decoder *decoder) { struct intel_pt_insn intel_pt_insn; @@ -1024,7 +1036,8 @@ static int intel_pt_walk_fup(struct intel_pt_decoder *decoder) err = intel_pt_walk_insn(decoder, &intel_pt_insn, ip); if (err == INTEL_PT_RETURN) return 0; - if (err == -EAGAIN) { + if (err == -EAGAIN || + intel_pt_fup_with_nlip(decoder, &intel_pt_insn, ip, err)) { if (decoder->set_fup_tx_flags) { decoder->set_fup_tx_flags = false; decoder->tx_flags = decoder->fup_tx_flags; @@ -1034,7 +1047,7 @@ static int intel_pt_walk_fup(struct intel_pt_decoder *decoder) decoder->state.flags = decoder->fup_tx_flags; return 0; } - return err; + return -EAGAIN; } decoder->set_fup_tx_flags = false; if (err) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h index 9ae4df1dcedc..2fe8f4c5aeb5 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h +++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h @@ -53,6 +53,14 @@ enum { INTEL_PT_ERR_MAX, }; +enum intel_pt_param_flags { + /* + * FUP packet can contain next linear instruction pointer instead of + * current linear instruction pointer. + */ + INTEL_PT_FUP_WITH_NLIP = 1 << 0, +}; + struct intel_pt_state { enum intel_pt_sample_type type; int err; @@ -92,6 +100,7 @@ struct intel_pt_params { unsigned int mtc_period; uint32_t tsc_ctc_ratio_n; uint32_t tsc_ctc_ratio_d; + enum intel_pt_param_flags flags; }; struct intel_pt_decoder; diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index 5fb78b661fd3..d40ab4cf8932 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -752,6 +752,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt, unsigned int queue_nr) { struct intel_pt_params params = { .get_trace = 0, }; + struct perf_env *env = pt->machine->env; struct intel_pt_queue *ptq; ptq = zalloc(sizeof(struct intel_pt_queue)); @@ -832,6 +833,9 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt, } } + if (env->cpuid && !strncmp(env->cpuid, "GenuineIntel,6,92,", 18)) + params.flags |= INTEL_PT_FUP_WITH_NLIP; + ptq->decoder = intel_pt_decoder_new(¶ms); if (!ptq->decoder) goto out_free; From d6a267b4c5f9bb3db42e3fe6092d1059f08d03bf Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 7 Jun 2018 14:30:02 +0300 Subject: [PATCH 080/970] perf intel-pt: Fix packet decoding of CYC packets commit 621a5a327c1e36ffd7bb567f44a559f64f76358f upstream. Use a 64-bit type so that the cycle count is not limited to 32-bits. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1528371002-8862-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c index 7528ae4f7e28..e5c6caf913f3 100644 --- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c +++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c @@ -281,7 +281,7 @@ static int intel_pt_get_cyc(unsigned int byte, const unsigned char *buf, if (len < offs) return INTEL_PT_NEED_MORE_BYTES; byte = buf[offs++]; - payload |= (byte >> 1) << shift; + payload |= ((uint64_t)byte >> 1) << shift; } packet->type = INTEL_PT_CYC; From 1e6b50b6b68e25a8ff972a1e1279a40cd7adc4fd Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 11 Apr 2018 11:47:32 -0400 Subject: [PATCH 081/970] media: v4l2-compat-ioctl32: prevent go past max size commit ea72fbf588ac9c017224dcdaa2019ff52ca56fee upstream. As warned by smatch: drivers/media/v4l2-core/v4l2-compat-ioctl32.c:879 put_v4l2_ext_controls32() warn: check for integer overflow 'count' The access_ok() logic should check for too big arrays too. Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c index a9fc64557c53..f1f697296ca0 100644 --- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c +++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c @@ -864,7 +864,7 @@ static int put_v4l2_ext_controls32(struct file *file, get_user(kcontrols, &kp->controls)) return -EFAULT; - if (!count) + if (!count || count > (U32_MAX/sizeof(*ucontrols))) return 0; if (get_user(p, &up->controls)) return -EFAULT; From 1a4726ba1dedcdb92bc15a70e65150e9222ceed1 Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Mon, 26 Mar 2018 02:06:16 -0400 Subject: [PATCH 082/970] media: cx231xx: Add support for AverMedia DVD EZMaker 7 commit 29e61d6ef061b012d320327af7dbb3990e75be45 upstream. User reports AverMedia DVD EZMaker 7 can be driven by VIDEO_GRABBER. Add the device to the id_table to make it work. BugLink: https://bugs.launchpad.net/bugs/1620762 Cc: stable@vger.kernel.org Signed-off-by: Kai-Heng Feng Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/cx231xx/cx231xx-cards.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c index 921cf1edb3b1..69156affd0ae 100644 --- a/drivers/media/usb/cx231xx/cx231xx-cards.c +++ b/drivers/media/usb/cx231xx/cx231xx-cards.c @@ -864,6 +864,9 @@ struct usb_device_id cx231xx_id_table[] = { .driver_info = CX231XX_BOARD_CNXT_RDE_250}, {USB_DEVICE(0x0572, 0x58A0), .driver_info = CX231XX_BOARD_CNXT_RDU_250}, + /* AverMedia DVD EZMaker 7 */ + {USB_DEVICE(0x07ca, 0xc039), + .driver_info = CX231XX_BOARD_CNXT_VIDEO_GRABBER}, {USB_DEVICE(0x2040, 0xb110), .driver_info = CX231XX_BOARD_HAUPPAUGE_USB2_FM_PAL}, {USB_DEVICE(0x2040, 0xb111), From dc00f08645be1f6a1319af6c6c1117137f787efc Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 5 Apr 2018 05:30:52 -0400 Subject: [PATCH 083/970] media: dvb_frontend: fix locking issues at dvb_frontend_get_event() commit 76d81243a487c09619822ef8e7201a756e58a87d upstream. As warned by smatch: drivers/media/dvb-core/dvb_frontend.c:314 dvb_frontend_get_event() warn: inconsistent returns 'sem:&fepriv->sem'. Locked on: line 288 line 295 line 306 line 314 Unlocked on: line 303 The lock implementation for get event is wrong, as, if an interrupt occurs, down_interruptible() will fail, and the routine will call up() twice when userspace calls the ioctl again. The bad code is there since when Linux migrated to git, in 2005. Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/dvb-core/dvb_frontend.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c index 01511e5a5566..2f054db8807b 100644 --- a/drivers/media/dvb-core/dvb_frontend.c +++ b/drivers/media/dvb-core/dvb_frontend.c @@ -251,8 +251,20 @@ static void dvb_frontend_add_event(struct dvb_frontend *fe, wake_up_interruptible (&events->wait_queue); } +static int dvb_frontend_test_event(struct dvb_frontend_private *fepriv, + struct dvb_fe_events *events) +{ + int ret; + + up(&fepriv->sem); + ret = events->eventw != events->eventr; + down(&fepriv->sem); + + return ret; +} + static int dvb_frontend_get_event(struct dvb_frontend *fe, - struct dvb_frontend_event *event, int flags) + struct dvb_frontend_event *event, int flags) { struct dvb_frontend_private *fepriv = fe->frontend_priv; struct dvb_fe_events *events = &fepriv->events; @@ -270,13 +282,8 @@ static int dvb_frontend_get_event(struct dvb_frontend *fe, if (flags & O_NONBLOCK) return -EWOULDBLOCK; - up(&fepriv->sem); - - ret = wait_event_interruptible (events->wait_queue, - events->eventw != events->eventr); - - if (down_interruptible (&fepriv->sem)) - return -ERESTARTSYS; + ret = wait_event_interruptible(events->wait_queue, + dvb_frontend_test_event(fepriv, events)); if (ret < 0) return ret; From 40d79a61957a7d59b079d30fbbc5e51c3b9b239a Mon Sep 17 00:00:00 2001 From: Scott Mayhew Date: Mon, 7 May 2018 09:01:08 -0400 Subject: [PATCH 084/970] nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdir commit 9c2ece6ef67e9d376f32823086169b489c422ed0 upstream. nfsd4_readdir_rsize restricts rd_maxcount to svc_max_payload when estimating the size of the readdir reply, but nfsd_encode_readdir restricts it to INT_MAX when encoding the reply. This can result in log messages like "kernel: RPC request reserved 32896 but used 1049444". Restrict rd_dircount similarly (no reason it should be larger than svc_max_payload). Signed-off-by: Scott Mayhew Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfs4xdr.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2c4f7a22e128..bdbd9e6d1ace 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3638,7 +3638,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 nfserr = nfserr_resource; goto err_no_verf; } - maxcount = min_t(u32, readdir->rd_maxcount, INT_MAX); + maxcount = svc_max_payload(resp->rqstp); + maxcount = min_t(u32, readdir->rd_maxcount, maxcount); /* * Note the rfc defines rd_maxcount as the size of the * READDIR4resok structure, which includes the verifier above @@ -3652,7 +3653,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 /* RFC 3530 14.2.24 allows us to ignore dircount when it's 0: */ if (!readdir->rd_dircount) - readdir->rd_dircount = INT_MAX; + readdir->rd_dircount = svc_max_payload(resp->rqstp); readdir->xdr = xdr; readdir->rd_maxcount = maxcount; From 5b7f582e808d6b39eb0751deb1b6685813b05847 Mon Sep 17 00:00:00 2001 From: Dave Wysochanski Date: Tue, 29 May 2018 17:47:30 -0400 Subject: [PATCH 085/970] NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_message commit d68894800ec5712d7ddf042356f11e36f87d7f78 upstream. In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d' that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which is a stack char array variable of length NFS_UINT_MAXLEN == 11. If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion overflows into a negative value, for example: crash> p (unsigned) (0x80000000) $1 = 2147483648 crash> p (signed) (0x80000000) $2 = -2147483648 The '-' sign is written to the buffer and this causes a 1 byte overflow when the NULL byte is written, which corrupts kernel stack memory. If CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic: [11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c [11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1 [11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014 [11558053.644462] ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac [11558053.646430] ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8 [11558053.648313] ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c [11558053.650107] Call Trace: [11558053.651347] [] dump_stack+0x19/0x1b [11558053.653013] [] panic+0xe3/0x1f2 [11558053.666240] [] ? kfree+0x103/0x140 [11558053.682589] [] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.689710] [] __stack_chk_fail+0x1b/0x30 [11558053.691619] [] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.693867] [] rpc_pipe_write+0x56/0x70 [sunrpc] [11558053.695763] [] vfs_write+0xbd/0x1e0 [11558053.702236] [] ? task_work_run+0xac/0xe0 [11558053.704215] [] SyS_write+0x7f/0xe0 [11558053.709674] [] system_call_fastpath+0x16/0x1b Fix this by calling the internally defined nfs_map_numeric_to_string() function which properly uses '%u' to convert this __u32. For consistency, also replace the one other place where snprintf is called. Signed-off-by: Dave Wysochanski Reported-by: Stephen Johnston Fixes: cf4ab538f1516 ("NFSv4: Fix the string length returned by the idmapper") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4idmap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/nfs/nfs4idmap.c b/fs/nfs/nfs4idmap.c index f1160cdd4682..3d4602d66068 100644 --- a/fs/nfs/nfs4idmap.c +++ b/fs/nfs/nfs4idmap.c @@ -343,7 +343,7 @@ static ssize_t nfs_idmap_lookup_name(__u32 id, const char *type, char *buf, int id_len; ssize_t ret; - id_len = snprintf(id_str, sizeof(id_str), "%u", id); + id_len = nfs_map_numeric_to_string(id, id_str, sizeof(id_str)); ret = nfs_idmap_get_key(id_str, id_len, type, buf, buflen, idmap); if (ret < 0) return -EINVAL; @@ -626,7 +626,8 @@ static int nfs_idmap_read_and_verify_message(struct idmap_msg *im, if (strcmp(upcall->im_name, im->im_name) != 0) break; /* Note: here we store the NUL terminator too */ - len = sprintf(id_str, "%d", im->im_id) + 1; + len = 1 + nfs_map_numeric_to_string(im->im_id, id_str, + sizeof(id_str)); ret = nfs_idmap_instantiate(key, authkey, id_str, len); break; case IDMAP_CONV_IDTONAME: From cdc83c366977044d7717d56f6bb1aa247e1dc582 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 9 Jun 2018 12:43:06 -0400 Subject: [PATCH 086/970] NFSv4: Revert commit 5f83d86cf531d ("NFSv4.x: Fix wraparound issues..") commit fc40724fc6731d90cc7fb6d62d66135f85a33dd2 upstream. The correct behaviour for NFSv4 sequence IDs is to wrap around to the value 0 after 0xffffffff. See https://tools.ietf.org/html/rfc5661#section-2.10.6.1 Fixes: 5f83d86cf531d ("NFSv4.x: Fix wraparound issues when validing...") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/callback_proc.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c index e9aa235e9d10..2e7ebd9d7168 100644 --- a/fs/nfs/callback_proc.c +++ b/fs/nfs/callback_proc.c @@ -402,11 +402,8 @@ validate_seqid(const struct nfs4_slot_table *tbl, const struct nfs4_slot *slot, return htonl(NFS4ERR_SEQ_FALSE_RETRY); } - /* Wraparound */ - if (unlikely(slot->seq_nr == 0xFFFFFFFFU)) { - if (args->csa_sequenceid == 1) - return htonl(NFS4_OK); - } else if (likely(args->csa_sequenceid == slot->seq_nr + 1)) + /* Note: wraparound relies on seq_nr being of type u32 */ + if (likely(args->csa_sequenceid == slot->seq_nr + 1)) return htonl(NFS4_OK); /* Misordered request */ From 7673ca3c93414faf90fa2a3c339f1f625415fecb Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 11 May 2018 18:24:12 +1000 Subject: [PATCH 087/970] video: uvesafb: Fix integer overflow in allocation commit 9f645bcc566a1e9f921bdae7528a01ced5bc3713 upstream. cmap->len can get close to INT_MAX/2, allowing for an integer overflow in allocation. This uses kmalloc_array() instead to catch the condition. Reported-by: Dr Silvio Cesare of InfoSect Fixes: 8bdb3a2d7df48 ("uvesafb: the driver core") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/uvesafb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/video/fbdev/uvesafb.c b/drivers/video/fbdev/uvesafb.c index 98af9e02959b..9fe0d0bcdf62 100644 --- a/drivers/video/fbdev/uvesafb.c +++ b/drivers/video/fbdev/uvesafb.c @@ -1059,7 +1059,8 @@ static int uvesafb_setcmap(struct fb_cmap *cmap, struct fb_info *info) info->cmap.len || cmap->start < info->cmap.start) return -EINVAL; - entries = kmalloc(sizeof(*entries) * cmap->len, GFP_KERNEL); + entries = kmalloc_array(cmap->len, sizeof(*entries), + GFP_KERNEL); if (!entries) return -ENOMEM; From c38bac75d1c21b09764a882b520b0ad33749ac10 Mon Sep 17 00:00:00 2001 From: Alexandr Savca Date: Thu, 21 Jun 2018 17:12:54 -0700 Subject: [PATCH 088/970] Input: elan_i2c - add ELAN0618 (Lenovo v330 15IKB) ACPI ID commit 8938fc7b8fe9ccfa11751ead502a8d385b607967 upstream. Add ELAN0618 to the list of supported touchpads; this ID is used in Lenovo v330 15IKB devices. Signed-off-by: Alexandr Savca Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elan_i2c_core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/mouse/elan_i2c_core.c b/drivers/input/mouse/elan_i2c_core.c index aeb8250ab079..2d6481f3e89b 100644 --- a/drivers/input/mouse/elan_i2c_core.c +++ b/drivers/input/mouse/elan_i2c_core.c @@ -1250,6 +1250,7 @@ static const struct acpi_device_id elan_acpi_id[] = { { "ELAN060C", 0 }, { "ELAN0611", 0 }, { "ELAN0612", 0 }, + { "ELAN0618", 0 }, { "ELAN1000", 0 }, { } }; From 037aca0e2fc22dcff17c38b38d9efab5e0d4b96e Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Thu, 26 Apr 2018 14:10:23 +0200 Subject: [PATCH 089/970] pwm: lpss: platform: Save/restore the ctrl register over a suspend/resume commit 1d375b58c12f08d8570b30b865def4734517f04f upstream. On some devices the contents of the ctrl register get lost over a suspend/resume and the PWM comes back up disabled after the resume. This is seen on some Bay Trail devices with the PWM in ACPI enumerated mode, so it shows up as a platform device instead of a PCI device. If we still think it is enabled and then try to change the duty-cycle after this, we end up with a "PWM_SW_UPDATE was not cleared" error and the PWM is stuck in that state from then on. This commit adds suspend and resume pm callbacks to the pwm-lpss-platform code, which save/restore the ctrl register over a suspend/resume, fixing this. Note that: 1) There is no need to do this over a runtime suspend, since we only runtime suspend when disabled and then we properly set the enable bit and reprogram the timings when we re-enable the PWM. 2) This may be happening on more systems then we realize, but has been covered up sofar by a bug in the acpi-lpss.c code which was save/restoring the regular device registers instead of the lpss private registers due to lpss_device_desc.prv_offset not being set. This is fixed by a later patch in this series. Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede Reviewed-by: Andy Shevchenko Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman --- drivers/pwm/pwm-lpss-platform.c | 5 +++++ drivers/pwm/pwm-lpss.c | 30 ++++++++++++++++++++++++++++++ drivers/pwm/pwm-lpss.h | 2 ++ 3 files changed, 37 insertions(+) diff --git a/drivers/pwm/pwm-lpss-platform.c b/drivers/pwm/pwm-lpss-platform.c index 54433fc6d1a4..e4eaefc2a2ef 100644 --- a/drivers/pwm/pwm-lpss-platform.c +++ b/drivers/pwm/pwm-lpss-platform.c @@ -52,6 +52,10 @@ static int pwm_lpss_remove_platform(struct platform_device *pdev) return pwm_lpss_remove(lpwm); } +static SIMPLE_DEV_PM_OPS(pwm_lpss_platform_pm_ops, + pwm_lpss_suspend, + pwm_lpss_resume); + static const struct acpi_device_id pwm_lpss_acpi_match[] = { { "80860F09", (unsigned long)&pwm_lpss_byt_info }, { "80862288", (unsigned long)&pwm_lpss_bsw_info }, @@ -64,6 +68,7 @@ static struct platform_driver pwm_lpss_driver_platform = { .driver = { .name = "pwm-lpss", .acpi_match_table = pwm_lpss_acpi_match, + .pm = &pwm_lpss_platform_pm_ops, }, .probe = pwm_lpss_probe_platform, .remove = pwm_lpss_remove_platform, diff --git a/drivers/pwm/pwm-lpss.c b/drivers/pwm/pwm-lpss.c index 72c0bce5a75c..5208b3f80ad8 100644 --- a/drivers/pwm/pwm-lpss.c +++ b/drivers/pwm/pwm-lpss.c @@ -31,10 +31,13 @@ /* Size of each PWM register space if multiple */ #define PWM_SIZE 0x400 +#define MAX_PWMS 4 + struct pwm_lpss_chip { struct pwm_chip chip; void __iomem *regs; const struct pwm_lpss_boardinfo *info; + u32 saved_ctrl[MAX_PWMS]; }; /* BayTrail */ @@ -168,6 +171,9 @@ struct pwm_lpss_chip *pwm_lpss_probe(struct device *dev, struct resource *r, unsigned long c; int ret; + if (WARN_ON(info->npwm > MAX_PWMS)) + return ERR_PTR(-ENODEV); + lpwm = devm_kzalloc(dev, sizeof(*lpwm), GFP_KERNEL); if (!lpwm) return ERR_PTR(-ENOMEM); @@ -203,6 +209,30 @@ int pwm_lpss_remove(struct pwm_lpss_chip *lpwm) } EXPORT_SYMBOL_GPL(pwm_lpss_remove); +int pwm_lpss_suspend(struct device *dev) +{ + struct pwm_lpss_chip *lpwm = dev_get_drvdata(dev); + int i; + + for (i = 0; i < lpwm->info->npwm; i++) + lpwm->saved_ctrl[i] = readl(lpwm->regs + i * PWM_SIZE + PWM); + + return 0; +} +EXPORT_SYMBOL_GPL(pwm_lpss_suspend); + +int pwm_lpss_resume(struct device *dev) +{ + struct pwm_lpss_chip *lpwm = dev_get_drvdata(dev); + int i; + + for (i = 0; i < lpwm->info->npwm; i++) + writel(lpwm->saved_ctrl[i], lpwm->regs + i * PWM_SIZE + PWM); + + return 0; +} +EXPORT_SYMBOL_GPL(pwm_lpss_resume); + MODULE_DESCRIPTION("PWM driver for Intel LPSS"); MODULE_AUTHOR("Mika Westerberg "); MODULE_LICENSE("GPL v2"); diff --git a/drivers/pwm/pwm-lpss.h b/drivers/pwm/pwm-lpss.h index 04766e0d41aa..27d5081ec218 100644 --- a/drivers/pwm/pwm-lpss.h +++ b/drivers/pwm/pwm-lpss.h @@ -31,5 +31,7 @@ extern const struct pwm_lpss_boardinfo pwm_lpss_bxt_info; struct pwm_lpss_chip *pwm_lpss_probe(struct device *dev, struct resource *r, const struct pwm_lpss_boardinfo *info); int pwm_lpss_remove(struct pwm_lpss_chip *lpwm); +int pwm_lpss_suspend(struct device *dev); +int pwm_lpss_resume(struct device *dev); #endif /* __PWM_LPSS_H */ From 1f00b1fc77752bb27b6b696fd7fceeda921ac0fa Mon Sep 17 00:00:00 2001 From: Dongsheng Yang Date: Mon, 4 Jun 2018 06:24:37 -0400 Subject: [PATCH 090/970] rbd: flush rbd_dev->watch_dwork after watch is unregistered commit 23edca864951250af845a11da86bb3ea63522ed2 upstream. There is a problem if we are going to unmap a rbd device and the watch_dwork is going to queue delayed work for watch: unmap Thread watch Thread timer do_rbd_remove cancel_tasks_sync(rbd_dev) queue_delayed_work for watch destroy_workqueue(rbd_dev->task_wq) drain_workqueue(wq) destroy other resources in wq call_timer_fn __queue_work() Then the delayed work escape the cancel_tasks_sync() and destroy_workqueue() and we will get an user-after-free call trace: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Modules linked in: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 4.17.0-rc6+ #13 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__queue_work+0x6a/0x3b0 RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086 RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000 RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00 R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970 R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0 Call Trace: ? __queue_work+0x3b0/0x3b0 call_timer_fn+0x2d/0x130 run_timer_softirq+0x16e/0x430 ? tick_sched_timer+0x37/0x70 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 [ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch() either bails out early because the watch is UNREGISTERED at that point or just gets cancelled. ] Cc: stable@vger.kernel.org Fixes: 99d1694310df ("rbd: retry watch re-registration periodically") Signed-off-by: Dongsheng Yang Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman --- drivers/block/rbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 343cad9bf7e7..ef3016a467a0 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3900,7 +3900,6 @@ static void cancel_tasks_sync(struct rbd_device *rbd_dev) { dout("%s rbd_dev %p\n", __func__, rbd_dev); - cancel_delayed_work_sync(&rbd_dev->watch_dwork); cancel_work_sync(&rbd_dev->acquired_lock_work); cancel_work_sync(&rbd_dev->released_lock_work); cancel_delayed_work_sync(&rbd_dev->lock_dwork); @@ -3918,6 +3917,7 @@ static void rbd_unregister_watch(struct rbd_device *rbd_dev) rbd_dev->watch_state = RBD_WATCH_STATE_UNREGISTERED; mutex_unlock(&rbd_dev->watch_mutex); + cancel_delayed_work_sync(&rbd_dev->watch_dwork); ceph_osdc_flush_notifies(&rbd_dev->rbd_client->client->osdc); } From 6d28f2d64cf5ddf6c55f7a3a954dd9a34b72862b Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 14 Jun 2018 15:26:24 -0700 Subject: [PATCH 091/970] mm: fix devmem_is_allowed() for sub-page System RAM intersections commit 2bdce74412c249ac01dfe36b6b0043ffd7a5361e upstream. Hussam reports: I was poking around and for no real reason, I did cat /dev/mem and strings /dev/mem. Then I saw the following warning in dmesg. I saved it and rebooted immediately. memremap attempted on mixed range 0x000000000009c000 size: 0x1000 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11810 at kernel/memremap.c:98 memremap+0x104/0x170 [..] Call Trace: xlate_dev_mem_ptr+0x25/0x40 read_mem+0x89/0x1a0 __vfs_read+0x36/0x170 The memremap() implementation checks for attempts to remap System RAM with MEMREMAP_WB and instead redirects those mapping attempts to the linear map. However, that only works if the physical address range being remapped is page aligned. In low memory we have situations like the following: 00000000-00000fff : Reserved 00001000-0009fbff : System RAM 0009fc00-0009ffff : Reserved ...where System RAM intersects Reserved ranges on a sub-page page granularity. Given that devmem_is_allowed() special cases any attempt to map System RAM in the first 1MB of memory, replace page_is_ram() with the more precise region_intersects() to trap attempts to map disallowed ranges. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199999 Link: http://lkml.kernel.org/r/152856436164.18127.2847888121707136898.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 92281dee825f ("arch: introduce memremap()") Signed-off-by: Dan Williams Reported-by: Hussam Al-Tayeb Tested-by: Hussam Al-Tayeb Cc: Christoph Hellwig Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index f92bdb9f4e46..ae9b84cae57c 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -653,7 +653,9 @@ void __init init_mem_mapping(void) */ int devmem_is_allowed(unsigned long pagenr) { - if (page_is_ram(pagenr)) { + if (region_intersects(PFN_PHYS(pagenr), PAGE_SIZE, + IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE) + != REGION_DISJOINT) { /* * For disallowed memory regions in the low 1MB range, * request that the page be shown as all zeros. From 3cac26f2a2c66f755e033ca944d02433be684556 Mon Sep 17 00:00:00 2001 From: Boris Ostrovsky Date: Thu, 21 Jun 2018 13:29:44 -0400 Subject: [PATCH 092/970] xen: Remove unnecessary BUG_ON from __unbind_from_irq() commit eef04c7b3786ff0c9cb1019278b6c6c2ea0ad4ff upstream. Commit 910f8befdf5b ("xen/pirq: fix error path cleanup when binding MSIs") fixed a couple of errors in error cleanup path of xen_bind_pirq_msi_to_irq(). This cleanup allowed a call to __unbind_from_irq() with an unbound irq, which would result in triggering the BUG_ON there. Since there is really no reason for the BUG_ON (xen_free_irq() can operate on unbound irqs) we can remove it. Reported-by: Ben Hutchings Signed-off-by: Boris Ostrovsky Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross Signed-off-by: Juergen Gross Signed-off-by: Greg Kroah-Hartman --- drivers/xen/events/events_base.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index 6d3b32ccc2c4..1435d8c58ea0 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -637,8 +637,6 @@ static void __unbind_from_irq(unsigned int irq) xen_irq_info_cleanup(info); } - BUG_ON(info_for_irq(irq)->type == IRQT_UNBOUND); - xen_free_irq(irq); } From 2a1b1234d0502237872f6a11016061328528b86d Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 13 Jun 2018 12:09:22 +0200 Subject: [PATCH 093/970] udf: Detect incorrect directory size commit fa65653e575fbd958bdf5fb9c4a71a324e39510d upstream. Detect when a directory entry is (possibly partially) beyond directory size and return EIO in that case since it means the filesystem is corrupted. Otherwise directory operations can further corrupt the directory and possibly also oops the kernel. CC: Anatoly Trosinenko CC: stable@vger.kernel.org Reported-and-tested-by: Anatoly Trosinenko Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/udf/directory.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/udf/directory.c b/fs/udf/directory.c index 988d5352bdb8..48ef184929ec 100644 --- a/fs/udf/directory.c +++ b/fs/udf/directory.c @@ -150,6 +150,9 @@ struct fileIdentDesc *udf_fileident_read(struct inode *dir, loff_t *nf_pos, sizeof(struct fileIdentDesc)); } } + /* Got last entry outside of dir size - fs is corrupted! */ + if (*nf_pos > dir->i_size) + return NULL; return fi; } From 54ae564b35423f25b763bc9e01b489d9f11c034e Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Tue, 19 Jun 2018 11:17:32 -0700 Subject: [PATCH 094/970] Input: elan_i2c_smbus - fix more potential stack buffer overflows commit 50fc7b61959af4b95fafce7fe5dd565199e0b61a upstream. Commit 40f7090bb1b4 ("Input: elan_i2c_smbus - fix corrupted stack") fixed most of the functions using i2c_smbus_read_block_data() to allocate a buffer with the maximum block size. However three functions were left unchanged: * In elan_smbus_initialize(), increase the buffer size in the same way. * In elan_smbus_calibrate_result(), the buffer is provided by the caller (calibrate_store()), so introduce a bounce buffer. Also name the result buffer size. * In elan_smbus_get_report(), the buffer is provided by the caller but happens to be the right length. Add a compile-time assertion to ensure this remains the case. Cc: # 3.19+ Signed-off-by: Ben Hutchings Reviewed-by: Benjamin Tissoires Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elan_i2c.h | 2 ++ drivers/input/mouse/elan_i2c_core.c | 2 +- drivers/input/mouse/elan_i2c_smbus.c | 10 ++++++++-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/input/mouse/elan_i2c.h b/drivers/input/mouse/elan_i2c.h index c0ec26118732..83dd0ce3ad2a 100644 --- a/drivers/input/mouse/elan_i2c.h +++ b/drivers/input/mouse/elan_i2c.h @@ -27,6 +27,8 @@ #define ETP_DISABLE_POWER 0x0001 #define ETP_PRESSURE_OFFSET 25 +#define ETP_CALIBRATE_MAX_LEN 3 + /* IAP Firmware handling */ #define ETP_PRODUCT_ID_FORMAT_STRING "%d.0" #define ETP_FW_NAME "elan_i2c_" ETP_PRODUCT_ID_FORMAT_STRING ".bin" diff --git a/drivers/input/mouse/elan_i2c_core.c b/drivers/input/mouse/elan_i2c_core.c index 2d6481f3e89b..97f6e05cffce 100644 --- a/drivers/input/mouse/elan_i2c_core.c +++ b/drivers/input/mouse/elan_i2c_core.c @@ -595,7 +595,7 @@ static ssize_t calibrate_store(struct device *dev, int tries = 20; int retval; int error; - u8 val[3]; + u8 val[ETP_CALIBRATE_MAX_LEN]; retval = mutex_lock_interruptible(&data->sysfs_mutex); if (retval) diff --git a/drivers/input/mouse/elan_i2c_smbus.c b/drivers/input/mouse/elan_i2c_smbus.c index 05b8695a6369..d21bd55f3c42 100644 --- a/drivers/input/mouse/elan_i2c_smbus.c +++ b/drivers/input/mouse/elan_i2c_smbus.c @@ -56,7 +56,7 @@ static int elan_smbus_initialize(struct i2c_client *client) { u8 check[ETP_SMBUS_HELLOPACKET_LEN] = { 0x55, 0x55, 0x55, 0x55, 0x55 }; - u8 values[ETP_SMBUS_HELLOPACKET_LEN] = { 0, 0, 0, 0, 0 }; + u8 values[I2C_SMBUS_BLOCK_MAX] = {0}; int len, error; /* Get hello packet */ @@ -117,12 +117,16 @@ static int elan_smbus_calibrate(struct i2c_client *client) static int elan_smbus_calibrate_result(struct i2c_client *client, u8 *val) { int error; + u8 buf[I2C_SMBUS_BLOCK_MAX] = {0}; + + BUILD_BUG_ON(ETP_CALIBRATE_MAX_LEN > sizeof(buf)); error = i2c_smbus_read_block_data(client, - ETP_SMBUS_CALIBRATE_QUERY, val); + ETP_SMBUS_CALIBRATE_QUERY, buf); if (error < 0) return error; + memcpy(val, buf, ETP_CALIBRATE_MAX_LEN); return 0; } @@ -470,6 +474,8 @@ static int elan_smbus_get_report(struct i2c_client *client, u8 *report) { int len; + BUILD_BUG_ON(I2C_SMBUS_BLOCK_MAX > ETP_SMBUS_REPORT_LEN); + len = i2c_smbus_read_block_data(client, ETP_SMBUS_PACKET_QUERY, &report[ETP_SMBUS_REPORT_OFFSET]); From 465e965f64358fdeeb35287bd7a6dc06b261fb7b Mon Sep 17 00:00:00 2001 From: Aaron Ma Date: Thu, 21 Jun 2018 17:14:01 -0700 Subject: [PATCH 095/970] Input: elantech - enable middle button of touchpads on ThinkPad P52 commit 24bb555e6e46d96e2a954aa0295029a81cc9bbaa upstream. PNPID is better way to identify the type of touchpads. Enable middle button support on 2 types of touchpads on Lenovo P52. Cc: stable@vger.kernel.org Signed-off-by: Aaron Ma Reviewed-by: Benjamin Tissoires Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elantech.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c index c519c0b09568..a6333f942e12 100644 --- a/drivers/input/mouse/elantech.c +++ b/drivers/input/mouse/elantech.c @@ -1173,6 +1173,12 @@ static const struct dmi_system_id elantech_dmi_has_middle_button[] = { { } }; +static const char * const middle_button_pnp_ids[] = { + "LEN2131", /* ThinkPad P52 w/ NFC */ + "LEN2132", /* ThinkPad P52 */ + NULL +}; + /* * Set the appropriate event bits for the input subsystem */ @@ -1192,7 +1198,8 @@ static int elantech_set_input_params(struct psmouse *psmouse) __clear_bit(EV_REL, dev->evbit); __set_bit(BTN_LEFT, dev->keybit); - if (dmi_check_system(elantech_dmi_has_middle_button)) + if (dmi_check_system(elantech_dmi_has_middle_button) || + psmouse_matches_pnp_id(psmouse, middle_button_pnp_ids)) __set_bit(BTN_MIDDLE, dev->keybit); __set_bit(BTN_RIGHT, dev->keybit); From 58d8103113ea911ed23cfd31c866f14c43773fb7 Mon Sep 17 00:00:00 2001 From: ??? Date: Thu, 21 Jun 2018 17:15:32 -0700 Subject: [PATCH 096/970] Input: elantech - fix V4 report decoding for module with middle key commit e0ae2519ca004a628fa55aeef969c37edce522d3 upstream. Some touchpad has middle key and it will be indicated in bit 2 of packet[0]. We need to fix V4 formation's byte mask to prevent error decoding. Signed-off-by: KT Liao Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elantech.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c index a6333f942e12..4e77adbfa835 100644 --- a/drivers/input/mouse/elantech.c +++ b/drivers/input/mouse/elantech.c @@ -800,7 +800,7 @@ static int elantech_packet_check_v4(struct psmouse *psmouse) else if (ic_version == 7 && etd->samples[1] == 0x2A) sanity_check = ((packet[3] & 0x1c) == 0x10); else - sanity_check = ((packet[0] & 0x0c) == 0x04 && + sanity_check = ((packet[0] & 0x08) == 0x00 && (packet[3] & 0x1c) == 0x10); if (!sanity_check) From 6008de291a2beab8028d04df9b0ea930e4fe4ce0 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 13 Jun 2018 12:43:10 +0200 Subject: [PATCH 097/970] ALSA: hda/realtek - Fix pop noise on Lenovo P50 & co commit d5a6cabf02210b896a60eee7c04c670ee9ba6dca upstream. Some Lenovo laptops, e.g. Lenovo P50, showed the pop noise at resume or runtime resume. It turned out to be reduced by applying alc_no_shutup() just like TPT440 quirk does. Since there are many Lenovo models showing the same behavior, put this workaround in ALC269_FIXUP_THINKPAD_ACPI entry so that it's applied commonly to all such Lenovo machines. Reported-by: Hans de Goede Tested-by: Benjamin Berg Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 183436e4a8c1..60b989e20203 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -4473,7 +4473,6 @@ static void alc_fixup_tpt440_dock(struct hda_codec *codec, struct alc_spec *spec = codec->spec; if (action == HDA_FIXUP_ACT_PRE_PROBE) { - spec->shutup = alc_no_shutup; /* reduce click noise */ spec->reboot_notify = alc_d3_at_reboot; /* reduce noise */ spec->parse_flags = HDA_PINCFG_NO_HP_FIXUP; codec->power_save_node = 0; /* avoid click noises */ @@ -4835,6 +4834,13 @@ static void alc280_fixup_hp_9480m(struct hda_codec *codec, /* for hda_fixup_thinkpad_acpi() */ #include "thinkpad_helper.c" +static void alc_fixup_thinkpad_acpi(struct hda_codec *codec, + const struct hda_fixup *fix, int action) +{ + alc_fixup_no_shutup(codec, fix, action); /* reduce click noise */ + hda_fixup_thinkpad_acpi(codec, fix, action); +} + /* for dell wmi mic mute led */ #include "dell_wmi_helper.c" @@ -5350,7 +5356,7 @@ static const struct hda_fixup alc269_fixups[] = { }, [ALC269_FIXUP_THINKPAD_ACPI] = { .type = HDA_FIXUP_FUNC, - .v.func = hda_fixup_thinkpad_acpi, + .v.func = alc_fixup_thinkpad_acpi, .chained = true, .chain_id = ALC269_FIXUP_SKU_IGNORE, }, From afd82d0757b37a346fbc557c5974c0f0de1bb0d2 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Fri, 22 Jun 2018 12:17:45 +0200 Subject: [PATCH 098/970] ALSA: hda/realtek - Add a quirk for FSC ESPRIMO U9210 commit 275ec0cb946cb75ac8977f662e608fce92f8b8a8 upstream. Fujitsu Seimens ESPRIMO Mobile U9210 requires the same fixup as H270 for the correct pin configs. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200107 Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 60b989e20203..f03a1430a3cb 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -2448,6 +2448,7 @@ static const struct snd_pci_quirk alc262_fixup_tbl[] = { SND_PCI_QUIRK(0x10cf, 0x1397, "Fujitsu Lifebook S7110", ALC262_FIXUP_FSC_S7110), SND_PCI_QUIRK(0x10cf, 0x142d, "Fujitsu Lifebook E8410", ALC262_FIXUP_BENQ), SND_PCI_QUIRK(0x10f1, 0x2915, "Tyan Thunder n6650W", ALC262_FIXUP_TYAN), + SND_PCI_QUIRK(0x1734, 0x1141, "FSC ESPRIMO U9210", ALC262_FIXUP_FSC_H270), SND_PCI_QUIRK(0x1734, 0x1147, "FSC Celsius H270", ALC262_FIXUP_FSC_H270), SND_PCI_QUIRK(0x17aa, 0x384e, "Lenovo 3000", ALC262_FIXUP_LENOVO_3000), SND_PCI_QUIRK(0x17ff, 0x0560, "Benq ED8", ALC262_FIXUP_BENQ), From 17057c59bd11911cff0f62715efcb556bab05ade Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Tue, 26 Jun 2018 09:14:58 -0600 Subject: [PATCH 099/970] block: Fix transfer when chunk sectors exceeds max commit 15bfd21fbc5d35834b9ea383dc458a1f0c9e3434 upstream. A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: Jitendra Bhivare Tested-by: Jitendra Bhivare Cc: Reviewed-by: Martin K. Petersen Signed-off-by: Keith Busch Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/blkdev.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f6a816129856..bd738aafd432 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -901,8 +901,8 @@ static inline unsigned int blk_max_size_offset(struct request_queue *q, if (!q->limits.chunk_sectors) return q->limits.max_sectors; - return q->limits.chunk_sectors - - (offset & (q->limits.chunk_sectors - 1)); + return min(q->limits.max_sectors, (unsigned int)(q->limits.chunk_sectors - + (offset & (q->limits.chunk_sectors - 1)))); } static inline unsigned int blk_rq_get_max_sectors(struct request *rq, From f2bc5d18d26350423446fcc073ad8750c20ec498 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Tue, 26 Jun 2018 12:04:23 -0400 Subject: [PATCH 100/970] dm thin: handle running out of data space vs concurrent discard commit a685557fbbc3122ed11e8ad3fa63a11ebc5de8c3 upstream. Discards issued to a DM thin device can complete to userspace (via fstrim) _before_ the metadata changes associated with the discards is reflected in the thinp superblock (e.g. free blocks). As such, if a user constructs a test that loops repeatedly over these steps, block allocation can fail due to discards not having completed yet: 1) fill thin device via filesystem file 2) remove file 3) fstrim From initial report, here: https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html "The root cause of this issue is that dm-thin will first remove mapping and increase corresponding blocks' reference count to prevent them from being reused before DISCARD bios get processed by the underlying layers. However. increasing blocks' reference count could also increase the nr_allocated_this_transaction in struct sm_disk which makes smd->old_ll.nr_allocated + smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks. In this case, alloc_data_block() will never commit metadata to reset the begin pointer of struct sm_disk, because sm_disk_get_nr_free() always return an underflow value." While there is room for improvement to the space-map accounting that thinp is making use of: the reality is this test is inherently racey and will result in the previous iteration's fstrim's discard(s) completing vs concurrent block allocation, via dd, in the next iteration of the loop. No amount of space map accounting improvements will be able to allow user's to use a block before a discard of that block has completed. So the best we can really do is allow DM thinp to gracefully handle such aggressive use of all the pool's data by degrading the pool into out-of-data-space (OODS) mode. We _should_ get that behaviour already (if space map accounting didn't falsely cause alloc_data_block() to believe free space was available).. but short of that we handle the current reality that dm_pool_alloc_data_block() can return -ENOSPC. Reported-by: Dennis Yang Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-thin.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 0b678b5da4c4..0f0374a4ac6e 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1384,6 +1384,8 @@ static void schedule_external_copy(struct thin_c *tc, dm_block_t virt_block, static void set_pool_mode(struct pool *pool, enum pool_mode new_mode); +static void requeue_bios(struct pool *pool); + static void check_for_space(struct pool *pool) { int r; @@ -1396,8 +1398,10 @@ static void check_for_space(struct pool *pool) if (r) return; - if (nr_free) + if (nr_free) { set_pool_mode(pool, PM_WRITE); + requeue_bios(pool); + } } /* @@ -1474,7 +1478,10 @@ static int alloc_data_block(struct thin_c *tc, dm_block_t *result) r = dm_pool_alloc_data_block(pool->pmd, result); if (r) { - metadata_operation_failed(pool, "dm_pool_alloc_data_block", r); + if (r == -ENOSPC) + set_pool_mode(pool, PM_OUT_OF_DATA_SPACE); + else + metadata_operation_failed(pool, "dm_pool_alloc_data_block", r); return r; } From 35fd10aeb2248cc7f8d3d48ccc2eff1cf19918f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Fri, 8 Jun 2018 09:15:24 +0200 Subject: [PATCH 101/970] cdc_ncm: avoid padding beyond end of skb MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 49c2c3f246e2fc3009039e31a826333dcd0283cd upstream. Commit 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end of NCM frame") added logic to reserve space for the NDP at the end of the NTB/skb. This reservation did not take the final alignment of the NDP into account, causing us to reserve too little space. Additionally the padding prior to NDP addition did not ensure there was enough space for the NDP. The NTB/skb with the NDP appended would then exceed the configured max size. This caused the final padding of the NTB to use a negative count, padding to almost INT_MAX, and resulting in: [60103.825970] BUG: unable to handle kernel paging request at ffff9641f2004000 [60103.825998] IP: __memset+0x24/0x30 [60103.826001] PGD a6a06067 P4D a6a06067 PUD 4f65a063 PMD 72003063 PTE 0 [60103.826013] Oops: 0002 [#1] SMP NOPTI [60103.826018] Modules linked in: (removed( [60103.826158] CPU: 0 PID: 5990 Comm: Chrome_DevTools Tainted: G O 4.14.0-3-amd64 #1 Debian 4.14.17-1 [60103.826162] Hardware name: LENOVO 20081 BIOS 41CN28WW(V2.04) 05/03/2012 [60103.826166] task: ffff964193484fc0 task.stack: ffffb2890137c000 [60103.826171] RIP: 0010:__memset+0x24/0x30 [60103.826174] RSP: 0000:ffff964316c03b68 EFLAGS: 00010216 [60103.826178] RAX: 0000000000000000 RBX: 00000000fffffffd RCX: 000000001ffa5000 [60103.826181] RDX: 0000000000000005 RSI: 0000000000000000 RDI: ffff9641f2003ffc [60103.826184] RBP: ffff964192f6c800 R08: 00000000304d434e R09: ffff9641f1d2c004 [60103.826187] R10: 0000000000000002 R11: 00000000000005ae R12: ffff9642e6957a80 [60103.826190] R13: ffff964282ff2ee8 R14: 000000000000000d R15: ffff9642e4843900 [60103.826194] FS: 00007f395aaf6700(0000) GS:ffff964316c00000(0000) knlGS:0000000000000000 [60103.826197] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [60103.826200] CR2: ffff9641f2004000 CR3: 0000000013b0c000 CR4: 00000000000006f0 [60103.826204] Call Trace: [60103.826212] [60103.826225] cdc_ncm_fill_tx_frame+0x5e3/0x740 [cdc_ncm] [60103.826236] cdc_ncm_tx_fixup+0x57/0x70 [cdc_ncm] [60103.826246] usbnet_start_xmit+0x5d/0x710 [usbnet] [60103.826254] ? netif_skb_features+0x119/0x250 [60103.826259] dev_hard_start_xmit+0xa1/0x200 [60103.826267] sch_direct_xmit+0xf2/0x1b0 [60103.826273] __dev_queue_xmit+0x5e3/0x7c0 [60103.826280] ? ip_finish_output2+0x263/0x3c0 [60103.826284] ip_finish_output2+0x263/0x3c0 [60103.826289] ? ip_output+0x6c/0xe0 [60103.826293] ip_output+0x6c/0xe0 [60103.826298] ? ip_forward_options+0x1a0/0x1a0 [60103.826303] tcp_transmit_skb+0x516/0x9b0 [60103.826309] tcp_write_xmit+0x1aa/0xee0 [60103.826313] ? sch_direct_xmit+0x71/0x1b0 [60103.826318] tcp_tasklet_func+0x177/0x180 [60103.826325] tasklet_action+0x5f/0x110 [60103.826332] __do_softirq+0xde/0x2b3 [60103.826337] irq_exit+0xae/0xb0 [60103.826342] do_IRQ+0x81/0xd0 [60103.826347] common_interrupt+0x98/0x98 [60103.826351] [60103.826355] RIP: 0033:0x7f397bdf2282 [60103.826358] RSP: 002b:00007f395aaf57d8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff6e [60103.826362] RAX: 0000000000000000 RBX: 00002f07bc6d0900 RCX: 00007f39752d7fe7 [60103.826365] RDX: 0000000000000022 RSI: 0000000000000147 RDI: 00002f07baea02c0 [60103.826368] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 [60103.826371] R10: 00000000ffffffff R11: 0000000000000000 R12: 00002f07baea02c0 [60103.826373] R13: 00002f07bba227a0 R14: 00002f07bc6d090c R15: 0000000000000000 [60103.826377] Code: 90 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 [60103.826442] RIP: __memset+0x24/0x30 RSP: ffff964316c03b68 [60103.826444] CR2: ffff9641f2004000 Commit e1069bbfcf3b ("net: cdc_ncm: Reduce memory use when kernel memory low") made this bug much more likely to trigger by reducing the NTB size under memory pressure. Link: https://bugs.debian.org/893393 Reported-by: Горбешко Богдан Reported-and-tested-by: Dennis Wassenberg Cc: Enrico Mioso Fixes: 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end of NCM frame") [ bmork: tx_curr_size => tx_max and context fixup for v4.12 and older ] Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/cdc_ncm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index feb61eaffe32..3086cae62fdc 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -1124,7 +1124,7 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign) * accordingly. Otherwise, we should check here. */ if (ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END) - delayed_ndp_size = ctx->max_ndp_size; + delayed_ndp_size = ALIGN(ctx->max_ndp_size, ctx->tx_ndp_modulus); else delayed_ndp_size = 0; @@ -1257,7 +1257,7 @@ cdc_ncm_fill_tx_frame(struct usbnet *dev, struct sk_buff *skb, __le32 sign) /* If requested, put NDP at end of frame. */ if (ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END) { nth16 = (struct usb_cdc_ncm_nth16 *)skb_out->data; - cdc_ncm_align_tail(skb_out, ctx->tx_ndp_modulus, 0, ctx->tx_max); + cdc_ncm_align_tail(skb_out, ctx->tx_ndp_modulus, 0, ctx->tx_max - ctx->max_ndp_size); nth16->wNdpIndex = cpu_to_le16(skb_out->len); memcpy(skb_put(skb_out, ctx->max_ndp_size), ctx->delayed_ndp16, ctx->max_ndp_size); From e692f66fab3019ca8f45463df165177505f38caa Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 3 Jul 2018 11:23:18 +0200 Subject: [PATCH 102/970] Linux 4.9.111 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 2fcfe1147eaa..b10646531fcd 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 110 +SUBLEVEL = 111 EXTRAVERSION = NAME = Roaring Lionus From e8aa3b401dd803e6ea444803e2e5a29efb6c06cc Mon Sep 17 00:00:00 2001 From: Houston Yaroschoff Date: Mon, 11 Jun 2018 12:39:09 +0200 Subject: [PATCH 103/970] usb: cdc_acm: Add quirk for Uniden UBC125 scanner commit 4a762569a2722b8a48066c7bacf0e1dc67d17fa1 upstream. Uniden UBC125 radio scanner has USB interface which fails to work with cdc_acm driver: usb 1-1.5: new full-speed USB device number 4 using xhci_hcd cdc_acm 1-1.5:1.0: Zero length descriptor references cdc_acm: probe of 1-1.5:1.0 failed with error -22 Adding the NO_UNION_NORMAL quirk for the device fixes the issue: usb 1-4: new full-speed USB device number 15 using xhci_hcd usb 1-4: New USB device found, idVendor=1965, idProduct=0018 usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 usb 1-4: Product: UBC125XLT usb 1-4: Manufacturer: Uniden Corp. usb 1-4: SerialNumber: 0001 cdc_acm 1-4:1.0: ttyACM0: USB ACM device `lsusb -v` of the device: Bus 001 Device 015: ID 1965:0018 Uniden Corporation Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 2 Communications bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x1965 Uniden Corporation idProduct 0x0018 bcdDevice 0.01 iManufacturer 1 Uniden Corp. iProduct 2 UBC125XLT iSerial 3 0001 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 48 bNumInterfaces 2 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 500mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 2 Communications bInterfaceSubClass 2 Abstract (modem) bInterfaceProtocol 0 None iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x87 EP 7 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 10 CDC Data bInterfaceSubClass 0 Unused bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 0 Device Status: 0x0000 (Bus Powered) Signed-off-by: Houston Yaroschoff Cc: stable Acked-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-acm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index fe22ac7c760a..08bef18372ea 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1712,6 +1712,9 @@ static const struct usb_device_id acm_ids[] = { { USB_DEVICE(0x11ca, 0x0201), /* VeriFone Mx870 Gadget Serial */ .driver_info = SINGLE_RX_URB, }, + { USB_DEVICE(0x1965, 0x0018), /* Uniden UBC125XLT */ + .driver_info = NO_UNION_NORMAL, /* has no union descriptor */ + }, { USB_DEVICE(0x22b8, 0x7000), /* Motorola Q Phone */ .driver_info = NO_UNION_NORMAL, /* has no union descriptor */ }, From b9a0ce3b8422a1043192bf6b9ba6106a06fc0d43 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 18 Jun 2018 10:24:03 +0200 Subject: [PATCH 104/970] USB: serial: cp210x: add CESINEL device ids commit 24160628a34af962ac99f2f58e547ac3c4cbd26f upstream. Add device ids for CESINEL products. Reported-by: Carlos Barcala Lara Cc: stable Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/cp210x.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index 46b4dea7a0ec..b6e4d3414045 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -92,6 +92,9 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0x8156) }, /* B&G H3000 link cable */ { USB_DEVICE(0x10C4, 0x815E) }, /* Helicomm IP-Link 1220-DVM */ { USB_DEVICE(0x10C4, 0x815F) }, /* Timewave HamLinkUSB */ + { USB_DEVICE(0x10C4, 0x817C) }, /* CESINEL MEDCAL N Power Quality Monitor */ + { USB_DEVICE(0x10C4, 0x817D) }, /* CESINEL MEDCAL NT Power Quality Monitor */ + { USB_DEVICE(0x10C4, 0x817E) }, /* CESINEL MEDCAL S Power Quality Monitor */ { USB_DEVICE(0x10C4, 0x818B) }, /* AVIT Research USB to TTL */ { USB_DEVICE(0x10C4, 0x819F) }, /* MJS USB Toslink Switcher */ { USB_DEVICE(0x10C4, 0x81A6) }, /* ThinkOptics WavIt */ @@ -109,6 +112,9 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0x826B) }, /* Cygnal Integrated Products, Inc., Fasttrax GPS demonstration module */ { USB_DEVICE(0x10C4, 0x8281) }, /* Nanotec Plug & Drive */ { USB_DEVICE(0x10C4, 0x8293) }, /* Telegesis ETRX2USB */ + { USB_DEVICE(0x10C4, 0x82EF) }, /* CESINEL FALCO 6105 AC Power Supply */ + { USB_DEVICE(0x10C4, 0x82F1) }, /* CESINEL MEDCAL EFD Earth Fault Detector */ + { USB_DEVICE(0x10C4, 0x82F2) }, /* CESINEL MEDCAL ST Network Analyzer */ { USB_DEVICE(0x10C4, 0x82F4) }, /* Starizona MicroTouch */ { USB_DEVICE(0x10C4, 0x82F9) }, /* Procyon AVS */ { USB_DEVICE(0x10C4, 0x8341) }, /* Siemens MC35PU GPRS Modem */ @@ -121,7 +127,9 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0x8470) }, /* Juniper Networks BX Series System Console */ { USB_DEVICE(0x10C4, 0x8477) }, /* Balluff RFID */ { USB_DEVICE(0x10C4, 0x84B6) }, /* Starizona Hyperion */ + { USB_DEVICE(0x10C4, 0x851E) }, /* CESINEL MEDCAL PT Network Analyzer */ { USB_DEVICE(0x10C4, 0x85A7) }, /* LifeScan OneTouch Verio IQ */ + { USB_DEVICE(0x10C4, 0x85B8) }, /* CESINEL ReCon T Energy Logger */ { USB_DEVICE(0x10C4, 0x85EA) }, /* AC-Services IBUS-IF */ { USB_DEVICE(0x10C4, 0x85EB) }, /* AC-Services CIS-IBUS */ { USB_DEVICE(0x10C4, 0x85F8) }, /* Virtenio Preon32 */ @@ -131,10 +139,13 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0x8857) }, /* CEL EM357 ZigBee USB Stick */ { USB_DEVICE(0x10C4, 0x88A4) }, /* MMB Networks ZigBee USB Device */ { USB_DEVICE(0x10C4, 0x88A5) }, /* Planet Innovation Ingeni ZigBee USB Device */ + { USB_DEVICE(0x10C4, 0x88FB) }, /* CESINEL MEDCAL STII Network Analyzer */ + { USB_DEVICE(0x10C4, 0x8938) }, /* CESINEL MEDCAL S II Network Analyzer */ { USB_DEVICE(0x10C4, 0x8946) }, /* Ketra N1 Wireless Interface */ { USB_DEVICE(0x10C4, 0x8962) }, /* Brim Brothers charging dock */ { USB_DEVICE(0x10C4, 0x8977) }, /* CEL MeshWorks DevKit Device */ { USB_DEVICE(0x10C4, 0x8998) }, /* KCF Technologies PRN */ + { USB_DEVICE(0x10C4, 0x89A4) }, /* CESINEL FTBC Flexible Thyristor Bridge Controller */ { USB_DEVICE(0x10C4, 0x8A2A) }, /* HubZ dual ZigBee and Z-Wave dongle */ { USB_DEVICE(0x10C4, 0x8A5E) }, /* CEL EM3588 ZigBee USB Stick Long Range */ { USB_DEVICE(0x10C4, 0x8B34) }, /* Qivicon ZigBee USB Radio Stick */ From 1b9f7d27057bb310e97aa5602da53cc6fff86b99 Mon Sep 17 00:00:00 2001 From: Karoly Pados Date: Sat, 9 Jun 2018 13:26:08 +0200 Subject: [PATCH 105/970] USB: serial: cp210x: add Silicon Labs IDs for Windows Update commit 2f839823382748664b643daa73f41ee0cc01ced6 upstream. Silicon Labs defines alternative VID/PID pairs for some chips that when used will automatically install drivers for Windows users without manual intervention. Unfortunately, these IDs are not recognized by the Linux module, so using these IDs improves user experience on one platform but degrades it on Linux. This patch addresses this problem. Signed-off-by: Karoly Pados Cc: stable Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/cp210x.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index b6e4d3414045..6f2c77a7c08e 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -151,8 +151,11 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0x8B34) }, /* Qivicon ZigBee USB Radio Stick */ { USB_DEVICE(0x10C4, 0xEA60) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA61) }, /* Silicon Labs factory default */ + { USB_DEVICE(0x10C4, 0xEA63) }, /* Silicon Labs Windows Update (CP2101-4/CP2102N) */ { USB_DEVICE(0x10C4, 0xEA70) }, /* Silicon Labs factory default */ { USB_DEVICE(0x10C4, 0xEA71) }, /* Infinity GPS-MIC-1 Radio Monophone */ + { USB_DEVICE(0x10C4, 0xEA7A) }, /* Silicon Labs Windows Update (CP2105) */ + { USB_DEVICE(0x10C4, 0xEA7B) }, /* Silicon Labs Windows Update (CP2108) */ { USB_DEVICE(0x10C4, 0xF001) }, /* Elan Digital Systems USBscope50 */ { USB_DEVICE(0x10C4, 0xF002) }, /* Elan Digital Systems USBwave12 */ { USB_DEVICE(0x10C4, 0xF003) }, /* Elan Digital Systems USBpulse100 */ From 42525f7a25067b78dfa8ed4f899ebcec6636b476 Mon Sep 17 00:00:00 2001 From: William Wu Date: Mon, 21 May 2018 18:12:00 +0800 Subject: [PATCH 106/970] usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub commit 8760675932ddb614e83702117d36ea644050c609 upstream. The dwc2_get_ls_map() use ttport to reference into the bitmap if we're on a multi_tt hub. But the bitmaps index from 0 to (hub->maxchild - 1), while the ttport index from 1 to hub->maxchild. This will cause invalid memory access when the number of ttport is hub->maxchild. Without this patch, I can easily meet a Kernel panic issue if connect a low-speed USB mouse with the max port of FE2.1 multi-tt hub (1a40:0201) on rk3288 platform. Fixes: 9f9f09b048f5 ("usb: dwc2: host: Totally redo the microframe scheduler") Cc: Reviewed-by: Douglas Anderson Acked-by: Minas Harutyunyan hminas@synopsys.com> Signed-off-by: William Wu Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc2/hcd_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c index 13754353251f..9669184fb1fe 100644 --- a/drivers/usb/dwc2/hcd_queue.c +++ b/drivers/usb/dwc2/hcd_queue.c @@ -479,7 +479,7 @@ static unsigned long *dwc2_get_ls_map(struct dwc2_hsotg *hsotg, /* Get the map and adjust if this is a multi_tt hub */ map = qh->dwc_tt->periodic_bitmaps; if (qh->dwc_tt->usb_tt->multi) - map += DWC2_ELEMENTS_PER_LS_BITMAP * qh->ttport; + map += DWC2_ELEMENTS_PER_LS_BITMAP * (qh->ttport - 1); return map; } From 947dead99ef1f1ee0aa41b3ccde99bbd9900cd9d Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sat, 26 May 2018 09:53:13 +0900 Subject: [PATCH 107/970] n_tty: Fix stall at n_tty_receive_char_special(). commit 3d63b7e4ae0dc5e02d28ddd2fa1f945defc68d81 upstream. syzbot is reporting stalls at n_tty_receive_char_special() [1]. This is because comparison is not working as expected since ldata->read_head can change at any moment. Mitigate this by explicitly masking with buffer size when checking condition for "while" loops. [1] https://syzkaller.appspot.com/bug?id=3d7481a346958d9469bebbeb0537d5f056bdd6e8 Signed-off-by: Tetsuo Handa Reported-by: syzbot Fixes: bc5a5e3f45d04784 ("n_tty: Don't wrap input buffer indices at buffer size") Cc: stable Cc: Peter Hurley Signed-off-by: Greg Kroah-Hartman --- drivers/tty/n_tty.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 1c70541a1467..a8a7a13c8683 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -126,6 +126,8 @@ struct n_tty_data { struct mutex output_lock; }; +#define MASK(x) ((x) & (N_TTY_BUF_SIZE - 1)) + static inline size_t read_cnt(struct n_tty_data *ldata) { return ldata->read_head - ldata->read_tail; @@ -980,14 +982,15 @@ static void eraser(unsigned char c, struct tty_struct *tty) } seen_alnums = 0; - while (ldata->read_head != ldata->canon_head) { + while (MASK(ldata->read_head) != MASK(ldata->canon_head)) { head = ldata->read_head; /* erase a single possibly multibyte character */ do { head--; c = read_buf(ldata, head); - } while (is_continuation(c, tty) && head != ldata->canon_head); + } while (is_continuation(c, tty) && + MASK(head) != MASK(ldata->canon_head)); /* do not partially erase */ if (is_continuation(c, tty)) @@ -1029,7 +1032,7 @@ static void eraser(unsigned char c, struct tty_struct *tty) * This info is used to go back the correct * number of columns. */ - while (tail != ldata->canon_head) { + while (MASK(tail) != MASK(ldata->canon_head)) { tail--; c = read_buf(ldata, tail); if (c == '\t') { @@ -1304,7 +1307,7 @@ n_tty_receive_char_special(struct tty_struct *tty, unsigned char c) finish_erasing(ldata); echo_char(c, tty); echo_char_raw('\n', ldata); - while (tail != ldata->read_head) { + while (MASK(tail) != MASK(ldata->read_head)) { echo_char(read_buf(ldata, tail), tty); tail++; } @@ -2413,7 +2416,7 @@ static unsigned long inq_canon(struct n_tty_data *ldata) tail = ldata->read_tail; nr = head - tail; /* Skip EOF-chars.. */ - while (head != tail) { + while (MASK(head) != MASK(tail)) { if (test_bit(tail & (N_TTY_BUF_SIZE - 1), ldata->read_flags) && read_buf(ldata, tail) == __DISABLED_CHAR) nr--; From 9264e9864a6175259cf94177facbbe1a67b6ce49 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sat, 26 May 2018 09:53:14 +0900 Subject: [PATCH 108/970] n_tty: Access echo_* variables carefully. commit ebec3f8f5271139df618ebdf8427e24ba102ba94 upstream. syzbot is reporting stalls at __process_echoes() [1]. This is because since ldata->echo_commit < ldata->echo_tail becomes true for some reason, the discard loop is serving as almost infinite loop. This patch tries to avoid falling into ldata->echo_commit < ldata->echo_tail situation by making access to echo_* variables more carefully. Since reset_buffer_flags() is called without output_lock held, it should not touch echo_* variables. And omit a call to reset_buffer_flags() from n_tty_open() by using vzalloc(). Since add_echo_byte() is called without output_lock held, it needs memory barrier between storing into echo_buf[] and incrementing echo_head counter. echo_buf() needs corresponding memory barrier before reading echo_buf[]. Lack of handling the possibility of not-yet-stored multi-byte operation might be the reason of falling into ldata->echo_commit < ldata->echo_tail situation, for if I do WARN_ON(ldata->echo_commit == tail + 1) prior to echo_buf(ldata, tail + 1), the WARN_ON() fires. Also, explicitly masking with buffer for the former "while" loop, and use ldata->echo_commit > tail for the latter "while" loop. [1] https://syzkaller.appspot.com/bug?id=17f23b094cd80df750e5b0f8982c521ee6bcbf40 Signed-off-by: Tetsuo Handa Reported-by: syzbot Cc: Peter Hurley Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/n_tty.c | 42 ++++++++++++++++++++++++------------------ 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index a8a7a13c8683..0475f9685a41 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -145,6 +145,7 @@ static inline unsigned char *read_buf_addr(struct n_tty_data *ldata, size_t i) static inline unsigned char echo_buf(struct n_tty_data *ldata, size_t i) { + smp_rmb(); /* Matches smp_wmb() in add_echo_byte(). */ return ldata->echo_buf[i & (N_TTY_BUF_SIZE - 1)]; } @@ -320,9 +321,7 @@ static inline void put_tty_queue(unsigned char c, struct n_tty_data *ldata) static void reset_buffer_flags(struct n_tty_data *ldata) { ldata->read_head = ldata->canon_head = ldata->read_tail = 0; - ldata->echo_head = ldata->echo_tail = ldata->echo_commit = 0; ldata->commit_head = 0; - ldata->echo_mark = 0; ldata->line_start = 0; ldata->erasing = 0; @@ -621,12 +620,19 @@ static size_t __process_echoes(struct tty_struct *tty) old_space = space = tty_write_room(tty); tail = ldata->echo_tail; - while (ldata->echo_commit != tail) { + while (MASK(ldata->echo_commit) != MASK(tail)) { c = echo_buf(ldata, tail); if (c == ECHO_OP_START) { unsigned char op; int no_space_left = 0; + /* + * Since add_echo_byte() is called without holding + * output_lock, we might see only portion of multi-byte + * operation. + */ + if (MASK(ldata->echo_commit) == MASK(tail + 1)) + goto not_yet_stored; /* * If the buffer byte is the start of a multi-byte * operation, get the next byte, which is either the @@ -638,6 +644,8 @@ static size_t __process_echoes(struct tty_struct *tty) unsigned int num_chars, num_bs; case ECHO_OP_ERASE_TAB: + if (MASK(ldata->echo_commit) == MASK(tail + 2)) + goto not_yet_stored; num_chars = echo_buf(ldata, tail + 2); /* @@ -732,7 +740,8 @@ static size_t __process_echoes(struct tty_struct *tty) /* If the echo buffer is nearly full (so that the possibility exists * of echo overrun before the next commit), then discard enough * data at the tail to prevent a subsequent overrun */ - while (ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) { + while (ldata->echo_commit > tail && + ldata->echo_commit - tail >= ECHO_DISCARD_WATERMARK) { if (echo_buf(ldata, tail) == ECHO_OP_START) { if (echo_buf(ldata, tail + 1) == ECHO_OP_ERASE_TAB) tail += 3; @@ -742,6 +751,7 @@ static size_t __process_echoes(struct tty_struct *tty) tail++; } + not_yet_stored: ldata->echo_tail = tail; return old_space - space; } @@ -752,6 +762,7 @@ static void commit_echoes(struct tty_struct *tty) size_t nr, old, echoed; size_t head; + mutex_lock(&ldata->output_lock); head = ldata->echo_head; ldata->echo_mark = head; old = ldata->echo_commit - ldata->echo_tail; @@ -760,10 +771,12 @@ static void commit_echoes(struct tty_struct *tty) * is over the threshold (and try again each time another * block is accumulated) */ nr = head - ldata->echo_tail; - if (nr < ECHO_COMMIT_WATERMARK || (nr % ECHO_BLOCK > old % ECHO_BLOCK)) + if (nr < ECHO_COMMIT_WATERMARK || + (nr % ECHO_BLOCK > old % ECHO_BLOCK)) { + mutex_unlock(&ldata->output_lock); return; + } - mutex_lock(&ldata->output_lock); ldata->echo_commit = head; echoed = __process_echoes(tty); mutex_unlock(&ldata->output_lock); @@ -814,7 +827,9 @@ static void flush_echoes(struct tty_struct *tty) static inline void add_echo_byte(unsigned char c, struct n_tty_data *ldata) { - *echo_buf_addr(ldata, ldata->echo_head++) = c; + *echo_buf_addr(ldata, ldata->echo_head) = c; + smp_wmb(); /* Matches smp_rmb() in echo_buf(). */ + ldata->echo_head++; } /** @@ -1883,30 +1898,21 @@ static int n_tty_open(struct tty_struct *tty) struct n_tty_data *ldata; /* Currently a malloc failure here can panic */ - ldata = vmalloc(sizeof(*ldata)); + ldata = vzalloc(sizeof(*ldata)); if (!ldata) - goto err; + return -ENOMEM; ldata->overrun_time = jiffies; mutex_init(&ldata->atomic_read_lock); mutex_init(&ldata->output_lock); tty->disc_data = ldata; - reset_buffer_flags(tty->disc_data); - ldata->column = 0; - ldata->canon_column = 0; - ldata->num_overrun = 0; - ldata->no_room = 0; - ldata->lnext = 0; tty->closing = 0; /* indicate buffer work may resume */ clear_bit(TTY_LDISC_HALTED, &tty->flags); n_tty_set_termios(tty, NULL); tty_unthrottle(tty); - return 0; -err: - return -ENOMEM; } static inline int input_available_p(struct tty_struct *tty, int poll) From 06bef9eebec3aa9543ac6196d983632d5fafe6f0 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 11 Jun 2018 11:06:53 -0700 Subject: [PATCH 109/970] staging: android: ion: Return an ERR_PTR in ion_map_kernel commit 0a2bc00341dcfcc793c0dbf4f8d43adf60458b05 upstream. The expected return value from ion_map_kernel is an ERR_PTR. The error path for a vmalloc failure currently just returns NULL, triggering a warning in ion_buffer_kmap_get. Encode the vmalloc failure as an ERR_PTR. Reported-by: syzbot+55b1d9f811650de944c6@syzkaller.appspotmail.com Signed-off-by: Laura Abbott Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/staging/android/ion/ion_heap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/android/ion/ion_heap.c b/drivers/staging/android/ion/ion_heap.c index 4e5c0f17f579..c2a7cb95725b 100644 --- a/drivers/staging/android/ion/ion_heap.c +++ b/drivers/staging/android/ion/ion_heap.c @@ -38,7 +38,7 @@ void *ion_heap_map_kernel(struct ion_heap *heap, struct page **tmp = pages; if (!pages) - return NULL; + return ERR_PTR(-ENOMEM); if (buffer->flags & ION_FLAG_CACHED) pgprot = PAGE_KERNEL; From 3bf351b89186181f9869612bd502ff115209abf0 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 14 Jun 2018 12:23:09 +0200 Subject: [PATCH 110/970] vt: prevent leaking uninitialized data to userspace via /dev/vcs* commit 21eff69aaaa0e766ca0ce445b477698dc6a9f55a upstream. KMSAN reported an infoleak when reading from /dev/vcs*: BUG: KMSAN: kernel-infoleak in vcs_read+0x18ba/0x1cc0 Call Trace: ... kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1253 copy_to_user ./include/linux/uaccess.h:184 vcs_read+0x18ba/0x1cc0 drivers/tty/vt/vc_screen.c:352 __vfs_read+0x1b2/0x9d0 fs/read_write.c:416 vfs_read+0x36c/0x6b0 fs/read_write.c:452 ... Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 __kmalloc+0x13a/0x350 mm/slub.c:3818 kmalloc ./include/linux/slab.h:517 vc_allocate+0x438/0x800 drivers/tty/vt/vt.c:787 con_install+0x8c/0x640 drivers/tty/vt/vt.c:2880 tty_driver_install_tty drivers/tty/tty_io.c:1224 tty_init_dev+0x1b5/0x1020 drivers/tty/tty_io.c:1324 tty_open_by_driver drivers/tty/tty_io.c:1959 tty_open+0x17b4/0x2ed0 drivers/tty/tty_io.c:2007 chrdev_open+0xc25/0xd90 fs/char_dev.c:417 do_dentry_open+0xccc/0x1440 fs/open.c:794 vfs_open+0x1b6/0x2f0 fs/open.c:908 ... Bytes 0-79 of 240 are uninitialized Consistently allocating |vc_screenbuf| with kzalloc() fixes the problem Reported-by: syzbot+17a8efdf800000@syzkaller.appspotmail.com Signed-off-by: Alexander Potapenko Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/vt/vt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 9e1ac58e269e..9d3e413f48c6 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -785,7 +785,7 @@ int vc_allocate(unsigned int currcons) /* return 0 on success */ if (!*vc->vc_uni_pagedir_loc) con_set_default_unimap(vc); - vc->vc_screenbuf = kmalloc(vc->vc_screenbuf_size, GFP_KERNEL); + vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_KERNEL); if (!vc->vc_screenbuf) goto err_free; @@ -872,7 +872,7 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc, if (new_screen_size > (4 << 20)) return -EINVAL; - newscreen = kmalloc(new_screen_size, GFP_USER); + newscreen = kzalloc(new_screen_size, GFP_USER); if (!newscreen) return -ENOMEM; From e581746bc737ce8d72982359bf479f19d84bee09 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Tue, 18 Apr 2017 20:38:35 +0200 Subject: [PATCH 111/970] i2c: rcar: fix resume by always initializing registers before transfer commit ae481cc139658e89eb3ea671dd00b67bd87f01a3 upstream. Resume failed because of uninitialized registers. Instead of adding a resume callback, we simply initialize registers before every transfer. This lightweight change is more robust and will keep us safe if we ever need support for power domains or dynamic frequency changes. Signed-off-by: Wolfram Sang Acked-by: Kuninori Morimoto Signed-off-by: Wolfram Sang Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-rcar.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 726615e54f2a..c7592fe30e6e 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -700,6 +700,8 @@ static int rcar_i2c_master_xfer(struct i2c_adapter *adap, pm_runtime_get_sync(dev); + rcar_i2c_init(priv); + ret = rcar_i2c_bus_barrier(priv); if (ret < 0) goto out; @@ -857,8 +859,6 @@ static int rcar_i2c_probe(struct platform_device *pdev) if (ret < 0) goto out_pm_put; - rcar_i2c_init(priv); - /* Don't suspend when multi-master to keep arbitration working */ if (of_property_read_bool(dev->of_node, "multi-master")) priv->flags |= ID_P_PM_BLOCKED; From 58d7ac7d3078e64c787350985c76bdd14f2d3557 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Tue, 19 Jun 2018 18:47:52 +0100 Subject: [PATCH 112/970] ipv4: Fix error return value in fib_convert_metrics() The validation code modified by commit 5b5e7a0de2bb ("net: metrics: add proper netlink validation") is organised differently in older kernel versions. The fib_convert_metrics() function that is modified in the backports to 4.4 and 4.9 needs to returns an error code, not a success flag. Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_semantics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index d476b7950adf..a88dab33cdf6 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -980,7 +980,7 @@ fib_convert_metrics(struct fib_info *fi, const struct fib_config *cfg) return -EINVAL; } else { if (nla_len(nla) != sizeof(u32)) - return false; + return -EINVAL; val = nla_get_u32(nla); } if (type == RTAX_ADVMSS && val > 65535 - 40) From 8391d38ca80ed3a6c0c769196351cb496b353aa8 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Wed, 29 Mar 2017 14:00:25 +0900 Subject: [PATCH 113/970] kprobes/x86: Do not modify singlestep buffer while resuming commit 804dec5bda9b4fcdab5f67fe61db4a0498af5221 upstream. Do not modify singlestep execution buffer (kprobe.ainsn.insn) while resuming from single-stepping, instead, modifies the buffer to add a jump back instruction at preparing buffer. Signed-off-by: Masami Hiramatsu Cc: Ananth N Mavinakayanahalli Cc: Andrey Ryabinin Cc: Anil S Keshavamurthy Cc: Borislav Petkov Cc: Brian Gerst Cc: David S . Miller Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ye Xiaolong Link: http://lkml.kernel.org/r/149076361560.22469.1610155860343077495.stgit@devbox Signed-off-by: Ingo Molnar Reviewed-by: "Steven Rostedt (VMware)" Signed-off-by: Alexey Makhalov Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/kprobes/core.c | 42 ++++++++++++++++------------------ 1 file changed, 20 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 91c48cdfe81f..516be613bd41 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -414,25 +414,38 @@ void free_insn_page(void *page) module_memfree(page); } +/* Prepare reljump right after instruction to boost */ +static void prepare_boost(struct kprobe *p, int length) +{ + if (can_boost(p->ainsn.insn, p->addr) && + MAX_INSN_SIZE - length >= RELATIVEJUMP_SIZE) { + /* + * These instructions can be executed directly if it + * jumps back to correct address. + */ + synthesize_reljump(p->ainsn.insn + length, p->addr + length); + p->ainsn.boostable = 1; + } else { + p->ainsn.boostable = -1; + } +} + static int arch_copy_kprobe(struct kprobe *p) { - int ret; + int len; set_memory_rw((unsigned long)p->ainsn.insn & PAGE_MASK, 1); /* Copy an instruction with recovering if other optprobe modifies it.*/ - ret = __copy_instruction(p->ainsn.insn, p->addr); - if (!ret) + len = __copy_instruction(p->ainsn.insn, p->addr); + if (!len) return -EINVAL; /* * __copy_instruction can modify the displacement of the instruction, * but it doesn't affect boostable check. */ - if (can_boost(p->ainsn.insn, p->addr)) - p->ainsn.boostable = 0; - else - p->ainsn.boostable = -1; + prepare_boost(p, len); set_memory_ro((unsigned long)p->ainsn.insn & PAGE_MASK, 1); @@ -897,21 +910,6 @@ static void resume_execution(struct kprobe *p, struct pt_regs *regs, break; } - if (p->ainsn.boostable == 0) { - if ((regs->ip > copy_ip) && - (regs->ip - copy_ip) + 5 < MAX_INSN_SIZE) { - /* - * These instructions can be executed directly if it - * jumps back to correct address. - */ - synthesize_reljump((void *)regs->ip, - (void *)orig_ip + (regs->ip - copy_ip)); - p->ainsn.boostable = 1; - } else { - p->ainsn.boostable = -1; - } - } - regs->ip += orig_ip - copy_ip; no_change: From 440bf5ac49c578a0c56070573a28a522a87e75df Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Mon, 11 Jun 2018 22:16:33 +0900 Subject: [PATCH 114/970] netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain() commit adc972c5b88829d38ede08b1069718661c7330ae upstream. When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain crashes. But there is no need to crash hard here. Suggested-by: Florian Westphal Signed-off-by: Taehee Yoo Acked-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_tables_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index 0dd5c695482f..9d593ecd8e87 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -185,7 +185,8 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv) switch (regs.verdict.code) { case NFT_JUMP: - BUG_ON(stackptr >= NFT_JUMP_STACK_SIZE); + if (WARN_ON_ONCE(stackptr >= NFT_JUMP_STACK_SIZE)) + return NF_DROP; jumpstack[stackptr].chain = chain; jumpstack[stackptr].rule = rule; jumpstack[stackptr].rulenum = rulenum; From 5b8fcc075714b86fb8fe194bb463fed2998a8e85 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Tue, 6 Jun 2017 11:34:06 -0400 Subject: [PATCH 115/970] Revert "sit: reload iphdr in ipip6_rcv" commit f4eb17e1efe538d4da7d574bedb00a8dafcc26b7 upstream. This reverts commit b699d0035836f6712917a41e7ae58d84359b8ff9. As per Eric Dumazet, the pskb_may_pull() is a NOP in this particular case, so the 'iph' reload is unnecessary. Signed-off-by: David S. Miller Cc: Luca Boccassi Signed-off-by: Greg Kroah-Hartman --- net/ipv6/sit.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index ae0485d776f4..fc7ca1e46908 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -659,7 +659,6 @@ static int ipip6_rcv(struct sk_buff *skb) if (iptunnel_pull_header(skb, 0, htons(ETH_P_IPV6), !net_eq(tunnel->net, dev_net(tunnel->dev)))) goto out; - iph = ip_hdr(skb); err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { From 7dafda5bf29f06c1d62f1a25bd4c1f13d0242b3f Mon Sep 17 00:00:00 2001 From: Grygorii Strashko Date: Thu, 13 Apr 2017 14:11:27 -0500 Subject: [PATCH 116/970] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy commit bfe72442578bb112626e476ffe1f276504d85b95 upstream. Now the command: ethtool --phy-statistics eth0 will cause system crash with meassage "Unable to handle kernel NULL pointer dereference at virtual address 00000010" from: (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210) (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c) (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964) (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0) (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c) (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64) (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44) The reason: phy_driver structure for KSZ9031 phy has no .probe() callback defined. As result, struct phy_device *phydev->priv pointer will not be initializes (null). This issue will affect also following phys: KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 Fix it by: - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 phys. The kszphy_probe() can be re-used as it doesn't do any phy specific settings. - removing statistic callbacks from other phys (KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding statistic counters. Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") Signed-off-by: Grygorii Strashko Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Cc: Dan Rue Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/micrel.c | 17 ++--------------- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 2032a6de026b..707190d3ada0 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -801,9 +801,6 @@ static struct phy_driver ksphy_driver[] = { .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, - .get_sset_count = kszphy_get_sset_count, - .get_strings = kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend = genphy_suspend, .resume = genphy_resume, }, { @@ -948,9 +945,6 @@ static struct phy_driver ksphy_driver[] = { .read_status = genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr = kszphy_config_intr, - .get_sset_count = kszphy_get_sset_count, - .get_strings = kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend = genphy_suspend, .resume = genphy_resume, }, { @@ -960,6 +954,7 @@ static struct phy_driver ksphy_driver[] = { .features = (PHY_GBIT_FEATURES | SUPPORTED_Pause), .flags = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT, .driver_data = &ksz9021_type, + .probe = kszphy_probe, .config_init = ksz9021_config_init, .config_aneg = genphy_config_aneg, .read_status = genphy_read_status, @@ -979,6 +974,7 @@ static struct phy_driver ksphy_driver[] = { .features = (PHY_GBIT_FEATURES | SUPPORTED_Pause), .flags = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT, .driver_data = &ksz9021_type, + .probe = kszphy_probe, .config_init = ksz9031_config_init, .config_aneg = genphy_config_aneg, .read_status = ksz9031_read_status, @@ -998,9 +994,6 @@ static struct phy_driver ksphy_driver[] = { .config_init = kszphy_config_init, .config_aneg = ksz8873mll_config_aneg, .read_status = ksz8873mll_read_status, - .get_sset_count = kszphy_get_sset_count, - .get_strings = kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend = genphy_suspend, .resume = genphy_resume, }, { @@ -1012,9 +1005,6 @@ static struct phy_driver ksphy_driver[] = { .config_init = kszphy_config_init, .config_aneg = genphy_config_aneg, .read_status = genphy_read_status, - .get_sset_count = kszphy_get_sset_count, - .get_strings = kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend = genphy_suspend, .resume = genphy_resume, }, { @@ -1026,9 +1016,6 @@ static struct phy_driver ksphy_driver[] = { .config_init = kszphy_config_init, .config_aneg = ksz8873mll_config_aneg, .read_status = ksz8873mll_read_status, - .get_sset_count = kszphy_get_sset_count, - .get_strings = kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend = genphy_suspend, .resume = genphy_resume, } }; From 0e76f4db40893ca36a184d8921777194704cf81a Mon Sep 17 00:00:00 2001 From: Sean Nyekjaer Date: Tue, 22 May 2018 19:45:09 +0200 Subject: [PATCH 117/970] ARM: dts: imx6q: Use correct SDMA script for SPI5 core commit df07101e1c4a29e820df02f9989a066988b160e6 upstream. According to the reference manual the shp_2_mcu / mcu_2_shp scripts must be used for devices connected through the SPBA. This fixes an issue we saw with DMA transfers. Sometimes the SPI controller RX FIFO was not empty after a DMA transfer and the driver got stuck in the next PIO transfer when it read one word more than expected. commit dd4b487b32a35 ("ARM: dts: imx6: Use correct SDMA script for SPI cores") is fixing the same issue but only for SPI1 - 4. Fixes: 677940258dd8e ("ARM: dts: imx6q: enable dma for ecspi5") Signed-off-by: Sean Nyekjaer Reviewed-by: Fabio Estevam Signed-off-by: Shawn Guo Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/imx6q.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/imx6q.dtsi b/arch/arm/boot/dts/imx6q.dtsi index e9a5d0b8c7b0..908b269a016b 100644 --- a/arch/arm/boot/dts/imx6q.dtsi +++ b/arch/arm/boot/dts/imx6q.dtsi @@ -96,7 +96,7 @@ clocks = <&clks IMX6Q_CLK_ECSPI5>, <&clks IMX6Q_CLK_ECSPI5>; clock-names = "ipg", "per"; - dmas = <&sdma 11 7 1>, <&sdma 12 7 2>; + dmas = <&sdma 11 8 1>, <&sdma 12 8 2>; dma-names = "rx", "tx"; status = "disabled"; }; From 389a3fcb3472c58ed13a7a5421cdf870f2b7b013 Mon Sep 17 00:00:00 2001 From: Mike Marciniszyn Date: Thu, 31 May 2018 11:30:09 -0700 Subject: [PATCH 118/970] IB/hfi1: Fix user context tail allocation for DMA_RTAIL commit 1bc0299d976e000ececc6acd76e33b4582646cb7 upstream. The following code fails to allocate a buffer for the tail address that the hardware DMAs into when the user context DMA_RTAIL is set. if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) { rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent( &dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail, gfp_flags); if (!rcd->rcvhdrtail_kvaddr) goto bail_free; rcd->rcvhdrqtailaddr_dma = dma_hdrqtail; } So the rcvhdrtail_kvaddr would then be NULL. The mmap logic fails to check for a NULL rcvhdrtail_kvaddr. The fix is to test for both user and kernel DMA_TAIL options during the allocation as well as testing for a NULL rcvhdrtail_kvaddr during the mmap processing. Additionally, all downstream testing of the capmask for DMA_RTAIL have been eliminated in favor of testing rcvhdrtail_kvaddr. Cc: # 4.9.x Reviewed-by: Michael J. Ruhl Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/chip.c | 8 ++++---- drivers/infiniband/hw/hfi1/file_ops.c | 2 +- drivers/infiniband/hw/hfi1/init.c | 9 ++++----- 3 files changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c index 148b313c6471..d30b3b908621 100644 --- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -6717,7 +6717,7 @@ static void rxe_kernel_unfreeze(struct hfi1_devdata *dd) for (i = 0; i < dd->n_krcv_queues; i++) { rcvmask = HFI1_RCVCTRL_CTXT_ENB; /* HFI1_RCVCTRL_TAILUPD_[ENB|DIS] needs to be set explicitly */ - rcvmask |= HFI1_CAP_KGET_MASK(dd->rcd[i]->flags, DMA_RTAIL) ? + rcvmask |= dd->rcd[i]->rcvhdrtail_kvaddr ? HFI1_RCVCTRL_TAILUPD_ENB : HFI1_RCVCTRL_TAILUPD_DIS; hfi1_rcvctrl(dd, rcvmask, i); } @@ -8211,7 +8211,7 @@ static inline int check_packet_present(struct hfi1_ctxtdata *rcd) u32 tail; int present; - if (!HFI1_CAP_IS_KSET(DMA_RTAIL)) + if (!rcd->rcvhdrtail_kvaddr) present = (rcd->seq_cnt == rhf_rcv_seq(rhf_to_cpu(get_rhf_addr(rcd)))); else /* is RDMA rtail */ @@ -11550,7 +11550,7 @@ void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int op, int ctxt) /* reset the tail and hdr addresses, and sequence count */ write_kctxt_csr(dd, ctxt, RCV_HDR_ADDR, rcd->rcvhdrq_dma); - if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) + if (rcd->rcvhdrtail_kvaddr) write_kctxt_csr(dd, ctxt, RCV_HDR_TAIL_ADDR, rcd->rcvhdrqtailaddr_dma); rcd->seq_cnt = 1; @@ -11630,7 +11630,7 @@ void hfi1_rcvctrl(struct hfi1_devdata *dd, unsigned int op, int ctxt) rcvctrl |= RCV_CTXT_CTRL_INTR_AVAIL_SMASK; if (op & HFI1_RCVCTRL_INTRAVAIL_DIS) rcvctrl &= ~RCV_CTXT_CTRL_INTR_AVAIL_SMASK; - if (op & HFI1_RCVCTRL_TAILUPD_ENB && rcd->rcvhdrqtailaddr_dma) + if ((op & HFI1_RCVCTRL_TAILUPD_ENB) && rcd->rcvhdrtail_kvaddr) rcvctrl |= RCV_CTXT_CTRL_TAIL_UPD_SMASK; if (op & HFI1_RCVCTRL_TAILUPD_DIS) { /* See comment on RcvCtxtCtrl.TailUpd above */ diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c index bb729764a799..d612f9d94083 100644 --- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -609,7 +609,7 @@ static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma) ret = -EINVAL; goto done; } - if (flags & VM_WRITE) { + if ((flags & VM_WRITE) || !uctxt->rcvhdrtail_kvaddr) { ret = -EPERM; goto done; } diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c index c81c44525dd5..9dc8cf096e2e 100644 --- a/drivers/infiniband/hw/hfi1/init.c +++ b/drivers/infiniband/hw/hfi1/init.c @@ -1618,7 +1618,6 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd) u64 reg; if (!rcd->rcvhdrq) { - dma_addr_t dma_hdrqtail; gfp_t gfp_flags; /* @@ -1641,13 +1640,13 @@ int hfi1_create_rcvhdrq(struct hfi1_devdata *dd, struct hfi1_ctxtdata *rcd) goto bail; } - if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) { + if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL) || + HFI1_CAP_UGET_MASK(rcd->flags, DMA_RTAIL)) { rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent( - &dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail, - gfp_flags); + &dd->pcidev->dev, PAGE_SIZE, + &rcd->rcvhdrqtailaddr_dma, gfp_flags); if (!rcd->rcvhdrtail_kvaddr) goto bail_free; - rcd->rcvhdrqtailaddr_dma = dma_hdrqtail; } rcd->rcvhdrq_size = amt; From 05a5d4baac9eb67de53df12d7c287492d80d674a Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 21 Jun 2018 10:43:31 +0200 Subject: [PATCH 119/970] x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths commit 74899d92e66663dc7671a8017b3146dcd4735f3b upstream. Commit: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") ... added speculative_store_bypass_ht_init() to the per-CPU initialization sequence. speculative_store_bypass_ht_init() needs to be called on each CPU for PV guests, too. Reported-by: Brian Woods Tested-by: Brian Woods Signed-off-by: Juergen Gross Cc: Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: boris.ostrovsky@oracle.com Cc: xen-devel@lists.xenproject.org Fixes: 1f50ddb4f4189243c05926b842dc1a0332195f31 ("x86/speculation: Handle HT correctly on AMD") Link: https://lore.kernel.org/lkml/20180621084331.21228-1-jgross@suse.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/smp.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index a11540e51f62..8eca26ef6471 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -28,6 +28,7 @@ #include #include +#include #include #include @@ -87,6 +88,8 @@ static void cpu_bringup(void) cpu_data(cpu).x86_max_cores = 1; set_cpu_sibling_map(cpu); + speculative_store_bypass_ht_init(); + xen_setup_cpu_clockevents(); notify_cpu_starting(cpu); @@ -375,6 +378,8 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus) } set_cpu_sibling_map(0); + speculative_store_bypass_ht_init(); + xen_pmu_init(0); if (xen_smp_intr_init(0)) From 1adc34adc3447c34926994b87db5d929f5ab45b5 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Wed, 18 Jan 2017 11:15:39 -0800 Subject: [PATCH 120/970] x86/cpu: Re-apply forced caps every time CPU caps are re-read commit 60d3450167433f2d099ce2869dc52dd9e7dc9b29 upstream. Calling get_cpu_cap() will reset a bunch of CPU features. This will cause the system to lose track of force-set and force-cleared features in the words that are reset until the end of CPU initialization. This can cause X86_FEATURE_FPU, for example, to change back and forth during boot and potentially confuse CPU setup. To minimize the chance of confusion, re-apply forced caps every time get_cpu_cap() is called. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Matthew Whitehead Cc: Oleg Nesterov Cc: One Thousand Gnomes Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Cc: Yu-cheng Yu Link: http://lkml.kernel.org/r/c817eb373d2c67c2c81413a70fc9b845fa34a37e.1484705016.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/common.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index b0fd028b2eee..7a4279d8a902 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -848,6 +848,13 @@ void get_cpu_cap(struct cpuinfo_x86 *c) init_scattered_cpuid_features(c); init_speculation_control(c); + + /* + * Clear/Set all flags overridden by options, after probe. + * This needs to happen each time we re-probe, which may happen + * several times during CPU initialization. + */ + apply_forced_caps(c); } static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c) From 433c183fa2478055e7c6171184e1c72e23836700 Mon Sep 17 00:00:00 2001 From: Cannon Matthews Date: Tue, 3 Jul 2018 17:02:43 -0700 Subject: [PATCH 121/970] mm: hugetlb: yield when prepping struct pages commit 520495fe96d74e05db585fc748351e0504d8f40d upstream. When booting with very large numbers of gigantic (i.e. 1G) pages, the operations in the loop of gather_bootmem_prealloc, and specifically prep_compound_gigantic_page, takes a very long time, and can cause a softlockup if enough pages are requested at boot. For example booting with 3844 1G pages requires prepping (set_compound_head, init the count) over 1 billion 4K tail pages, which takes considerable time. Add a cond_resched() to the outer loop in gather_bootmem_prealloc() to prevent this lockup. Tested: Booted with softlockup_panic=1 hugepagesz=1G hugepages=3844 and no softlockup is reported, and the hugepages are reported as successfully setup. Link: http://lkml.kernel.org/r/20180627214447.260804-1-cannonmatthews@google.com Signed-off-by: Cannon Matthews Reviewed-by: Andrew Morton Reviewed-by: Mike Kravetz Acked-by: Michal Hocko Cc: Andres Lagar-Cavilla Cc: Peter Feiner Cc: Greg Thelen Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/hugetlb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6ff65c405243..f9e735537c37 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2171,6 +2171,7 @@ static void __init gather_bootmem_prealloc(void) */ if (hstate_is_gigantic(h)) adjust_managed_page_count(page, 1 << h->order); + cond_resched(); } } From 07cd8167aa703efc15fe01da8211d585ac6d3be0 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Wed, 31 Jan 2018 23:48:49 +0800 Subject: [PATCH 122/970] tracing: Fix missing return symbol in function_graph output commit 1fe4293f4b8de75824935f8d8e9a99c7fc6873da upstream. The function_graph tracer does not show the interrupt return marker for the leaf entry. On leaf entries, we see an unbalanced interrupt marker (the interrupt was entered, but nevern left). Before: 1) | SyS_write() { 1) | __fdget_pos() { 1) 0.061 us | __fget_light(); 1) 0.289 us | } 1) | vfs_write() { 1) 0.049 us | rw_verify_area(); 1) + 15.424 us | __vfs_write(); 1) ==========> | 1) 6.003 us | smp_apic_timer_interrupt(); 1) 0.055 us | __fsnotify_parent(); 1) 0.073 us | fsnotify(); 1) + 23.665 us | } 1) + 24.501 us | } After: 0) | SyS_write() { 0) | __fdget_pos() { 0) 0.052 us | __fget_light(); 0) 0.328 us | } 0) | vfs_write() { 0) 0.057 us | rw_verify_area(); 0) | __vfs_write() { 0) ==========> | 0) 8.548 us | smp_apic_timer_interrupt(); 0) <========== | 0) + 36.507 us | } /* __vfs_write */ 0) 0.049 us | __fsnotify_parent(); 0) 0.066 us | fsnotify(); 0) + 50.064 us | } 0) + 50.952 us | } Link: http://lkml.kernel.org/r/1517413729-20411-1-git-send-email-changbin.du@intel.com Cc: stable@vger.kernel.org Fixes: f8b755ac8e0cc ("tracing/function-graph-tracer: Output arrows signal on hardirq call/return") Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_functions_graph.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c index a17cb1d8415c..01e71812e174 100644 --- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -830,6 +830,7 @@ print_graph_entry_leaf(struct trace_iterator *iter, struct ftrace_graph_ret *graph_ret; struct ftrace_graph_ent *call; unsigned long long duration; + int cpu = iter->cpu; int i; graph_ret = &ret_entry->ret; @@ -838,7 +839,6 @@ print_graph_entry_leaf(struct trace_iterator *iter, if (data) { struct fgraph_cpu_data *cpu_data; - int cpu = iter->cpu; cpu_data = per_cpu_ptr(data->cpu_data, cpu); @@ -868,6 +868,9 @@ print_graph_entry_leaf(struct trace_iterator *iter, trace_seq_printf(s, "%ps();\n", (void *)call->func); + print_graph_irq(iter, graph_ret->func, TRACE_GRAPH_RET, + cpu, iter->ent->pid, flags); + return trace_handle_return(s); } From b6db8af7e34edfa1bf1d7b0797da15c3811a2a98 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Mon, 25 Jun 2018 16:25:44 +0200 Subject: [PATCH 123/970] scsi: sg: mitigate read/write abuse commit 26b5b874aff5659a7e26e5b1997e3df2c41fa7fd upstream. As Al Viro noted in commit 128394eff343 ("sg_write()/bsg_write() is not fit to be called under KERNEL_DS"), sg improperly accesses userspace memory outside the provided buffer, permitting kernel memory corruption via splice(). But it doesn't just do it on ->write(), also on ->read(). As a band-aid, make sure that the ->read() and ->write() handlers can not be called in weird contexts (kernel context or credentials different from file opener), like for ib_safe_file_access(). If someone needs to use these interfaces from different security contexts, a new interface should be written that goes through the ->ioctl() handler. I've mostly copypasted ib_safe_file_access() over as sg_safe_file_access() because I couldn't find a good common header - please tell me if you know a better way. [mkp: s/_safe_/_check_/] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Signed-off-by: Jann Horn Acked-by: Douglas Gilbert Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sg.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 0f81d739f9e9..2065a0f9dca6 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -51,6 +51,7 @@ static int sg_version_num = 30536; /* 2 digits for each component */ #include #include #include +#include /* for sg_check_file_access() */ #include "scsi.h" #include @@ -210,6 +211,33 @@ static void sg_device_destroy(struct kref *kref); sdev_prefix_printk(prefix, (sdp)->device, \ (sdp)->disk->disk_name, fmt, ##a) +/* + * The SCSI interfaces that use read() and write() as an asynchronous variant of + * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways + * to trigger read() and write() calls from various contexts with elevated + * privileges. This can lead to kernel memory corruption (e.g. if these + * interfaces are called through splice()) and privilege escalation inside + * userspace (e.g. if a process with access to such a device passes a file + * descriptor to a SUID binary as stdin/stdout/stderr). + * + * This function provides protection for the legacy API by restricting the + * calling context. + */ +static int sg_check_file_access(struct file *filp, const char *caller) +{ + if (filp->f_cred != current_real_cred()) { + pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n", + caller, task_tgid_vnr(current), current->comm); + return -EPERM; + } + if (unlikely(segment_eq(get_fs(), KERNEL_DS))) { + pr_err_once("%s: process %d (%s) called from kernel context, this is not allowed.\n", + caller, task_tgid_vnr(current), current->comm); + return -EACCES; + } + return 0; +} + static int sg_allow_access(struct file *filp, unsigned char *cmd) { struct sg_fd *sfp = filp->private_data; @@ -394,6 +422,14 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos) struct sg_header *old_hdr = NULL; int retval = 0; + /* + * This could cause a response to be stranded. Close the associated + * file descriptor to free up any resources being held. + */ + retval = sg_check_file_access(filp, __func__); + if (retval) + return retval; + if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp))) return -ENXIO; SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp, @@ -581,9 +617,11 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos) struct sg_header old_hdr; sg_io_hdr_t *hp; unsigned char cmnd[SG_MAX_CDB_SIZE]; + int retval; - if (unlikely(segment_eq(get_fs(), KERNEL_DS))) - return -EINVAL; + retval = sg_check_file_access(filp, __func__); + if (retval) + return retval; if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp))) return -ENXIO; From 0cab67a1ed6bbae481dae0ea7047f880a43fd017 Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Thu, 21 Jun 2018 14:49:38 +0200 Subject: [PATCH 124/970] s390: Correct register corruption in critical section cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 891f6a726cacbb87e5b06076693ffab53bd378d7 upstream. In the critical section cleanup we must not mess with r1. For march=z9 or older, larl + ex (instead of exrl) are used with r1 as a temporary register. This can clobber r1 in several interrupt handlers. Fix this by using r11 as a temp register. r11 is being saved by all callers of cleanup_critical. Fixes: 6dd85fbb87 ("s390: move expoline assembler macros to a header") Cc: stable@vger.kernel.org #v4.16 Reported-by: Oliver Kurz Reported-by: Petr Tesařík Signed-off-by: Christian Borntraeger Reviewed-by: Hendrik Brueckner Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/entry.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S index a4fd00064c80..771cfd2e1e6d 100644 --- a/arch/s390/kernel/entry.S +++ b/arch/s390/kernel/entry.S @@ -1187,7 +1187,7 @@ cleanup_critical: jl 0f clg %r9,BASED(.Lcleanup_table+104) # .Lload_fpu_regs_end jl .Lcleanup_load_fpu_regs -0: BR_EX %r14 +0: BR_EX %r14,%r11 .align 8 .Lcleanup_table: @@ -1217,7 +1217,7 @@ cleanup_critical: ni __SIE_PROG0C+3(%r9),0xfe # no longer in SIE lctlg %c1,%c1,__LC_USER_ASCE # load primary asce larl %r9,sie_exit # skip forward to sie_exit - BR_EX %r14 + BR_EX %r14,%r11 #endif .Lcleanup_system_call: From f9b1cd6e7490e3aed85b915cd4a5e3a10ac2bd58 Mon Sep 17 00:00:00 2001 From: Lars Ellenberg Date: Mon, 25 Jun 2018 11:39:52 +0200 Subject: [PATCH 125/970] drbd: fix access after free commit 64dafbc9530c10300acffc57fae3269d95fa8f93 upstream. We have struct drbd_requests { ... struct bio *private_bio; ... } to hold a bio clone for local submission. On local IO completion, we put that bio, and in case we want to use the result later, we overload that member to hold the ERR_PTR() of the completion result, Which, before v4.3, used to be the passed in "int error", so we could first bio_put(), then assign. v4.3-rc1~100^2~21 4246a0b63bd8 block: add a bi_error field to struct bio changed that: bio_put(req->private_bio); - req->private_bio = ERR_PTR(error); + req->private_bio = ERR_PTR(bio->bi_error); Which introduces an access after free, because it was non obvious that req->private_bio == bio. Impact of that was mostly unnoticable, because we only use that value in a multiple-failure case, and even then map any "unexpected" error code to EIO, so worst case we could potentially mask a more specific error with EIO in a multiple failure case. Unless the pointed to memory region was unmapped, as is the case with CONFIG_DEBUG_PAGEALLOC, in which case this results in BUG: unable to handle kernel paging request v4.13-rc1~70^2~75 4e4cbee93d56 block: switch bios to blk_status_t changes it further to bio_put(req->private_bio); req->private_bio = ERR_PTR(blk_status_to_errno(bio->bi_status)); And blk_status_to_errno() now contains a WARN_ON_ONCE() for unexpected values, which catches this "sometimes", if the memory has been reused quickly enough for other things. Should also go into stable since 4.3, with the trivial change around 4.13. Cc: stable@vger.kernel.org Fixes: 4246a0b63bd8 block: add a bi_error field to struct bio Reported-by: Sarah Newman Signed-off-by: Lars Ellenberg Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/block/drbd/drbd_worker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c index c6755c9a0aea..51c233c4e058 100644 --- a/drivers/block/drbd/drbd_worker.c +++ b/drivers/block/drbd/drbd_worker.c @@ -269,8 +269,8 @@ void drbd_request_endio(struct bio *bio) what = COMPLETED_OK; } - bio_put(req->private_bio); req->private_bio = ERR_PTR(bio->bi_error); + bio_put(bio); /* not req_mod(), we need irqsave here! */ spin_lock_irqsave(&device->resource->req_lock, flags); From 2c4f6b710bcc109d9d7063ee8114bbd1ac7dc45b Mon Sep 17 00:00:00 2001 From: Paulo Alcantara Date: Thu, 5 Jul 2018 13:46:34 -0300 Subject: [PATCH 126/970] cifs: Fix infinite loop when using hard mount option commit 7ffbe65578b44fafdef577a360eb0583929f7c6e upstream. For every request we send, whether it is SMB1 or SMB2+, we attempt to reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying out the request. So, while server->tcpStatus != CifsNeedReconnect, we wait for the reconnection to succeed on wait_event_interruptible_timeout(). If it returns, that means that either the condition was evaluated to true, or timeout elapsed, or it was interrupted by a signal. Since we're not handling the case where the process woke up due to a received signal (-ERESTARTSYS), the next call to wait_event_interruptible_timeout() will _always_ fail and we end up looping forever inside either cifs_reconnect_tcon() or smb2_reconnect(). Here's an example of how to trigger that: $ mount.cifs //foo/share /mnt/test -o username=foo,password=foo,vers=1.0,hard (break connection to server before executing bellow cmd) $ stat -f /mnt/test & sleep 140 [1] 2511 $ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 0.0 0.0 12892 1008 pts/0 S 12:24 0:00 stat -f /mnt/test $ kill -9 2511 (wait for a while; process is stuck in the kernel) $ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 83.2 0.0 12892 1008 pts/0 R 12:24 30:01 stat -f /mnt/test By using 'hard' mount point means that cifs.ko will keep retrying indefinitely, however we must allow the process to be killed otherwise it would hang the system. Signed-off-by: Paulo Alcantara Cc: stable@vger.kernel.org Reviewed-by: Aurelien Aptel Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/cifssmb.c | 10 ++++++++-- fs/cifs/smb2pdu.c | 18 ++++++++++++------ 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index d57222894892..8407b07428a6 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -150,8 +150,14 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command) * greater than cifs socket timeout which is 7 seconds */ while (server->tcpStatus == CifsNeedReconnect) { - wait_event_interruptible_timeout(server->response_q, - (server->tcpStatus != CifsNeedReconnect), 10 * HZ); + rc = wait_event_interruptible_timeout(server->response_q, + (server->tcpStatus != CifsNeedReconnect), + 10 * HZ); + if (rc < 0) { + cifs_dbg(FYI, "%s: aborting reconnect due to a received" + " signal by the process\n", __func__); + return -ERESTARTSYS; + } /* are we still trying to reconnect? */ if (server->tcpStatus != CifsNeedReconnect) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index e0214334769b..4ded64b8b43b 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -155,7 +155,7 @@ smb2_hdr_assemble(struct smb2_hdr *hdr, __le16 smb2_cmd /* command */ , static int smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon) { - int rc = 0; + int rc; struct nls_table *nls_codepage; struct cifs_ses *ses; struct TCP_Server_Info *server; @@ -166,10 +166,10 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon) * for those three - in the calling routine. */ if (tcon == NULL) - return rc; + return 0; if (smb2_command == SMB2_TREE_CONNECT) - return rc; + return 0; if (tcon->tidStatus == CifsExiting) { /* @@ -212,8 +212,14 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon) return -EAGAIN; } - wait_event_interruptible_timeout(server->response_q, - (server->tcpStatus != CifsNeedReconnect), 10 * HZ); + rc = wait_event_interruptible_timeout(server->response_q, + (server->tcpStatus != CifsNeedReconnect), + 10 * HZ); + if (rc < 0) { + cifs_dbg(FYI, "%s: aborting reconnect due to a received" + " signal by the process\n", __func__); + return -ERESTARTSYS; + } /* are we still trying to reconnect? */ if (server->tcpStatus != CifsNeedReconnect) @@ -231,7 +237,7 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon) } if (!tcon->ses->need_reconnect && !tcon->need_reconnect) - return rc; + return 0; nls_codepage = load_nls_default(); From 0f80447d031d91748878e002b9506e323d932467 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Sun, 3 Jun 2018 16:40:54 +0200 Subject: [PATCH 127/970] drm/udl: fix display corruption of the last line commit 99ec9e77511dea55d81729fc80b6c63a61bfa8e0 upstream. The displaylink hardware has such a peculiarity that it doesn't render a command until next command is received. This produces occasional corruption, such as when setting 22x11 font on the console, only the first line of the cursor will be blinking if the cursor is located at some specific columns. When we end up with a repeating pixel, the driver has a bug that it leaves one uninitialized byte after the command (and this byte is enough to flush the command and render it - thus it fixes the screen corruption), however whe we end up with a non-repeating pixel, there is no byte appended and this results in temporary screen corruption. This patch fixes the screen corruption by always appending a byte 0xAF at the end of URB. It also removes the uninitialized byte. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/udl/udl_fb.c | 5 ++++- drivers/gpu/drm/udl/udl_transfer.c | 11 +++++++---- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c index 67ea2ce03a23..39d0fdcb17d2 100644 --- a/drivers/gpu/drm/udl/udl_fb.c +++ b/drivers/gpu/drm/udl/udl_fb.c @@ -136,7 +136,10 @@ int udl_handle_damage(struct udl_framebuffer *fb, int x, int y, if (cmd > (char *) urb->transfer_buffer) { /* Send partial buffer remaining before exiting */ - int len = cmd - (char *) urb->transfer_buffer; + int len; + if (cmd < (char *) urb->transfer_buffer + urb->transfer_buffer_length) + *cmd++ = 0xAF; + len = cmd - (char *) urb->transfer_buffer; ret = udl_submit_urb(dev, urb, len); bytes_sent += len; } else diff --git a/drivers/gpu/drm/udl/udl_transfer.c b/drivers/gpu/drm/udl/udl_transfer.c index 917dcb978c2c..9259a2f8bf3a 100644 --- a/drivers/gpu/drm/udl/udl_transfer.c +++ b/drivers/gpu/drm/udl/udl_transfer.c @@ -152,11 +152,11 @@ static void udl_compress_hline16( raw_pixels_count_byte = cmd++; /* we'll know this later */ raw_pixel_start = pixel; - cmd_pixel_end = pixel + (min(MAX_CMD_PIXELS + 1, - min((int)(pixel_end - pixel) / bpp, - (int)(cmd_buffer_end - cmd) / 2))) * bpp; + cmd_pixel_end = pixel + min3(MAX_CMD_PIXELS + 1UL, + (unsigned long)(pixel_end - pixel) / bpp, + (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) * bpp; - prefetch_range((void *) pixel, (cmd_pixel_end - pixel) * bpp); + prefetch_range((void *) pixel, cmd_pixel_end - pixel); pixel_val16 = get_pixel_val16(pixel, bpp); while (pixel < cmd_pixel_end) { @@ -192,6 +192,9 @@ static void udl_compress_hline16( if (pixel > raw_pixel_start) { /* finalize last RAW span */ *raw_pixels_count_byte = ((pixel-raw_pixel_start) / bpp) & 0xFF; + } else { + /* undo unused byte */ + cmd--; } *cmd_pixels_count_byte = ((pixel - cmd_pixel_start) / bpp) & 0xFF; From 8ef97ef67ce0f8fc3d32c7218e6b412e479ee2ab Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sat, 16 Jun 2018 20:21:45 -0400 Subject: [PATCH 128/970] jbd2: don't mark block as modified if the handle is out of credits commit e09463f220ca9a1a1ecfda84fcda658f99a1f12a upstream. Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/jbd2/transaction.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 9e9e0936138b..b320c1ba7fdc 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1353,6 +1353,13 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) if (jh->b_transaction == transaction && jh->b_jlist != BJ_Metadata) { jbd_lock_bh_state(bh); + if (jh->b_transaction == transaction && + jh->b_jlist != BJ_Metadata) + pr_err("JBD2: assertion failure: h_type=%u " + "h_line_no=%u block_no=%llu jlist=%u\n", + handle->h_type, handle->h_line_no, + (unsigned long long) bh->b_blocknr, + jh->b_jlist); J_ASSERT_JH(jh, jh->b_transaction != transaction || jh->b_jlist == BJ_Metadata); jbd_unlock_bh_state(bh); @@ -1372,11 +1379,11 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) * of the transaction. This needs to be done * once a transaction -bzzz */ - jh->b_modified = 1; if (handle->h_buffer_credits <= 0) { ret = -ENOSPC; goto out_unlock_bh; } + jh->b_modified = 1; handle->h_buffer_credits--; } From 9e4842f2aa6c4b4340669730c90cb6fbf630ee42 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Wed, 13 Jun 2018 23:08:26 -0400 Subject: [PATCH 129/970] ext4: make sure bitmaps and the inode table don't overlap with bg descriptors commit 77260807d1170a8cf35dbb06e07461a655f67eee upstream. It's really bad when the allocation bitmaps and the inode table overlap with the block group descriptors, since it causes random corruption of the bg descriptors. So we really want to head those off at the pass. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index bfb83d76d128..6525127133e8 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2231,6 +2231,7 @@ static int ext4_check_descriptors(struct super_block *sb, struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t first_block = le32_to_cpu(sbi->s_es->s_first_data_block); ext4_fsblk_t last_block; + ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0) + 1; ext4_fsblk_t block_bitmap; ext4_fsblk_t inode_bitmap; ext4_fsblk_t inode_table; @@ -2263,6 +2264,14 @@ static int ext4_check_descriptors(struct super_block *sb, if (!(sb->s_flags & MS_RDONLY)) return 0; } + if (block_bitmap >= sb_block + 1 && + block_bitmap <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Block bitmap for group %u overlaps " + "block group descriptors", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; + } if (block_bitmap < first_block || block_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Block bitmap for group %u not in group " @@ -2277,6 +2286,14 @@ static int ext4_check_descriptors(struct super_block *sb, if (!(sb->s_flags & MS_RDONLY)) return 0; } + if (inode_bitmap >= sb_block + 1 && + inode_bitmap <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Inode bitmap for group %u overlaps " + "block group descriptors", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; + } if (inode_bitmap < first_block || inode_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Inode bitmap for group %u not in group " @@ -2291,6 +2308,14 @@ static int ext4_check_descriptors(struct super_block *sb, if (!(sb->s_flags & MS_RDONLY)) return 0; } + if (inode_table >= sb_block + 1 && + inode_table <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " + "Inode table for group %u overlaps " + "block group descriptors", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; + } if (inode_table < first_block || inode_table + sbi->s_itb_per_group - 1 > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " From cdde876fce2501828af33d5e4faa36c8919fc96a Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Wed, 13 Jun 2018 23:00:48 -0400 Subject: [PATCH 130/970] ext4: always check block group bounds in ext4_init_block_bitmap() commit 819b23f1c501b17b9694325471789e6b5cc2d0d2 upstream. Regardless of whether the flex_bg feature is set, we should always check to make sure the bits we are setting in the block bitmap are within the block group bounds. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/balloc.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 6776f4aa3d12..badb4e075225 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -183,7 +183,6 @@ static int ext4_init_block_bitmap(struct super_block *sb, unsigned int bit, bit_max; struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t start, tmp; - int flex_bg = 0; struct ext4_group_info *grp; J_ASSERT_BH(bh, buffer_locked(bh)); @@ -216,22 +215,19 @@ static int ext4_init_block_bitmap(struct super_block *sb, start = ext4_group_first_block_no(sb, block_group); - if (ext4_has_feature_flex_bg(sb)) - flex_bg = 1; - /* Set bits for block and inode bitmaps, and inode table */ tmp = ext4_block_bitmap(sb, gdp); - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data); tmp = ext4_inode_bitmap(sb, gdp); - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data); tmp = ext4_inode_table(sb, gdp); for (; tmp < ext4_inode_table(sb, gdp) + sbi->s_itb_per_group; tmp++) { - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data); } From 5ae57329580d6ceca97559ff030a5f0e91fa66fe Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Thu, 14 Jun 2018 00:58:00 -0400 Subject: [PATCH 131/970] ext4: only look at the bg_flags field if it is valid commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream. The bg_flags field in the block group descripts is only valid if the uninit_bg or metadata_csum feature is enabled. We were not consistently looking at this field; fix this. Also block group #0 must never have uninitialized allocation bitmaps, or need to be zeroed, since that's where the root inode, and other special inodes are set up. Check for these conditions and mark the file system as corrupted if they are detected. This addresses CVE-2018-10876. https://bugzilla.kernel.org/show_bug.cgi?id=199403 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/balloc.c | 11 ++++++++++- fs/ext4/ialloc.c | 14 ++++++++++++-- fs/ext4/mballoc.c | 6 ++++-- fs/ext4/super.c | 11 ++++++++++- 4 files changed, 36 insertions(+), 6 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index badb4e075225..ad13f07cf0d3 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -450,7 +450,16 @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group) goto verify; } ext4_lock_group(sb, block_group); - if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + if (block_group == 0) { + ext4_unlock_group(sb, block_group); + unlock_buffer(bh); + ext4_error(sb, "Block bitmap for bg 0 marked " + "uninitialized"); + err = -EFSCORRUPTED; + goto out; + } err = ext4_init_block_bitmap(sb, bh, block_group, desc); set_bitmap_uptodate(bh); set_buffer_uptodate(bh); diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index dcf63daefee0..460866b2166d 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -152,7 +152,16 @@ ext4_read_inode_bitmap(struct super_block *sb, ext4_group_t block_group) } ext4_lock_group(sb, block_group); - if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT))) { + if (block_group == 0) { + ext4_unlock_group(sb, block_group); + unlock_buffer(bh); + ext4_error(sb, "Inode bitmap for bg 0 marked " + "uninitialized"); + err = -EFSCORRUPTED; + goto out; + } memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8); ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb->s_blocksize * 8, bh->b_data); @@ -926,7 +935,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, /* recheck and clear flag under lock if we still need to */ ext4_lock_group(sb, group); - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ext4_free_group_clusters_set(sb, gdp, ext4_free_clusters_after_init(sb, group, gdp)); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 14bd37041e1a..53e1890660a2 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2444,7 +2444,8 @@ int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group, * initialize bb_free to be able to skip * empty groups without initialization */ - if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { meta_group_info[i]->bb_free = ext4_free_clusters_after_init(sb, group, desc); } else { @@ -2969,7 +2970,8 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, #endif ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ext4_free_group_clusters_set(sb, gdp, ext4_free_clusters_after_init(sb, diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6525127133e8..4c5d9df1790a 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3023,13 +3023,22 @@ static ext4_group_t ext4_has_uninit_itable(struct super_block *sb) ext4_group_t group, ngroups = EXT4_SB(sb)->s_groups_count; struct ext4_group_desc *gdp = NULL; + if (!ext4_has_group_desc_csum(sb)) + return ngroups; + for (group = 0; group < ngroups; group++) { gdp = ext4_get_group_desc(sb, group, NULL); if (!gdp) continue; - if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) + if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) + continue; + if (group != 0) break; + ext4_error(sb, "Inode table for bg 0 marked as " + "needing zeroing"); + if (sb->s_flags & MS_RDONLY) + return ngroups; } return group; From 87dad44faabd45683fba94443471298f8809e8a8 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Thu, 14 Jun 2018 12:55:10 -0400 Subject: [PATCH 132/970] ext4: verify the depth of extent tree in ext4_find_extent() commit bc890a60247171294acc0bd67d211fa4b88d40ba upstream. If there is a corupted file system where the claimed depth of the extent tree is -1, this can cause a massive buffer overrun leading to sadness. This addresses CVE-2018-10877. https://bugzilla.kernel.org/show_bug.cgi?id=199417 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ext4_extents.h | 1 + fs/ext4/extents.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h index 8ecf84b8f5a1..a284fb28944b 100644 --- a/fs/ext4/ext4_extents.h +++ b/fs/ext4/ext4_extents.h @@ -103,6 +103,7 @@ struct ext4_extent_header { }; #define EXT4_EXT_MAGIC cpu_to_le16(0xf30a) +#define EXT4_MAX_EXTENT_DEPTH 5 #define EXT4_EXTENT_TAIL_OFFSET(hdr) \ (sizeof(struct ext4_extent_header) + \ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 63c702b4b24c..106a5bb3ae68 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -881,6 +881,12 @@ ext4_find_extent(struct inode *inode, ext4_lblk_t block, eh = ext_inode_hdr(inode); depth = ext_depth(inode); + if (depth < 0 || depth > EXT4_MAX_EXTENT_DEPTH) { + EXT4_ERROR_INODE(inode, "inode has invalid extent depth: %d", + depth); + ret = -EFSCORRUPTED; + goto err; + } if (path) { ext4_ext_drop_refs(path); From 2f135cc8c094152d361b3172d3f757369de28a08 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Fri, 15 Jun 2018 12:27:16 -0400 Subject: [PATCH 133/970] ext4: include the illegal physical block in the bad map ext4_error msg commit bdbd6ce01a70f02e9373a584d0ae9538dcf0a121 upstream. Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 7c025ee1276f..81b16c3eccd4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -377,9 +377,9 @@ static int __check_block_validity(struct inode *inode, const char *func, if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), map->m_pblk, map->m_len)) { ext4_error_inode(inode, func, line, map->m_pblk, - "lblock %lu mapped to illegal pblock " + "lblock %lu mapped to illegal pblock %llu " "(length %d)", (unsigned long) map->m_lblk, - map->m_len); + map->m_pblk, map->m_len); return -EFSCORRUPTED; } return 0; From a5e063d348bd2ef14fff96b129749409a8991ea5 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Fri, 15 Jun 2018 12:28:16 -0400 Subject: [PATCH 134/970] ext4: clear i_data in ext4_inode_info when removing inline data commit 6e8ab72a812396996035a37e5ca4b3b99b5d214b upstream. When converting from an inode from storing the data in-line to a data block, ext4_destroy_inline_data_nolock() was only clearing the on-disk copy of the i_blocks[] array. It was not clearing copy of the i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually used by ext4_map_blocks(). This didn't matter much if we are using extents, since the extents header would be invalid and thus the extents could would re-initialize the extents tree. But if we are using indirect blocks, the previous contents of the i_blocks array will be treated as block numbers, with potentially catastrophic results to the file system integrity and/or user data. This gets worse if the file system is using a 1k block size and s_first_data is zero, but even without this, the file system can get quite badly corrupted. This addresses CVE-2018-10881. https://bugzilla.kernel.org/show_bug.cgi?id=200015 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inline.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 73cbc01ef5ad..e6ac24de119d 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -434,6 +434,7 @@ static int ext4_destroy_inline_data_nolock(handle_t *handle, memset((void *)ext4_raw_inode(&is.iloc)->i_block, 0, EXT4_MIN_INLINE_DATA_SIZE); + memset(ei->i_data, 0, EXT4_MIN_INLINE_DATA_SIZE); if (ext4_has_feature_extents(inode->i_sb)) { if (S_ISDIR(inode->i_mode) || From 425dc465de3725210162da9b1e9062e86cc2de27 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 17 Jun 2018 00:41:14 -0400 Subject: [PATCH 135/970] ext4: add more inode number paranoia checks commit c37e9e013469521d9adb932d17a1795c139b36db upstream. If there is a directory entry pointing to a system inode (such as a journal inode), complain and declare the file system to be corrupted. Also, if the superblock's first inode number field is too small, refuse to mount the file system. This addresses CVE-2018-10882. https://bugzilla.kernel.org/show_bug.cgi?id=200069 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ext4.h | 5 ----- fs/ext4/inode.c | 3 ++- fs/ext4/super.c | 5 +++++ 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index a8a750f59621..43e27d8ec770 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1542,11 +1542,6 @@ static inline struct timespec ext4_current_time(struct inode *inode) static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || - ino == EXT4_USR_QUOTA_INO || - ino == EXT4_GRP_QUOTA_INO || - ino == EXT4_BOOT_LOADER_INO || - ino == EXT4_JOURNAL_INO || - ino == EXT4_RESIZE_INO || (ino >= EXT4_FIRST_INO(sb) && ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)); } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 81b16c3eccd4..5c4c9af4aaf4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4242,7 +4242,8 @@ static int __ext4_get_inode_loc(struct inode *inode, int inodes_per_block, inode_offset; iloc->bh = NULL; - if (!ext4_valid_inum(sb, inode->i_ino)) + if (inode->i_ino < EXT4_ROOT_INO || + inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)) return -EFSCORRUPTED; iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4c5d9df1790a..25e078e839fb 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3713,6 +3713,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) } else { sbi->s_inode_size = le16_to_cpu(es->s_inode_size); sbi->s_first_ino = le32_to_cpu(es->s_first_ino); + if (sbi->s_first_ino < EXT4_GOOD_OLD_FIRST_INO) { + ext4_msg(sb, KERN_ERR, "invalid first ino: %u", + sbi->s_first_ino); + goto failed_mount; + } if ((sbi->s_inode_size < EXT4_GOOD_OLD_INODE_SIZE) || (!is_power_of_2(sbi->s_inode_size)) || (sbi->s_inode_size > blocksize)) { From eb13a42605ab8560b02fdeb117f119f327e4776d Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 17 Jun 2018 18:11:20 -0400 Subject: [PATCH 136/970] ext4: add more mount time checks of the superblock commit bfe0a5f47ada40d7984de67e59a7d3390b9b9ecc upstream. The kernel's ext4 mount-time checks were more permissive than e2fsprogs's libext2fs checks when opening a file system. The superblock is considered too insane for debugfs or e2fsck to operate on it, the kernel has no business trying to mount it. This will make file system fuzzing tools work harder, but the failure cases that they find will be more useful and be easier to evaluate. Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 25e078e839fb..fff215a1f1a9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3656,6 +3656,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) le32_to_cpu(es->s_log_block_size)); goto failed_mount; } + if (le32_to_cpu(es->s_log_cluster_size) > + (EXT4_MAX_CLUSTER_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) { + ext4_msg(sb, KERN_ERR, + "Invalid log cluster size: %u", + le32_to_cpu(es->s_log_cluster_size)); + goto failed_mount; + } if (le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) > (blocksize / 4)) { ext4_msg(sb, KERN_ERR, @@ -3794,13 +3801,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) "block size (%d)", clustersize, blocksize); goto failed_mount; } - if (le32_to_cpu(es->s_log_cluster_size) > - (EXT4_MAX_CLUSTER_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) { - ext4_msg(sb, KERN_ERR, - "Invalid log cluster size: %u", - le32_to_cpu(es->s_log_cluster_size)); - goto failed_mount; - } sbi->s_cluster_bits = le32_to_cpu(es->s_log_cluster_size) - le32_to_cpu(es->s_log_block_size); sbi->s_clusters_per_group = @@ -3821,10 +3821,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) } } else { if (clustersize != blocksize) { - ext4_warning(sb, "fragment/cluster size (%d) != " - "block size (%d)", clustersize, - blocksize); - clustersize = blocksize; + ext4_msg(sb, KERN_ERR, + "fragment/cluster size (%d) != " + "block size (%d)", clustersize, blocksize); + goto failed_mount; } if (sbi->s_blocks_per_group > blocksize * 8) { ext4_msg(sb, KERN_ERR, @@ -3878,6 +3878,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) ext4_blocks_count(es)); goto failed_mount; } + if ((es->s_first_data_block == 0) && (es->s_log_block_size == 0) && + (sbi->s_cluster_ratio == 1)) { + ext4_msg(sb, KERN_WARNING, "bad geometry: first data " + "block is 0 with a 1k block and cluster size"); + goto failed_mount; + } + blocks_count = (ext4_blocks_count(es) - le32_to_cpu(es->s_first_data_block) + EXT4_BLOCKS_PER_GROUP(sb) - 1); @@ -3913,6 +3920,14 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) ret = -ENOMEM; goto failed_mount; } + if (((u64)sbi->s_groups_count * sbi->s_inodes_per_group) != + le32_to_cpu(es->s_inodes_count)) { + ext4_msg(sb, KERN_ERR, "inodes count not valid: %u vs %llu", + le32_to_cpu(es->s_inodes_count), + ((u64)sbi->s_groups_count * sbi->s_inodes_per_group)); + ret = -EINVAL; + goto failed_mount; + } bgl_lock_init(sbi->s_blockgroup_lock); From 917692c9cdf5ea62e783d08dfef40bfb982a9dca Mon Sep 17 00:00:00 2001 From: Jon Derrick Date: Mon, 2 Jul 2018 18:45:18 -0400 Subject: [PATCH 137/970] ext4: check superblock mapped prior to committing commit a17712c8e4be4fa5404d20e9cd3b2b21eae7bc56 upstream. This patch attempts to close a hole leading to a BUG seen with hot removals during writes [1]. A block device (NVME namespace in this test case) is formatted to EXT4 without partitions. It's mounted and write I/O is run to a file, then the device is hot removed from the slot. The superblock attempts to be written to the drive which is no longer present. The typical chain of events leading to the BUG: ext4_commit_super() __sync_dirty_buffer() submit_bh() submit_bh_wbc() BUG_ON(!buffer_mapped(bh)); This fix checks for the superblock's buffer head being mapped prior to syncing. [1] https://www.spinics.net/lists/linux-ext4/msg56527.html Signed-off-by: Jon Derrick Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index fff215a1f1a9..41ef83471ea5 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4629,6 +4629,14 @@ static int ext4_commit_super(struct super_block *sb, int sync) if (!sbh || block_device_ejected(sb)) return error; + + /* + * The superblock bh should be mapped, but it might not be if the + * device was hot-removed. Not much we can do but fail the I/O. + */ + if (!buffer_mapped(sbh)) + return error; + /* * If the file system is mounted read-only, don't update the * superblock write time. This avoids updating the superblock From 2f1a56ef237d91bb649526292434fd60220d6b41 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Sun, 8 Jul 2018 14:23:19 +0300 Subject: [PATCH 138/970] mlxsw: spectrum: Forbid linking of VLAN devices to devices that have uppers Jiri Slaby noticed that the backport of upstream commit 25cc72a33835 ("mlxsw: spectrum: Forbid linking to devices that have uppers") to kernel 4.9.y introduced the same check twice in the same function instead of in two different places. Fix this by relocating one of the checks to its intended place, thus preventing unsupported configurations as described in the original commit. Fixes: 73ee5a73e75f ("mlxsw: spectrum: Forbid linking to devices that have uppers") Signed-off-by: Ido Schimmel Reported-by: Jiri Slaby Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index d50350c7adc4..22a5916e477e 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -4187,10 +4187,6 @@ static int mlxsw_sp_netdevice_port_upper_event(struct net_device *dev, if (netif_is_lag_port(dev) && is_vlan_dev(upper_dev) && !netif_is_lag_master(vlan_dev_real_dev(upper_dev))) return -EINVAL; - if (!info->linking) - break; - if (netdev_has_any_upper_dev(upper_dev)) - return -EINVAL; break; case NETDEV_CHANGEUPPER: upper_dev = info->upper_dev; @@ -4566,6 +4562,8 @@ static int mlxsw_sp_netdevice_vport_event(struct net_device *dev, return -EINVAL; if (!info->linking) break; + if (netdev_has_any_upper_dev(upper_dev)) + return -EINVAL; /* We can't have multiple VLAN interfaces configured on * the same port and being members in the same bridge. */ From 814b4302fda0c4fc09b023e18ca813d88c3350b6 Mon Sep 17 00:00:00 2001 From: Jason Andryuk Date: Fri, 22 Jun 2018 12:25:49 -0400 Subject: [PATCH 139/970] HID: i2c-hid: Fix "incomplete report" noise commit ef6eaf27274c0351f7059163918f3795da13199c upstream. Commit ac75a041048b ("HID: i2c-hid: fix size check and type usage") started writing messages when the ret_size is <= 2 from i2c_master_recv. However, my device i2c-DLL07D1 returns 2 for a short period of time (~0.5s) after I stop moving the pointing stick or touchpad. It varies, but you get ~50 messages each time which spams the log hard. [ 95.925055] i2c_hid i2c-DLL07D1:01: i2c_hid_get_input: incomplete report (83/2) This has also been observed with a i2c-ALP0017. [ 1781.266353] i2c_hid i2c-ALP0017:00: i2c_hid_get_input: incomplete report (30/2) Only print the message when ret_size is totally invalid and less than 2 to cut down on the log spam. Fixes: ac75a041048b ("HID: i2c-hid: fix size check and type usage") Reported-by: John Smith Cc: stable@vger.kernel.org Signed-off-by: Jason Andryuk Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman --- drivers/hid/i2c-hid/i2c-hid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hid/i2c-hid/i2c-hid.c b/drivers/hid/i2c-hid/i2c-hid.c index 2548c5dbdc75..00bce002b357 100644 --- a/drivers/hid/i2c-hid/i2c-hid.c +++ b/drivers/hid/i2c-hid/i2c-hid.c @@ -477,7 +477,7 @@ static void i2c_hid_get_input(struct i2c_hid *ihid) return; } - if ((ret_size > size) || (ret_size <= 2)) { + if ((ret_size > size) || (ret_size < 2)) { dev_err(&ihid->client->dev, "%s: incomplete report (%d/%d)\n", __func__, size, ret_size); return; From 82e360cd6fb2055cb178f92934eb71d511b1aa47 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Fri, 29 Jun 2018 17:08:44 -0500 Subject: [PATCH 140/970] HID: hiddev: fix potential Spectre v1 commit 4f65245f2d178b9cba48350620d76faa4a098841 upstream. uref->field_index, uref->usage_index, finfo.field_index and cinfo.index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/hid/usbhid/hiddev.c:473 hiddev_ioctl_usage() warn: potential spectre issue 'report->field' (local cap) drivers/hid/usbhid/hiddev.c:477 hiddev_ioctl_usage() warn: potential spectre issue 'field->usage' (local cap) drivers/hid/usbhid/hiddev.c:757 hiddev_ioctl() warn: potential spectre issue 'report->field' (local cap) drivers/hid/usbhid/hiddev.c:801 hiddev_ioctl() warn: potential spectre issue 'hid->collection' (local cap) Fix this by sanitizing such structure fields before using them to index report->field, field->usage and hid->collection Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman --- drivers/hid/usbhid/hiddev.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c index 700145b15088..b59b15d4caa9 100644 --- a/drivers/hid/usbhid/hiddev.c +++ b/drivers/hid/usbhid/hiddev.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "usbhid.h" #ifdef CONFIG_USB_DYNAMIC_MINORS @@ -478,10 +479,14 @@ static noinline int hiddev_ioctl_usage(struct hiddev *hiddev, unsigned int cmd, if (uref->field_index >= report->maxfield) goto inval; + uref->field_index = array_index_nospec(uref->field_index, + report->maxfield); field = report->field[uref->field_index]; if (uref->usage_index >= field->maxusage) goto inval; + uref->usage_index = array_index_nospec(uref->usage_index, + field->maxusage); uref->usage_code = field->usage[uref->usage_index].hid; @@ -508,6 +513,8 @@ static noinline int hiddev_ioctl_usage(struct hiddev *hiddev, unsigned int cmd, if (uref->field_index >= report->maxfield) goto inval; + uref->field_index = array_index_nospec(uref->field_index, + report->maxfield); field = report->field[uref->field_index]; @@ -761,6 +768,8 @@ static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) if (finfo.field_index >= report->maxfield) break; + finfo.field_index = array_index_nospec(finfo.field_index, + report->maxfield); field = report->field[finfo.field_index]; memset(&finfo, 0, sizeof(finfo)); @@ -801,6 +810,8 @@ static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) if (cinfo.index >= hid->maxcollection) break; + cinfo.index = array_index_nospec(cinfo.index, + hid->maxcollection); cinfo.type = hid->collection[cinfo.index].type; cinfo.usage = hid->collection[cinfo.index].usage; From 4a30c12542290f1def08b9ef0d677c024c500589 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Mon, 2 Jul 2018 16:59:37 -0700 Subject: [PATCH 141/970] HID: debug: check length before copy_to_user() commit 717adfdaf14704fd3ec7fa2c04520c0723247eac upstream. If our length is greater than the size of the buffer, we overflow the buffer Cc: stable@vger.kernel.org Signed-off-by: Daniel Rosenberg Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman --- drivers/hid/hid-debug.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/hid/hid-debug.c b/drivers/hid/hid-debug.c index acfb522a432a..29423691c105 100644 --- a/drivers/hid/hid-debug.c +++ b/drivers/hid/hid-debug.c @@ -1152,6 +1152,8 @@ static ssize_t hid_debug_events_read(struct file *file, char __user *buffer, goto out; if (list->tail > list->head) { len = list->tail - list->head; + if (len > count) + len = count; if (copy_to_user(buffer + ret, &list->hid_debug_buf[list->head], len)) { ret = -EFAULT; @@ -1161,6 +1163,8 @@ static ssize_t hid_debug_events_read(struct file *file, char __user *buffer, list->head += len; } else { len = HID_DEBUG_BUFSIZE - list->head; + if (len > count) + len = count; if (copy_to_user(buffer, &list->hid_debug_buf[list->head], len)) { ret = -EFAULT; @@ -1168,7 +1172,9 @@ static ssize_t hid_debug_events_read(struct file *file, char __user *buffer, } list->head = 0; ret += len; - goto copy_rest; + count -= len; + if (count > 0) + goto copy_rest; } } From 6989d4079d60c20753413d1fc01547e098417cc7 Mon Sep 17 00:00:00 2001 From: Waldemar Rymarkiewicz Date: Thu, 14 Jun 2018 15:56:08 +0200 Subject: [PATCH 142/970] PM / OPP: Update voltage in case freq == old_freq commit c5c2a97b3ac7d1ec19e7cff9e38caca6afefc3de upstream. This commit fixes a rare but possible case when the clk rate is updated without update of the regulator voltage. At boot up, CPUfreq checks if the system is running at the right freq. This is a sanity check in case a bootloader set clk rate that is outside of freq table present with cpufreq core. In such cases system can be unstable so better to change it to a freq that is preset in freq-table. The CPUfreq takes next freq that is >= policy->cur and this is our target_freq that needs to be set now. dev_pm_opp_set_rate(dev, target_freq) checks the target_freq and the old_freq (a current rate). If these are equal it returns early. If not, it searches for OPP (old_opp) that fits best to old_freq (not listed in the table) and updates old_freq (!). Here, we can end up with old_freq = old_opp.rate = target_freq, which is not handled in _generic_set_opp_regulator(). It's supposed to update voltage only when freq > old_freq || freq > old_freq. if (freq > old_freq) { ret = _set_opp_voltage(dev, reg, new_supply); [...] if (freq < old_freq) { ret = _set_opp_voltage(dev, reg, new_supply); if (ret) It results in, no voltage update while clk rate is updated. Example: freq-table = { 1000MHz 1.15V 666MHZ 1.10V 333MHz 1.05V } boot-up-freq = 800MHz # not listed in freq-table freq = target_freq = 1GHz old_freq = 800Mhz old_opp = _find_freq_ceil(opp_table, &old_freq); #(old_freq is modified!) old_freq = 1GHz Fixes: 6a0712f6f199 ("PM / OPP: Add dev_pm_opp_set_rate()") Cc: 4.6+ # v4.6+ Signed-off-by: Waldemar Rymarkiewicz Signed-off-by: Viresh Kumar Signed-off-by: Greg Kroah-Hartman --- drivers/base/power/opp/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index a7c5b79371a7..23ee46a0c78c 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -651,7 +651,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq) rcu_read_unlock(); /* Scaling up? Scale voltage before frequency */ - if (freq > old_freq) { + if (freq >= old_freq) { ret = _set_opp_voltage(dev, reg, u_volt, u_volt_min, u_volt_max); if (ret) From b5d7d7d919f1708f2645739d37dc19747ac0ab5b Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Sun, 8 Apr 2018 23:35:28 +0200 Subject: [PATCH 143/970] Kbuild: fix # escaping in .cmd files for future Make MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 9564a8cf422d7b58f6e857e3546d346fa970191e upstream. I tried building using a freshly built Make (4.2.1-69-g8a731d1), but already the objtool build broke with orc_dump.c: In function ‘orc_dump’: orc_dump.c:106:2: error: ‘elf_getshnum’ is deprecated [-Werror=deprecated-declarations] if (elf_getshdrnum(elf, &nr_sections)) { Turns out that with that new Make, the backslash was not removed, so cpp didn't see a #include directive, grep found nothing, and -DLIBELF_USE_DEPRECATED was wrongly put in CFLAGS. Now, that new Make behaviour is documented in their NEWS file: * WARNING: Backward-incompatibility! Number signs (#) appearing inside a macro reference or function invocation no longer introduce comments and should not be escaped with backslashes: thus a call such as: foo := $(shell echo '#') is legal. Previously the number sign needed to be escaped, for example: foo := $(shell echo '\#') Now this latter will resolve to "\#". If you want to write makefiles portable to both versions, assign the number sign to a variable: C := \# foo := $(shell echo '$C') This was claimed to be fixed in 3.81, but wasn't, for some reason. To detect this change search for 'nocomment' in the .FEATURES variable. This also fixes up the two make-cmd instances to replace # with $(pound) rather than with \#. There might very well be other places that need similar fixup in preparation for whatever future Make release contains the above change, but at least this builds an x86_64 defconfig with the new make. Link: https://bugzilla.kernel.org/show_bug.cgi?id=197847 Cc: Randy Dunlap Signed-off-by: Rasmus Villemoes Signed-off-by: Masahiro Yamada Signed-off-by: Greg Kroah-Hartman --- scripts/Kbuild.include | 5 +++-- tools/build/Build.include | 5 +++-- tools/objtool/Makefile | 2 +- tools/scripts/Makefile.include | 2 ++ 4 files changed, 9 insertions(+), 5 deletions(-) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 179219845dfc..63774307a751 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -8,6 +8,7 @@ squote := ' empty := space := $(empty) $(empty) space_escape := _-_SPACE_-_ +pound := \# ### # Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o @@ -241,11 +242,11 @@ endif # Replace >$< with >$$< to preserve $ when reloading the .cmd file # (needed for make) -# Replace >#< with >\#< to avoid starting a comment in the .cmd file +# Replace >#< with >$(pound)< to avoid starting a comment in the .cmd file # (needed for make) # Replace >'< with >'\''< to be able to enclose the whole string in '...' # (needed for the shell) -make-cmd = $(call escsq,$(subst \#,\\\#,$(subst $$,$$$$,$(cmd_$(1))))) +make-cmd = $(call escsq,$(subst $(pound),$$(pound),$(subst $$,$$$$,$(cmd_$(1))))) # Find any prerequisites that is newer than target or that does not exist. # PHONY targets skipped in both cases. diff --git a/tools/build/Build.include b/tools/build/Build.include index 1dcb95e76f70..b8165545ddf6 100644 --- a/tools/build/Build.include +++ b/tools/build/Build.include @@ -12,6 +12,7 @@ # Convenient variables comma := , squote := ' +pound := \# ### # Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o @@ -43,11 +44,11 @@ echo-cmd = $(if $($(quiet)cmd_$(1)),\ ### # Replace >$< with >$$< to preserve $ when reloading the .cmd file # (needed for make) -# Replace >#< with >\#< to avoid starting a comment in the .cmd file +# Replace >#< with >$(pound)< to avoid starting a comment in the .cmd file # (needed for make) # Replace >'< with >'\''< to be able to enclose the whole string in '...' # (needed for the shell) -make-cmd = $(call escsq,$(subst \#,\\\#,$(subst $$,$$$$,$(cmd_$(1))))) +make-cmd = $(call escsq,$(subst $(pound),$$(pound),$(subst $$,$$$$,$(cmd_$(1))))) ### # Find any prerequisites that is newer than target or that does not exist. diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile index e6acc281dd37..8ae824dbfca3 100644 --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -35,7 +35,7 @@ CFLAGS += -Wall -Werror $(WARNINGS) -fomit-frame-pointer -O2 -g $(INCLUDES) LDFLAGS += -lelf $(LIBSUBCMD) # Allow old libelf to be used: -elfshdr := $(shell echo '\#include ' | $(CC) $(CFLAGS) -x c -E - | grep elf_getshdr) +elfshdr := $(shell echo '$(pound)include ' | $(CC) $(CFLAGS) -x c -E - | grep elf_getshdr) CFLAGS += $(if $(elfshdr),,-DLIBELF_USE_DEPRECATED) AWK = awk diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include index 19edc1a7a232..7ea4438b801d 100644 --- a/tools/scripts/Makefile.include +++ b/tools/scripts/Makefile.include @@ -92,3 +92,5 @@ ifneq ($(silent),1) QUIET_INSTALL = @printf ' INSTALL %s\n' $1; endif endif + +pound := \# From d96a0d3cd5395aa0e4deb20e06e1ec2d723c426c Mon Sep 17 00:00:00 2001 From: Brad Love Date: Tue, 6 Mar 2018 14:15:34 -0500 Subject: [PATCH 144/970] media: cx25840: Use subdev host data for PLL override commit 3ee9bc12342cf546313d300808ff47d7dbb8e7db upstream. The cx25840 driver currently configures 885, 887, and 888 using default divisors for each chip. This check to see if the cx23885 driver has passed the cx25840 a non-default clock rate for a specific chip. If a cx23885 board has left clk_freq at 0, the clock default values will be used to configure the PLLs. This patch only has effect on 888 boards who set clk_freq to 25M. Signed-off-by: Brad Love Signed-off-by: Mauro Carvalho Chehab Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- drivers/media/i2c/cx25840/cx25840-core.c | 28 +++++++++++++++++++----- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/drivers/media/i2c/cx25840/cx25840-core.c b/drivers/media/i2c/cx25840/cx25840-core.c index d558ed3e59c6..cc5666050282 100644 --- a/drivers/media/i2c/cx25840/cx25840-core.c +++ b/drivers/media/i2c/cx25840/cx25840-core.c @@ -467,8 +467,13 @@ static void cx23885_initialize(struct i2c_client *client) { DEFINE_WAIT(wait); struct cx25840_state *state = to_state(i2c_get_clientdata(client)); + u32 clk_freq = 0; struct workqueue_struct *q; + /* cx23885 sets hostdata to clk_freq pointer */ + if (v4l2_get_subdev_hostdata(&state->sd)) + clk_freq = *((u32 *)v4l2_get_subdev_hostdata(&state->sd)); + /* * Come out of digital power down * The CX23888, at least, needs this, otherwise registers aside from @@ -504,8 +509,13 @@ static void cx23885_initialize(struct i2c_client *client) * 50.0 MHz * (0xb + 0xe8ba26/0x2000000)/4 = 5 * 28.636363 MHz * 572.73 MHz before post divide */ - /* HVR1850 or 50MHz xtal */ - cx25840_write(client, 0x2, 0x71); + if (clk_freq == 25000000) { + /* 888/ImpactVCBe or 25Mhz xtal */ + ; /* nothing to do */ + } else { + /* HVR1850 or 50MHz xtal */ + cx25840_write(client, 0x2, 0x71); + } cx25840_write4(client, 0x11c, 0x01d1744c); cx25840_write4(client, 0x118, 0x00000416); cx25840_write4(client, 0x404, 0x0010253e); @@ -548,9 +558,15 @@ static void cx23885_initialize(struct i2c_client *client) /* HVR1850 */ switch (state->id) { case CX23888_AV: - /* 888/HVR1250 specific */ - cx25840_write4(client, 0x10c, 0x13333333); - cx25840_write4(client, 0x108, 0x00000515); + if (clk_freq == 25000000) { + /* 888/ImpactVCBe or 25MHz xtal */ + cx25840_write4(client, 0x10c, 0x01b6db7b); + cx25840_write4(client, 0x108, 0x00000512); + } else { + /* 888/HVR1250 or 50MHz xtal */ + cx25840_write4(client, 0x10c, 0x13333333); + cx25840_write4(client, 0x108, 0x00000515); + } break; default: cx25840_write4(client, 0x10c, 0x002be2c9); @@ -580,7 +596,7 @@ static void cx23885_initialize(struct i2c_client *client) * 368.64 MHz before post divide * 122.88 MHz / 0xa = 12.288 MHz */ - /* HVR1850 or 50MHz xtal */ + /* HVR1850 or 50MHz xtal or 25MHz xtal */ cx25840_write4(client, 0x114, 0x017dbf48); cx25840_write4(client, 0x110, 0x000a030e); break; From 6cfbbdd2bcc950e21c293019a1f9c40458405891 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 7 Jun 2018 17:09:29 -0700 Subject: [PATCH 145/970] mm, page_alloc: do not break __GFP_THISNODE by zonelist reset commit 7810e6781e0fcbca78b91cf65053f895bf59e85f upstream. In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for allocations that can ignore memory policies. The zonelist is obtained from current CPU's node. This is a problem for __GFP_THISNODE allocations that want to allocate on a different node, e.g. because the allocating thread has been migrated to a different CPU. This has been observed to break SLAB in our 4.4-based kernel, because there it relies on __GFP_THISNODE working as intended. If a slab page is put on wrong node's list, then further list manipulations may corrupt the list because page_to_nid() is used to determine which node's list_lock should be locked and thus we may take a wrong lock and race. Current SLAB implementation seems to be immune by luck thanks to commit 511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") but there may be others assuming that __GFP_THISNODE works as promised. We can fix it by simply removing the zonelist reset completely. There is actually no reason to reset it, because memory policies and cpusets don't affect the zonelist choice in the first place. This was different when commit 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their own restricted zonelists. We might consider this for 4.17 although I don't know if there's anything currently broken. SLAB is currently not affected, but in kernels older than 4.7 that don't yet have 511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on arbitrary node") it is. That's at least 4.4 LTS. Older ones I'll have to check. So stable backports should be more important, but will have to be reviewed carefully, as the code went through many changes. BTW I think that also the ac->preferred_zoneref reset is currently useless if we don't also reset ac->nodemask from a mempolicy to NULL first (which we probably should for the OOM victims etc?), but I would leave that for a separate patch. Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") Acked-by: Mel Gorman Cc: Michal Hocko Cc: David Rientjes Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 94018ea5f935..28240ce475d6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3642,7 +3642,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * orientated. */ if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { - ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, ac->high_zoneidx, ac->nodemask); } From 0758c35b535fc166e743b05287bf395e2241fa60 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Thu, 17 Nov 2016 11:24:20 -0800 Subject: [PATCH 146/970] dm bufio: avoid sleeping while holding the dm_bufio lock commit 9ea61cac0b1ad0c09022f39fd97e9b99a2cfc2dc upstream. We've seen in-field reports showing _lots_ (18 in one case, 41 in another) of tasks all sitting there blocked on: mutex_lock+0x4c/0x68 dm_bufio_shrink_count+0x38/0x78 shrink_slab.part.54.constprop.65+0x100/0x464 shrink_zone+0xa8/0x198 In the two cases analyzed, we see one task that looks like this: Workqueue: kverityd verity_prefetch_io __switch_to+0x9c/0xa8 __schedule+0x440/0x6d8 schedule+0x94/0xb4 schedule_timeout+0x204/0x27c schedule_timeout_uninterruptible+0x44/0x50 wait_iff_congested+0x9c/0x1f0 shrink_inactive_list+0x3a0/0x4cc shrink_lruvec+0x418/0x5cc shrink_zone+0x88/0x198 try_to_free_pages+0x51c/0x588 __alloc_pages_nodemask+0x648/0xa88 __get_free_pages+0x34/0x7c alloc_buffer+0xa4/0x144 __bufio_new+0x84/0x278 dm_bufio_prefetch+0x9c/0x154 verity_prefetch_io+0xe8/0x10c process_one_work+0x240/0x424 worker_thread+0x2fc/0x424 kthread+0x10c/0x114 ...and that looks to be the one holding the mutex. The problem has been reproduced on fairly easily: 0. Be running Chrome OS w/ verity enabled on the root filesystem 1. Pick test patch: http://crosreview.com/412360 2. Install launchBalloons.sh and balloon.arm from http://crbug.com/468342 ...that's just a memory stress test app. 3. On a 4GB rk3399 machine, run nice ./launchBalloons.sh 4 900 100000 ...that tries to eat 4 * 900 MB of memory and keep accessing. 4. Login to the Chrome web browser and restore many tabs With that, I've seen printouts like: DOUG: long bufio 90758 ms ...and stack trace always show's we're in dm_bufio_prefetch(). The problem is that we try to allocate memory with GFP_NOIO while we're holding the dm_bufio lock. Instead we should be using GFP_NOWAIT. Using GFP_NOIO can cause us to sleep while holding the lock and that causes the above problems. The current behavior explained by David Rientjes: It will still try reclaim initially because __GFP_WAIT (or __GFP_KSWAPD_RECLAIM) is set by GFP_NOIO. This is the cause of contention on dm_bufio_lock() that the thread holds. You want to pass GFP_NOWAIT instead of GFP_NOIO to alloc_buffer() when holding a mutex that can be contended by a concurrent slab shrinker (if count_objects didn't use a trylock, this pattern would trivially deadlock). This change significantly increases responsiveness of the system while in this state. It makes a real difference because it unblocks kswapd. In the bug report analyzed, kswapd was hung: kswapd0 D ffffffc000204fd8 0 72 2 0x00000000 Call trace: [] __switch_to+0x9c/0xa8 [] __schedule+0x440/0x6d8 [] schedule+0x94/0xb4 [] schedule_preempt_disabled+0x28/0x44 [] __mutex_lock_slowpath+0x120/0x1ac [] mutex_lock+0x4c/0x68 [] dm_bufio_shrink_count+0x38/0x78 [] shrink_slab.part.54.constprop.65+0x100/0x464 [] shrink_zone+0xa8/0x198 [] balance_pgdat+0x328/0x508 [] kswapd+0x424/0x51c [] kthread+0x10c/0x114 [] ret_from_fork+0x10/0x40 By unblocking kswapd memory pressure should be reduced. Suggested-by: David Rientjes Reviewed-by: Guenter Roeck Signed-off-by: Douglas Anderson Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-bufio.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index 35fd57fdeba9..6deb1feff5c7 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -824,7 +824,8 @@ static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client * dm-bufio is resistant to allocation failures (it just keeps * one buffer reserved in cases all the allocations fail). * So set flags to not try too hard: - * GFP_NOIO: don't recurse into the I/O layer + * GFP_NOWAIT: don't wait; if we need to sleep we'll release our + * mutex and wait ourselves. * __GFP_NORETRY: don't retry and rather return failure * __GFP_NOMEMALLOC: don't use emergency reserves * __GFP_NOWARN: don't print a warning in case of failure @@ -834,7 +835,7 @@ static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client */ while (1) { if (dm_bufio_cache_size_latch != 1) { - b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + b = alloc_buffer(c, GFP_NOWAIT | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); if (b) return b; } From 34d2fe724aeed1d61dd524d1062cf428f5bca2f1 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 23 Nov 2016 17:04:00 -0500 Subject: [PATCH 147/970] dm bufio: drop the lock when doing GFP_NOIO allocation commit 41c73a49df31151f4ff868f28fe4f129f113fa2c upstream. If the first allocation attempt using GFP_NOWAIT fails, drop the lock and retry using GFP_NOIO allocation (lock is dropped because the allocation can take some time). Note that we won't do GFP_NOIO allocation when we loop for the second time, because the lock shouldn't be dropped between __wait_for_free_buffer and __get_unclaimed_buffer. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-bufio.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index 6deb1feff5c7..57b01bc3c70b 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -819,6 +819,7 @@ enum new_flag { static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client *c, enum new_flag nf) { struct dm_buffer *b; + bool tried_noio_alloc = false; /* * dm-bufio is resistant to allocation failures (it just keeps @@ -843,6 +844,15 @@ static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client if (nf == NF_PREFETCH) return NULL; + if (dm_bufio_cache_size_latch != 1 && !tried_noio_alloc) { + dm_bufio_unlock(c); + b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); + dm_bufio_lock(c); + if (b) + return b; + tried_noio_alloc = true; + } + if (!list_empty(&c->reserved_buffers)) { b = list_entry(c->reserved_buffers.next, struct dm_buffer, lru_list); From 9d1304f5816db4cc96cab6e5cceb75054cb841c5 Mon Sep 17 00:00:00 2001 From: Martin Kaiser Date: Mon, 18 Jun 2018 22:41:03 +0200 Subject: [PATCH 148/970] mtd: rawnand: mxc: set spare area size register explicitly commit 3f77f244d8ec28e3a0a81240ffac7d626390060c upstream. The v21 version of the NAND flash controller contains a Spare Area Size Register (SPAS) at offset 0x10. Its setting defaults to the maximum spare area size of 218 bytes. The size that is set in this register is used by the controller when it calculates the ECC bytes internally in hardware. Usually, this register is updated from settings in the IIM fuses when the system is booting from NAND flash. For other boot media, however, the SPAS register remains at the default setting, which may not work for the particular flash chip on the board. The same goes for flash chips whose configuration cannot be set in the IIM fuses (e.g. chips with 2k sector size and 128 bytes spare area size can't be configured in the IIM fuses on imx25 systems). Set the SPAS register explicitly during the preset operation. Derive the register value from mtd->oobsize that was detected during probe by decoding the flash chip's ID bytes. While at it, rename the define for the spare area register's offset to NFC_V21_RSLTSPARE_AREA. The register at offset 0x10 on v1 controllers is different from the register on v21 controllers. Fixes: d484018 ("mtd: mxc_nand: set NFC registers after reset") Cc: stable@vger.kernel.org Signed-off-by: Martin Kaiser Reviewed-by: Sascha Hauer Reviewed-by: Miquel Raynal Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/nand/mxc_nand.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/nand/mxc_nand.c b/drivers/mtd/nand/mxc_nand.c index 0c84ee80e5b6..5c44eb57885b 100644 --- a/drivers/mtd/nand/mxc_nand.c +++ b/drivers/mtd/nand/mxc_nand.c @@ -48,7 +48,7 @@ #define NFC_V1_V2_CONFIG (host->regs + 0x0a) #define NFC_V1_V2_ECC_STATUS_RESULT (host->regs + 0x0c) #define NFC_V1_V2_RSLTMAIN_AREA (host->regs + 0x0e) -#define NFC_V1_V2_RSLTSPARE_AREA (host->regs + 0x10) +#define NFC_V21_RSLTSPARE_AREA (host->regs + 0x10) #define NFC_V1_V2_WRPROT (host->regs + 0x12) #define NFC_V1_UNLOCKSTART_BLKADDR (host->regs + 0x14) #define NFC_V1_UNLOCKEND_BLKADDR (host->regs + 0x16) @@ -1121,6 +1121,9 @@ static void preset_v2(struct mtd_info *mtd) writew(config1, NFC_V1_V2_CONFIG1); /* preset operation */ + /* spare area size in 16-bit half-words */ + writew(mtd->oobsize / 2, NFC_V21_RSLTSPARE_AREA); + /* Unlock the internal RAM Buffer */ writew(0x2, NFC_V1_V2_CONFIG); From 4779184af75437c83c7ac5f257fa0db1fe1ed9b7 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 23 Nov 2016 16:52:01 -0500 Subject: [PATCH 149/970] dm bufio: don't take the lock in dm_bufio_shrink_count commit d12067f428c037b4575aaeb2be00847fc214c24a upstream. dm_bufio_shrink_count() is called from do_shrink_slab to find out how many freeable objects are there. The reported value doesn't have to be precise, so we don't need to take the dm-bufio lock. Suggested-by: David Rientjes Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-bufio.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index 57b01bc3c70b..c837defb5e4d 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -1598,19 +1598,11 @@ dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) static unsigned long dm_bufio_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { - struct dm_bufio_client *c; - unsigned long count; - unsigned long retain_target; + struct dm_bufio_client *c = container_of(shrink, struct dm_bufio_client, shrinker); + unsigned long count = READ_ONCE(c->n_buffers[LIST_CLEAN]) + + READ_ONCE(c->n_buffers[LIST_DIRTY]); + unsigned long retain_target = get_retain_buffers(c); - c = container_of(shrink, struct dm_bufio_client, shrinker); - if (sc->gfp_mask & __GFP_FS) - dm_bufio_lock(c); - else if (!dm_bufio_trylock(c)) - return 0; - - count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY]; - retain_target = get_retain_buffers(c); - dm_bufio_unlock(c); return (count < retain_target) ? 0 : (count - retain_target); } From c2f163e35f2eab7d38c5e3fb14d5a7e902fb6d8a Mon Sep 17 00:00:00 2001 From: Tokunori Ikegami Date: Wed, 30 May 2018 18:32:27 +0900 Subject: [PATCH 150/970] mtd: cfi_cmdset_0002: Change definition naming to retry write operation commit 85a82e28b023de9b259a86824afbd6ba07bd6475 upstream. The definition can be used for other program and erase operations also. So change the naming to MAX_RETRIES from MAX_WORD_RETRIES. Signed-off-by: Tokunori Ikegami Reviewed-by: Joakim Tjernlund Cc: Chris Packham Cc: Brian Norris Cc: David Woodhouse Cc: Boris Brezillon Cc: Marek Vasut Cc: Richard Weinberger Cc: Cyrille Pitchen Cc: linux-mtd@lists.infradead.org Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 33d025e42793..1065388e6287 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -42,7 +42,7 @@ #define AMD_BOOTLOC_BUG #define FORCE_WORD_WRITE 0 -#define MAX_WORD_RETRIES 3 +#define MAX_RETRIES 3 #define SST49LF004B 0x0060 #define SST49LF040B 0x0050 @@ -1643,7 +1643,7 @@ static int __xipram do_write_oneword(struct map_info *map, struct flchip *chip, map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */ - if (++retry_cnt <= MAX_WORD_RETRIES) + if (++retry_cnt <= MAX_RETRIES) goto retry; ret = -EIO; @@ -2102,7 +2102,7 @@ static int do_panic_write_oneword(struct map_info *map, struct flchip *chip, map_write(map, CMD(0xF0), chip->start); /* FIXME - should have reset delay before continuing */ - if (++retry_cnt <= MAX_WORD_RETRIES) + if (++retry_cnt <= MAX_RETRIES) goto retry; ret = -EIO; From ed1746148b3eabb8d9019dfdbbfc8f59f2917980 Mon Sep 17 00:00:00 2001 From: Tokunori Ikegami Date: Wed, 30 May 2018 18:32:28 +0900 Subject: [PATCH 151/970] mtd: cfi_cmdset_0002: Change erase functions to retry for error commit 45f75b8a919a4255f52df454f1ffdee0e42443b2 upstream. For the word write functions it is retried for error. But it is not implemented to retry for the erase functions. To make sure for the erase functions change to retry as same. This is needed to prevent the flash erase error caused only once. It was caused by the error case of chip_good() in the do_erase_oneblock(). Also it was confirmed on the MACRONIX flash device MX29GL512FHT2I-11G. But the error issue behavior is not able to reproduce at this moment. The flash controller is parallel Flash interface integrated on BCM53003. Signed-off-by: Tokunori Ikegami Reviewed-by: Joakim Tjernlund Cc: Chris Packham Cc: Brian Norris Cc: David Woodhouse Cc: Boris Brezillon Cc: Marek Vasut Cc: Richard Weinberger Cc: Cyrille Pitchen Cc: linux-mtd@lists.infradead.org Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 1065388e6287..ccfbb41ed322 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2237,6 +2237,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip) unsigned long int adr; DECLARE_WAITQUEUE(wait, current); int ret = 0; + int retry_cnt = 0; adr = cfi->addr_unlock1; @@ -2254,6 +2255,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip) ENABLE_VPP(map); xip_disable(map, chip, adr); + retry: cfi_send_gen_cmd(0xAA, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x55, cfi->addr_unlock2, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x80, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); @@ -2308,6 +2310,9 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip) map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */ + if (++retry_cnt <= MAX_RETRIES) + goto retry; + ret = -EIO; } @@ -2327,6 +2332,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip, unsigned long timeo = jiffies + HZ; DECLARE_WAITQUEUE(wait, current); int ret = 0; + int retry_cnt = 0; adr += chip->start; @@ -2344,6 +2350,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip, ENABLE_VPP(map); xip_disable(map, chip, adr); + retry: cfi_send_gen_cmd(0xAA, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x55, cfi->addr_unlock2, chip->start, map, cfi, cfi->device_type, NULL); cfi_send_gen_cmd(0x80, cfi->addr_unlock1, chip->start, map, cfi, cfi->device_type, NULL); @@ -2401,6 +2408,9 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip, map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */ + if (++retry_cnt <= MAX_RETRIES) + goto retry; + ret = -EIO; } From a0239d83e1cb60de5e78452d4708c083b9e3dcbe Mon Sep 17 00:00:00 2001 From: Tokunori Ikegami Date: Wed, 30 May 2018 18:32:29 +0900 Subject: [PATCH 152/970] mtd: cfi_cmdset_0002: Change erase functions to check chip good only commit 79ca484b613041ca223f74b34608bb6f5221724b upstream. Currently the functions use to check both chip ready and good. But the chip ready is not enough to check the operation status. So change this to check the chip good instead of this. About the retry functions to make sure the error handling remain it. Signed-off-by: Tokunori Ikegami Reviewed-by: Joakim Tjernlund Cc: Chris Packham Cc: Brian Norris Cc: David Woodhouse Cc: Boris Brezillon Cc: Marek Vasut Cc: Richard Weinberger Cc: Cyrille Pitchen Cc: linux-mtd@lists.infradead.org Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/chips/cfi_cmdset_0002.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index ccfbb41ed322..de35a2a362f9 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -2292,12 +2292,13 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip) chip->erase_suspended = 0; } - if (chip_ready(map, adr)) + if (chip_good(map, adr, map_word_ff(map))) break; if (time_after(jiffies, timeo)) { printk(KERN_WARNING "MTD %s(): software timeout\n", __func__ ); + ret = -EIO; break; } @@ -2305,15 +2306,15 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip) UDELAY(map, chip, adr, 1000000/HZ); } /* Did we succeed? */ - if (!chip_good(map, adr, map_word_ff(map))) { + if (ret) { /* reset on all failures. */ map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */ - if (++retry_cnt <= MAX_RETRIES) + if (++retry_cnt <= MAX_RETRIES) { + ret = 0; goto retry; - - ret = -EIO; + } } chip->state = FL_READY; @@ -2387,7 +2388,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip, chip->erase_suspended = 0; } - if (chip_ready(map, adr)) { + if (chip_good(map, adr, map_word_ff(map))) { xip_enable(map, chip, adr); break; } @@ -2396,6 +2397,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip, xip_enable(map, chip, adr); printk(KERN_WARNING "MTD %s(): software timeout\n", __func__ ); + ret = -EIO; break; } @@ -2403,15 +2405,15 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip, UDELAY(map, chip, adr, 1000000/HZ); } /* Did we succeed? */ - if (!chip_good(map, adr, map_word_ff(map))) { + if (ret) { /* reset on all failures. */ map_write( map, CMD(0xF0), chip->start ); /* FIXME - should have reset delay before continuing */ - if (++retry_cnt <= MAX_RETRIES) + if (++retry_cnt <= MAX_RETRIES) { + ret = 0; goto retry; - - ret = -EIO; + } } chip->state = FL_READY; From 1712fae9489eaf16de9592e654d2d92a863fc06e Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Mon, 25 Jun 2018 17:22:00 +0200 Subject: [PATCH 153/970] netfilter: nf_log: don't hold nf_log_mutex during user access commit ce00bf07cc95a57cd20b208e02b3c2604e532ae8 upstream. The old code would indefinitely block other users of nf_log_mutex if a userspace access in proc_dostring() blocked e.g. due to a userfaultfd region. Fix it by moving proc_dostring() out of the locked region. This is a followup to commit 266d07cb1c9a ("netfilter: nf_log: fix sleeping function called from invalid context"), which changed this code from using rcu_read_lock() to taking nf_log_mutex. Fixes: 266d07cb1c9a ("netfilter: nf_log: fix sleeping function calle[...]") Signed-off-by: Jann Horn Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_log.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c index ffb9e8ada899..e02fed784cd0 100644 --- a/net/netfilter/nf_log.c +++ b/net/netfilter/nf_log.c @@ -444,14 +444,17 @@ static int nf_log_proc_dostring(struct ctl_table *table, int write, rcu_assign_pointer(net->nf.nf_loggers[tindex], logger); mutex_unlock(&nf_log_mutex); } else { + struct ctl_table tmp = *table; + + tmp.data = buf; mutex_lock(&nf_log_mutex); logger = nft_log_dereference(net->nf.nf_loggers[tindex]); if (!logger) - table->data = "NONE"; + strlcpy(buf, "NONE", sizeof(buf)); else - table->data = logger->name; - r = proc_dostring(table, write, buffer, lenp, ppos); + strlcpy(buf, logger->name, sizeof(buf)); mutex_unlock(&nf_log_mutex); + r = proc_dostring(&tmp, write, buffer, lenp, ppos); } return r; From e31cd420e1be4e701292078e0840df0fe4c0cd8b Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 5 Jun 2018 12:36:30 +0300 Subject: [PATCH 154/970] staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write() commit 1376b0a2160319125c3a2822e8c09bd283cd8141 upstream. There is a '>' vs '<' typo so this loop is a no-op. Fixes: d35dcc89fc93 ("staging: comedi: quatech_daqp_cs: fix daqp_ao_insn_write()") Signed-off-by: Dan Carpenter Reviewed-by: Ian Abbott Signed-off-by: Greg Kroah-Hartman --- drivers/staging/comedi/drivers/quatech_daqp_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/comedi/drivers/quatech_daqp_cs.c b/drivers/staging/comedi/drivers/quatech_daqp_cs.c index 802f51e46405..171960568356 100644 --- a/drivers/staging/comedi/drivers/quatech_daqp_cs.c +++ b/drivers/staging/comedi/drivers/quatech_daqp_cs.c @@ -642,7 +642,7 @@ static int daqp_ao_insn_write(struct comedi_device *dev, /* Make sure D/A update mode is direct update */ outb(0, dev->iobase + DAQP_AUX_REG); - for (i = 0; i > insn->n; i++) { + for (i = 0; i < insn->n; i++) { unsigned int val = data[i]; int ret; From 060744011e93679f03932f050619744be895b772 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 11 Jul 2018 16:26:46 +0200 Subject: [PATCH 155/970] Linux 4.9.112 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index b10646531fcd..c4544293db10 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 111 +SUBLEVEL = 112 EXTRAVERSION = NAME = Roaring Lionus From 93e54f40c89336adc28b4125529bb75d4b565a70 Mon Sep 17 00:00:00 2001 From: Scott Bauer Date: Tue, 25 Jul 2017 10:27:06 -0600 Subject: [PATCH 156/970] nvme: validate admin queue before unquiesce commit 7dd1ab163c17e11473a65b11f7e748db30618ebb upstream. With a misbehaving controller it's possible we'll never enter the live state and create an admin queue. When we fail out of reset work it's possible we failed out early enough without setting up the admin queue. We tear down queues after a failed reset, but needed to do some more sanitization. Fixes 443bd90f2cca: "nvme: host: unquiesce queue in nvme_kill_queues()" [ 189.650995] nvme nvme1: pci function 0000:0b:00.0 [ 317.680055] nvme nvme0: Device not ready; aborting reset [ 317.680183] nvme nvme0: Removing after probe failure status: -19 [ 317.681258] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 317.681397] general protection fault: 0000 [#1] SMP KASAN [ 317.682984] CPU: 3 PID: 477 Comm: kworker/3:2 Not tainted 4.13.0-rc1+ #5 [ 317.683112] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016 [ 317.683284] Workqueue: events nvme_remove_dead_ctrl_work [nvme] [ 317.683398] task: ffff8803b0990000 task.stack: ffff8803c2ef0000 [ 317.683516] RIP: 0010:blk_mq_unquiesce_queue+0x2b/0xa0 [ 317.683614] RSP: 0018:ffff8803c2ef7d40 EFLAGS: 00010282 [ 317.683716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff1006fbdcde3 [ 317.683847] RDX: 0000000000000038 RSI: 1ffff1006f5a9245 RDI: 0000000000000000 [ 317.683978] RBP: ffff8803c2ef7d58 R08: 1ffff1007bcdc974 R09: 0000000000000000 [ 317.684108] R10: 1ffff1007bcdc975 R11: 0000000000000000 R12: 00000000000001c0 [ 317.684239] R13: ffff88037ad49228 R14: ffff88037ad492d0 R15: ffff88037ad492e0 [ 317.684371] FS: 0000000000000000(0000) GS:ffff8803de6c0000(0000) knlGS:0000000000000000 [ 317.684519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 317.684627] CR2: 0000002d1860c000 CR3: 000000045b40d000 CR4: 00000000003406e0 [ 317.684758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 317.684888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 317.685018] Call Trace: [ 317.685084] nvme_kill_queues+0x4d/0x170 [nvme_core] [ 317.685185] nvme_remove_dead_ctrl_work+0x3a/0x90 [nvme] [ 317.685289] process_one_work+0x771/0x1170 [ 317.685372] worker_thread+0xde/0x11e0 [ 317.685452] ? pci_mmcfg_check_reserved+0x110/0x110 [ 317.685550] kthread+0x2d3/0x3d0 [ 317.685617] ? process_one_work+0x1170/0x1170 [ 317.685704] ? kthread_create_on_node+0xc0/0xc0 [ 317.685785] ret_from_fork+0x25/0x30 [ 317.685798] Code: 0f 1f 44 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54 4c 8d a7 c0 01 00 00 53 48 89 fb 4c 89 e2 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 75 50 48 8b bb c0 01 00 00 e8 33 8a f9 00 0f ba b3 [ 317.685872] RIP: blk_mq_unquiesce_queue+0x2b/0xa0 RSP: ffff8803c2ef7d40 [ 317.685908] ---[ end trace a3f8704150b1e8b4 ]--- Signed-off-by: Scott Bauer Signed-off-by: Christoph Hellwig [ adapted for 4.9: added check around blk_mq_start_hw_queues() call instead of upstream blk_mq_unquiesce_queue() ] Fixes: 4aae4388165a2611fa42 ("nvme: fix hang in remove path") Signed-off-by: Simon Veith Signed-off-by: David Woodhouse Signed-off-by: Amit Shah Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/host/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index c823e9346389..979c6ecc6446 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2042,7 +2042,8 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl) mutex_lock(&ctrl->namespaces_mutex); /* Forcibly start all queues to avoid having stuck requests */ - blk_mq_start_hw_queues(ctrl->admin_q); + if (ctrl->admin_q) + blk_mq_start_hw_queues(ctrl->admin_q); list_for_each_entry(ns, &ctrl->namespaces, list) { /* From 473b33dd615f35c0bad13737def7751fb089ce60 Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Fri, 22 Jun 2018 10:55:45 -0700 Subject: [PATCH 157/970] MIPS: Call dump_stack() from show_regs() commit 5a267832c2ec47b2dad0fdb291a96bb5b8869315 upstream. The generic nmi_cpu_backtrace() function calls show_regs() when a struct pt_regs is available, and dump_stack() otherwise. If we were to make use of the generic nmi_cpu_backtrace() with MIPS' current implementation of show_regs() this would mean that we see only register data with no accompanying stack information, in contrast with our current implementation which calls dump_stack() regardless of whether register state is available. In preparation for making use of the generic nmi_cpu_backtrace() to implement arch_trigger_cpumask_backtrace(), have our implementation of show_regs() call dump_stack() and drop the explicit dump_stack() call in arch_dump_stack() which is invoked by arch_trigger_cpumask_backtrace(). This will allow the output we produce to remain the same after a later patch switches to using nmi_cpu_backtrace(). It may mean that we produce extra stack output in other uses of show_regs(), but this: 1) Seems harmless. 2) Is good for consistency between arch_trigger_cpumask_backtrace() and other users of show_regs(). 3) Matches the behaviour of the ARM & PowerPC architectures. Marked for stable back to v4.9 as a prerequisite of the following patch "MIPS: Call dump_stack() from show_regs()". Signed-off-by: Paul Burton Patchwork: https://patchwork.linux-mips.org/patch/19596/ Cc: James Hogan Cc: Ralf Baechle Cc: Huacai Chen Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/process.c | 4 ++-- arch/mips/kernel/traps.c | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c index ebb575c4231b..cb1e9c184b5a 100644 --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -641,8 +641,8 @@ static void arch_dump_stack(void *info) if (regs) show_regs(regs); - - dump_stack(); + else + dump_stack(); } void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index bb1d9ff1be5c..8e0749665934 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -351,6 +351,7 @@ static void __show_regs(const struct pt_regs *regs) void show_regs(struct pt_regs *regs) { __show_regs((struct pt_regs *)regs); + dump_stack(); } void show_registers(struct pt_regs *regs) From 92cb1184ae064dfd9d15575dc64b3b0a88a198ad Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Thu, 5 Jul 2018 14:37:52 -0700 Subject: [PATCH 158/970] MIPS: Fix ioremap() RAM check commit 523402fa9101090c91d2033b7ebdfdcf65880488 upstream. We currently attempt to check whether a physical address range provided to __ioremap() may be in use by the page allocator by examining the value of PageReserved for each page in the region - lowmem pages not marked reserved are presumed to be in use by the page allocator, and requests to ioremap them fail. The way we check this has been broken since commit 92923ca3aace ("mm: meminit: only set page reserved in the memblock region"), because memblock will typically not have any knowledge of non-RAM pages and therefore those pages will not have the PageReserved flag set. Thus when we attempt to ioremap a region outside of RAM we incorrectly fail believing that the region is RAM that may be in use. In most cases ioremap() on MIPS will take a fast-path to use the unmapped kseg1 or xkphys virtual address spaces and never hit this path, so the only way to hit it is for a MIPS32 system to attempt to ioremap() an address range in lowmem with flags other than _CACHE_UNCACHED. Perhaps the most straightforward way to do this is using ioremap_uncached_accelerated(), which is how the problem was discovered. Fix this by making use of walk_system_ram_range() to test the address range provided to __ioremap() against only RAM pages, rather than all lowmem pages. This means that if we have a lowmem I/O region, which is very common for MIPS systems, we're free to ioremap() address ranges within it. A nice bonus is that the test is no longer limited to lowmem. The approach here matches the way x86 performed the same test after commit c81c8a1eeede ("x86, ioremap: Speed up check for RAM pages") until x86 moved towards a slightly more complicated check using walk_mem_res() for unrelated reasons with commit 0e4c12b45aa8 ("x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages"). Signed-off-by: Paul Burton Reported-by: Serge Semin Tested-by: Serge Semin Fixes: 92923ca3aace ("mm: meminit: only set page reserved in the memblock region") Cc: James Hogan Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.2+ Patchwork: https://patchwork.linux-mips.org/patch/19786/ Signed-off-by: Greg Kroah-Hartman --- arch/mips/mm/ioremap.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/arch/mips/mm/ioremap.c b/arch/mips/mm/ioremap.c index 1f189627440f..0dbcd90b19b5 100644 --- a/arch/mips/mm/ioremap.c +++ b/arch/mips/mm/ioremap.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -97,6 +98,20 @@ static int remap_area_pages(unsigned long address, phys_addr_t phys_addr, return error; } +static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages, + void *arg) +{ + unsigned long i; + + for (i = 0; i < nr_pages; i++) { + if (pfn_valid(start_pfn + i) && + !PageReserved(pfn_to_page(start_pfn + i))) + return 1; + } + + return 0; +} + /* * Generic mapping function (not visible outside): */ @@ -115,8 +130,8 @@ static int remap_area_pages(unsigned long address, phys_addr_t phys_addr, void __iomem * __ioremap(phys_addr_t phys_addr, phys_addr_t size, unsigned long flags) { + unsigned long offset, pfn, last_pfn; struct vm_struct * area; - unsigned long offset; phys_addr_t last_addr; void * addr; @@ -136,18 +151,16 @@ void __iomem * __ioremap(phys_addr_t phys_addr, phys_addr_t size, unsigned long return (void __iomem *) CKSEG1ADDR(phys_addr); /* - * Don't allow anybody to remap normal RAM that we're using.. + * Don't allow anybody to remap RAM that may be allocated by the page + * allocator, since that could lead to races & data clobbering. */ - if (phys_addr < virt_to_phys(high_memory)) { - char *t_addr, *t_end; - struct page *page; - - t_addr = __va(phys_addr); - t_end = t_addr + (size - 1); - - for(page = virt_to_page(t_addr); page <= virt_to_page(t_end); page++) - if(!PageReserved(page)) - return NULL; + pfn = PFN_DOWN(phys_addr); + last_pfn = PFN_DOWN(last_addr); + if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL, + __ioremap_check_ram) == 1) { + WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n", + &phys_addr, &last_addr); + return NULL; } /* From 35479c22ff2159387f93b368c5d416ddc1cc437d Mon Sep 17 00:00:00 2001 From: x00270170 Date: Tue, 3 Jul 2018 15:06:27 +0800 Subject: [PATCH 159/970] mmc: dw_mmc: fix card threshold control configuration commit 7a6b9f4d601dfce8cb68f0dcfd834270280e31e6 upstream. Card write threshold control is supposed to be set since controller version 2.80a for data write in HS400 mode and data read in HS200/HS400/SDR104 mode. However the current code returns without configuring it in the case of data writing in HS400 mode. Meanwhile the patch fixes that the current code goes to 'disable' when doing data reading in HS400 mode. Fixes: 7e4bf1bc9543 ("mmc: dw_mmc: add the card write threshold for HS400 mode") Signed-off-by: Qing Xia Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/dw_mmc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index d382dbd44635..1a1501fde010 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -981,8 +981,8 @@ static void dw_mci_ctrl_thld(struct dw_mci *host, struct mmc_data *data) * It's used when HS400 mode is enabled. */ if (data->flags & MMC_DATA_WRITE && - !(host->timing != MMC_TIMING_MMC_HS400)) - return; + host->timing != MMC_TIMING_MMC_HS400) + goto disable; if (data->flags & MMC_DATA_WRITE) enable = SDMMC_CARD_WR_THR_EN; @@ -990,7 +990,8 @@ static void dw_mci_ctrl_thld(struct dw_mci *host, struct mmc_data *data) enable = SDMMC_CARD_RD_THR_EN; if (host->timing != MMC_TIMING_MMC_HS200 && - host->timing != MMC_TIMING_UHS_SDR104) + host->timing != MMC_TIMING_UHS_SDR104 && + host->timing != MMC_TIMING_MMC_HS400) goto disable; blksz_depth = blksz / (1 << host->data_shift); From 2823345cd41f30d39483aebb9e3eb5341a2d4f06 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Sat, 7 Jul 2018 04:16:33 +0200 Subject: [PATCH 160/970] ibmasm: don't write out of bounds in read handler commit a0341fc1981a950c1e902ab901e98f60e0e243f3 upstream. This read handler had a lot of custom logic and wrote outside the bounds of the provided buffer. This could lead to kernel and userspace memory corruption. Just use simple_read_from_buffer() with a stack buffer. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn Signed-off-by: Greg Kroah-Hartman --- drivers/misc/ibmasm/ibmasmfs.c | 27 +++------------------------ 1 file changed, 3 insertions(+), 24 deletions(-) diff --git a/drivers/misc/ibmasm/ibmasmfs.c b/drivers/misc/ibmasm/ibmasmfs.c index 520f58439080..65ad7e5261bf 100644 --- a/drivers/misc/ibmasm/ibmasmfs.c +++ b/drivers/misc/ibmasm/ibmasmfs.c @@ -507,35 +507,14 @@ static int remote_settings_file_close(struct inode *inode, struct file *file) static ssize_t remote_settings_file_read(struct file *file, char __user *buf, size_t count, loff_t *offset) { void __iomem *address = (void __iomem *)file->private_data; - unsigned char *page; - int retval; int len = 0; unsigned int value; - - if (*offset < 0) - return -EINVAL; - if (count == 0 || count > 1024) - return 0; - if (*offset != 0) - return 0; - - page = (unsigned char *)__get_free_page(GFP_KERNEL); - if (!page) - return -ENOMEM; + char lbuf[20]; value = readl(address); - len = sprintf(page, "%d\n", value); + len = snprintf(lbuf, sizeof(lbuf), "%d\n", value); - if (copy_to_user(buf, page, len)) { - retval = -EFAULT; - goto exit; - } - *offset += len; - retval = len; - -exit: - free_page((unsigned long)page); - return retval; + return simple_read_from_buffer(buf, count, offset, lbuf, len); } static ssize_t remote_settings_file_write(struct file *file, const char __user *ubuff, size_t count, loff_t *offset) From 51bacd848cc1bcd8fb82d413cf27f407973536f5 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Tue, 26 Jun 2018 20:56:54 +0900 Subject: [PATCH 161/970] ata: Fix ZBC_OUT command block check commit b320a0a9f23c98f21631eb27bcbbca91c79b1c6e upstream. The block (LBA) specified must not exceed the last addressable LBA, which is dev->nr_sectors - 1. So fix the correct check is "if (block >= dev->n_sectors)" and not "if (block > dev->n_sectords)". Additionally, the asc/ascq to return for an LBA that is not a zone start LBA should be ILLEGAL REQUEST, regardless if the bad LBA is out of range. Reported-by: David Butterfield Signed-off-by: Damien Le Moal Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- drivers/ata/libata-scsi.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index fb2c00fce8f9..ad7e1d0dd6ab 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3772,8 +3772,13 @@ static unsigned int ata_scsi_zbc_out_xlat(struct ata_queued_cmd *qc) */ goto invalid_param_len; } - if (block > dev->n_sectors) - goto out_of_range; + if (block >= dev->n_sectors) { + /* + * Block must be a valid zone ID (a zone start LBA). + */ + fp = 2; + goto invalid_fld; + } all = cdb[14] & 0x1; @@ -3804,10 +3809,6 @@ static unsigned int ata_scsi_zbc_out_xlat(struct ata_queued_cmd *qc) invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, 0xff); return 1; - out_of_range: - /* "Logical Block Address out of range" */ - ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x21, 0x00); - return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); From 3f205d7a89d96d50fcf9a44754c180908388da25 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Tue, 26 Jun 2018 20:56:55 +0900 Subject: [PATCH 162/970] ata: Fix ZBC_OUT all bit handling commit 6edf1d4cb0acde3a0a5dac849f33031bd7abb7b1 upstream. If the ALL bit is set in the ZBC_OUT command, the command zone ID field (block) should be ignored. Reported-by: David Butterfield Signed-off-by: Damien Le Moal Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- drivers/ata/libata-scsi.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index ad7e1d0dd6ab..a3d60ccafd9a 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3772,7 +3772,14 @@ static unsigned int ata_scsi_zbc_out_xlat(struct ata_queued_cmd *qc) */ goto invalid_param_len; } - if (block >= dev->n_sectors) { + + all = cdb[14] & 0x1; + if (all) { + /* + * Ignore the block address (zone ID) as defined by ZBC. + */ + block = 0; + } else if (block >= dev->n_sectors) { /* * Block must be a valid zone ID (a zone start LBA). */ @@ -3780,8 +3787,6 @@ static unsigned int ata_scsi_zbc_out_xlat(struct ata_queued_cmd *qc) goto invalid_fld; } - all = cdb[14] & 0x1; - if (ata_ncq_enabled(qc->dev) && ata_fpdma_zac_mgmt_out_supported(qc->dev)) { tf->protocol = ATA_PROT_NCQ_NODATA; From 63c003e3fff7e3f450375bdbf34a8465cb824d16 Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Mon, 2 Jul 2018 19:27:13 -0700 Subject: [PATCH 163/970] vmw_balloon: fix inflation with batching commit 90d72ce079791399ac255c75728f3c9e747b093d upstream. Embarrassingly, the recent fix introduced worse problem than it solved, causing the balloon not to inflate. The VM informed the hypervisor that the pages for lock/unlock are sitting in the wrong address, as it used the page that is used the uninitialized page variable. Fixes: b23220fe054e9 ("vmw_balloon: fixing double free when batching mode is off") Cc: stable@vger.kernel.org Reviewed-by: Xavier Deguillard Signed-off-by: Nadav Amit Signed-off-by: Greg Kroah-Hartman --- drivers/misc/vmw_balloon.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c index fe90b7e04427..5e047bfc0cc4 100644 --- a/drivers/misc/vmw_balloon.c +++ b/drivers/misc/vmw_balloon.c @@ -467,7 +467,7 @@ static int vmballoon_send_batched_lock(struct vmballoon *b, unsigned int num_pages, bool is_2m_pages, unsigned int *target) { unsigned long status; - unsigned long pfn = page_to_pfn(b->page); + unsigned long pfn = PHYS_PFN(virt_to_phys(b->batch_page)); STATS_INC(b->stats.lock[is_2m_pages]); @@ -515,7 +515,7 @@ static bool vmballoon_send_batched_unlock(struct vmballoon *b, unsigned int num_pages, bool is_2m_pages, unsigned int *target) { unsigned long status; - unsigned long pfn = page_to_pfn(b->page); + unsigned long pfn = PHYS_PFN(virt_to_phys(b->batch_page)); STATS_INC(b->stats.unlock[is_2m_pages]); From f510cc3a2f314b299b6dd7bce93a11dfe7adaabe Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Sun, 1 Jul 2018 12:15:46 +0200 Subject: [PATCH 164/970] ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS commit 240630e61870e62e39a97225048f9945848fa5f5 upstream. There have been several reports of LPM related hard freezes about once a day on multiple Lenovo 50 series models. Strange enough these reports where not disk model specific as LPM issues usually are and some users with the exact same disk + laptop where seeing them while other users where not seeing these issues. It turns out that enabling LPM triggers a firmware bug somewhere, which has been fixed in later BIOS versions. This commit adds a new ahci_broken_lpm() function and a new ATA_FLAG_NO_LPM for dealing with this. The ahci_broken_lpm() function contains DMI match info for the 4 models which are known to be affected by this and the DMI BIOS date field for known good BIOS versions. If the BIOS date is older then the one in the table LPM will be disabled and a warning will be printed. Note the BIOS dates are for known good versions, some older versions may work too, but we don't know for sure, the table is using dates from BIOS versions for which users have confirmed that upgrading to that version makes the problem go away. Unfortunately I've been unable to get hold of the reporter who reported that BIOS version 2.35 fixed the problems on the W541 for him. I've been able to verify the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION from an older dmidecode, but I don't know the exact BIOS date as reported in the DMI. Lenovo keeps a changelog with dates in their release notes, but the dates there are the release dates not the build dates which are in DMI. So I've chosen to set the date to which we compare to one day past the release date of the 2.34 BIOS. I plan to fix this with a follow up commit once I've the necessary info. Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman --- drivers/ata/ahci.c | 59 +++++++++++++++++++++++++++++++++++++++ drivers/ata/libata-core.c | 3 ++ include/linux/libata.h | 1 + 3 files changed, 63 insertions(+) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 4d4b5f607b81..faa91f8a17a5 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1260,6 +1260,59 @@ static bool ahci_broken_suspend(struct pci_dev *pdev) return strcmp(buf, dmi->driver_data) < 0; } +static bool ahci_broken_lpm(struct pci_dev *pdev) +{ + static const struct dmi_system_id sysids[] = { + /* Various Lenovo 50 series have LPM issues with older BIOSen */ + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X250"), + }, + .driver_data = "20180406", /* 1.31 */ + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L450"), + }, + .driver_data = "20180420", /* 1.28 */ + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T450s"), + }, + .driver_data = "20180315", /* 1.33 */ + }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad W541"), + }, + /* + * Note date based on release notes, 2.35 has been + * reported to be good, but I've been unable to get + * a hold of the reporter to get the DMI BIOS date. + * TODO: fix this. + */ + .driver_data = "20180310", /* 2.35 */ + }, + { } /* terminate list */ + }; + const struct dmi_system_id *dmi = dmi_first_match(sysids); + int year, month, date; + char buf[9]; + + if (!dmi) + return false; + + dmi_get_date(DMI_BIOS_DATE, &year, &month, &date); + snprintf(buf, sizeof(buf), "%04d%02d%02d", year, month, date); + + return strcmp(buf, dmi->driver_data) < 0; +} + static bool ahci_broken_online(struct pci_dev *pdev) { #define ENCODE_BUSDEVFN(bus, slot, func) \ @@ -1626,6 +1679,12 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) "quirky BIOS, skipping spindown on poweroff\n"); } + if (ahci_broken_lpm(pdev)) { + pi.flags |= ATA_FLAG_NO_LPM; + dev_warn(&pdev->dev, + "BIOS update required for Link Power Management support\n"); + } + if (ahci_broken_suspend(pdev)) { hpriv->flags |= AHCI_HFLAG_NO_SUSPEND; dev_warn(&pdev->dev, diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 82c59a143a14..73d636d35961 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -2385,6 +2385,9 @@ int ata_dev_configure(struct ata_device *dev) (id[ATA_ID_SATA_CAPABILITY] & 0xe) == 0x2) dev->horkage |= ATA_HORKAGE_NOLPM; + if (ap->flags & ATA_FLAG_NO_LPM) + dev->horkage |= ATA_HORKAGE_NOLPM; + if (dev->horkage & ATA_HORKAGE_NOLPM) { ata_dev_warn(dev, "LPM support broken, forcing max_power\n"); dev->link->ap->target_lpm_policy = ATA_LPM_MAX_POWER; diff --git a/include/linux/libata.h b/include/linux/libata.h index 616eef4d81ea..df58b01e6962 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -208,6 +208,7 @@ enum { ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */ /* (doesn't imply presence) */ ATA_FLAG_SATA = (1 << 1), + ATA_FLAG_NO_LPM = (1 << 2), /* host not happy with LPM */ ATA_FLAG_NO_LOG_PAGE = (1 << 5), /* do not issue log page read */ ATA_FLAG_NO_ATAPI = (1 << 6), /* No ATAPI support */ ATA_FLAG_PIO_DMA = (1 << 7), /* PIO cmds via DMA */ From 4c73f193b320d8463c280ef8b4df314fb291f76b Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 4 Jul 2018 12:29:38 +0300 Subject: [PATCH 165/970] USB: serial: ch341: fix type promotion bug in ch341_control_in() commit e33eab9ded328ccc14308afa51b5be7cbe78d30b upstream. The "r" variable is an int and "bufsize" is an unsigned int so the comparison is type promoted to unsigned. If usb_control_msg() returns a negative that is treated as a high positive value and the error handling doesn't work. Fixes: 2d5a9c72d0c4 ("USB: serial: ch341: fix control-message error handling") Signed-off-by: Dan Carpenter Cc: stable Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/ch341.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/serial/ch341.c b/drivers/usb/serial/ch341.c index e98590aab633..9a2c0c76d11b 100644 --- a/drivers/usb/serial/ch341.c +++ b/drivers/usb/serial/ch341.c @@ -118,7 +118,7 @@ static int ch341_control_in(struct usb_device *dev, r = usb_control_msg(dev, usb_rcvctrlpipe(dev, 0), request, USB_TYPE_VENDOR | USB_RECIP_DEVICE | USB_DIR_IN, value, index, buf, bufsize, DEFAULT_TIMEOUT); - if (r < bufsize) { + if (r < (int)bufsize) { if (r >= 0) { dev_err(&dev->dev, "short control message received (%d < %u)\n", From 4115045f9588e794c091c5fbc2a85f2e1f03e260 Mon Sep 17 00:00:00 2001 From: Olli Salonen Date: Wed, 4 Jul 2018 14:07:42 +0300 Subject: [PATCH 166/970] USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick commit 367b160fe4717c14a2a978b6f9ffb75a7762d3ed upstream. There are two versions of the Qivicon Zigbee stick in circulation. This adds the second USB ID to the cp210x driver. Signed-off-by: Olli Salonen Cc: stable Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c index 6f2c77a7c08e..c2b120021443 100644 --- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -146,6 +146,7 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x10C4, 0x8977) }, /* CEL MeshWorks DevKit Device */ { USB_DEVICE(0x10C4, 0x8998) }, /* KCF Technologies PRN */ { USB_DEVICE(0x10C4, 0x89A4) }, /* CESINEL FTBC Flexible Thyristor Bridge Controller */ + { USB_DEVICE(0x10C4, 0x89FB) }, /* Qivicon ZigBee USB Radio Stick */ { USB_DEVICE(0x10C4, 0x8A2A) }, /* HubZ dual ZigBee and Z-Wave dongle */ { USB_DEVICE(0x10C4, 0x8A5E) }, /* CEL EM3588 ZigBee USB Stick Long Range */ { USB_DEVICE(0x10C4, 0x8B34) }, /* Qivicon ZigBee USB Radio Stick */ From 7e7c86d2757067f52fa1a81f5d472f5662bd0e7f Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 4 Jul 2018 17:02:16 +0200 Subject: [PATCH 167/970] USB: serial: keyspan_pda: fix modem-status error handling commit 01b3cdfca263a17554f7b249d20a247b2a751521 upstream. Fix broken modem-status error handling which could lead to bits of slab data leaking to user space. Fixes: 3b36a8fd6777 ("usb: fix uninitialized variable warning in keyspan_pda") Cc: stable # 2.6.27 Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/keyspan_pda.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/keyspan_pda.c b/drivers/usb/serial/keyspan_pda.c index d2dab2a341b8..d17f7872f95a 100644 --- a/drivers/usb/serial/keyspan_pda.c +++ b/drivers/usb/serial/keyspan_pda.c @@ -373,8 +373,10 @@ static int keyspan_pda_get_modem_info(struct usb_serial *serial, 3, /* get pins */ USB_TYPE_VENDOR|USB_RECIP_INTERFACE|USB_DIR_IN, 0, 0, data, 1, 2000); - if (rc >= 0) + if (rc == 1) *value = *data; + else if (rc >= 0) + rc = -EIO; kfree(data); return rc; From 0fdef3142f99430b94f5d394ca2b181d20d87e77 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Fri, 6 Jul 2018 17:12:56 +0200 Subject: [PATCH 168/970] USB: yurex: fix out-of-bounds uaccess in read handler commit f1e255d60ae66a9f672ff9a207ee6cd8e33d2679 upstream. In general, accessing userspace memory beyond the length of the supplied buffer in VFS read/write handlers can lead to both kernel memory corruption (via kernel_read()/kernel_write(), which can e.g. be triggered via sys_splice()) and privilege escalation inside userspace. Fix it by using simple_read_from_buffer() instead of custom logic. Fixes: 6bc235a2e24a ("USB: add driver for Meywa-Denki & Kayac YUREX") Signed-off-by: Jann Horn Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/misc/yurex.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/drivers/usb/misc/yurex.c b/drivers/usb/misc/yurex.c index 54e53ac4c08f..f36968ee2e70 100644 --- a/drivers/usb/misc/yurex.c +++ b/drivers/usb/misc/yurex.c @@ -406,8 +406,7 @@ static ssize_t yurex_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos) { struct usb_yurex *dev; - int retval = 0; - int bytes_read = 0; + int len = 0; char in_buffer[20]; unsigned long flags; @@ -415,26 +414,16 @@ static ssize_t yurex_read(struct file *file, char __user *buffer, size_t count, mutex_lock(&dev->io_mutex); if (!dev->interface) { /* already disconnected */ - retval = -ENODEV; - goto exit; + mutex_unlock(&dev->io_mutex); + return -ENODEV; } spin_lock_irqsave(&dev->lock, flags); - bytes_read = snprintf(in_buffer, 20, "%lld\n", dev->bbu); + len = snprintf(in_buffer, 20, "%lld\n", dev->bbu); spin_unlock_irqrestore(&dev->lock, flags); - - if (*ppos < bytes_read) { - if (copy_to_user(buffer, in_buffer + *ppos, bytes_read - *ppos)) - retval = -EFAULT; - else { - retval = bytes_read - *ppos; - *ppos += bytes_read; - } - } - -exit: mutex_unlock(&dev->io_mutex); - return retval; + + return simple_read_from_buffer(buffer, count, ppos, in_buffer, len); } static ssize_t yurex_write(struct file *file, const char __user *user_buffer, From 7675c7b78e34ece3829e85308f5e37f2fb62c11a Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 4 Jul 2018 17:02:17 +0200 Subject: [PATCH 169/970] USB: serial: mos7840: fix status-register error handling commit 794744abfffef8b1f3c0c8a4896177d6d13d653d upstream. Add missing transfer-length sanity check to the status-register completion handler to avoid leaking bits of uninitialised slab data to user space. Fixes: 3f5429746d91 ("USB: Moschip 7840 USB-Serial Driver") Cc: stable # 2.6.19 Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/mos7840.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c index 6baacf64b21e..03d63bad6be4 100644 --- a/drivers/usb/serial/mos7840.c +++ b/drivers/usb/serial/mos7840.c @@ -482,6 +482,9 @@ static void mos7840_control_callback(struct urb *urb) } dev_dbg(dev, "%s urb buffer size is %d\n", __func__, urb->actual_length); + if (urb->actual_length < 1) + goto out; + dev_dbg(dev, "%s mos7840_port->MsrLsr is %d port %d\n", __func__, mos7840_port->MsrLsr, mos7840_port->port_num); data = urb->transfer_buffer; From cac38ab7d4ff3b97eea76b7a59e22ced0dbb85a6 Mon Sep 17 00:00:00 2001 From: Nico Sneck Date: Mon, 2 Jul 2018 19:26:07 +0300 Subject: [PATCH 170/970] usb: quirks: add delay quirks for Corsair Strafe commit bba57eddadda936c94b5dccf73787cb9e159d0a5 upstream. Corsair Strafe appears to suffer from the same issues as the Corsair Strafe RGB. Apply the same quirks (control message delay and init delay) that the RGB version has to 1b1c:1b15. With these quirks in place the keyboard works correctly upon booting the system, and no longer requires reattaching the device. Signed-off-by: Nico Sneck Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/quirks.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 40ce175655e6..99f67764765f 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -231,6 +231,10 @@ static const struct usb_device_id usb_quirk_list[] = { /* Corsair K70 RGB */ { USB_DEVICE(0x1b1c, 0x1b13), .driver_info = USB_QUIRK_DELAY_INIT }, + /* Corsair Strafe */ + { USB_DEVICE(0x1b1c, 0x1b15), .driver_info = USB_QUIRK_DELAY_INIT | + USB_QUIRK_DELAY_CTRL_MSG }, + /* Corsair Strafe RGB */ { USB_DEVICE(0x1b1c, 0x1b20), .driver_info = USB_QUIRK_DELAY_INIT | USB_QUIRK_DELAY_CTRL_MSG }, From 268476c9d3cbd9af1e879536e99a045301d7c83b Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 4 Jul 2018 12:48:53 +0300 Subject: [PATCH 171/970] xhci: xhci-mem: off by one in xhci_stream_id_to_ring() commit 313db3d6488bb03b61b99de9dbca061f1fd838e1 upstream. The > should be >= here so that we don't read one element beyond the end of the ep->stream_info->stream_rings[] array. Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") Signed-off-by: Dan Carpenter Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index 48bdab4fdc8f..7199e400fbac 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -650,7 +650,7 @@ struct xhci_ring *xhci_stream_id_to_ring( if (!ep->stream_info) return NULL; - if (stream_id > ep->stream_info->num_streams) + if (stream_id >= ep->stream_info->num_streams) return NULL; return ep->stream_info->stream_rings[stream_id]; } From 16387eb51caa0828de1d78e605523a6d738a7f3f Mon Sep 17 00:00:00 2001 From: Tomasz Kramkowski Date: Tue, 14 Feb 2017 23:14:33 +0000 Subject: [PATCH 172/970] HID: usbhid: add quirk for innomedia INNEX GENESIS/ATARI adapter commit 9547837bdccb4af127528b36a73377150658b4ac upstream. The (1292:4745) Innomedia INNEX GENESIS/ATARI adapter needs HID_QUIRK_MULTI_INPUT to split the device up into two controllers instead of inputs from both being merged into one. Signed-off-by: Tomasz Kramkowski Acked-By: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Greg Kroah-Hartman --- drivers/hid/hid-ids.h | 3 +++ drivers/hid/usbhid/hid-quirks.c | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 9347b37a1303..019ee9181f2b 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -549,6 +549,9 @@ #define USB_VENDOR_ID_IRTOUCHSYSTEMS 0x6615 #define USB_DEVICE_ID_IRTOUCH_INFRARED_USB 0x0070 +#define USB_VENDOR_ID_INNOMEDIA 0x1292 +#define USB_DEVICE_ID_INNEX_GENESIS_ATARI 0x4745 + #define USB_VENDOR_ID_ITE 0x048d #define USB_DEVICE_ID_ITE_LENOVO_YOGA 0x8386 #define USB_DEVICE_ID_ITE_LENOVO_YOGA2 0x8350 diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c index 1916f80a692d..617ae294a318 100644 --- a/drivers/hid/usbhid/hid-quirks.c +++ b/drivers/hid/usbhid/hid-quirks.c @@ -170,6 +170,7 @@ static const struct hid_blacklist { { USB_VENDOR_ID_MULTIPLE_1781, USB_DEVICE_ID_RAPHNET_4NES4SNES_OLD, HID_QUIRK_MULTI_INPUT }, { USB_VENDOR_ID_DRACAL_RAPHNET, USB_DEVICE_ID_RAPHNET_2NES2SNES, HID_QUIRK_MULTI_INPUT }, { USB_VENDOR_ID_DRACAL_RAPHNET, USB_DEVICE_ID_RAPHNET_4NES4SNES, HID_QUIRK_MULTI_INPUT }, + { USB_VENDOR_ID_INNOMEDIA, USB_DEVICE_ID_INNEX_GENESIS_ATARI, HID_QUIRK_MULTI_INPUT }, { 0, 0 } }; From d2c7c52431819aa05d76fae77bb3f95dd0955da1 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 3 Jul 2018 17:10:19 -0700 Subject: [PATCH 173/970] Fix up non-directory creation in SGID directories commit 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7 upstream. sgid directories have special semantics, making newly created files in the directory belong to the group of the directory, and newly created subdirectories will also become sgid. This is historically used for group-shared directories. But group directories writable by non-group members should not imply that such non-group members can magically join the group, so make sure to clear the sgid bit on non-directories for non-members (but remember that sgid without group execute means "mandatory locking", just to confuse things even more). Reported-by: Jann Horn Cc: Andy Lutomirski Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/inode.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/inode.c b/fs/inode.c index 920aa0b1c6b0..2071ff5343c5 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2003,8 +2003,14 @@ void inode_init_owner(struct inode *inode, const struct inode *dir, inode->i_uid = current_fsuid(); if (dir && dir->i_mode & S_ISGID) { inode->i_gid = dir->i_gid; + + /* Directories are special, and always inherit S_ISGID */ if (S_ISDIR(mode)) mode |= S_ISGID; + else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) && + !in_group_p(inode->i_gid) && + !capable_wrt_inode_uidgid(dir, CAP_FSETID)) + mode &= ~S_ISGID; } else inode->i_gid = current_fsgid(); inode->i_mode = mode; From bc193057d4885693374fd286e6e39b421a0ed758 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Wed, 27 Jun 2018 07:25:32 +0100 Subject: [PATCH 174/970] ALSA: hda - Handle pm failure during hotplug commit aaa23f86001bdb82d2f937c5c7bce0a1e11a6c5b upstream. Obtaining the runtime pm wakeref can fail, especially in a hotplug scenario where i915.ko has been unloaded. If we do not catch the failure, we end up with an unbalanced pm. v2 additions by tiwai: hdmi_present_sense() checks the return value and handle only a negative error case and bails out only if it's really still suspended. Also, snd_hda_power_down() is called at the error path so that the refcount is balanced. Along with it, the spec->pcm_lock is taken outside hdmi_present_sense() in the caller side, so that it won't cause deadlock at reentrace via runtime resume. v3 fix by tiwai: Missing linux/pm_runtime.h is included. References: 222bde03881c ("ALSA: hda - Fix mutex deadlock at HDMI/DP hotplug") Signed-off-by: Chris Wilson Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_hdmi.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index bd650222e711..76ae627e3f93 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -731,8 +732,10 @@ static void check_presence_and_report(struct hda_codec *codec, hda_nid_t nid) if (pin_idx < 0) return; + mutex_lock(&spec->pcm_lock); if (hdmi_present_sense(get_pin(spec, pin_idx), 1)) snd_hda_jack_report_sync(codec); + mutex_unlock(&spec->pcm_lock); } static void jack_callback(struct hda_codec *codec, @@ -1521,21 +1524,23 @@ static void sync_eld_via_acomp(struct hda_codec *codec, static bool hdmi_present_sense(struct hdmi_spec_per_pin *per_pin, int repoll) { struct hda_codec *codec = per_pin->codec; - struct hdmi_spec *spec = codec->spec; int ret; /* no temporary power up/down needed for component notifier */ - if (!codec_has_acomp(codec)) - snd_hda_power_up_pm(codec); + if (!codec_has_acomp(codec)) { + ret = snd_hda_power_up_pm(codec); + if (ret < 0 && pm_runtime_suspended(hda_codec_dev(codec))) { + snd_hda_power_down_pm(codec); + return false; + } + } - mutex_lock(&spec->pcm_lock); if (codec_has_acomp(codec)) { sync_eld_via_acomp(codec, per_pin); ret = false; /* don't call snd_hda_jack_report_sync() */ } else { ret = hdmi_present_sense_via_verbs(per_pin, repoll); } - mutex_unlock(&spec->pcm_lock); if (!codec_has_acomp(codec)) snd_hda_power_down_pm(codec); @@ -1547,12 +1552,16 @@ static void hdmi_repoll_eld(struct work_struct *work) { struct hdmi_spec_per_pin *per_pin = container_of(to_delayed_work(work), struct hdmi_spec_per_pin, work); + struct hda_codec *codec = per_pin->codec; + struct hdmi_spec *spec = codec->spec; if (per_pin->repoll_count++ > 6) per_pin->repoll_count = 0; + mutex_lock(&spec->pcm_lock); if (hdmi_present_sense(per_pin, per_pin->repoll_count)) snd_hda_jack_report_sync(per_pin->codec); + mutex_unlock(&spec->pcm_lock); } static void intel_haswell_fixup_connect_list(struct hda_codec *codec, From db2858f193dd9b3808fb180f1dbada87b3e41ba4 Mon Sep 17 00:00:00 2001 From: Oscar Salvador Date: Fri, 13 Jul 2018 16:59:13 -0700 Subject: [PATCH 175/970] fs, elf: make sure to page align bss in load_elf_library commit 24962af7e1041b7e50c1bc71d8d10dc678c556b5 upstream. The current code does not make sure to page align bss before calling vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate() due to the requested lenght not being correctly aligned. Let us make sure to align it properly. Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured for libc5. Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net Signed-off-by: Oscar Salvador Reported-by: syzbot+5dcb560fe12aa5091c06@syzkaller.appspotmail.com Tested-by: Tetsuo Handa Acked-by: Kees Cook Cc: Michal Hocko Cc: Nicolas Pitre Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/binfmt_elf.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 1fdf4e5bf8c6..a4fabf60d5ee 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1217,9 +1217,8 @@ static int load_elf_library(struct file *file) goto out_free_ph; } - len = ELF_PAGESTART(eppnt->p_filesz + eppnt->p_vaddr + - ELF_MIN_ALIGN - 1); - bss = eppnt->p_memsz + eppnt->p_vaddr; + len = ELF_PAGEALIGN(eppnt->p_filesz + eppnt->p_vaddr); + bss = ELF_PAGEALIGN(eppnt->p_memsz + eppnt->p_vaddr); if (bss > len) { error = vm_brk(len, bss - len); if (error) From 36c038f0a97edc9f76000f6238a37c8dd16781cb Mon Sep 17 00:00:00 2001 From: Paul Menzel Date: Tue, 5 Jun 2018 19:00:22 +0200 Subject: [PATCH 176/970] tools build: fix # escaping in .cmd files for future Make commit 9feeb638cde083c737e295c0547f1b4f28e99583 upstream. In 2016 GNU Make made a backwards incompatible change to the way '#' characters were handled in Makefiles when used inside functions or macros: http://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57 Due to this change, when attempting to run `make prepare' I get a spurious make syntax error: /home/earnest/linux/tools/objtool/.fixdep.o.cmd:1: *** missing separator. Stop. When inspecting `.fixdep.o.cmd' it includes two lines which use unescaped comment characters at the top: \# cannot find fixdep (/home/earnest/linux/tools/objtool//fixdep) \# using basic dep data This is because `tools/build/Build.include' prints these '\#' characters: printf '\# cannot find fixdep (%s)\n' $(fixdep) > $(dot-target).cmd; \ printf '\# using basic dep data\n\n' >> $(dot-target).cmd; \ This completes commit 9564a8cf422d ("Kbuild: fix # escaping in .cmd files for future Make"). Link: https://bugzilla.kernel.org/show_bug.cgi?id=197847 Cc: Randy Dunlap Cc: Rasmus Villemoes Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel Signed-off-by: Masahiro Yamada Signed-off-by: Greg Kroah-Hartman --- tools/build/Build.include | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/build/Build.include b/tools/build/Build.include index b8165545ddf6..ab02f8df0d56 100644 --- a/tools/build/Build.include +++ b/tools/build/Build.include @@ -63,8 +63,8 @@ dep-cmd = $(if $(wildcard $(fixdep)), $(fixdep) $(depfile) $@ '$(make-cmd)' > $(dot-target).tmp; \ rm -f $(depfile); \ mv -f $(dot-target).tmp $(dot-target).cmd, \ - printf '\# cannot find fixdep (%s)\n' $(fixdep) > $(dot-target).cmd; \ - printf '\# using basic dep data\n\n' >> $(dot-target).cmd; \ + printf '$(pound) cannot find fixdep (%s)\n' $(fixdep) > $(dot-target).cmd; \ + printf '$(pound) using basic dep data\n\n' >> $(dot-target).cmd; \ cat $(depfile) >> $(dot-target).cmd; \ printf '%s\n' 'cmd_$@ := $(make-cmd)' >> $(dot-target).cmd) From e78e3706bfba8b7865b7a52b3d1f4643e31c2197 Mon Sep 17 00:00:00 2001 From: Jon Hunter Date: Tue, 3 Jul 2018 09:55:43 +0100 Subject: [PATCH 177/970] i2c: tegra: Fix NACK error handling commit 54836e2d03e76d80aec3399368ffaf5b7caadd1b upstream. On Tegra30 Cardhu the PCA9546 I2C mux is not ACK'ing I2C commands on resume from suspend (which is caused by the reset signal for the I2C mux not being configured correctl). However, this NACK is causing the Tegra30 to hang on resuming from suspend which is not expected as we detect NACKs and handle them. The hang observed appears to occur when resetting the I2C controller to recover from the NACK. Commit 77821b4678f9 ("i2c: tegra: proper handling of error cases") added additional error handling for some error cases including NACK, however, it appears that this change conflicts with an early fix by commit f70893d08338 ("i2c: tegra: Add delay before resetting the controller after NACK"). After commit 77821b4678f9 was made we now disable 'packet mode' before the delay from commit f70893d08338 happens. Testing shows that moving the delay to before disabling 'packet mode' fixes the hang observed on Tegra30. The delay was added to give the I2C controller chance to send a stop condition and so it makes sense to move this to before we disable packet mode. Please note that packet mode is always enabled for Tegra. Fixes: 77821b4678f9 ("i2c: tegra: proper handling of error cases") Signed-off-by: Jon Hunter Acked-by: Thierry Reding Signed-off-by: Wolfram Sang Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-tegra.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c index 4af9bbae20df..586e557e113a 100644 --- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -547,6 +547,14 @@ static int tegra_i2c_disable_packet_mode(struct tegra_i2c_dev *i2c_dev) { u32 cnfg; + /* + * NACK interrupt is generated before the I2C controller generates + * the STOP condition on the bus. So wait for 2 clock periods + * before disabling the controller so that the STOP condition has + * been delivered properly. + */ + udelay(DIV_ROUND_UP(2 * 1000000, i2c_dev->bus_clk_rate)); + cnfg = i2c_readl(i2c_dev, I2C_CNFG); if (cnfg & I2C_CNFG_PACKET_MODE_EN) i2c_writel(i2c_dev, cnfg & ~I2C_CNFG_PACKET_MODE_EN, I2C_CNFG); @@ -708,15 +716,6 @@ static int tegra_i2c_xfer_msg(struct tegra_i2c_dev *i2c_dev, if (likely(i2c_dev->msg_err == I2C_ERR_NONE)) return 0; - /* - * NACK interrupt is generated before the I2C controller generates - * the STOP condition on the bus. So wait for 2 clock periods - * before resetting the controller so that the STOP condition has - * been delivered properly. - */ - if (i2c_dev->msg_err == I2C_ERR_NO_ACK) - udelay(DIV_ROUND_UP(2 * 1000000, i2c_dev->bus_clk_rate)); - tegra_i2c_init(i2c_dev); if (i2c_dev->msg_err == I2C_ERR_NO_ACK) { if (msg->flags & I2C_M_IGNORE_NAK) From 70c89bccb516af8694c50ba73b553278d7afaed4 Mon Sep 17 00:00:00 2001 From: Steve Wise Date: Thu, 21 Jun 2018 07:43:21 -0700 Subject: [PATCH 178/970] iw_cxgb4: correctly enforce the max reg_mr depth commit 7b72717a20bba8bdd01b14c0460be7d15061cd6b upstream. The code was mistakenly using the length of the page array memory instead of the depth of the page array. This would cause MR creation to fail in some cases. Fixes: 8376b86de7d3 ("iw_cxgb4: Support the new memory registration API") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/cxgb4/mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 410408f886c1..0c215353adb9 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -724,7 +724,7 @@ static int c4iw_set_page(struct ib_mr *ibmr, u64 addr) { struct c4iw_mr *mhp = to_c4iw_mr(ibmr); - if (unlikely(mhp->mpl_len == mhp->max_mpl_len)) + if (unlikely(mhp->mpl_len == mhp->attr.pbl_size)) return -ENOMEM; mhp->mpl[mhp->mpl_len++] = addr; From 062c4965a1e66361f3840fe8234871500af3549e Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Tue, 13 Feb 2018 05:44:44 -0700 Subject: [PATCH 179/970] nvme-pci: Remap CMB SQ entries on every controller reset commit 815c6704bf9f1c59f3a6be380a4032b9c57b12f1 upstream. The controller memory buffer is remapped into a kernel address on each reset, but the driver was setting the submission queue base address only on the very first queue creation. The remapped address is likely to change after a reset, so accessing the old address will hit a kernel bug. This patch fixes that by setting the queue's CMB base address each time the queue is created. Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path") Reported-by: Christian Black Cc: Jon Derrick Cc: # 4.9+ Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Signed-off-by: Scott Bauer Reviewed-by: Jon Derrick Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/host/pci.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a55d112583bd..fadf151ce830 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1034,17 +1034,15 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues, static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq, int qid, int depth) { - if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) { - unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth), - dev->ctrl.page_size); - nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset; - nvmeq->sq_cmds_io = dev->cmb + offset; - } else { - nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth), - &nvmeq->sq_dma_addr, GFP_KERNEL); - if (!nvmeq->sq_cmds) - return -ENOMEM; - } + + /* CMB SQEs will be mapped before creation */ + if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) + return 0; + + nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth), + &nvmeq->sq_dma_addr, GFP_KERNEL); + if (!nvmeq->sq_cmds) + return -ENOMEM; return 0; } @@ -1117,6 +1115,13 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid) struct nvme_dev *dev = nvmeq->dev; int result; + if (qid && dev->cmb && use_cmb_sqes && NVME_CMB_SQS(dev->cmbsz)) { + unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth), + dev->ctrl.page_size); + nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset; + nvmeq->sq_cmds_io = dev->cmb + offset; + } + nvmeq->cq_vector = qid - 1; result = adapter_alloc_cq(dev, qid, nvmeq); if (result < 0) From 377fb3d894d61421dadafd2cc9c4a5765d0ea241 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Fri, 18 May 2018 18:27:39 +0200 Subject: [PATCH 180/970] uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn() commit 90718e32e1dcc2479acfa208ccfc6442850b594c upstream. insn_get_length() has the side-effect of processing the entire instruction but only if it was decoded successfully, otherwise insn_complete() can fail and in this case we need to just return an error without warning. Reported-by: syzbot+30d675e3ca03c1c351e7@syzkaller.appspotmail.com Signed-off-by: Oleg Nesterov Reviewed-by: Masami Hiramatsu Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: syzkaller-bugs@googlegroups.com Link: https://lkml.kernel.org/lkml/20180518162739.GA5559@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/uprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 495c776de4b4..e78a6b1db74b 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -290,7 +290,7 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool insn_init(insn, auprobe->insn, sizeof(auprobe->insn), x86_64); /* has the side-effect of processing the entire instruction */ insn_get_length(insn); - if (WARN_ON_ONCE(!insn_complete(insn))) + if (!insn_complete(insn)) return -ENOEXEC; if (is_prefix_bad(insn)) From ac378e6ade31d5426c6225afa147252ce9170e63 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 13 Jun 2018 09:13:39 -0700 Subject: [PATCH 181/970] netfilter: nf_queue: augment nfqa_cfg_policy commit ba062ebb2cd561d404e0fba8ee4b3f5ebce7cbfc upstream. Three attributes are currently not verified, thus can trigger KMSAN warnings such as : BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline] BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268 CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ #5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117 __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620 __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] __fswab32 include/uapi/linux/swab.h:59 [inline] nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268 nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212 netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448 nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x43fd59 RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680 R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: fdb694a01f1f ("netfilter: Add fail-open support") Fixes: 829e17a1a602 ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink") Signed-off-by: Eric Dumazet Reported-by: syzbot Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nfnetlink_queue.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 5efb40291ac3..2a811b5634d4 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -1210,6 +1210,9 @@ static int nfqnl_recv_unsupp(struct net *net, struct sock *ctnl, static const struct nla_policy nfqa_cfg_policy[NFQA_CFG_MAX+1] = { [NFQA_CFG_CMD] = { .len = sizeof(struct nfqnl_msg_config_cmd) }, [NFQA_CFG_PARAMS] = { .len = sizeof(struct nfqnl_msg_config_params) }, + [NFQA_CFG_QUEUE_MAXLEN] = { .type = NLA_U32 }, + [NFQA_CFG_MASK] = { .type = NLA_U32 }, + [NFQA_CFG_FLAGS] = { .type = NLA_U32 }, }; static const struct nf_queue_handler nfqh = { From 40352e791c0c407ae260c294e04d81bd989e00a5 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 7 Jun 2018 21:34:43 +0200 Subject: [PATCH 182/970] netfilter: x_tables: initialise match/target check parameter struct commit c568503ef02030f169c9e19204def610a3510918 upstream. syzbot reports following splat: BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506 ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline] ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline] The uninitialised access is xt_mtchk_param->nft_compat ... which should be set to 0. Fix it by zeroing the struct beforehand, same for tgchk. ip(6)tables targetinfo uses c99-style initialiser, so no change needed there. Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com Fixes: 55917a21d0cc0 ("netfilter: x_tables: add context to know if extension runs from nft_compat") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/bridge/netfilter/ebtables.c | 2 ++ net/ipv4/netfilter/ip_tables.c | 1 + net/ipv6/netfilter/ip6_tables.c | 1 + 3 files changed, 4 insertions(+) diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index da3d373eb5bd..cb6fbb525ba6 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -704,6 +704,8 @@ ebt_check_entry(struct ebt_entry *e, struct net *net, } i = 0; + memset(&mtpar, 0, sizeof(mtpar)); + memset(&tgpar, 0, sizeof(tgpar)); mtpar.net = tgpar.net = net; mtpar.table = tgpar.table = name; mtpar.entryinfo = tgpar.entryinfo = e; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index e78f6521823f..06aa4948d0c0 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -554,6 +554,7 @@ find_check_entry(struct ipt_entry *e, struct net *net, const char *name, return -ENOMEM; j = 0; + memset(&mtpar, 0, sizeof(mtpar)); mtpar.net = net; mtpar.table = name; mtpar.entryinfo = &e->ip; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index e26becc9a43d..180f19526a80 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -584,6 +584,7 @@ find_check_entry(struct ip6t_entry *e, struct net *net, const char *name, return -ENOMEM; j = 0; + memset(&mtpar, 0, sizeof(mtpar)); mtpar.net = net; mtpar.table = name; mtpar.entryinfo = &e->ipv6; From e3cf1cc9ed9230560e61306e13c88eccb5189c1b Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Mon, 7 May 2018 11:37:58 -0400 Subject: [PATCH 183/970] loop: add recursion validation to LOOP_CHANGE_FD commit d2ac838e4cd7e5e9891ecc094d626734b0245c99 upstream. Refactor the validation code used in LOOP_SET_FD so it is also used in LOOP_CHANGE_FD. Otherwise it is possible to construct a set of loop devices that all refer to each other. This can lead to a infinite loop in starting with "while (is_loop_device(f)) .." in loop_set_fd(). Fix this by refactoring out the validation code and using it for LOOP_CHANGE_FD as well as LOOP_SET_FD. Reported-by: syzbot+4349872271ece473a7c91190b68b4bac7c5dbc87@syzkaller.appspotmail.com Reported-by: syzbot+40bd32c4d9a3cc12a339@syzkaller.appspotmail.com Reported-by: syzbot+769c54e66f994b041be7@syzkaller.appspotmail.com Reported-by: syzbot+0a89a9ce473936c57065@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/block/loop.c | 68 +++++++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index ff1c4d7aa025..29c847137e42 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -640,6 +640,36 @@ static void loop_reread_partitions(struct loop_device *lo, __func__, lo->lo_number, lo->lo_file_name, rc); } +static inline int is_loop_device(struct file *file) +{ + struct inode *i = file->f_mapping->host; + + return i && S_ISBLK(i->i_mode) && MAJOR(i->i_rdev) == LOOP_MAJOR; +} + +static int loop_validate_file(struct file *file, struct block_device *bdev) +{ + struct inode *inode = file->f_mapping->host; + struct file *f = file; + + /* Avoid recursion */ + while (is_loop_device(f)) { + struct loop_device *l; + + if (f->f_mapping->host->i_bdev == bdev) + return -EBADF; + + l = f->f_mapping->host->i_bdev->bd_disk->private_data; + if (l->lo_state == Lo_unbound) { + return -EINVAL; + } + f = l->lo_backing_file; + } + if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode)) + return -EINVAL; + return 0; +} + /* * loop_change_fd switched the backing store of a loopback device to * a new file. This is useful for operating system installers to free up @@ -669,14 +699,15 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev, if (!file) goto out; + error = loop_validate_file(file, bdev); + if (error) + goto out_putf; + inode = file->f_mapping->host; old_file = lo->lo_backing_file; error = -EINVAL; - if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode)) - goto out_putf; - /* size of the new backing store needs to be the same */ if (get_loop_size(lo, file) != get_loop_size(lo, old_file)) goto out_putf; @@ -697,13 +728,6 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev, return error; } -static inline int is_loop_device(struct file *file) -{ - struct inode *i = file->f_mapping->host; - - return i && S_ISBLK(i->i_mode) && MAJOR(i->i_rdev) == LOOP_MAJOR; -} - /* loop sysfs attributes */ static ssize_t loop_attr_show(struct device *dev, char *page, @@ -861,7 +885,7 @@ static int loop_prepare_queue(struct loop_device *lo) static int loop_set_fd(struct loop_device *lo, fmode_t mode, struct block_device *bdev, unsigned int arg) { - struct file *file, *f; + struct file *file; struct inode *inode; struct address_space *mapping; unsigned lo_blocksize; @@ -881,29 +905,13 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, if (lo->lo_state != Lo_unbound) goto out_putf; - /* Avoid recursion */ - f = file; - while (is_loop_device(f)) { - struct loop_device *l; - - if (f->f_mapping->host->i_bdev == bdev) - goto out_putf; - - l = f->f_mapping->host->i_bdev->bd_disk->private_data; - if (l->lo_state == Lo_unbound) { - error = -EINVAL; - goto out_putf; - } - f = l->lo_backing_file; - } + error = loop_validate_file(file, bdev); + if (error) + goto out_putf; mapping = file->f_mapping; inode = mapping->host; - error = -EINVAL; - if (!S_ISREG(inode->i_mode) && !S_ISBLK(inode->i_mode)) - goto out_putf; - if (!(file->f_mode & FMODE_WRITE) || !(mode & FMODE_WRITE) || !file->f_op->write_iter) lo_flags |= LO_FLAGS_READ_ONLY; From 34f841a3c3db03dd7de1157c3b1bda07b50d3259 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sat, 26 May 2018 09:59:36 +0900 Subject: [PATCH 184/970] PM / hibernate: Fix oops at snapshot_write() commit fc14eebfc20854a38fd9f1d93a42b1783dad4d17 upstream. syzbot is reporting NULL pointer dereference at snapshot_write() [1]. This is because data->handle is zero-cleared by ioctl(SNAPSHOT_FREE). Fix this by checking data_of(data->handle) != NULL before using it. [1] https://syzkaller.appspot.com/bug?id=828a3c71bd344a6de8b6a31233d51a72099f27fd Signed-off-by: Tetsuo Handa Reported-by: syzbot Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- kernel/power/user.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/power/user.c b/kernel/power/user.c index 35310b627388..bc6dde1f1567 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -186,6 +186,11 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf, res = PAGE_SIZE - pg_offp; } + if (!data_of(data->handle)) { + res = -EINVAL; + goto unlock; + } + res = simple_write_to_buffer(data_of(data->handle), res, &pg_offp, buf, count); if (res > 0) From 684db31e7471c39aad887820ff62804dac60b444 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 23 May 2018 08:22:11 +0300 Subject: [PATCH 185/970] RDMA/ucm: Mark UCM interface as BROKEN commit 7a8690ed6f5346f6738971892205e91d39b6b901 upstream. In commit 357d23c811a7 ("Remove the obsolete libibcm library") in rdma-core [1], we removed obsolete library which used the /dev/infiniband/ucmX interface. Following multiple syzkaller reports about non-sanitized user input in the UCMA module, the short audit reveals the same issues in UCM module too. It is better to disable this interface in the kernel, before syzkaller team invests time and energy to harden this unused interface. [1] https://github.com/linux-rdma/rdma-core/pull/279 Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/Kconfig | 12 ++++++++++++ drivers/infiniband/core/Makefile | 4 ++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index fb3fb89640e5..5d5368a3c819 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -34,6 +34,18 @@ config INFINIBAND_USER_ACCESS libibverbs, libibcm and a hardware driver library from . +config INFINIBAND_USER_ACCESS_UCM + bool "Userspace CM (UCM, DEPRECATED)" + depends on BROKEN + depends on INFINIBAND_USER_ACCESS + help + The UCM module has known security flaws, which no one is + interested to fix. The user-space part of this code was + dropped from the upstream a long time ago. + + This option is DEPRECATED and planned to be removed. + + config INFINIBAND_USER_MEM bool depends on INFINIBAND_USER_ACCESS != n diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index edaae9f9853c..33dc00c721b9 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -4,8 +4,8 @@ user_access-$(CONFIG_INFINIBAND_ADDR_TRANS) := rdma_ucm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_cm.o iw_cm.o \ $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o -obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o \ - $(user_access-y) +obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o $(user_access-y) +obj-$(CONFIG_INFINIBAND_USER_ACCESS_UCM) += ib_ucm.o $(user_access-y) ib_core-y := packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \ device.o fmr_pool.o cache.o netlink.o \ From b2660f35d3da4d960b935c9b683b92511d4d54b4 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Fri, 4 May 2018 10:58:09 -0600 Subject: [PATCH 186/970] loop: remember whether sysfs_create_group() was done commit d3349b6b3c373ac1fbfb040b810fcee5e2adc7e0 upstream. syzbot is hitting WARN() triggered by memory allocation fault injection [1] because loop module is calling sysfs_remove_group() when sysfs_create_group() failed. Fix this by remembering whether sysfs_create_group() succeeded. [1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90b Signed-off-by: Tetsuo Handa Reported-by: syzbot Reviewed-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman Renamed sysfs_ready -> sysfs_inited. Signed-off-by: Jens Axboe --- drivers/block/loop.c | 11 ++++++----- drivers/block/loop.h | 1 + 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 29c847137e42..9f840d9fdfcb 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -824,16 +824,17 @@ static struct attribute_group loop_attribute_group = { .attrs= loop_attrs, }; -static int loop_sysfs_init(struct loop_device *lo) +static void loop_sysfs_init(struct loop_device *lo) { - return sysfs_create_group(&disk_to_dev(lo->lo_disk)->kobj, - &loop_attribute_group); + lo->sysfs_inited = !sysfs_create_group(&disk_to_dev(lo->lo_disk)->kobj, + &loop_attribute_group); } static void loop_sysfs_exit(struct loop_device *lo) { - sysfs_remove_group(&disk_to_dev(lo->lo_disk)->kobj, - &loop_attribute_group); + if (lo->sysfs_inited) + sysfs_remove_group(&disk_to_dev(lo->lo_disk)->kobj, + &loop_attribute_group); } static void loop_config_discard(struct loop_device *lo) diff --git a/drivers/block/loop.h b/drivers/block/loop.h index fb2237c73e61..60f0fd2c0c65 100644 --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -59,6 +59,7 @@ struct loop_device { struct kthread_worker worker; struct task_struct *worker_task; bool use_dio; + bool sysfs_inited; struct request_queue *lo_queue; struct blk_mq_tag_set tag_set; From f77982e6911294725180897ff7244446ab708381 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 17 Jul 2018 11:37:54 +0200 Subject: [PATCH 187/970] Linux 4.9.113 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index c4544293db10..3884afb2850f 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 112 +SUBLEVEL = 113 EXTRAVERSION = NAME = Roaring Lionus From dc9e795b080bab8f3268d585f406922e07f4be46 Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Fri, 22 Jun 2018 10:55:46 -0700 Subject: [PATCH 188/970] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace() commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream. The current MIPS implementation of arch_trigger_cpumask_backtrace() is broken because it attempts to use synchronous IPIs despite the fact that it may be run with interrupts disabled. This means that when arch_trigger_cpumask_backtrace() is invoked, for example by the RCU CPU stall watchdog, we may: - Deadlock due to use of synchronous IPIs with interrupts disabled, causing the CPU that's attempting to generate the backtrace output to hang itself. - Not succeed in generating the desired output from remote CPUs. - Produce warnings about this from smp_call_function_many(), for example: [42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks: [42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0 [42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0 [42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33) [42760.568927] ------------[ cut here ]------------ [42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c [42760.587839] Modules linked in: [42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2 [42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8 [42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000 [42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0 [42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0 [42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca [42760.669424] ... [42760.673919] Call Trace: [42760.678672] [<27fde568>] show_stack+0x70/0xf0 [42760.685417] [<84751641>] dump_stack+0xaa/0xd0 [42760.692188] [<699d671c>] __warn+0x80/0x92 [42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36 [42760.705912] [] smp_call_function_many+0x88/0x20c [42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a [42760.722216] [] rcu_dump_cpu_stacks+0x6a/0x98 [42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac [42760.737476] [<059b3b43>] update_process_times+0x18/0x34 [42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38 [42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50 [42760.759882] [] __hrtimer_run_queues+0xc6/0x226 [42760.767418] [] hrtimer_interrupt+0x88/0x19a [42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a [42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168 [42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86 [42760.805545] [] gic_irq_dispatch+0xa/0x14 [42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c [42760.820086] [] do_IRQ+0x16/0x20 [42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94 [42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78 [42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c [42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c [42760.855693] [] flush_tlb_mm+0x2a/0x98 [42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44 [42760.869628] [] arch_tlb_finish_mmu+0x26/0x3e [42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66 [42760.883907] [] exit_mmap+0x76/0xea [42760.890428] [] mmput+0x80/0x11a [42760.896632] [] do_exit+0x1f4/0x80c [42760.903158] [] do_group_exit+0x20/0x7e [42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e [42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c [42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c [42760.931765] ---[ end trace 02aa09da9dc52a60 ]--- [42760.938342] ------------[ cut here ]------------ [42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8 ... This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async IPIs & smp_call_function_single_async() in order to resolve this problem. We ensure use of the pre-allocated call_single_data_t structures is serialized by maintaining a cpumask indicating that they're busy, and refusing to attempt to send an IPI when a CPU's bit is set in this mask. This should only happen if a CPU hasn't responded to a previous backtrace IPI - ie. if it's hung - and we print a warning to the console in this case. I've marked this for stable branches as far back as v4.9, to which it applies cleanly. Strictly speaking the faulty MIPS implementation can be traced further back to commit 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel versions v3.19 through v4.8 will require further work to backport due to the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods"). Signed-off-by: Paul Burton Patchwork: https://patchwork.linux-mips.org/patch/19597/ Cc: James Hogan Cc: Ralf Baechle Cc: Huacai Chen Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.9+ Fixes: 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function") Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods") [ Huacai: backported to 4.9: Replace "call_single_data_t" with "struct call_single_data" ] Signed-off-by: Huacai Chen Signed-off-by: Greg Kroah-Hartman --- arch/mips/kernel/process.c | 45 +++++++++++++++++++++++++------------- 1 file changed, 30 insertions(+), 15 deletions(-) diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c index cb1e9c184b5a..513a63b9b991 100644 --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -633,28 +634,42 @@ unsigned long arch_align_stack(unsigned long sp) return sp & ALMASK; } -static void arch_dump_stack(void *info) +static DEFINE_PER_CPU(struct call_single_data, backtrace_csd); +static struct cpumask backtrace_csd_busy; + +static void handle_backtrace(void *info) { - struct pt_regs *regs; + nmi_cpu_backtrace(get_irq_regs()); + cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy); +} - regs = get_irq_regs(); +static void raise_backtrace(cpumask_t *mask) +{ + struct call_single_data *csd; + int cpu; - if (regs) - show_regs(regs); - else - dump_stack(); + for_each_cpu(cpu, mask) { + /* + * If we previously sent an IPI to the target CPU & it hasn't + * cleared its bit in the busy cpumask then it didn't handle + * our previous IPI & it's not safe for us to reuse the + * call_single_data_t. + */ + if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) { + pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n", + cpu); + continue; + } + + csd = &per_cpu(backtrace_csd, cpu); + csd->func = handle_backtrace; + smp_call_function_single_async(cpu, csd); + } } void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self) { - long this_cpu = get_cpu(); - - if (cpumask_test_cpu(this_cpu, mask) && !exclude_self) - dump_stack(); - - smp_call_function_many(mask, arch_dump_stack, NULL, 1); - - put_cpu(); + nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace); } int mips_get_process_fp_mode(struct task_struct *task) From 94cc698fdaa7250c117343b34432ea6431b3856b Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Tue, 6 Jun 2017 13:36:24 -0700 Subject: [PATCH 189/970] compiler, clang: suppress warning for unused static inline functions commit abb2ea7dfd82451d85ce669b811310c05ab5ca46 upstream. GCC explicitly does not warn for unused static inline functions for -Wunused-function. The manual states: Warn whenever a static function is declared but not defined or a non-inline static function is unused. Clang does warn for static inline functions that are unused. It turns out that suppressing the warnings avoids potentially complex #ifdef directives, which also reduces LOC. Suppress the warning for clang. Signed-off-by: David Rientjes Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler-clang.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 01225b0059b1..5ce62d3e69b5 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -16,6 +16,13 @@ */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) +/* + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well. + */ +#define inline inline __attribute__((unused)) + /* Clang doesn't have a way to turn it off per-function, yet. */ #ifdef __noretpoline #undef __noretpoline From f276b50c3a5b01c1b6636adb3e4eba67d187d218 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 11 Jun 2017 15:51:56 -0700 Subject: [PATCH 190/970] compiler, clang: properly override 'inline' for clang commit 6d53cefb18e4646fb4bf62ccb6098fb3808486df upstream. Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") just caused more warnings due to re-defining the 'inline' macro. So undef it before re-defining it, and also add the 'notrace' attribute like the gcc version that this is overriding does. Maybe this makes clang happier. Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler-clang.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 5ce62d3e69b5..8a97da12ea26 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -21,7 +21,8 @@ * -Wunused-function. This turns out to avoid the need for complex #ifdef * directives. Suppress the warning in clang as well. */ -#define inline inline __attribute__((unused)) +#undef inline +#define inline inline __attribute__((unused)) notrace /* Clang doesn't have a way to turn it off per-function, yet. */ #ifdef __noretpoline From 29524a9d42f683ca7b67f07f1ac3f6049c8675cc Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Thu, 6 Jul 2017 15:35:24 -0700 Subject: [PATCH 191/970] compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled commit 9a04dbcfb33b4012d0ce8c0282f1e3ca694675b1 upstream. The motivation for commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") was to suppress clang's warnings about unused static inline functions. For configs without CONFIG_OPTIMIZE_INLINING enabled, such as any non-x86 architecture, `inline' in the kernel implies that __attribute__((always_inline)) is used. Some code depends on that behavior, see https://lkml.org/lkml/2017/6/13/918: net/built-in.o: In function `__xchg_mb': arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99' arch/arm64/include/asm/cmpxchg.h:99: undefined reference to `__compiletime_assert_99 The full fix would be to identify these breakages and annotate the functions with __always_inline instead of `inline'. But since we are late in the 4.12-rc cycle, simply carry forward the forced inlining behavior and work toward moving arm64, and other architectures, toward CONFIG_OPTIMIZE_INLINING behavior. Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706261552200.1075@chino.kir.corp.google.com Signed-off-by: David Rientjes Reported-by: Sodagudi Prasad Tested-by: Sodagudi Prasad Tested-by: Matthias Kaehlcke Cc: Mark Rutland Cc: Will Deacon Cc: Catalin Marinas Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler-clang.h | 8 -------- include/linux/compiler-gcc.h | 18 +++++++++++------- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 8a97da12ea26..01225b0059b1 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -16,14 +16,6 @@ */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) -/* - * GCC does not warn about unused static inline functions for - * -Wunused-function. This turns out to avoid the need for complex #ifdef - * directives. Suppress the warning in clang as well. - */ -#undef inline -#define inline inline __attribute__((unused)) notrace - /* Clang doesn't have a way to turn it off per-function, yet. */ #ifdef __noretpoline #undef __noretpoline diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index ad793c69cc46..a6d1bf2a5b8e 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,18 +66,22 @@ /* * Force always-inline if the user requests it so via the .config, - * or if gcc is too old: + * or if gcc is too old. + * GCC does not warn about unused static inline functions for + * -Wunused-function. This turns out to avoid the need for complex #ifdef + * directives. Suppress the warning in clang as well by using "unused" + * function attribute, which is redundant but not harmful for gcc. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline)) notrace -#define __inline__ __inline__ __attribute__((always_inline)) notrace -#define __inline __inline __attribute__((always_inline)) notrace +#define inline inline __attribute__((always_inline,unused)) notrace +#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace +#define __inline __inline __attribute__((always_inline,unused)) notrace #else /* A lot of inline functions can cause havoc with function tracing */ -#define inline inline notrace -#define __inline__ __inline__ notrace -#define __inline __inline notrace +#define inline inline __attribute__((unused)) notrace +#define __inline__ __inline__ __attribute__((unused)) notrace +#define __inline __inline __attribute__((unused)) notrace #endif #define __always_inline inline __attribute__((always_inline)) From 02c89527b056301846bfdec4fabbad924d1ad59d Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Thu, 21 Jun 2018 09:23:22 -0700 Subject: [PATCH 192/970] compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations commit d03db2bc26f0e4a6849ad649a09c9c73fccdc656 upstream. Functions marked extern inline do not emit an externally visible function when the gnu89 C standard is used. Some KBUILD Makefiles overwrite KBUILD_CFLAGS. This is an issue for GCC 5.1+ users as without an explicit C standard specified, the default is gnu11. Since c99, the semantics of extern inline have changed such that an externally visible function is always emitted. This can lead to multiple definition errors of extern inline functions at link time of compilation units whose build files have removed an explicit C standard compiler flag for users of GCC 5.1+ or Clang. Suggested-by: Arnd Bergmann Suggested-by: H. Peter Anvin Suggested-by: Joe Perches Signed-off-by: Nick Desaulniers Acked-by: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: sedat.dilek@gmail.com Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-2-ndesaulniers@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler-gcc.h | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index a6d1bf2a5b8e..8e82e3373eaf 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -64,6 +64,18 @@ #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0])) #endif +/* + * Feature detection for gnu_inline (gnu89 extern inline semantics). Either + * __GNUC_STDC_INLINE__ is defined (not using gnu89 extern inline semantics, + * and we opt in to the gnu89 semantics), or __GNUC_STDC_INLINE__ is not + * defined so the gnu89 semantics are the default. + */ +#ifdef __GNUC_STDC_INLINE__ +# define __gnu_inline __attribute__((gnu_inline)) +#else +# define __gnu_inline +#endif + /* * Force always-inline if the user requests it so via the .config, * or if gcc is too old. @@ -71,19 +83,22 @@ * -Wunused-function. This turns out to avoid the need for complex #ifdef * directives. Suppress the warning in clang as well by using "unused" * function attribute, which is redundant but not harmful for gcc. + * Prefer gnu_inline, so that extern inline functions do not emit an + * externally visible function. This makes extern inline behave as per gnu89 + * semantics rather than c99. This prevents multiple symbol definition errors + * of extern inline functions at link time. + * A lot of inline functions can cause havoc with function tracing. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline,unused)) notrace -#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace -#define __inline __inline __attribute__((always_inline,unused)) notrace +#define inline \ + inline __attribute__((always_inline, unused)) notrace __gnu_inline #else -/* A lot of inline functions can cause havoc with function tracing */ -#define inline inline __attribute__((unused)) notrace -#define __inline__ __inline__ __attribute__((unused)) notrace -#define __inline __inline __attribute__((unused)) notrace +#define inline inline __attribute__((unused)) notrace __gnu_inline #endif +#define __inline__ inline +#define __inline inline #define __always_inline inline __attribute__((always_inline)) #define noinline __attribute__((noinline)) From cb877e4763d4b6276568858156621558c8e40037 Mon Sep 17 00:00:00 2001 From: "H. Peter Anvin" Date: Thu, 21 Jun 2018 09:23:23 -0700 Subject: [PATCH 193/970] x86/asm: Add _ASM_ARG* constants for argument registers to commit 0e2e160033283e20f688d8bad5b89460cc5bfcc4 upstream. i386 and x86-64 uses different registers for arguments; make them available so we don't have to #ifdef in the actual code. Native size and specified size (q, l, w, b) versions are provided. Signed-off-by: H. Peter Anvin Signed-off-by: Nick Desaulniers Reviewed-by: Sedat Dilek Acked-by: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tstellar@redhat.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-3-ndesaulniers@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/asm.h | 59 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index 8d8c24f3a963..742712b4bdc3 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -45,6 +45,65 @@ #define _ASM_SI __ASM_REG(si) #define _ASM_DI __ASM_REG(di) +#ifndef __x86_64__ +/* 32 bit */ + +#define _ASM_ARG1 _ASM_AX +#define _ASM_ARG2 _ASM_DX +#define _ASM_ARG3 _ASM_CX + +#define _ASM_ARG1L eax +#define _ASM_ARG2L edx +#define _ASM_ARG3L ecx + +#define _ASM_ARG1W ax +#define _ASM_ARG2W dx +#define _ASM_ARG3W cx + +#define _ASM_ARG1B al +#define _ASM_ARG2B dl +#define _ASM_ARG3B cl + +#else +/* 64 bit */ + +#define _ASM_ARG1 _ASM_DI +#define _ASM_ARG2 _ASM_SI +#define _ASM_ARG3 _ASM_DX +#define _ASM_ARG4 _ASM_CX +#define _ASM_ARG5 r8 +#define _ASM_ARG6 r9 + +#define _ASM_ARG1Q rdi +#define _ASM_ARG2Q rsi +#define _ASM_ARG3Q rdx +#define _ASM_ARG4Q rcx +#define _ASM_ARG5Q r8 +#define _ASM_ARG6Q r9 + +#define _ASM_ARG1L edi +#define _ASM_ARG2L esi +#define _ASM_ARG3L edx +#define _ASM_ARG4L ecx +#define _ASM_ARG5L r8d +#define _ASM_ARG6L r9d + +#define _ASM_ARG1W di +#define _ASM_ARG2W si +#define _ASM_ARG3W dx +#define _ASM_ARG4W cx +#define _ASM_ARG5W r8w +#define _ASM_ARG6W r9w + +#define _ASM_ARG1B dil +#define _ASM_ARG2B sil +#define _ASM_ARG3B dl +#define _ASM_ARG4B cl +#define _ASM_ARG5B r8b +#define _ASM_ARG6B r9b + +#endif + /* * Macros to generate condition code outputs from inline assembly, * The output operand must be type "bool". From 1919f3fd551aeeb547b8ebff5ab758a3fd615c31 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Thu, 21 Jun 2018 09:23:24 -0700 Subject: [PATCH 194/970] x86/paravirt: Make native_save_fl() extern inline commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream. native_save_fl() is marked static inline, but by using it as a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined. paravirt's use of native_save_fl() also requires that no GPRs other than %rax are clobbered. Compilers have different heuristics which they use to emit stack guard code, the emittance of which can break paravirt's callee saved assumption by clobbering %rcx. Marking a function definition extern inline means that if this version cannot be inlined, then the out-of-line version will be preferred. By having the out-of-line version be implemented in assembly, it cannot be instrumented with a stack protector, which might violate custom calling conventions that code like paravirt rely on. The semantics of extern inline has changed since gnu89. This means that folks using GCC versions >= 5.1 may see symbol redefinition errors at link time for subdirs that override KBUILD_CFLAGS (making the C standard used implicit) regardless of this patch. This has been cleaned up earlier in the patch set, but is left as a note in the commit message for future travelers. Reports: https://lkml.org/lkml/2018/5/7/534 https://github.com/ClangBuiltLinux/linux/issues/16 Discussion: https://bugs.llvm.org/show_bug.cgi?id=37512 https://lkml.org/lkml/2018/5/24/1371 Thanks to the many folks that participated in the discussion. Debugged-by: Alistair Strachan Debugged-by: Matthias Kaehlcke Suggested-by: Arnd Bergmann Suggested-by: H. Peter Anvin Suggested-by: Tom Stellar Reported-by: Sedat Dilek Tested-by: Sedat Dilek Signed-off-by: Nick Desaulniers Acked-by: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: acme@redhat.com Cc: akataria@vmware.com Cc: akpm@linux-foundation.org Cc: andrea.parri@amarulasolutions.com Cc: ard.biesheuvel@linaro.org Cc: aryabinin@virtuozzo.com Cc: astrachan@google.com Cc: boris.ostrovsky@oracle.com Cc: brijesh.singh@amd.com Cc: caoj.fnst@cn.fujitsu.com Cc: geert@linux-m68k.org Cc: ghackmann@google.com Cc: gregkh@linuxfoundation.org Cc: jan.kiszka@siemens.com Cc: jarkko.sakkinen@linux.intel.com Cc: joe@perches.com Cc: jpoimboe@redhat.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: kstewart@linuxfoundation.org Cc: linux-efi@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Cc: manojgupta@google.com Cc: mawilcox@microsoft.com Cc: michal.lkml@markovi.net Cc: mjg59@google.com Cc: mka@chromium.org Cc: pombredanne@nexb.com Cc: rientjes@google.com Cc: rostedt@goodmis.org Cc: thomas.lendacky@amd.com Cc: tweek@google.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: yamada.masahiro@socionext.com Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulniers@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/irqflags.h | 2 +- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/irqflags.S | 26 ++++++++++++++++++++++++++ 3 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/irqflags.S diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index ac7692dcfa2e..8a8a6c66be9a 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -12,7 +12,7 @@ * Interrupt control: */ -static inline unsigned long native_save_fl(void) +extern inline unsigned long native_save_fl(void) { unsigned long flags; diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 4c9c61517613..a9ba968621cb 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -56,6 +56,7 @@ obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o obj-y += tsc.o tsc_msr.o io_delay.o rtc.o obj-y += pci-iommu_table.o obj-y += resource.o +obj-y += irqflags.o obj-y += process.o obj-y += fpu/ diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S new file mode 100644 index 000000000000..ddeeaac8adda --- /dev/null +++ b/arch/x86/kernel/irqflags.S @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include +#include +#include + +/* + * unsigned long native_save_fl(void) + */ +ENTRY(native_save_fl) + pushf + pop %_ASM_AX + ret +ENDPROC(native_save_fl) +EXPORT_SYMBOL(native_save_fl) + +/* + * void native_restore_fl(unsigned long flags) + * %eax/%rdi: flags + */ +ENTRY(native_restore_fl) + push %_ASM_ARG1 + popf + ret +ENDPROC(native_restore_fl) +EXPORT_SYMBOL(native_restore_fl) From 32a1733cf823011266dac0ea34c13555ff35dde5 Mon Sep 17 00:00:00 2001 From: alex chen Date: Wed, 15 Nov 2017 17:31:48 -0800 Subject: [PATCH 195/970] ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent commit 853bc26a7ea39e354b9f8889ae7ad1492ffa28d2 upstream. The subsystem.su_mutex is required while accessing the item->ci_parent, otherwise, NULL pointer dereference to the item->ci_parent will be triggered in the following situation: add node delete node sys_write vfs_write configfs_write_file o2nm_node_store o2nm_node_local_write do_rmdir vfs_rmdir configfs_rmdir mutex_lock(&subsys->su_mutex); unlink_obj item->ci_group = NULL; item->ci_parent = NULL; to_o2nm_cluster_from_node node->nd_item.ci_parent->ci_parent BUG since of NULL pointer dereference to nd_item.ci_parent Moreover, the o2nm_cluster also should be protected by the subsystem.su_mutex. [alex.chen@huawei.com: v2] Link: http://lkml.kernel.org/r/59EEAA69.9080703@huawei.com Link: http://lkml.kernel.org/r/59E9B36A.10700@huawei.com Signed-off-by: Alex Chen Reviewed-by: Jun Piao Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Salvatore Bonaccorso Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/cluster/nodemanager.c | 63 +++++++++++++++++++++++++++++----- 1 file changed, 55 insertions(+), 8 deletions(-) diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c index b17d180bdc16..c204ac9b49e5 100644 --- a/fs/ocfs2/cluster/nodemanager.c +++ b/fs/ocfs2/cluster/nodemanager.c @@ -40,6 +40,9 @@ char *o2nm_fence_method_desc[O2NM_FENCE_METHODS] = { "panic", /* O2NM_FENCE_PANIC */ }; +static inline void o2nm_lock_subsystem(void); +static inline void o2nm_unlock_subsystem(void); + struct o2nm_node *o2nm_get_node_by_num(u8 node_num) { struct o2nm_node *node = NULL; @@ -181,7 +184,10 @@ static struct o2nm_cluster *to_o2nm_cluster_from_node(struct o2nm_node *node) { /* through the first node_set .parent * mycluster/nodes/mynode == o2nm_cluster->o2nm_node_group->o2nm_node */ - return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + if (node->nd_item.ci_parent) + return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + else + return NULL; } enum { @@ -194,7 +200,7 @@ static ssize_t o2nm_node_num_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; int ret = 0; @@ -214,6 +220,13 @@ static ssize_t o2nm_node_num_store(struct config_item *item, const char *page, !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */ + o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + write_lock(&cluster->cl_nodes_lock); if (cluster->cl_nodes[tmp]) ret = -EEXIST; @@ -226,6 +239,8 @@ static ssize_t o2nm_node_num_store(struct config_item *item, const char *page, set_bit(tmp, cluster->cl_nodes_bitmap); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret; @@ -269,7 +284,7 @@ static ssize_t o2nm_node_ipv4_address_store(struct config_item *item, size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; int ret, i; struct rb_node **p, *parent; unsigned int octets[4]; @@ -286,6 +301,13 @@ static ssize_t o2nm_node_ipv4_address_store(struct config_item *item, be32_add_cpu(&ipv4_addr, octets[i] << (i * 8)); } + o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + ret = 0; write_lock(&cluster->cl_nodes_lock); if (o2nm_node_ip_tree_lookup(cluster, ipv4_addr, &p, &parent)) @@ -298,6 +320,8 @@ static ssize_t o2nm_node_ipv4_address_store(struct config_item *item, rb_insert_color(&node->nd_ip_node, &cluster->cl_node_ip_tree); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret; @@ -315,7 +339,7 @@ static ssize_t o2nm_node_local_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; ssize_t ret; @@ -333,17 +357,26 @@ static ssize_t o2nm_node_local_store(struct config_item *item, const char *page, !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */ + o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + ret = -EINVAL; + goto out; + } + /* the only failure case is trying to set a new local node * when a different one is already set */ if (tmp && tmp == cluster->cl_has_local && - cluster->cl_local_node != node->nd_num) - return -EBUSY; + cluster->cl_local_node != node->nd_num) { + ret = -EBUSY; + goto out; + } /* bring up the rx thread if we're setting the new local node. */ if (tmp && !cluster->cl_has_local) { ret = o2net_start_listening(node); if (ret) - return ret; + goto out; } if (!tmp && cluster->cl_has_local && @@ -358,7 +391,11 @@ static ssize_t o2nm_node_local_store(struct config_item *item, const char *page, cluster->cl_local_node = node->nd_num; } - return count; + ret = count; + +out: + o2nm_unlock_subsystem(); + return ret; } CONFIGFS_ATTR(o2nm_node_, num); @@ -738,6 +775,16 @@ static struct o2nm_cluster_group o2nm_cluster_group = { }, }; +static inline void o2nm_lock_subsystem(void) +{ + mutex_lock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + +static inline void o2nm_unlock_subsystem(void) +{ + mutex_unlock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + int o2nm_depend_item(struct config_item *item) { return configfs_depend_item(&o2nm_cluster_group.cs_subsys, item); From 78a65505cdf7b7392c963d3715269516bc812ef2 Mon Sep 17 00:00:00 2001 From: alex chen Date: Wed, 15 Nov 2017 17:31:44 -0800 Subject: [PATCH 196/970] ocfs2: ip_alloc_sem should be taken in ocfs2_get_block() commit 3e4c56d41eef5595035872a2ec5a483f42e8917f upstream. ip_alloc_sem should be taken in ocfs2_get_block() when reading file in DIRECT mode to prevent concurrent access to extent tree with ocfs2_dio_end_io_write(), which may cause BUGON in the following situation: read file 'A' end_io of writing file 'A' vfs_read __vfs_read ocfs2_file_read_iter generic_file_read_iter ocfs2_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO get_more_blocks ocfs2_get_block ocfs2_extent_map_get_blocks ocfs2_get_clusters ocfs2_get_clusters_nocache() ocfs2_search_extent_list return the index of record which contains the v_cluster, that is v_cluster > rec[i]->e_cpos. ocfs2_dio_end_io ocfs2_dio_end_io_write down_write(&oi->ip_alloc_sem); ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ... --> modify the rec[i]->e_cpos, resulting in v_cluster < rec[i]->e_cpos. BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)) [alex.chen@huawei.com: v3] Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Alex Chen Reviewed-by: Jun Piao Reviewed-by: Joseph Qi Reviewed-by: Gang He Acked-by: Changwei Ge Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Salvatore Bonaccorso Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/aops.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index f2961b13e8c5..c26d046adaaa 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -134,6 +134,19 @@ static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, return err; } +static int ocfs2_lock_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create) +{ + int ret = 0; + struct ocfs2_inode_info *oi = OCFS2_I(inode); + + down_read(&oi->ip_alloc_sem); + ret = ocfs2_get_block(inode, iblock, bh_result, create); + up_read(&oi->ip_alloc_sem); + + return ret; +} + int ocfs2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { @@ -2120,7 +2133,7 @@ static void ocfs2_dio_free_write_ctx(struct inode *inode, * called like this: dio->get_blocks(dio->inode, fs_startblk, * fs_count, map_bh, dio->rw == WRITE); */ -static int ocfs2_dio_get_block(struct inode *inode, sector_t iblock, +static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); @@ -2146,12 +2159,9 @@ static int ocfs2_dio_get_block(struct inode *inode, sector_t iblock, * while file size will be changed. */ if (pos + total_len <= i_size_read(inode)) { - down_read(&oi->ip_alloc_sem); + /* This is the fast path for re-write. */ - ret = ocfs2_get_block(inode, iblock, bh_result, create); - - up_read(&oi->ip_alloc_sem); - + ret = ocfs2_lock_get_block(inode, iblock, bh_result, create); if (buffer_mapped(bh_result) && !buffer_new(bh_result) && ret == 0) @@ -2416,9 +2426,9 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter) return 0; if (iov_iter_rw(iter) == READ) - get_block = ocfs2_get_block; + get_block = ocfs2_lock_get_block; else - get_block = ocfs2_dio_get_block; + get_block = ocfs2_dio_wr_get_block; return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter, get_block, From f61de8ef5c905ccc959099118986c4a18ba92f1b Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Fri, 28 Oct 2016 07:58:46 +0200 Subject: [PATCH 197/970] mtd: m25p80: consider max message size in m25p80_read commit 9e276de6a367cde07c1a63522152985d4e5cca8b upstream. Consider a message size limit when calculating the maximum amount of data that can be read. The message size limit has been introduced with 4.9, so cc it to stable. Signed-off-by: Heiner Kallweit Signed-off-by: Cyrille Pitchen Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/devices/m25p80.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c index 9cf7fcd28034..16a7df2a0246 100644 --- a/drivers/mtd/devices/m25p80.c +++ b/drivers/mtd/devices/m25p80.c @@ -172,7 +172,8 @@ static ssize_t m25p80_read(struct spi_nor *nor, loff_t from, size_t len, t[1].rx_buf = buf; t[1].rx_nbits = m25p80_rx_nbits(nor); - t[1].len = min(len, spi_max_transfer_size(spi)); + t[1].len = min3(len, spi_max_transfer_size(spi), + spi_max_message_size(spi) - t[0].len); spi_message_add_tail(&t[1], &m); ret = spi_sync(spi, &m); From f5490a6ec598f201cccdd1de3e6356d51ed154b7 Mon Sep 17 00:00:00 2001 From: Jonas Gorski Date: Sun, 1 Oct 2017 13:02:15 +0200 Subject: [PATCH 198/970] bcm63xx_enet: correct clock usage commit 9c86b846ce02f7e35d7234cf090b80553eba5389 upstream. Check the return code of prepare_enable and change one last instance of enable only to prepare_enable. Also properly disable and release the clock in error paths and on remove for enetsw. Signed-off-by: Jonas Gorski Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 33 ++++++++++++++------ 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c index 08d91efceed0..3760a2be5dbc 100644 --- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -1790,7 +1790,9 @@ static int bcm_enet_probe(struct platform_device *pdev) ret = PTR_ERR(priv->mac_clk); goto out; } - clk_prepare_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk_mac; /* initialize default and fetch platform data */ priv->rx_ring_size = BCMENET_DEF_RX_DESC; @@ -1822,9 +1824,11 @@ static int bcm_enet_probe(struct platform_device *pdev) if (IS_ERR(priv->phy_clk)) { ret = PTR_ERR(priv->phy_clk); priv->phy_clk = NULL; - goto out_put_clk_mac; + goto out_disable_clk_mac; } - clk_prepare_enable(priv->phy_clk); + ret = clk_prepare_enable(priv->phy_clk); + if (ret) + goto out_put_clk_phy; } /* do minimal hardware init to be able to probe mii bus */ @@ -1915,13 +1919,16 @@ static int bcm_enet_probe(struct platform_device *pdev) out_uninit_hw: /* turn off mdc clock */ enet_writel(priv, 0, ENET_MIISC_REG); - if (priv->phy_clk) { + if (priv->phy_clk) clk_disable_unprepare(priv->phy_clk); - clk_put(priv->phy_clk); - } -out_put_clk_mac: +out_put_clk_phy: + if (priv->phy_clk) + clk_put(priv->phy_clk); + +out_disable_clk_mac: clk_disable_unprepare(priv->mac_clk); +out_put_clk_mac: clk_put(priv->mac_clk); out: free_netdev(dev); @@ -2766,7 +2773,9 @@ static int bcm_enetsw_probe(struct platform_device *pdev) ret = PTR_ERR(priv->mac_clk); goto out_unmap; } - clk_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk; priv->rx_chan = 0; priv->tx_chan = 1; @@ -2787,7 +2796,7 @@ static int bcm_enetsw_probe(struct platform_device *pdev) ret = register_netdev(dev); if (ret) - goto out_put_clk; + goto out_disable_clk; netif_carrier_off(dev); platform_set_drvdata(pdev, dev); @@ -2796,6 +2805,9 @@ static int bcm_enetsw_probe(struct platform_device *pdev) return 0; +out_disable_clk: + clk_disable_unprepare(priv->mac_clk); + out_put_clk: clk_put(priv->mac_clk); @@ -2827,6 +2839,9 @@ static int bcm_enetsw_remove(struct platform_device *pdev) res = platform_get_resource(pdev, IORESOURCE_MEM, 0); release_mem_region(res->start, resource_size(res)); + clk_disable_unprepare(priv->mac_clk); + clk_put(priv->mac_clk); + free_netdev(dev); return 0; } From 68bf812b35e1999656a1cd3ab1ffe07d023a1dfd Mon Sep 17 00:00:00 2001 From: Jonas Gorski Date: Sun, 1 Oct 2017 13:02:16 +0200 Subject: [PATCH 199/970] bcm63xx_enet: do not write to random DMA channel on BCM6345 commit d6213c1f2ad54a964b77471690264ed685718928 upstream. The DMA controller regs actually point to DMA channel 0, so the write to ENETDMA_CFG_REG will actually modify a random DMA channel. Since DMA controller registers do not exist on BCM6345, guard the write with the usual check for dma_has_sram. Signed-off-by: Jonas Gorski Signed-off-by: David S. Miller Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c index 3760a2be5dbc..c4078401b7de 100644 --- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -1063,7 +1063,8 @@ static int bcm_enet_open(struct net_device *dev) val = enet_readl(priv, ENET_CTL_REG); val |= ENET_CTL_ENABLE_MASK; enet_writel(priv, val, ENET_CTL_REG); - enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); + if (priv->dma_has_sram) + enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); enet_dmac_writel(priv, priv->dma_chan_en_mask, ENETDMAC_CHANCFG, priv->rx_chan); From af4b765a780b71f8d823bfd9123bd101fd12eb03 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 25 Aug 2017 15:47:14 +0200 Subject: [PATCH 200/970] crypto: crypto4xx - remove bad list_del commit a728a196d253530f17da5c86dc7dfbe58c5f7094 upstream. alg entries are only added to the list, after the registration was successful. If the registration failed, it was never added to the list in the first place. Signed-off-by: Christian Lamparter Signed-off-by: Herbert Xu Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/amcc/crypto4xx_core.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/amcc/crypto4xx_core.c b/drivers/crypto/amcc/crypto4xx_core.c index dae1e39139e9..ba80e8a5d73b 100644 --- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -1034,12 +1034,10 @@ int crypto4xx_register_alg(struct crypto4xx_device *sec_dev, break; } - if (rc) { - list_del(&alg->entry); + if (rc) kfree(alg); - } else { + else list_add_tail(&alg->entry, &sec_dev->alg_list); - } } return 0; From e77e7d8f6bc00ec5d0b1d5194ae8093921f0c058 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 25 Aug 2017 15:47:24 +0200 Subject: [PATCH 201/970] crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak commit 5d59ad6eea82ef8df92b4109615a0dde9d8093e9 upstream. If one of the later memory allocations in rypto4xx_build_pdr() fails: dev->pdr (and/or) dev->pdr_uinfo wouldn't be freed. crypto4xx_build_sdr() has the same issue with dev->sdr. Signed-off-by: Christian Lamparter Signed-off-by: Herbert Xu Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/amcc/crypto4xx_core.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/amcc/crypto4xx_core.c b/drivers/crypto/amcc/crypto4xx_core.c index ba80e8a5d73b..c7524bbbaf98 100644 --- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -208,7 +208,7 @@ static u32 crypto4xx_build_pdr(struct crypto4xx_device *dev) dev->pdr_pa); return -ENOMEM; } - memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); + memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); dev->shadow_sa_pool = dma_alloc_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, &dev->shadow_sa_pool_pa, @@ -241,13 +241,15 @@ static u32 crypto4xx_build_pdr(struct crypto4xx_device *dev) static void crypto4xx_destroy_pdr(struct crypto4xx_device *dev) { - if (dev->pdr != NULL) + if (dev->pdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_pd) * PPC4XX_NUM_PD, dev->pdr, dev->pdr_pa); + if (dev->shadow_sa_pool) dma_free_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, dev->shadow_sa_pool, dev->shadow_sa_pool_pa); + if (dev->shadow_sr_pool) dma_free_coherent(dev->core_dev->device, sizeof(struct sa_state_record) * PPC4XX_NUM_PD, @@ -417,12 +419,12 @@ static u32 crypto4xx_build_sdr(struct crypto4xx_device *dev) static void crypto4xx_destroy_sdr(struct crypto4xx_device *dev) { - if (dev->sdr != NULL) + if (dev->sdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_sd) * PPC4XX_NUM_SD, dev->sdr, dev->sdr_pa); - if (dev->scatter_buffer_va != NULL) + if (dev->scatter_buffer_va) dma_free_coherent(dev->core_dev->device, dev->scatter_buffer_size * PPC4XX_NUM_SD, dev->scatter_buffer_va, @@ -1191,7 +1193,7 @@ static int crypto4xx_probe(struct platform_device *ofdev) rc = crypto4xx_build_gdr(core_dev->dev); if (rc) - goto err_build_gdr; + goto err_build_pdr; rc = crypto4xx_build_sdr(core_dev->dev); if (rc) @@ -1234,12 +1236,11 @@ static int crypto4xx_probe(struct platform_device *ofdev) err_request_irq: irq_dispose_mapping(core_dev->irq); tasklet_kill(&core_dev->tasklet); - crypto4xx_destroy_sdr(core_dev->dev); err_build_sdr: + crypto4xx_destroy_sdr(core_dev->dev); crypto4xx_destroy_gdr(core_dev->dev); -err_build_gdr: - crypto4xx_destroy_pdr(core_dev->dev); err_build_pdr: + crypto4xx_destroy_pdr(core_dev->dev); kfree(core_dev->dev); err_alloc_dev: kfree(core_dev); From b76942ac26f6d94d7d50abc9646d754b9ca4c428 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Fri, 29 Jun 2018 13:28:07 -0500 Subject: [PATCH 202/970] atm: zatm: Fix potential Spectre v1 [ Upstream commit ced9e191501e52b95e1b57b8e0db00943869eed0 ] pool can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue 'zatm_dev->pool_info' (local cap) Fix this by sanitizing pool before using it to index zatm_dev->pool_info Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/atm/zatm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/atm/zatm.c b/drivers/atm/zatm.c index d0fac641e717..a0b88f148990 100644 --- a/drivers/atm/zatm.c +++ b/drivers/atm/zatm.c @@ -1483,6 +1483,8 @@ static int zatm_ioctl(struct atm_dev *dev,unsigned int cmd,void __user *arg) return -EFAULT; if (pool < 0 || pool > ZATM_LAST_POOL) return -EINVAL; + pool = array_index_nospec(pool, + ZATM_LAST_POOL + 1); if (copy_from_user(&info, &((struct zatm_pool_req __user *) arg)->info, sizeof(info))) return -EFAULT; From d7adadbf09463110a0bc9561250395c07f9011fb Mon Sep 17 00:00:00 2001 From: Xin Long Date: Thu, 21 Jun 2018 12:56:04 +0800 Subject: [PATCH 203/970] ipvlan: fix IFLA_MTU ignored on NEWLINK [ Upstream commit 30877961b1cdd6fdca783c2e8c4f0f47e95dc58c ] Commit 296d48568042 ("ipvlan: inherit MTU from master device") adjusted the mtu from the master device when creating a ipvlan device, but it would also override the mtu value set in rtnl_create_link. It causes IFLA_MTU param not to take effect. So this patch is to not adjust the mtu if IFLA_MTU param is set when creating a ipvlan device. Fixes: 296d48568042 ("ipvlan: inherit MTU from master device") Reported-by: Jianlin Shi Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ipvlan/ipvlan_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index dfbc4ef6d507..24eb5755604f 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -525,7 +525,8 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev, ipvlan->dev = dev; ipvlan->port = port; ipvlan->sfeatures = IPVLAN_FEATURES; - ipvlan_adjust_mtu(ipvlan, phy_dev); + if (!tb[IFLA_MTU]) + ipvlan_adjust_mtu(ipvlan, phy_dev); INIT_LIST_HEAD(&ipvlan->addrs); /* TODO Probably put random address here to be presented to the From 87cd5e4acdbebe0a4ec47ef5b0d31fd1f435067c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 22 Jun 2018 06:44:14 -0700 Subject: [PATCH 204/970] net: dccp: avoid crash in ccid3_hc_rx_send_feedback() [ Upstream commit 74174fe5634ffbf645a7ca5a261571f700b2f332 ] On fast hosts or malicious bots, we trigger a DCCP_BUG() which seems excessive. syzbot reported : BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback() CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline] ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline] dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654 sk_backlog_rcv include/net/sock.h:914 [inline] __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:287 [inline] ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:287 [inline] ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 process_backlog+0x219/0x760 net/core/dev.c:5373 napi_poll net/core/dev.c:5771 [inline] net_rx_action+0x7da/0x1980 net/core/dev.c:5837 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Gerrit Renker Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/ccids/ccid3.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 119c04317d48..b913ee062a81 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -624,9 +624,8 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk, case CCID3_FBACK_PERIODIC: delta = ktime_us_delta(now, hc->rx_tstamp_last_feedback); if (delta <= 0) - DCCP_BUG("delta (%ld) <= 0", (long)delta); - else - hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); + delta = 1; + hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); break; default: return; From e555ae018ba40c8259fa8bc3952e2f01d9bec89c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 22 Jun 2018 06:44:15 -0700 Subject: [PATCH 205/970] net: dccp: switch rx_tstamp_last_feedback to monotonic clock [ Upstream commit 0ce4e70ff00662ad7490e545ba0cd8c1fa179fca ] To compute delays, better not use time of the day which can be changed by admins or malicious programs. Also change ccid3_first_li() to use s64 type for delta variable to avoid potential overflows. Signed-off-by: Eric Dumazet Cc: Gerrit Renker Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/ccids/ccid3.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index b913ee062a81..03fcf3ee1534 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -599,7 +599,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk, { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - ktime_t now = ktime_get_real(); + ktime_t now = ktime_get(); s64 delta = 0; switch (fbtype) { @@ -631,7 +631,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk, return; } - ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + ccid3_pr_debug("Interval %lldusec, X_recv=%u, 1/p=%u\n", delta, hc->rx_x_recv, hc->rx_pinv); hc->rx_tstamp_last_feedback = now; @@ -678,7 +678,8 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) static u32 ccid3_first_li(struct sock *sk) { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); - u32 x_recv, p, delta; + u32 x_recv, p; + s64 delta; u64 fval; if (hc->rx_rtt == 0) { @@ -686,7 +687,9 @@ static u32 ccid3_first_li(struct sock *sk) hc->rx_rtt = DCCP_FALLBACK_RTT; } - delta = ktime_to_us(net_timedelta(hc->rx_tstamp_last_feedback)); + delta = ktime_us_delta(ktime_get(), hc->rx_tstamp_last_feedback); + if (delta <= 0) + delta = 1; x_recv = scaled_div32(hc->rx_bytes_recv, delta); if (x_recv == 0) { /* would also trigger divide-by-zero */ DCCP_WARN("X_recv==0\n"); From 5b3cc7f9b39a02a981a89e42df12dcbc610bff3f Mon Sep 17 00:00:00 2001 From: Alex Vesker Date: Fri, 25 May 2018 20:25:59 +0300 Subject: [PATCH 206/970] net/mlx5: Fix incorrect raw command length parsing [ Upstream commit 603b7bcff824740500ddfa001d7a7168b0b38542 ] The NULL character was not set correctly for the string containing the command length, this caused failures reading the output of the command due to a random length. The fix is to initialize the output length string. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 6631fb0782d7..05de3bb7f7c0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -1256,7 +1256,7 @@ static ssize_t outlen_write(struct file *filp, const char __user *buf, { struct mlx5_core_dev *dev = filp->private_data; struct mlx5_cmd_debug *dbg = &dev->cmd.dbg; - char outlen_str[8]; + char outlen_str[8] = {0}; int outlen; void *ptr; int err; @@ -1271,8 +1271,6 @@ static ssize_t outlen_write(struct file *filp, const char __user *buf, if (copy_from_user(outlen_str, buf, count)) return -EFAULT; - outlen_str[7] = 0; - err = sscanf(outlen_str, "%d", &outlen); if (err < 0) return err; From 14e9e6527a21eb37cecdc682c96979dbe7cf7e32 Mon Sep 17 00:00:00 2001 From: Shay Agroskin Date: Tue, 22 May 2018 14:14:02 +0300 Subject: [PATCH 207/970] net/mlx5: Fix wrong size allocation for QoS ETC TC regitster [ Upstream commit d14fcb8d877caf1b8d6bd65d444bf62b21f2070c ] The driver allocates wrong size (due to wrong struct name) when issuing a query/set request to NIC's register. Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Shay Agroskin Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index 34e7184e23c9..43d7c8378fb4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -575,7 +575,7 @@ EXPORT_SYMBOL_GPL(mlx5_set_port_prio_tc); static int mlx5_set_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *in, int inlen) { - u32 out[MLX5_ST_SZ_DW(qtct_reg)]; + u32 out[MLX5_ST_SZ_DW(qetc_reg)]; if (!MLX5_CAP_GEN(mdev, ets)) return -ENOTSUPP; @@ -587,7 +587,7 @@ static int mlx5_set_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *in, static int mlx5_query_port_qetcr_reg(struct mlx5_core_dev *mdev, u32 *out, int outlen) { - u32 in[MLX5_ST_SZ_DW(qtct_reg)]; + u32 in[MLX5_ST_SZ_DW(qetc_reg)]; if (!MLX5_CAP_GEN(mdev, ets)) return -ENOTSUPP; From 1f1fbe1692afda4800290644606a9175175a84bd Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Fri, 15 Jun 2018 13:27:31 +0300 Subject: [PATCH 208/970] net_sched: blackhole: tell upper qdisc about dropped packets [ Upstream commit 7e85dc8cb35abf16455f1511f0670b57c1a84608 ] When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice. In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly. This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued. Signed-off-by: Konstantin Khlebnikov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_blackhole.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c index c98a61e980ba..9c4c2bb547d7 100644 --- a/net/sched/sch_blackhole.c +++ b/net/sched/sch_blackhole.c @@ -21,7 +21,7 @@ static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { qdisc_drop(skb, sch, to_free); - return NET_XMIT_SUCCESS; + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; } static struct sk_buff *blackhole_dequeue(struct Qdisc *sch) From 32490f4d76ed005b35d826ee4807dafad95f9ba8 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 19 Jun 2018 19:18:50 -0700 Subject: [PATCH 209/970] net: sungem: fix rx checksum support [ Upstream commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ] After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), sungem owners reported the infamous "eth0: hw csum failure" message. CHECKSUM_COMPLETE has in fact never worked for this driver, but this was masked by the fact that upper stacks had to strip the FCS, and therefore skb->ip_summed was set back to CHECKSUM_NONE before my recent change. Driver configures a number of bytes to skip when the chip computes the checksum, and for some reason only half of the Ethernet header was skipped. Then a second problem is that we should strip the FCS by default, unless the driver is updated to eventually support NETIF_F_RXFCS in the future. Finally, a driver should check if NETIF_F_RXCSUM feature is enabled or not, so that the admin can turn off rx checksum if wanted. Many thanks to Andreas Schwab and Mathieu Malaterre for their help in debugging this issue. Signed-off-by: Eric Dumazet Reported-by: Meelis Roos Reported-by: Mathieu Malaterre Reported-by: Andreas Schwab Tested-by: Andreas Schwab Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/sun/sungem.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index d6ad0fbd054e..920321bf4bb6 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -59,8 +59,7 @@ #include #include "sungem.h" -/* Stripping FCS is causing problems, disabled for now */ -#undef STRIP_FCS +#define STRIP_FCS #define DEFAULT_MSG (NETIF_MSG_DRV | \ NETIF_MSG_PROBE | \ @@ -434,7 +433,7 @@ static int gem_rxmac_reset(struct gem *gp) writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW); writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) writel(((5 & RXDMA_BLANK_IPKTS) | @@ -759,7 +758,6 @@ static int gem_rx(struct gem *gp, int work_to_do) struct net_device *dev = gp->dev; int entry, drops, work_done = 0; u32 done; - __sum16 csum; if (netif_msg_rx_status(gp)) printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n", @@ -854,9 +852,13 @@ static int gem_rx(struct gem *gp, int work_to_do) skb = copy_skb; } - csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); - skb->csum = csum_unfold(csum); - skb->ip_summed = CHECKSUM_COMPLETE; + if (likely(dev->features & NETIF_F_RXCSUM)) { + __sum16 csum; + + csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); + skb->csum = csum_unfold(csum); + skb->ip_summed = CHECKSUM_COMPLETE; + } skb->protocol = eth_type_trans(skb, gp->dev); napi_gro_receive(&gp->napi, skb); @@ -1754,7 +1756,7 @@ static void gem_init_dma(struct gem *gp) writel(0, gp->regs + TXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); writel(desc_dma >> 32, gp->regs + RXDMA_DBHI); @@ -2972,8 +2974,8 @@ static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) pci_set_drvdata(pdev, dev); /* We can do scatter/gather and HW checksum */ - dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM; - dev->features |= dev->hw_features | NETIF_F_RXCSUM; + dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM; + dev->features = dev->hw_features; if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA; From a648a4636d17810f008529ec181af932cf4684b7 Mon Sep 17 00:00:00 2001 From: Sudarsana Reddy Kalluru Date: Sun, 1 Jul 2018 20:03:07 -0700 Subject: [PATCH 210/970] qed: Fix use of incorrect size in memcpy call. [ Upstream commit cc9b27cdf7bd3c86df73439758ac1564bc8f5bbe ] Use the correct size value while copying chassis/port id values. Fixes: 6ad8c632e ("qed: Add support for query/config dcbx.") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c index 9d59cb85c012..7b6824e560d2 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c @@ -677,9 +677,9 @@ qed_dcbx_get_local_lldp_params(struct qed_hwfn *p_hwfn, p_local = &p_hwfn->p_dcbx_info->lldp_local[LLDP_NEAREST_BRIDGE]; memcpy(params->lldp_local.local_chassis_id, p_local->local_chassis_id, - ARRAY_SIZE(p_local->local_chassis_id)); + sizeof(p_local->local_chassis_id)); memcpy(params->lldp_local.local_port_id, p_local->local_port_id, - ARRAY_SIZE(p_local->local_port_id)); + sizeof(p_local->local_port_id)); } static void @@ -692,9 +692,9 @@ qed_dcbx_get_remote_lldp_params(struct qed_hwfn *p_hwfn, p_remote = &p_hwfn->p_dcbx_info->lldp_remote[LLDP_NEAREST_BRIDGE]; memcpy(params->lldp_remote.peer_chassis_id, p_remote->peer_chassis_id, - ARRAY_SIZE(p_remote->peer_chassis_id)); + sizeof(p_remote->peer_chassis_id)); memcpy(params->lldp_remote.peer_port_id, p_remote->peer_port_id, - ARRAY_SIZE(p_remote->peer_port_id)); + sizeof(p_remote->peer_port_id)); } static int From 0b796049605886c51dc12a646e7de8e39fa1aedf Mon Sep 17 00:00:00 2001 From: Sudarsana Reddy Kalluru Date: Sun, 1 Jul 2018 20:03:05 -0700 Subject: [PATCH 211/970] qed: Limit msix vectors in kdump kernel to the minimum required count. [ Upstream commit bb7858ba1102f82470a917e041fd23e6385c31be ] Memory size is limited in the kdump kernel environment. Allocation of more msix-vectors (or queues) consumes few tens of MBs of memory, which might lead to the kdump kernel failure. This patch adds changes to limit the number of MSI-X vectors in kdump kernel to minimum required value (i.e., 2 per engine). Fixes: fe56b9e6a ("qed: Add module with basic common support") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qlogic/qed/qed_main.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 0b949c6d83fc..f36bd0bd37da 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "qed.h" #include "qed_sriov.h" @@ -701,6 +702,14 @@ static int qed_slowpath_setup_int(struct qed_dev *cdev, /* We want a minimum of one slowpath and one fastpath vector per hwfn */ cdev->int_params.in.min_msix_cnt = cdev->num_hwfns * 2; + if (is_kdump_kernel()) { + DP_INFO(cdev, + "Kdump kernel: Limit the max number of requested MSI-X vectors to %hd\n", + cdev->int_params.in.min_msix_cnt); + cdev->int_params.in.num_vectors = + cdev->int_params.in.min_msix_cnt; + } + rc = qed_set_int_mode(cdev, false); if (rc) { DP_ERR(cdev, "qed_slowpath_setup_int ERR\n"); From b0a508a5c854d5c06eb0f0577f286f553bf45cfb Mon Sep 17 00:00:00 2001 From: Aleksander Morgado Date: Sat, 23 Jun 2018 23:22:52 +0200 Subject: [PATCH 212/970] qmi_wwan: add support for the Dell Wireless 5821e module MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit e7e197edd09c25774b4f12cab19f9d5462f240f4 ] This module exposes two USB configurations: a QMI+AT capable setup on USB config #1 and a MBIM capable setup on USB config #2. By default the kernel will choose the MBIM capable configuration as long as the cdc_mbim driver is available. This patch adds support for the QMI port in the secondary configuration. Signed-off-by: Aleksander Morgado Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 85bc0ca61389..6d654d65f8a0 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -946,6 +946,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x413c, 0x81b3, 8)}, /* Dell Wireless 5809e Gobi(TM) 4G LTE Mobile Broadband Card (rev3) */ {QMI_FIXED_INTF(0x413c, 0x81b6, 8)}, /* Dell Wireless 5811e */ {QMI_FIXED_INTF(0x413c, 0x81b6, 10)}, /* Dell Wireless 5811e */ + {QMI_FIXED_INTF(0x413c, 0x81d7, 1)}, /* Dell Wireless 5821e */ {QMI_FIXED_INTF(0x03f0, 0x4e1d, 8)}, /* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */ {QMI_FIXED_INTF(0x03f0, 0x9d1d, 1)}, /* HP lt4120 Snapdragon X5 LTE */ {QMI_FIXED_INTF(0x22de, 0x9061, 3)}, /* WeTelecom WPD-600N */ From 3e056369903c8a5a6ae34ed29debfd45dc01fc08 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Mon, 25 Jun 2018 09:26:27 +0200 Subject: [PATCH 213/970] r8152: napi hangup fix after disconnect [ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ] When unplugging an r8152 adapter while the interface is UP, the NIC becomes unusable. usb->disconnect (aka rtl8152_disconnect) deletes napi. Then, rtl8152_disconnect calls unregister_netdev and that invokes netdev->ndo_stop (aka rtl8152_close). rtl8152_close tries to napi_disable, but the napi is already deleted by disconnect above. So the first while loop in napi_disable never finishes. This results in complete deadlock of the network layer as there is rtnl_mutex held by unregister_netdev. So avoid the call to napi_disable in rtl8152_close when the device is already gone. The other calls to usb_kill_urb, cancel_delayed_work_sync, netif_stop_queue etc. seem to be fine. The urb and netdev is not destroyed yet. Signed-off-by: Jiri Slaby Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/r8152.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index d3d89b05f66e..5988674818ed 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3327,7 +3327,8 @@ static int rtl8152_close(struct net_device *netdev) #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - napi_disable(&tp->napi); + if (!test_bit(RTL8152_UNPLUG, &tp->flags)) + napi_disable(&tp->napi); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule); From 63253726a5154b0a456667afc75a7cf17b7818d0 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Wed, 27 Jun 2018 16:04:48 -0700 Subject: [PATCH 214/970] tcp: fix Fast Open key endianness [ Upstream commit c860e997e9170a6d68f9d1e6e2cf61f572191aaf ] Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts. Reported-by: Daniele Iamartino Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/sysctl_net_ipv4.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 566cfc50f7cf..51a0039cb318 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -212,8 +212,9 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, { struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) }; struct tcp_fastopen_context *ctxt; - int ret; u32 user_key[4]; /* 16 bytes, matching TCP_FASTOPEN_KEY_LENGTH */ + __le32 key[4]; + int ret, i; tbl.data = kmalloc(tbl.maxlen, GFP_KERNEL); if (!tbl.data) @@ -222,11 +223,14 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, rcu_read_lock(); ctxt = rcu_dereference(tcp_fastopen_ctx); if (ctxt) - memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); + memcpy(key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); else - memset(user_key, 0, sizeof(user_key)); + memset(key, 0, sizeof(key)); rcu_read_unlock(); + for (i = 0; i < ARRAY_SIZE(key); i++) + user_key[i] = le32_to_cpu(key[i]); + snprintf(tbl.data, tbl.maxlen, "%08x-%08x-%08x-%08x", user_key[0], user_key[1], user_key[2], user_key[3]); ret = proc_dostring(&tbl, write, buffer, lenp, ppos); @@ -242,12 +246,16 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, * first invocation of tcp_fastopen_cookie_gen */ tcp_fastopen_init_key_once(false); - tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH); + + for (i = 0; i < ARRAY_SIZE(user_key); i++) + key[i] = cpu_to_le32(user_key[i]); + + tcp_fastopen_reset_cipher(key, TCP_FASTOPEN_KEY_LENGTH); } bad_key: pr_debug("proc FO key set 0x%x-%x-%x-%x <- 0x%s: %u\n", - user_key[0], user_key[1], user_key[2], user_key[3], + user_key[0], user_key[1], user_key[2], user_key[3], (char *)tbl.data, ret); kfree(tbl.data); return ret; From 65fb77c3bab33eae0077247f330c6097c32217f9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Fri, 29 Jun 2018 13:07:53 +0300 Subject: [PATCH 215/970] tcp: prevent bogus FRTO undos with non-SACK flows MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 1236f22fbae15df3736ab4a984c64c0c6ee6254c ] If SACK is not enabled and the first cumulative ACK after the RTO retransmission covers more than the retransmitted skb, a spurious FRTO undo will trigger (assuming FRTO is enabled for that RTO). The reason is that any non-retransmitted segment acknowledged will set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is no indication that it would have been delivered for real (the scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK case so the check for that bit won't help like it does with SACK). Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo in tcp_process_loss. We need to use more strict condition for non-SACK case and check that none of the cumulatively ACKed segments were retransmitted to prove that progress is due to original transmissions. Only then keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in non-SACK case. (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS to better indicate its purpose but to keep this change minimal, it will be done in another patch). Besides burstiness and congestion control violations, this problem can result in RTO loop: When the loss recovery is prematurely undoed, only new data will be transmitted (if available) and the next retransmission can occur only after a new RTO which in case of multiple losses (that are not for consecutive packets) requires one RTO per loss to recover. Signed-off-by: Ilpo Järvinen Tested-by: Neal Cardwell Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 8999e25fd0e1..be453aa8fce8 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3236,6 +3236,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, if (tcp_is_reno(tp)) { tcp_remove_reno_sacks(sk, pkts_acked); + + /* If any of the cumulatively ACKed segments was + * retransmitted, non-SACK case cannot confirm that + * progress was due to original transmission due to + * lack of TCPCB_SACKED_ACKED bits even if some of + * the packets may have been never retransmitted. + */ + if (flag & FLAG_RETRANS_DATA_ACKED) + flag &= ~FLAG_ORIG_SACK_ACKED; } else { int delta; From e11eb6a3f96e7c05450a8bbc908bce0c9c13c77e Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Thu, 21 Jun 2018 13:11:31 +0800 Subject: [PATCH 216/970] vhost_net: validate sock before trying to put its fd [ Upstream commit b8f1f65882f07913157c44673af7ec0b308d03eb ] Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when we meet errors during ubuf allocation, the code does not check for NULL before calling sockfd_put(), this will lead NULL dereferencing. Fixing by checking sock pointer before. Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Reported-by: Dan Carpenter Signed-off-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/vhost/net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 487586e2d8b9..353c93bc459b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1052,7 +1052,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) if (ubufs) vhost_net_ubuf_put_wait_and_free(ubufs); err_ubufs: - sockfd_put(sock); + if (sock) + sockfd_put(sock); err_vq: mutex_unlock(&vq->mutex); err: From 9dc96f7205d4bc8c99187c29dcdca5d31d6e7383 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 21 Jun 2018 14:16:02 -0700 Subject: [PATCH 217/970] net/packet: fix use-after-free [ Upstream commit 945d015ee0c3095d2290e845565a23dedfd8027c ] We should put copy_skb in receive_queue only after a successful call to virtio_net_hdr_from_skb(). syzbot report : BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline] BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline] BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553 CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __skb_unlink include/linux/skbuff.h:1843 [inline] __skb_dequeue include/linux/skbuff.h:1863 [inline] skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331 packet_release+0x630/0xd90 net/packet/af_packet.c:2991 __sock_release+0xd7/0x260 net/socket.c:603 sock_close+0x19/0x20 net/socket.c:1186 __fput+0x35b/0x8b0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1b08/0x2750 kernel/exit.c:865 do_group_exit+0x177/0x440 kernel/exit.c:968 __do_sys_exit_group kernel/exit.c:979 [inline] __se_sys_exit_group kernel/exit.c:977 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4448e9 Code: Bad RIP value. RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9 RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001 RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000 R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0 R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582 __kfree_skb net/core/skbuff.c:642 [inline] kfree_skb+0x1a5/0x580 net/core/skbuff.c:659 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801b044ecc0 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 0 bytes inside of 232-byte region [ffff8801b044ecc0, ffff8801b044eda8) The buggy address belongs to the page: page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0 raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 2c4a47f29f36..ea601f7ca2f8 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2265,6 +2265,12 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, if (po->stats.stats1.tp_drops) status |= TP_STATUS_LOSING; } + + if (do_vnet && + __packet_rcv_vnet(skb, h.raw + macoff - + sizeof(struct virtio_net_hdr))) + goto drop_n_account; + po->stats.stats1.tp_packets++; if (copy_skb) { status |= TP_STATUS_COPY; @@ -2272,14 +2278,6 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, } spin_unlock(&sk->sk_receive_queue.lock); - if (do_vnet) { - if (__packet_rcv_vnet(skb, h.raw + macoff - - sizeof(struct virtio_net_hdr))) { - spin_lock(&sk->sk_receive_queue.lock); - goto drop_n_account; - } - } - skb_copy_bits(skb, 0, h.raw + macoff, snaplen); if (!(ts_status = tpacket_get_timestamp(skb, &ts, po->tp_tstamp))) From 224d2337c000090fca8e364e9e7b616c2f67adda Mon Sep 17 00:00:00 2001 From: Alex Vesker Date: Tue, 12 Jun 2018 16:14:31 +0300 Subject: [PATCH 218/970] net/mlx5: Fix command interface race in polling mode [ Upstream commit d412c31dae053bf30a1bc15582a9990df297a660 ] The command interface can work in two modes: Events and Polling. In the general case, each time we invoke a command, a work is queued to handle it. When working in events, the interrupt handler completes the command execution. On the other hand, when working in polling mode, the work itself completes it. Due to a bug in the work handler, a command could have been completed by the interrupt handler, while the work handler hasn't finished yet, causing the it to complete once again if the command interface mode was changed from Events to polling after the interrupt handler was called. mlx5_unload_one() mlx5_stop_eqs() // Destroy the EQ before cmd EQ ...cmd_work_handler() write_doorbell() --> EVENT_TYPE_CMD mlx5_cmd_comp_handler() // First free free_ent(cmd, ent->idx) complete(&ent->done) <-- mlx5_stop_eqs //cmd was complete // move to polling before destroying the last cmd EQ mlx5_cmd_use_polling() cmd->mode = POLL; --> cmd_work_handler (continues) if (cmd->mode == POLL) mlx5_cmd_comp_handler() // Double free The solution is to store the cmd->mode before writing the doorbell. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Alex Vesker Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 05de3bb7f7c0..9680c8805178 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -784,6 +784,7 @@ static void cmd_work_handler(struct work_struct *work) struct semaphore *sem; unsigned long flags; int alloc_ret; + int cmd_mode; sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem; down(sem); @@ -830,6 +831,7 @@ static void cmd_work_handler(struct work_struct *work) set_signature(ent, !cmd->checksum_disabled); dump_command(dev, ent, 1); ent->ts1 = ktime_get_ns(); + cmd_mode = cmd->mode; if (ent->callback) schedule_delayed_work(&ent->cb_timeout_work, cb_timeout); @@ -854,7 +856,7 @@ static void cmd_work_handler(struct work_struct *work) iowrite32be(1 << ent->idx, &dev->iseg->cmd_dbell); mmiowb(); /* if not in polling don't use ent after this point */ - if (cmd->mode == CMD_MODE_POLLING) { + if (cmd_mode == CMD_MODE_POLLING) { poll_timeout(ent); /* make sure we read the descriptor after ownership is SW */ rmb(); From 53e795c755de6e20516ae4387b78c4d92f841e2b Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 16 Jul 2018 20:59:58 -0500 Subject: [PATCH 219/970] net: cxgb3_main: fix potential Spectre v1 commit 676bcfece19f83621e905aa55b5ed2d45cc4f2d3 upstream. t.qset_idx can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl() warn: potential spectre issue 'adapter->msix_info' Fix this by sanitizing t.qset_idx before using it to index adapter->msix_info Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c index 43da891fab97..dc0efbd91c32 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include "common.h" @@ -2259,6 +2260,7 @@ static int cxgb_extension_ioctl(struct net_device *dev, void __user *useraddr) if (t.qset_idx >= nqsets) return -EINVAL; + t.qset_idx = array_index_nospec(t.qset_idx, nqsets); q = &adapter->params.sge.qset[q1 + t.qset_idx]; t.rspq_size = q->rspq_size; From 254f52df9b1ecbd0ee2a9f31b506924cb61ca850 Mon Sep 17 00:00:00 2001 From: Ping-Ke Shih Date: Thu, 28 Jun 2018 10:02:27 +0800 Subject: [PATCH 220/970] rtlwifi: rtl8821ae: fix firmware is not ready to run commit 9a98302de19991d51e067b88750585203b2a3ab6 upstream. Without this patch, firmware will not run properly on rtl8821ae, and it causes bad user experience. For example, bad connection performance with low rate, higher power consumption, and so on. rtl8821ae uses two kinds of firmwares for normal and WoWlan cases, and each firmware has firmware data buffer and size individually. Original code always overwrite size of normal firmware rtlpriv->rtlhal.fwsize, and this mismatch causes firmware checksum error, then firmware can't start. In this situation, driver gives message "Firmware is not ready to run!". Fixes: fe89707f0afa ("rtlwifi: rtl8821ae: Simplify loading of WOWLAN firmware") Signed-off-by: Ping-Ke Shih Cc: Stable # 4.0+ Reviewed-by: Larry Finger Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/realtek/rtlwifi/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c b/drivers/net/wireless/realtek/rtlwifi/core.c index 4da4e458142c..9526643312d9 100644 --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -131,7 +131,6 @@ static void rtl_fw_do_work(const struct firmware *firmware, void *context, firmware->size); rtlpriv->rtlhal.wowlan_fwsize = firmware->size; } - rtlpriv->rtlhal.fwsize = firmware->size; release_firmware(firmware); } From f6ed63bc39e08fe761f9d100937d01fba9b098c1 Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Sun, 15 Jul 2018 21:53:20 +0200 Subject: [PATCH 221/970] net: lan78xx: Fix race in tx pending skb size calculation commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: https://github.com/raspberrypi/linux/issues/2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable Signed-off-by: Floris Bos Signed-off-by: Stefan Wahren Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/lan78xx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index f5a96678494b..5e0626c80b81 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -2964,6 +2964,7 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev) pkt_cnt = 0; count = 0; length = 0; + spin_lock_irqsave(&tqp->lock, flags); for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) { if (skb_is_gso(skb)) { if (pkt_cnt) { @@ -2972,7 +2973,8 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev) } count = 1; length = skb->len - TX_OVERHEAD; - skb2 = skb_dequeue(tqp); + __skb_unlink(skb, tqp); + spin_unlock_irqrestore(&tqp->lock, flags); goto gso_skb; } @@ -2981,6 +2983,7 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev) skb_totallen = skb->len + roundup(skb_totallen, sizeof(u32)); pkt_cnt++; } + spin_unlock_irqrestore(&tqp->lock, flags); /* copy to a single skb */ skb = alloc_skb(skb_totallen, GFP_ATOMIC); From 064d9e9744728d0c10ef5e22e955e8886f3dfcca Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 6 Jun 2018 12:14:56 +0200 Subject: [PATCH 222/970] netfilter: ebtables: reject non-bridge targets commit 11ff7288beb2b7da889a014aff0a7b80bf8efcf3 upstream. the ebtables evaluation loop expects targets to return positive values (jumps), or negative values (absolute verdicts). This is completely different from what xtables does. In xtables, targets are expected to return the standard netfilter verdicts, i.e. NF_DROP, NF_ACCEPT, etc. ebtables will consider these as jumps. Therefore reject any target found due to unspec fallback. v2: also reject watchers. ebtables ignores their return value, so a target that assumes skb ownership (and returns NF_STOLEN) causes use-after-free. The only watchers in the 'ebtables' front-end are log and nflog; both have AF_BRIDGE specific wrappers on kernel side. Reported-by: syzbot+2b43f681169a2a0d306a@syzkaller.appspotmail.com Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/bridge/netfilter/ebtables.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index cb6fbb525ba6..18c1f07e4f3b 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -406,6 +406,12 @@ ebt_check_watcher(struct ebt_entry_watcher *w, struct xt_tgchk_param *par, watcher = xt_request_find_target(NFPROTO_BRIDGE, w->u.name, 0); if (IS_ERR(watcher)) return PTR_ERR(watcher); + + if (watcher->family != NFPROTO_BRIDGE) { + module_put(watcher->me); + return -ENOENT; + } + w->u.watcher = watcher; par->target = watcher; @@ -727,6 +733,13 @@ ebt_check_entry(struct ebt_entry *e, struct net *net, goto cleanup_watchers; } + /* Reject UNSPEC, xtables verdicts/return values are incompatible */ + if (target->family != NFPROTO_BRIDGE) { + module_put(target->me); + ret = -ENOENT; + goto cleanup_watchers; + } + t->u.target = target; if (t->u.target == &ebt_standard_target) { if (gap < sizeof(struct ebt_standard_target)) { From ec5e52a881fe7df745a45262e486f73fc6bd6b88 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 13 Jul 2018 16:59:27 -0700 Subject: [PATCH 223/970] reiserfs: fix buffer overflow with long warning messages commit fe10e398e860955bac4d28ec031b701d358465e4 upstream. ReiserFS prepares log messages into a 1024-byte buffer with no bounds checks. Long messages, such as the "unknown mount option" warning when userspace passes a crafted mount options string, overflow this buffer. This causes KASAN to report a global-out-of-bounds write. Fix it by truncating messages to the buffer size. Link: http://lkml.kernel.org/r/20180707203621.30922-1-ebiggers3@gmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+b890b3335a4d8c608963@syzkaller.appspotmail.com Signed-off-by: Eric Biggers Reviewed-by: Andrew Morton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/reiserfs/prints.c | 141 +++++++++++++++++++++++++------------------ 1 file changed, 81 insertions(+), 60 deletions(-) diff --git a/fs/reiserfs/prints.c b/fs/reiserfs/prints.c index 4f3f928076f3..92470e5973f8 100644 --- a/fs/reiserfs/prints.c +++ b/fs/reiserfs/prints.c @@ -76,83 +76,99 @@ static char *le_type(struct reiserfs_key *key) } /* %k */ -static void sprintf_le_key(char *buf, struct reiserfs_key *key) +static int scnprintf_le_key(char *buf, size_t size, struct reiserfs_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", le32_to_cpu(key->k_dir_id), - le32_to_cpu(key->k_objectid), le_offset(key), - le_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + le32_to_cpu(key->k_dir_id), + le32_to_cpu(key->k_objectid), le_offset(key), + le_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); } /* %K */ -static void sprintf_cpu_key(char *buf, struct cpu_key *key) +static int scnprintf_cpu_key(char *buf, size_t size, struct cpu_key *key) { if (key) - sprintf(buf, "[%d %d %s %s]", key->on_disk_key.k_dir_id, - key->on_disk_key.k_objectid, reiserfs_cpu_offset(key), - cpu_type(key)); + return scnprintf(buf, size, "[%d %d %s %s]", + key->on_disk_key.k_dir_id, + key->on_disk_key.k_objectid, + reiserfs_cpu_offset(key), cpu_type(key)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); } -static void sprintf_de_head(char *buf, struct reiserfs_de_head *deh) +static int scnprintf_de_head(char *buf, size_t size, + struct reiserfs_de_head *deh) { if (deh) - sprintf(buf, - "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", - deh_offset(deh), deh_dir_id(deh), deh_objectid(deh), - deh_location(deh), deh_state(deh)); + return scnprintf(buf, size, + "[offset=%d dir_id=%d objectid=%d location=%d state=%04x]", + deh_offset(deh), deh_dir_id(deh), + deh_objectid(deh), deh_location(deh), + deh_state(deh)); else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); } -static void sprintf_item_head(char *buf, struct item_head *ih) +static int scnprintf_item_head(char *buf, size_t size, struct item_head *ih) { if (ih) { - strcpy(buf, - (ih_version(ih) == KEY_FORMAT_3_6) ? "*3.6* " : "*3.5*"); - sprintf_le_key(buf + strlen(buf), &(ih->ih_key)); - sprintf(buf + strlen(buf), ", item_len %d, item_location %d, " - "free_space(entry_count) %d", - ih_item_len(ih), ih_location(ih), ih_free_space(ih)); + char *p = buf; + char * const end = buf + size; + + p += scnprintf(p, end - p, "%s", + (ih_version(ih) == KEY_FORMAT_3_6) ? + "*3.6* " : "*3.5*"); + + p += scnprintf_le_key(p, end - p, &ih->ih_key); + + p += scnprintf(p, end - p, + ", item_len %d, item_location %d, free_space(entry_count) %d", + ih_item_len(ih), ih_location(ih), + ih_free_space(ih)); + return p - buf; } else - sprintf(buf, "[NULL]"); + return scnprintf(buf, size, "[NULL]"); } -static void sprintf_direntry(char *buf, struct reiserfs_dir_entry *de) +static int scnprintf_direntry(char *buf, size_t size, + struct reiserfs_dir_entry *de) { char name[20]; memcpy(name, de->de_name, de->de_namelen > 19 ? 19 : de->de_namelen); name[de->de_namelen > 19 ? 19 : de->de_namelen] = 0; - sprintf(buf, "\"%s\"==>[%d %d]", name, de->de_dir_id, de->de_objectid); + return scnprintf(buf, size, "\"%s\"==>[%d %d]", + name, de->de_dir_id, de->de_objectid); } -static void sprintf_block_head(char *buf, struct buffer_head *bh) +static int scnprintf_block_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, "level=%d, nr_items=%d, free_space=%d rdkey ", - B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); + return scnprintf(buf, size, + "level=%d, nr_items=%d, free_space=%d rdkey ", + B_LEVEL(bh), B_NR_ITEMS(bh), B_FREE_SPACE(bh)); } -static void sprintf_buffer_head(char *buf, struct buffer_head *bh) +static int scnprintf_buffer_head(char *buf, size_t size, struct buffer_head *bh) { - sprintf(buf, - "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", - bh->b_bdev, bh->b_size, - (unsigned long long)bh->b_blocknr, atomic_read(&(bh->b_count)), - bh->b_state, bh->b_page, - buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", - buffer_dirty(bh) ? "DIRTY" : "CLEAN", - buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); + return scnprintf(buf, size, + "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)", + bh->b_bdev, bh->b_size, + (unsigned long long)bh->b_blocknr, + atomic_read(&(bh->b_count)), + bh->b_state, bh->b_page, + buffer_uptodate(bh) ? "UPTODATE" : "!UPTODATE", + buffer_dirty(bh) ? "DIRTY" : "CLEAN", + buffer_locked(bh) ? "LOCKED" : "UNLOCKED"); } -static void sprintf_disk_child(char *buf, struct disk_child *dc) +static int scnprintf_disk_child(char *buf, size_t size, struct disk_child *dc) { - sprintf(buf, "[dc_number=%d, dc_size=%u]", dc_block_number(dc), - dc_size(dc)); + return scnprintf(buf, size, "[dc_number=%d, dc_size=%u]", + dc_block_number(dc), dc_size(dc)); } static char *is_there_reiserfs_struct(char *fmt, int *what) @@ -189,55 +205,60 @@ static void prepare_error_buf(const char *fmt, va_list args) char *fmt1 = fmt_buf; char *k; char *p = error_buf; + char * const end = &error_buf[sizeof(error_buf)]; int what; spin_lock(&error_lock); - strcpy(fmt1, fmt); + if (WARN_ON(strscpy(fmt_buf, fmt, sizeof(fmt_buf)) < 0)) { + strscpy(error_buf, "format string too long", end - error_buf); + goto out_unlock; + } while ((k = is_there_reiserfs_struct(fmt1, &what)) != NULL) { *k = 0; - p += vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args); switch (what) { case 'k': - sprintf_le_key(p, va_arg(args, struct reiserfs_key *)); + p += scnprintf_le_key(p, end - p, + va_arg(args, struct reiserfs_key *)); break; case 'K': - sprintf_cpu_key(p, va_arg(args, struct cpu_key *)); + p += scnprintf_cpu_key(p, end - p, + va_arg(args, struct cpu_key *)); break; case 'h': - sprintf_item_head(p, va_arg(args, struct item_head *)); + p += scnprintf_item_head(p, end - p, + va_arg(args, struct item_head *)); break; case 't': - sprintf_direntry(p, - va_arg(args, - struct reiserfs_dir_entry *)); + p += scnprintf_direntry(p, end - p, + va_arg(args, struct reiserfs_dir_entry *)); break; case 'y': - sprintf_disk_child(p, - va_arg(args, struct disk_child *)); + p += scnprintf_disk_child(p, end - p, + va_arg(args, struct disk_child *)); break; case 'z': - sprintf_block_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_block_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'b': - sprintf_buffer_head(p, - va_arg(args, struct buffer_head *)); + p += scnprintf_buffer_head(p, end - p, + va_arg(args, struct buffer_head *)); break; case 'a': - sprintf_de_head(p, - va_arg(args, - struct reiserfs_de_head *)); + p += scnprintf_de_head(p, end - p, + va_arg(args, struct reiserfs_de_head *)); break; } - p += strlen(p); fmt1 = k + 2; } - vsprintf(p, fmt1, args); + p += vscnprintf(p, end - p, fmt1, args); +out_unlock: spin_unlock(&error_lock); } From 3d0ce44dafd3d2540fbef3deeb5355faf009862c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 11 Jul 2018 10:46:29 -0700 Subject: [PATCH 224/970] KEYS: DNS: fix parsing multiple options commit c604cb767049b78b3075497b80ebb8fd530ea2cc upstream. My recent fix for dns_resolver_preparse() printing very long strings was incomplete, as shown by syzbot which still managed to hit the WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key: precision 50001 too large WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0 The bug this time isn't just a printing bug, but also a logical error when multiple options ("#"-separated strings) are given in the key payload. Specifically, when separating an option string into name and value, if there is no value then the name is incorrectly considered to end at the end of the key payload, rather than the end of the current option. This bypasses validation of the option length, and also means that specifying multiple options is broken -- which presumably has gone unnoticed as there is currently only one valid option anyway. A similar problem also applied to option values, as the kstrtoul() when parsing the "dnserror" option will read past the end of the current option and into the next option. Fix these bugs by correctly computing the length of the option name and by copying the option value, null-terminated, into a temporary buffer. Reproducer for the WARN_ONCE() that syzbot hit: perl -e 'print "#A#", "\0" x 50000' | keyctl padd dns_resolver desc @s Reproducer for "dnserror" option being parsed incorrectly (expected behavior is to fail when seeing the unknown option "foo", actual behavior was to read the dnserror value as "1#foo" and fail there): perl -e 'print "#dnserror=1#foo\0"' | keyctl padd dns_resolver desc @s Reported-by: syzbot Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dns_resolver/dns_key.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/net/dns_resolver/dns_key.c b/net/dns_resolver/dns_key.c index f0252768ecf4..5f5d9eafccf5 100644 --- a/net/dns_resolver/dns_key.c +++ b/net/dns_resolver/dns_key.c @@ -87,35 +87,39 @@ dns_resolver_preparse(struct key_preparsed_payload *prep) opt++; kdebug("options: '%s'", opt); do { + int opt_len, opt_nlen; const char *eq; - int opt_len, opt_nlen, opt_vlen, tmp; + char optval[128]; next_opt = memchr(opt, '#', end - opt) ?: end; opt_len = next_opt - opt; - if (opt_len <= 0 || opt_len > 128) { + if (opt_len <= 0 || opt_len > sizeof(optval)) { pr_warn_ratelimited("Invalid option length (%d) for dns_resolver key\n", opt_len); return -EINVAL; } - eq = memchr(opt, '=', opt_len) ?: end; - opt_nlen = eq - opt; - eq++; - opt_vlen = next_opt - eq; /* will be -1 if no value */ + eq = memchr(opt, '=', opt_len); + if (eq) { + opt_nlen = eq - opt; + eq++; + memcpy(optval, eq, next_opt - eq); + optval[next_opt - eq] = '\0'; + } else { + opt_nlen = opt_len; + optval[0] = '\0'; + } - tmp = opt_vlen >= 0 ? opt_vlen : 0; - kdebug("option '%*.*s' val '%*.*s'", - opt_nlen, opt_nlen, opt, tmp, tmp, eq); + kdebug("option '%*.*s' val '%s'", + opt_nlen, opt_nlen, opt, optval); /* see if it's an error number representing a DNS error * that's to be recorded as the result in this key */ if (opt_nlen == sizeof(DNS_ERRORNO_OPTION) - 1 && memcmp(opt, DNS_ERRORNO_OPTION, opt_nlen) == 0) { kdebug("dns error number option"); - if (opt_vlen <= 0) - goto bad_option_value; - ret = kstrtoul(eq, 10, &derrno); + ret = kstrtoul(optval, 10, &derrno); if (ret < 0) goto bad_option_value; From ad8b1ffc3efae2f65080bdb11145c87d299b8f9a Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 9 Jul 2018 13:43:38 +0200 Subject: [PATCH 225/970] netfilter: ipv6: nf_defrag: drop skb dst before queueing commit 84379c9afe011020e797e3f50a662b08a6355dcf upstream. Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg | tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2 Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute. Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier. Reported-by: Eric Dumazet Reported-by: John Sperbeck Signed-off-by: Florian Westphal Tested-by: Eric Dumazet Reported-by: syzbot Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/ipv6/netfilter/nf_conntrack_reasm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 64ec23388450..722a9db8c6a7 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -618,6 +618,8 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) fq->q.meat == fq->q.len && nf_ct_frag6_reasm(fq, skb, dev)) ret = 0; + else + skb_dst_drop(skb); out_unlock: spin_unlock_bh(&fq->q.lock); From 863d5568b74171ad7e703fe0b98ceea11d3632ec Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Thu, 14 Jun 2018 11:52:34 -0700 Subject: [PATCH 226/970] rds: avoid unenecessary cong_update in loop transport commit f1693c63ab133d16994cc50f773982b5905af264 upstream. Loop transport which is self loopback, remote port congestion update isn't relevant. Infact the xmit path already ignores it. Receive path needs to do the same. Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com Reviewed-by: Sowmini Varadhan Signed-off-by: Santosh Shilimkar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/loop.c | 1 + net/rds/rds.h | 5 +++++ net/rds/recv.c | 5 +++++ 3 files changed, 11 insertions(+) diff --git a/net/rds/loop.c b/net/rds/loop.c index f2bf78de5688..dac6218a460e 100644 --- a/net/rds/loop.c +++ b/net/rds/loop.c @@ -193,4 +193,5 @@ struct rds_transport rds_loop_transport = { .inc_copy_to_user = rds_message_inc_copy_to_user, .inc_free = rds_loop_inc_free, .t_name = "loopback", + .t_type = RDS_TRANS_LOOP, }; diff --git a/net/rds/rds.h b/net/rds/rds.h index 30a51fec0f63..edfc3397aa24 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -440,6 +440,11 @@ struct rds_notifier { int n_status; }; +/* Available as part of RDS core, so doesn't need to participate + * in get_preferred transport etc + */ +#define RDS_TRANS_LOOP 3 + /** * struct rds_transport - transport specific behavioural hooks * diff --git a/net/rds/recv.c b/net/rds/recv.c index cbfabdf3ff48..f16ee1b13b8d 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -94,6 +94,11 @@ static void rds_recv_rcvbuf_delta(struct rds_sock *rs, struct sock *sk, return; rs->rs_rcv_bytes += delta; + + /* loop transport doesn't send/recv congestion updates */ + if (rs->rs_transport->t_type == RDS_TRANS_LOOP) + return; + now_congested = rs->rs_rcv_bytes > rds_sk_rcvbuf(rs); rdsdebug("rs %p (%pI4:%u) recv bytes %d buf %d " From d31a56d23eac2cc53559731b5a7c17fd1a1b0450 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Wed, 18 Jul 2018 18:57:27 +0900 Subject: [PATCH 227/970] net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL. commit 3bc53be9db21040b5d2de4d455f023c8c494aa68 upstream. syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is because nfc_llcp_send_ui_frame() is retrying the loop without any delay when nonblocking nfc_alloc_send_skb() returned NULL. Since there is no need to use MSG_DONTWAIT if we retry until sock_alloc_send_pskb() succeeds, let's use blocking call. Also, in case an unexpected error occurred, let's break the loop if blocking nfc_alloc_send_skb() failed. [1] https://syzkaller.appspot.com/bug?id=4a131cc571c3733e0eff6bc673f4e36ae48f19c6 Signed-off-by: Tetsuo Handa Reported-by: syzbot Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/nfc/llcp_commands.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c index 3f266115294f..04759a0c3273 100644 --- a/net/nfc/llcp_commands.c +++ b/net/nfc/llcp_commands.c @@ -753,11 +753,14 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap, pr_debug("Fragment %zd bytes remaining %zd", frag_len, remaining_len); - pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT, + pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, 0, frag_len + LLCP_HEADER_SIZE, &err); if (pdu == NULL) { - pr_err("Could not allocate PDU\n"); - continue; + pr_err("Could not allocate PDU (error=%d)\n", err); + len -= remaining_len; + if (len == 0) + len = err; + break; } pdu = llcp_add_header(pdu, dsap, ssap, LLCP_PDU_UI); From c488ae439ded49d20fcd264dec39f30300bbd2c1 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 20 Jul 2018 10:56:12 +0100 Subject: [PATCH 228/970] arm64: assembler: introduce ldr_this_cpu Commit 1b7e2296a822dfd2349960addc42a139360ce769 upstream. Shortly we will want to load a percpu variable in the return from userspace path. We can save an instruction by folding the addition of the percpu offset into the load instruction, and this patch adds a new helper to do so. At the same time, we clean up this_cpu_ptr for consistency. As with {adr,ldr,str}_l, we change the template to take the destination register first, and name this dst. Secondly, we rename the macro to adr_this_cpu, following the scheme of adr_l, and matching the newly added ldr_this_cpu. Signed-off-by: Mark Rutland Tested-by: Laura Abbott Cc: Ard Biesheuvel Cc: James Morse Cc: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/assembler.h | 19 +++++++++++++++---- arch/arm64/kernel/entry.S | 2 +- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index bfcfec3590f6..a4f227f6a400 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -239,14 +239,25 @@ lr .req x30 // link register .endm /* + * @dst: Result of per_cpu(sym, smp_processor_id()) * @sym: The name of the per-cpu variable - * @reg: Result of per_cpu(sym, smp_processor_id()) * @tmp: scratch register */ - .macro this_cpu_ptr, sym, reg, tmp - adr_l \reg, \sym + .macro adr_this_cpu, dst, sym, tmp + adr_l \dst, \sym mrs \tmp, tpidr_el1 - add \reg, \reg, \tmp + add \dst, \dst, \tmp + .endm + + /* + * @dst: Result of READ_ONCE(per_cpu(sym, smp_processor_id())) + * @sym: The name of the per-cpu variable + * @tmp: scratch register + */ + .macro ldr_this_cpu dst, sym, tmp + adr_l \dst, \sym + mrs \tmp, tpidr_el1 + ldr \dst, [\dst, \tmp] .endm /* diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index b79e302d2a3e..7fed00b47692 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -243,7 +243,7 @@ alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0 cmp x25, tsk b.ne 9998f - this_cpu_ptr irq_stack, x25, x26 + adr_this_cpu x25, irq_stack, x26 mov x26, #IRQ_STACK_START_SP add x26, x25, x26 From 02891fdbfd1e61feec902b640d9eb415f04de693 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 20 Jul 2018 10:56:13 +0100 Subject: [PATCH 229/970] KVM: arm64: Store vcpu on the stack during __guest_enter() Commit 32b03d1059667a39e089c45ee38ec9c16332430f upstream. KVM uses tpidr_el2 as its private vcpu register, which makes sense for non-vhe world switch as only KVM can access this register. This means vhe Linux has to use tpidr_el1, which KVM has to save/restore as part of the host context. If the SDEI handler code runs behind KVMs back, it mustn't access any per-cpu variables. To allow this on systems with vhe we need to make the host use tpidr_el2, saving KVM from save/restoring it. __guest_enter() stores the host_ctxt on the stack, do the same with the vcpu. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/hyp/entry.S | 10 +++++++--- arch/arm64/kvm/hyp/hyp-entry.S | 6 +++--- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index 12ee62d6d410..9a8ab5dddd9e 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -62,8 +62,8 @@ ENTRY(__guest_enter) // Store the host regs save_callee_saved_regs x1 - // Store the host_ctxt for use at exit time - str x1, [sp, #-16]! + // Store host_ctxt and vcpu for use at exit time + stp x1, x0, [sp, #-16]! add x18, x0, #VCPU_CONTEXT @@ -159,6 +159,10 @@ abort_guest_exit_end: ENDPROC(__guest_exit) ENTRY(__fpsimd_guest_restore) + // x0: esr + // x1: vcpu + // x2-x29,lr: vcpu regs + // vcpu x0-x1 on the stack stp x2, x3, [sp, #-16]! stp x4, lr, [sp, #-16]! @@ -173,7 +177,7 @@ alternative_else alternative_endif isb - mrs x3, tpidr_el2 + mov x3, x1 ldr x0, [x3, #VCPU_HOST_CONTEXT] kern_hyp_va x0 diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index 4e9d50c3e658..acd75c905084 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -121,24 +121,24 @@ el1_trap: /* * x0: ESR_EC */ + ldr x1, [sp, #16 + 8] // vcpu stored by __guest_enter /* Guest accessed VFP/SIMD registers, save host, restore Guest */ cmp x0, #ESR_ELx_EC_FP_ASIMD b.eq __fpsimd_guest_restore - mrs x1, tpidr_el2 mov x0, #ARM_EXCEPTION_TRAP b __guest_exit el1_irq: stp x0, x1, [sp, #-16]! - mrs x1, tpidr_el2 + ldr x1, [sp, #16 + 8] mov x0, #ARM_EXCEPTION_IRQ b __guest_exit el1_error: stp x0, x1, [sp, #-16]! - mrs x1, tpidr_el2 + ldr x1, [sp, #16 + 8] mov x0, #ARM_EXCEPTION_EL1_SERROR b __guest_exit From 6a654e6939152cea11ad339a9976b4d57bee5928 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 20 Jul 2018 10:56:14 +0100 Subject: [PATCH 230/970] KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation Commit 36989e7fd386a9a5822c48691473863f8fbb404d upstream. kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init() used to store the host EL1 registers when KVM switches to a guest. Make it easier for ASM to generate pointers into this per-cpu memory by making it a static allocation. Signed-off-by: James Morse Acked-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/arm.c | 18 +++--------------- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index ef6595c7d697..d7cd5410ab5c 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -51,8 +51,8 @@ __asm__(".arch_extension virt"); #endif +DEFINE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); -static kvm_cpu_context_t __percpu *kvm_host_cpu_state; static unsigned long hyp_default_vectors; /* Per-CPU variable containing the currently running vcpu. */ @@ -338,7 +338,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) } vcpu->cpu = cpu; - vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state); + vcpu->arch.host_cpu_context = this_cpu_ptr(&kvm_host_cpu_state); kvm_arm_set_running_vcpu(vcpu); } @@ -1199,19 +1199,8 @@ static inline void hyp_cpu_pm_exit(void) } #endif -static void teardown_common_resources(void) -{ - free_percpu(kvm_host_cpu_state); -} - static int init_common_resources(void) { - kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t); - if (!kvm_host_cpu_state) { - kvm_err("Cannot allocate host CPU state\n"); - return -ENOMEM; - } - /* set size of VMID supported by CPU */ kvm_vmid_bits = kvm_get_vmid_bits(); kvm_info("%d-bit VMID\n", kvm_vmid_bits); @@ -1369,7 +1358,7 @@ static int init_hyp_mode(void) for_each_possible_cpu(cpu) { kvm_cpu_context_t *cpu_ctxt; - cpu_ctxt = per_cpu_ptr(kvm_host_cpu_state, cpu); + cpu_ctxt = per_cpu_ptr(&kvm_host_cpu_state, cpu); err = create_hyp_mappings(cpu_ctxt, cpu_ctxt + 1, PAGE_HYP); if (err) { @@ -1447,7 +1436,6 @@ int kvm_arch_init(void *opaque) out_hyp: teardown_hyp_mode(); out_err: - teardown_common_resources(); return err; } From fa043b975c9aa7dc3a770217115d114cc3ca0d48 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 20 Jul 2018 10:56:15 +0100 Subject: [PATCH 231/970] KVM: arm64: Change hyp_panic()s dependency on tpidr_el2 Commit c97e166e54b662717d20ec2e36761758d2b6a7c2 upstream. Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and on VHE they can be the same register. KVM calls hyp_panic() when anything unexpected happens. This may occur while a guest owns the EL1 registers. KVM stashes the vcpu pointer in tpidr_el2, which it uses to find the host context in order to restore the host EL1 registers before parachuting into the host's panic(). The host context is a struct kvm_cpu_context allocated in the per-cpu area, and mapped to hyp. Given the per-cpu offset for this CPU, this is easy to find. Change hyp_panic() to take a pointer to the struct kvm_cpu_context. Wrap these calls with an asm function that retrieves the struct kvm_cpu_context from the host's per-cpu area. Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during kvm init. (Later patches will make this unnecessary for VHE hosts) We print out the vcpu pointer as part of the panic message. Add a back reference to the 'running vcpu' in the host cpu context to preserve this. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/hyp/hyp-entry.S | 12 ++++++++++++ arch/arm64/kvm/hyp/s2-setup.c | 3 +++ arch/arm64/kvm/hyp/switch.c | 25 +++++++++++++------------ 4 files changed, 30 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2abb4493f4f6..04631b0efbd6 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -197,6 +197,8 @@ struct kvm_cpu_context { u64 sys_regs[NR_SYS_REGS]; u32 copro[NR_COPRO_REGS]; }; + + struct kvm_vcpu *__hyp_running_vcpu; }; typedef struct kvm_cpu_context kvm_cpu_context_t; diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index acd75c905084..f1b48ab02cd5 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -173,6 +173,18 @@ ENTRY(__hyp_do_panic) eret ENDPROC(__hyp_do_panic) +ENTRY(__hyp_panic) + /* + * '=kvm_host_cpu_state' is a host VA from the constant pool, it may + * not be accessible by this address from EL2, hyp_panic() converts + * it with kern_hyp_va() before use. + */ + ldr x0, =kvm_host_cpu_state + mrs x1, tpidr_el2 + add x0, x0, x1 + b hyp_panic +ENDPROC(__hyp_panic) + .macro invalid_vector label, target = __hyp_panic .align 2 \label: diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c index b81f4091c909..eb401dbb285e 100644 --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -84,5 +84,8 @@ u32 __hyp_text __init_stage2_translation(void) write_sysreg(val, vtcr_el2); + /* copy tpidr_el1 into tpidr_el2 for use by HYP */ + write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); + return parange; } diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index c49d09387192..c8c918302153 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -275,9 +275,9 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) u64 exit_code; vcpu = kern_hyp_va(vcpu); - write_sysreg(vcpu, tpidr_el2); host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); + host_ctxt->__hyp_running_vcpu = vcpu; guest_ctxt = &vcpu->arch.ctxt; __sysreg_save_host_state(host_ctxt); @@ -364,7 +364,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n"; -static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par) +static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par, + struct kvm_vcpu *vcpu) { unsigned long str_va; @@ -378,35 +379,35 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par) __hyp_do_panic(str_va, spsr, elr, read_sysreg(esr_el2), read_sysreg_el2(far), - read_sysreg(hpfar_el2), par, - (void *)read_sysreg(tpidr_el2)); + read_sysreg(hpfar_el2), par, vcpu); } -static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par) +static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par, + struct kvm_vcpu *vcpu) { panic(__hyp_panic_string, spsr, elr, read_sysreg_el2(esr), read_sysreg_el2(far), - read_sysreg(hpfar_el2), par, - (void *)read_sysreg(tpidr_el2)); + read_sysreg(hpfar_el2), par, vcpu); } static hyp_alternate_select(__hyp_call_panic, __hyp_call_panic_nvhe, __hyp_call_panic_vhe, ARM64_HAS_VIRT_HOST_EXTN); -void __hyp_text __noreturn __hyp_panic(void) +void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt) { + struct kvm_vcpu *vcpu = NULL; + u64 spsr = read_sysreg_el2(spsr); u64 elr = read_sysreg_el2(elr); u64 par = read_sysreg(par_el1); if (read_sysreg(vttbr_el2)) { - struct kvm_vcpu *vcpu; struct kvm_cpu_context *host_ctxt; - vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2); - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); + host_ctxt = kern_hyp_va(__host_ctxt); + vcpu = host_ctxt->__hyp_running_vcpu; __timer_save_state(vcpu); __deactivate_traps(vcpu); __deactivate_vm(vcpu); @@ -414,7 +415,7 @@ void __hyp_text __noreturn __hyp_panic(void) } /* Call panic for real */ - __hyp_call_panic()(spsr, elr, par); + __hyp_call_panic()(spsr, elr, par, vcpu); unreachable(); } From eea59020a7f2993018ccde317387031c04c62036 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 20 Jul 2018 10:56:16 +0100 Subject: [PATCH 232/970] arm64: alternatives: use tpidr_el2 on VHE hosts Commit 6d99b68933fbcf51f84fcbba49246ce1209ec193 upstream. Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1 on VHE hosts, and allows future code to blindly access per-cpu variables without triggering world-switch. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Reviewed-by: Catalin Marinas Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/alternative.h | 2 ++ arch/arm64/include/asm/assembler.h | 8 ++++++++ arch/arm64/include/asm/percpu.h | 12 ++++++++++-- arch/arm64/kernel/alternative.c | 9 +++++---- arch/arm64/kernel/cpufeature.c | 17 +++++++++++++++++ 5 files changed, 42 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h index 6e1cb8c5af4d..f9e2f69f296e 100644 --- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -11,6 +11,8 @@ #include #include +extern int alternatives_applied; + struct alt_instr { s32 orig_offset; /* offset to original instruction */ s32 alt_offset; /* offset to replacement instruction */ diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index a4f227f6a400..3f85bbcd7e40 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -245,7 +245,11 @@ lr .req x30 // link register */ .macro adr_this_cpu, dst, sym, tmp adr_l \dst, \sym +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN mrs \tmp, tpidr_el1 +alternative_else + mrs \tmp, tpidr_el2 +alternative_endif add \dst, \dst, \tmp .endm @@ -256,7 +260,11 @@ lr .req x30 // link register */ .macro ldr_this_cpu dst, sym, tmp adr_l \dst, \sym +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN mrs \tmp, tpidr_el1 +alternative_else + mrs \tmp, tpidr_el2 +alternative_endif ldr \dst, [\dst, \tmp] .endm diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h index 5394c8405e66..0d551576eb57 100644 --- a/arch/arm64/include/asm/percpu.h +++ b/arch/arm64/include/asm/percpu.h @@ -16,9 +16,14 @@ #ifndef __ASM_PERCPU_H #define __ASM_PERCPU_H +#include + static inline void set_my_cpu_offset(unsigned long off) { - asm volatile("msr tpidr_el1, %0" :: "r" (off) : "memory"); + asm volatile(ALTERNATIVE("msr tpidr_el1, %0", + "msr tpidr_el2, %0", + ARM64_HAS_VIRT_HOST_EXTN) + :: "r" (off) : "memory"); } static inline unsigned long __my_cpu_offset(void) @@ -29,7 +34,10 @@ static inline unsigned long __my_cpu_offset(void) * We want to allow caching the value, so avoid using volatile and * instead use a fake stack read to hazard against barrier(). */ - asm("mrs %0, tpidr_el1" : "=r" (off) : + asm(ALTERNATIVE("mrs %0, tpidr_el1", + "mrs %0, tpidr_el2", + ARM64_HAS_VIRT_HOST_EXTN) + : "=r" (off) : "Q" (*(const unsigned long *)current_stack_pointer)); return off; diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index 06d650f61da7..f56f9894ef6b 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -32,6 +32,8 @@ #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) +int alternatives_applied; + struct alt_region { struct alt_instr *begin; struct alt_instr *end; @@ -142,7 +144,6 @@ static void __apply_alternatives(void *alt_region) */ static int __apply_alternatives_multi_stop(void *unused) { - static int patched = 0; struct alt_region region = { .begin = (struct alt_instr *)__alt_instructions, .end = (struct alt_instr *)__alt_instructions_end, @@ -150,14 +151,14 @@ static int __apply_alternatives_multi_stop(void *unused) /* We always have a CPU 0 at this point (__init) */ if (smp_processor_id()) { - while (!READ_ONCE(patched)) + while (!READ_ONCE(alternatives_applied)) cpu_relax(); isb(); } else { - BUG_ON(patched); + BUG_ON(alternatives_applied); __apply_alternatives(®ion); /* Barriers provided by the cache flushing */ - WRITE_ONCE(patched, 1); + WRITE_ONCE(alternatives_applied, 1); } return 0; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 625c2b240ffb..ab15747a49d4 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -829,6 +829,22 @@ static int __init parse_kpti(char *str) early_param("kpti", parse_kpti); #endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */ +static int cpu_copy_el2regs(void *__unused) +{ + /* + * Copy register values that aren't redirected by hardware. + * + * Before code patching, we only set tpidr_el1, all CPUs need to copy + * this value to tpidr_el2 before we patch the code. Once we've done + * that, freshly-onlined CPUs will set tpidr_el2, so we don't need to + * do anything here. + */ + if (!alternatives_applied) + write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); + + return 0; +} + static const struct arm64_cpu_capabilities arm64_features[] = { { .desc = "GIC system register CPU interface", @@ -895,6 +911,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .capability = ARM64_HAS_VIRT_HOST_EXTN, .def_scope = SCOPE_SYSTEM, .matches = runs_at_el2, + .enable = cpu_copy_el2regs, }, { .desc = "32-bit EL0 Support", From 8bace8ac81580d0e1b95563411bbf898b1d19787 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 20 Jul 2018 10:56:17 +0100 Subject: [PATCH 233/970] KVM: arm64: Stop save/restoring host tpidr_el1 on VHE Commit 1f742679c33bc083722cb0b442a95d458c491b56 upstream. Now that a VHE host uses tpidr_el2 for the cpu offset we no longer need KVM to save/restore tpidr_el1. Move this from the 'common' code into the non-vhe code. While we're at it, on VHE we don't need to save the ELR or SPSR as kernel_entry in entry.S will have pushed these onto the kernel stack, and will restore them from there. Move these to the non-vhe code as we need them to get back to the host. Finally remove the always-copy-tpidr we hid in the stage2 setup code, cpufeature's enable callback will do this for VHE, we only need KVM to do it for non-vhe. Add the copy into kvm-init instead. Signed-off-by: James Morse Reviewed-by: Christoffer Dall Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/hyp-init.S | 4 ++++ arch/arm64/kvm/hyp/s2-setup.c | 3 --- arch/arm64/kvm/hyp/sysreg-sr.c | 16 ++++++++-------- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S index 4bbff904169d..db5efaf2a985 100644 --- a/arch/arm64/kvm/hyp-init.S +++ b/arch/arm64/kvm/hyp-init.S @@ -118,6 +118,10 @@ CPU_BE( orr x4, x4, #SCTLR_ELx_EE) kern_hyp_va x2 msr vbar_el2, x2 + /* copy tpidr_el1 into tpidr_el2 for use by HYP */ + mrs x1, tpidr_el1 + msr tpidr_el2, x1 + /* Hello, World! */ eret ENDPROC(__kvm_hyp_init) diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c index eb401dbb285e..b81f4091c909 100644 --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -84,8 +84,5 @@ u32 __hyp_text __init_stage2_translation(void) write_sysreg(val, vtcr_el2); - /* copy tpidr_el1 into tpidr_el2 for use by HYP */ - write_sysreg(read_sysreg(tpidr_el1), tpidr_el2); - return parange; } diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c index 934137647837..c54cc2afb92b 100644 --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -27,8 +27,8 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { } /* * Non-VHE: Both host and guest must save everything. * - * VHE: Host must save tpidr*_el[01], actlr_el1, mdscr_el1, sp0, pc, - * pstate, and guest must save everything. + * VHE: Host must save tpidr*_el0, actlr_el1, mdscr_el1, sp_el0, + * and guest must save everything. */ static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt) @@ -36,11 +36,8 @@ static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt) ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1); ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0); ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0); - ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1); ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1); ctxt->gp_regs.regs.sp = read_sysreg(sp_el0); - ctxt->gp_regs.regs.pc = read_sysreg_el2(elr); - ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr); } static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) @@ -62,10 +59,13 @@ static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair); ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl); ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1); + ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1); ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1); ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr); ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr); + ctxt->gp_regs.regs.pc = read_sysreg_el2(elr); + ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr); } static hyp_alternate_select(__sysreg_call_save_host_state, @@ -89,11 +89,8 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1); write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0); write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0); - write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1); write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1); write_sysreg(ctxt->gp_regs.regs.sp, sp_el0); - write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); - write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); } static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt) @@ -115,10 +112,13 @@ static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt) write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair); write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl); write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1); + write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1); write_sysreg(ctxt->gp_regs.sp_el1, sp_el1); write_sysreg_el1(ctxt->gp_regs.elr_el1, elr); write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr); + write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); + write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); } static hyp_alternate_select(__sysreg_call_restore_host_state, From 3e75f25aadb52f64e124f88e4eb1e6128900c06a Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:18 +0100 Subject: [PATCH 234/970] arm64: alternatives: Add dynamic patching feature Commit dea5e2a4c5bcf196f879a66cebdcca07793e8ba4 upstream. We've so far relied on a patching infrastructure that only gave us a single alternative, without any way to provide a range of potential replacement instructions. For a single feature, this is an all or nothing thing. It would be interesting to have a more flexible grained way of patching the kernel though, where we could dynamically tune the code that gets injected. In order to achive this, let's introduce a new form of dynamic patching, assiciating a callback to a patching site. This callback gets source and target locations of the patching request, as well as the number of instructions to be patched. Dynamic patching is declared with the new ALTERNATIVE_CB and alternative_cb directives: asm volatile(ALTERNATIVE_CB("mov %0, #0\n", callback) : "r" (v)); or alternative_cb callback mov x0, #0 alternative_cb_end where callback is the C function computing the alternative. Reviewed-by: Christoffer Dall Reviewed-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/alternative.h | 41 ++++++++++++++++++++++--- arch/arm64/kernel/alternative.c | 45 ++++++++++++++++++++-------- 2 files changed, 70 insertions(+), 16 deletions(-) diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h index f9e2f69f296e..7e842dcae450 100644 --- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -4,6 +4,8 @@ #include #include +#define ARM64_CB_PATCH ARM64_NCAPS + #ifndef __ASSEMBLY__ #include @@ -21,12 +23,19 @@ struct alt_instr { u8 alt_len; /* size of new instruction(s), <= orig_len */ }; +typedef void (*alternative_cb_t)(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst); + void __init apply_alternatives_all(void); void apply_alternatives(void *start, size_t length); -#define ALTINSTR_ENTRY(feature) \ +#define ALTINSTR_ENTRY(feature,cb) \ " .word 661b - .\n" /* label */ \ + " .if " __stringify(cb) " == 0\n" \ " .word 663f - .\n" /* new instruction */ \ + " .else\n" \ + " .word " __stringify(cb) "- .\n" /* callback */ \ + " .endif\n" \ " .hword " __stringify(feature) "\n" /* feature bit */ \ " .byte 662b-661b\n" /* source len */ \ " .byte 664f-663f\n" /* replacement len */ @@ -44,15 +53,18 @@ void apply_alternatives(void *start, size_t length); * but most assemblers die if insn1 or insn2 have a .inst. This should * be fixed in a binutils release posterior to 2.25.51.0.2 (anything * containing commit 4e4d08cf7399b606 or c1baaddf8861). + * + * Alternatives with callbacks do not generate replacement instructions. */ -#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled) \ +#define __ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg_enabled, cb) \ ".if "__stringify(cfg_enabled)" == 1\n" \ "661:\n\t" \ oldinstr "\n" \ "662:\n" \ ".pushsection .altinstructions,\"a\"\n" \ - ALTINSTR_ENTRY(feature) \ + ALTINSTR_ENTRY(feature,cb) \ ".popsection\n" \ + " .if " __stringify(cb) " == 0\n" \ ".pushsection .altinstr_replacement, \"a\"\n" \ "663:\n\t" \ newinstr "\n" \ @@ -60,11 +72,17 @@ void apply_alternatives(void *start, size_t length); ".popsection\n\t" \ ".org . - (664b-663b) + (662b-661b)\n\t" \ ".org . - (662b-661b) + (664b-663b)\n" \ + ".else\n\t" \ + "663:\n\t" \ + "664:\n\t" \ + ".endif\n" \ ".endif\n" #define _ALTERNATIVE_CFG(oldinstr, newinstr, feature, cfg, ...) \ - __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg)) + __ALTERNATIVE_CFG(oldinstr, newinstr, feature, IS_ENABLED(cfg), 0) +#define ALTERNATIVE_CB(oldinstr, cb) \ + __ALTERNATIVE_CFG(oldinstr, "NOT_AN_INSTRUCTION", ARM64_CB_PATCH, 1, cb) #else #include @@ -131,6 +149,14 @@ void apply_alternatives(void *start, size_t length); 661: .endm +.macro alternative_cb cb + .set .Lasm_alt_mode, 0 + .pushsection .altinstructions, "a" + altinstruction_entry 661f, \cb, ARM64_CB_PATCH, 662f-661f, 0 + .popsection +661: +.endm + /* * Provide the other half of the alternative code sequence. */ @@ -156,6 +182,13 @@ void apply_alternatives(void *start, size_t length); .org . - (662b-661b) + (664b-663b) .endm +/* + * Callback-based alternative epilogue + */ +.macro alternative_cb_end +662: +.endm + /* * Provides a trivial alternative or default sequence consisting solely * of NOPs. The number of NOPs is chosen automatically to match the diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index f56f9894ef6b..091748095140 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -28,7 +28,7 @@ #include #include -#define __ALT_PTR(a,f) (u32 *)((void *)&(a)->f + (a)->f) +#define __ALT_PTR(a,f) ((void *)&(a)->f + (a)->f) #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) @@ -107,31 +107,52 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 *insnptr, u32 *altinsnptr) return insn; } +static void patch_alternative(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + __le32 *replptr; + int i; + + replptr = ALT_REPL_PTR(alt); + for (i = 0; i < nr_inst; i++) { + u32 insn; + + insn = get_alt_insn(alt, origptr + i, replptr + i); + updptr[i] = cpu_to_le32(insn); + } +} + static void __apply_alternatives(void *alt_region) { struct alt_instr *alt; struct alt_region *region = alt_region; - u32 *origptr, *replptr; + __le32 *origptr; + alternative_cb_t alt_cb; for (alt = region->begin; alt < region->end; alt++) { - u32 insn; - int i, nr_inst; + int nr_inst; - if (!cpus_have_cap(alt->cpufeature)) + /* Use ARM64_CB_PATCH as an unconditional patch */ + if (alt->cpufeature < ARM64_CB_PATCH && + !cpus_have_cap(alt->cpufeature)) continue; - BUG_ON(alt->alt_len != alt->orig_len); + if (alt->cpufeature == ARM64_CB_PATCH) + BUG_ON(alt->alt_len != 0); + else + BUG_ON(alt->alt_len != alt->orig_len); pr_info_once("patching kernel code\n"); origptr = ALT_ORIG_PTR(alt); - replptr = ALT_REPL_PTR(alt); - nr_inst = alt->alt_len / sizeof(insn); + nr_inst = alt->orig_len / AARCH64_INSN_SIZE; - for (i = 0; i < nr_inst; i++) { - insn = get_alt_insn(alt, origptr + i, replptr + i); - *(origptr + i) = cpu_to_le32(insn); - } + if (alt->cpufeature < ARM64_CB_PATCH) + alt_cb = patch_alternative; + else + alt_cb = ALT_REPL_PTR(alt); + + alt_cb(alt, origptr, origptr, nr_inst); flush_icache_range((uintptr_t)origptr, (uintptr_t)(origptr + nr_inst)); From 42768259386bd29562954cb815237bf0608ebbae Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:19 +0100 Subject: [PATCH 235/970] KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state Commit 44a497abd621a71c645f06d3d545ae2f46448830 upstream. kvm_vgic_global_state is part of the read-only section, and is usually accessed using a PC-relative address generation (adrp + add). It is thus useless to use kern_hyp_va() on it, and actively problematic if kern_hyp_va() becomes non-idempotent. On the other hand, there is no way that the compiler is going to guarantee that such access is always PC relative. So let's bite the bullet and provide our own accessor. Acked-by: Catalin Marinas Reviewed-by: James Morse Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/kvm_mmu.h | 7 +++++++ arch/arm64/include/asm/kvm_mmu.h | 20 ++++++++++++++++++++ virt/kvm/arm/hyp/vgic-v2-sr.c | 2 +- 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 7f66b1b3aca1..6a0d496c4145 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -28,6 +28,13 @@ */ #define kern_hyp_va(kva) (kva) +/* Contrary to arm64, there is no need to generate a PC-relative address */ +#define hyp_symbol_addr(s) \ + ({ \ + typeof(s) *addr = &(s); \ + addr; \ + }) + /* * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 824c83db9b47..cd7c9a3e0a03 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -130,6 +130,26 @@ static inline unsigned long __kern_hyp_va(unsigned long v) #define kern_hyp_va(v) ((typeof(v))(__kern_hyp_va((unsigned long)(v)))) +/* + * Obtain the PC-relative address of a kernel symbol + * s: symbol + * + * The goal of this macro is to return a symbol's address based on a + * PC-relative computation, as opposed to a loading the VA from a + * constant pool or something similar. This works well for HYP, as an + * absolute VA is guaranteed to be wrong. Only use this if trying to + * obtain the address of a symbol (i.e. not something you obtained by + * following a pointer). + */ +#define hyp_symbol_addr(s) \ + ({ \ + typeof(s) *addr; \ + asm("adrp %0, %1\n" \ + "add %0, %0, :lo12:%1\n" \ + : "=r" (addr) : "S" (&s)); \ + addr; \ + }) + /* * We currently only support a 40bit IPA. */ diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c index 95021246ee26..3d6dbdf850aa 100644 --- a/virt/kvm/arm/hyp/vgic-v2-sr.c +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c @@ -203,7 +203,7 @@ int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu) return -1; rd = kvm_vcpu_dabt_get_rd(vcpu); - addr = kern_hyp_va((kern_hyp_va(&kvm_vgic_global_state))->vcpu_base_va); + addr = kern_hyp_va(hyp_symbol_addr(kvm_vgic_global_state)->vcpu_base_va); addr += fault_ipa - vgic->vgic_cpu_base; if (kvm_vcpu_dabt_iswrite(vcpu)) { From cab367c1c9b7797d7747a523935a90dc05ff24a1 Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Fri, 20 Jul 2018 10:56:20 +0100 Subject: [PATCH 236/970] KVM: arm64: Avoid storing the vcpu pointer on the stack Commit 4464e210de9e80e38de59df052fe09ea2ff80b1b upstream. We already have the percpu area for the host cpu state, which points to the VCPU, so there's no need to store the VCPU pointer on the stack on every context switch. We can be a little more clever and just use tpidr_el2 for the percpu offset and load the VCPU pointer from the host context. This has the benefit of being able to retrieve the host context even when our stack is corrupted, and it has a potential performance benefit because we trade a store plus a load for an mrs and a load on a round trip to the guest. This does require us to calculate the percpu offset without including the offset from the kernel mapping of the percpu array to the linear mapping of the array (which is what we store in tpidr_el1), because a PC-relative generated address in EL2 is already giving us the hyp alias of the linear mapping of a kernel address. We do this in __cpu_init_hyp_mode() by using kvm_ksym_ref(). The code that accesses ESR_EL2 was previously using an alternative to use the _EL1 accessor on VHE systems, but this was actually unnecessary as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2 accessor does the same thing on both systems. Cc: Ard Biesheuvel Reviewed-by: Marc Zyngier Reviewed-by: Andrew Jones Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/kvm_asm.h | 15 +++++++++++++++ arch/arm64/include/asm/kvm_host.h | 15 +++++++++++++++ arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp/entry.S | 6 +----- arch/arm64/kvm/hyp/hyp-entry.S | 28 ++++++++++------------------ arch/arm64/kvm/hyp/switch.c | 5 +---- arch/arm64/kvm/hyp/sysreg-sr.c | 5 +++++ 7 files changed, 48 insertions(+), 27 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index ec3553eb9349..217630dc51c9 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -33,6 +33,7 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT) +/* Translate a kernel address of @sym into its equivalent linear mapping */ #define kvm_ksym_ref(sym) \ ({ \ void *val = &sym; \ @@ -65,6 +66,20 @@ extern u32 __kvm_get_mdcr_el2(void); extern u32 __init_stage2_translation(void); +#else /* __ASSEMBLY__ */ + +.macro get_host_ctxt reg, tmp + adr_l \reg, kvm_host_cpu_state + mrs \tmp, tpidr_el2 + add \reg, \reg, \tmp +.endm + +.macro get_vcpu_ptr vcpu, ctxt + get_host_ctxt \ctxt, \vcpu + ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU] + kern_hyp_va \vcpu +.endm + #endif #endif /* __ARM_KVM_ASM_H__ */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 04631b0efbd6..834ae734db41 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -356,10 +356,15 @@ int kvm_perf_teardown(void); struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); +void __kvm_set_tpidr_el2(u64 tpidr_el2); +DECLARE_PER_CPU(kvm_cpu_context_t, kvm_host_cpu_state); + static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, unsigned long hyp_stack_ptr, unsigned long vector_ptr) { + u64 tpidr_el2; + /* * Call initialization code, and switch to the full blown HYP code. * If the cpucaps haven't been finalized yet, something has gone very @@ -368,6 +373,16 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr, */ BUG_ON(!static_branch_likely(&arm64_const_caps_ready)); __kvm_call_hyp((void *)pgd_ptr, hyp_stack_ptr, vector_ptr); + + /* + * Calculate the raw per-cpu offset without a translation from the + * kernel's mapping to the linear mapping, and store it in tpidr_el2 + * so that we can use adr_l to access per-cpu variables in EL2. + */ + tpidr_el2 = (u64)this_cpu_ptr(&kvm_host_cpu_state) + - (u64)kvm_ksym_ref(kvm_host_cpu_state); + + kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2); } void __kvm_hyp_teardown(void); diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 5f4bf3c6f016..6e6375e07778 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -132,6 +132,7 @@ int main(void) DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2])); DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context)); + DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu)); #endif #ifdef CONFIG_CPU_PM DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx)); diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index 9a8ab5dddd9e..a360ac6e89e9 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -62,9 +62,6 @@ ENTRY(__guest_enter) // Store the host regs save_callee_saved_regs x1 - // Store host_ctxt and vcpu for use at exit time - stp x1, x0, [sp, #-16]! - add x18, x0, #VCPU_CONTEXT // Restore guest regs x0-x17 @@ -118,8 +115,7 @@ ENTRY(__guest_exit) // Store the guest regs x19-x29, lr save_callee_saved_regs x1 - // Restore the host_ctxt from the stack - ldr x2, [sp], #16 + get_host_ctxt x2, x3 // Now restore the host regs restore_callee_saved_regs x2 diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index f1b48ab02cd5..3418c1dda236 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -72,13 +72,8 @@ ENDPROC(__kvm_hyp_teardown) el1_sync: // Guest trapped into EL2 stp x0, x1, [sp, #-16]! -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN - mrs x1, esr_el2 -alternative_else - mrs x1, esr_el1 -alternative_endif - lsr x0, x1, #ESR_ELx_EC_SHIFT - + mrs x0, esr_el2 + lsr x0, x0, #ESR_ELx_EC_SHIFT cmp x0, #ESR_ELx_EC_HVC64 ccmp x0, #ESR_ELx_EC_HVC32, #4, ne b.ne el1_trap @@ -118,10 +113,14 @@ el1_hvc_guest: eret el1_trap: + get_vcpu_ptr x1, x0 + + mrs x0, esr_el2 + lsr x0, x0, #ESR_ELx_EC_SHIFT /* * x0: ESR_EC + * x1: vcpu pointer */ - ldr x1, [sp, #16 + 8] // vcpu stored by __guest_enter /* Guest accessed VFP/SIMD registers, save host, restore Guest */ cmp x0, #ESR_ELx_EC_FP_ASIMD @@ -132,13 +131,13 @@ el1_trap: el1_irq: stp x0, x1, [sp, #-16]! - ldr x1, [sp, #16 + 8] + get_vcpu_ptr x1, x0 mov x0, #ARM_EXCEPTION_IRQ b __guest_exit el1_error: stp x0, x1, [sp, #-16]! - ldr x1, [sp, #16 + 8] + get_vcpu_ptr x1, x0 mov x0, #ARM_EXCEPTION_EL1_SERROR b __guest_exit @@ -174,14 +173,7 @@ ENTRY(__hyp_do_panic) ENDPROC(__hyp_do_panic) ENTRY(__hyp_panic) - /* - * '=kvm_host_cpu_state' is a host VA from the constant pool, it may - * not be accessible by this address from EL2, hyp_panic() converts - * it with kern_hyp_va() before use. - */ - ldr x0, =kvm_host_cpu_state - mrs x1, tpidr_el2 - add x0, x0, x1 + get_host_ctxt x0, x1 b hyp_panic ENDPROC(__hyp_panic) diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index c8c918302153..57f4e6145664 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -395,7 +395,7 @@ static hyp_alternate_select(__hyp_call_panic, __hyp_call_panic_nvhe, __hyp_call_panic_vhe, ARM64_HAS_VIRT_HOST_EXTN); -void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt) +void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt) { struct kvm_vcpu *vcpu = NULL; @@ -404,9 +404,6 @@ void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *__host_ctxt) u64 par = read_sysreg(par_el1); if (read_sysreg(vttbr_el2)) { - struct kvm_cpu_context *host_ctxt; - - host_ctxt = kern_hyp_va(__host_ctxt); vcpu = host_ctxt->__hyp_running_vcpu; __timer_save_state(vcpu); __deactivate_traps(vcpu); diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c index c54cc2afb92b..e19d89cabf2a 100644 --- a/arch/arm64/kvm/hyp/sysreg-sr.c +++ b/arch/arm64/kvm/hyp/sysreg-sr.c @@ -183,3 +183,8 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu) if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY) write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2); } + +void __hyp_text __kvm_set_tpidr_el2(u64 tpidr_el2) +{ + asm("msr tpidr_el2, %0": : "r" (tpidr_el2)); +} From d1b5c1958391b921bce2ffd95ddfcd3f1e950c99 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:21 +0100 Subject: [PATCH 237/970] arm/arm64: smccc: Add SMCCC-specific return codes commit eff0e9e1078ea7dc1d794dc50e31baef984c46d7 upstream. We've so far used the PSCI return codes for SMCCC because they were extremely similar. But with the new ARM DEN 0070A specification, "NOT_REQUIRED" (-2) is clashing with PSCI's "PSCI_RET_INVALID_PARAMS". Let's bite the bullet and add SMCCC specific return codes. Users can be repainted as and when required. Acked-by: Will Deacon Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- include/linux/arm-smccc.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index a031897fca76..c89da86de99f 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -291,5 +291,10 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, */ #define arm_smccc_1_1_hvc(...) __arm_smccc_1_1(SMCCC_HVC_INST, __VA_ARGS__) +/* Return codes defined in ARM DEN 0070A */ +#define SMCCC_RET_SUCCESS 0 +#define SMCCC_RET_NOT_SUPPORTED -1 +#define SMCCC_RET_NOT_REQUIRED -2 + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/ From be331630903b1c6355178d116d3bfe52f20a3ac7 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:22 +0100 Subject: [PATCH 238/970] arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1 commit 8e2906245f1e3b0d027169d9f2e55ce0548cb96e upstream. In order for the kernel to protect itself, let's call the SSBD mitigation implemented by the higher exception level (either hypervisor or firmware) on each transition between userspace and kernel. We must take the PSCI conduit into account in order to target the right exception level, hence the introduction of a runtime patching callback. Reviewed-by: Mark Rutland Reviewed-by: Julien Grall Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/cpu_errata.c | 24 ++++++++++++++++++++++++ arch/arm64/kernel/entry.S | 22 ++++++++++++++++++++++ include/linux/arm-smccc.h | 5 +++++ 3 files changed, 51 insertions(+) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 2de62aa91303..d4cbdbe1d1eb 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -187,6 +187,30 @@ static int enable_smccc_arch_workaround_1(void *data) } #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */ +#ifdef CONFIG_ARM64_SSBD +void __init arm64_update_smccc_conduit(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + u32 insn; + + BUG_ON(nr_inst != 1); + + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + insn = aarch64_insn_get_hvc_value(); + break; + case PSCI_CONDUIT_SMC: + insn = aarch64_insn_get_smc_value(); + break; + default: + return; + } + + *updptr = cpu_to_le32(insn); +} +#endif /* CONFIG_ARM64_SSBD */ + #define MIDR_RANGE(model, min, max) \ .def_scope = SCOPE_LOCAL_CPU, \ .matches = is_affected_midr_range, \ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 7fed00b47692..7651e9164943 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -18,6 +18,7 @@ * along with this program. If not, see . */ +#include #include #include @@ -95,6 +96,18 @@ alternative_else_nop_endif add \dst, \dst, #(\sym - .entry.tramp.text) .endm + // This macro corrupts x0-x3. It is the caller's duty + // to save/restore them if required. + .macro apply_ssbd, state +#ifdef CONFIG_ARM64_SSBD + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + mov w1, #\state +alternative_cb arm64_update_smccc_conduit + nop // Patched to SMC/HVC #0 +alternative_cb_end +#endif + .endm + .macro kernel_entry, el, regsize = 64 .if \regsize == 32 mov w0, w0 // zero upper 32 bits of x0 @@ -122,6 +135,13 @@ alternative_else_nop_endif ldr x19, [tsk, #TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling. + apply_ssbd 1 + +#ifdef CONFIG_ARM64_SSBD + ldp x0, x1, [sp, #16 * 0] + ldp x2, x3, [sp, #16 * 1] +#endif + mov x29, xzr // fp pointed to user-space .else add x21, sp, #S_FRAME_SIZE @@ -190,6 +210,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: + apply_ssbd 0 + .endif msr elr_el1, x21 // set up the return data msr spsr_el1, x22 diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index c89da86de99f..ca1d2cc2cdfa 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -80,6 +80,11 @@ ARM_SMCCC_SMC_32, \ 0, 0x8000) +#define ARM_SMCCC_ARCH_WORKAROUND_2 \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 0x7fff) + #ifndef __ASSEMBLY__ #include From d8174bd75c8b9d3ed5598f39f0a3c7050e9cbbe6 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:23 +0100 Subject: [PATCH 239/970] arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2 commit 5cf9ce6e5ea50f805c6188c04ed0daaec7b6887d upstream. In a heterogeneous system, we can end up with both affected and unaffected CPUs. Let's check their status before calling into the firmware. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/cpu_errata.c | 2 ++ arch/arm64/kernel/entry.S | 11 +++++++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index d4cbdbe1d1eb..a093907214bf 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -188,6 +188,8 @@ static int enable_smccc_arch_workaround_1(void *data) #endif /* CONFIG_HARDEN_BRANCH_PREDICTOR */ #ifdef CONFIG_ARM64_SSBD +DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 7651e9164943..0f2510bd1ef9 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -98,8 +98,10 @@ alternative_else_nop_endif // This macro corrupts x0-x3. It is the caller's duty // to save/restore them if required. - .macro apply_ssbd, state + .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD + ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 + cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit @@ -135,12 +137,13 @@ alternative_cb_end ldr x19, [tsk, #TI_FLAGS] // since we can unmask debug disable_step_tsk x19, x20 // exceptions when scheduling. - apply_ssbd 1 + apply_ssbd 1, 1f, x22, x23 #ifdef CONFIG_ARM64_SSBD ldp x0, x1, [sp, #16 * 0] ldp x2, x3, [sp, #16 * 1] #endif +1: mov x29, xzr // fp pointed to user-space .else @@ -210,8 +213,8 @@ alternative_if ARM64_WORKAROUND_845719 alternative_else_nop_endif #endif 3: - apply_ssbd 0 - + apply_ssbd 0, 5f, x0, x1 +5: .endif msr elr_el1, x21 // set up the return data msr spsr_el1, x22 From e7037bd9fc07329f8a2e6383c65876a35ba33a12 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:24 +0100 Subject: [PATCH 240/970] arm64: Add ARCH_WORKAROUND_2 probing commit a725e3dda1813ed306734823ac4c65ca04e38500 upstream. As for Spectre variant-2, we rely on SMCCC 1.1 to provide the discovery mechanism for detecting the SSBD mitigation. A new capability is also allocated for that purpose, and a config option. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Reviewed-by: Suzuki K Poulose Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/Kconfig | 9 +++++ arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/kernel/cpu_errata.c | 69 ++++++++++++++++++++++++++++++++ 3 files changed, 80 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index d0df3611d1e2..3e43874568f9 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -776,6 +776,15 @@ config HARDEN_BRANCH_PREDICTOR If unsure, say Y. +config ARM64_SSBD + bool "Speculative Store Bypass Disable" if EXPERT + default y + help + This enables mitigation of the bypassing of previous stores + by speculative loads. + + If unsure, say Y. + menuconfig ARMV8_DEPRECATED bool "Emulate deprecated/obsolete ARMv8 instructions" depends on COMPAT diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index ce67bf6a0886..7010779a1429 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -36,7 +36,8 @@ #define ARM64_MISMATCHED_CACHE_LINE_SIZE 15 #define ARM64_UNMAP_KERNEL_AT_EL0 16 #define ARM64_HARDEN_BRANCH_PREDICTOR 17 +#define ARM64_SSBD 18 -#define ARM64_NCAPS 18 +#define ARM64_NCAPS 19 #endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index a093907214bf..e57331726ac7 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -211,6 +211,67 @@ void __init arm64_update_smccc_conduit(struct alt_instr *alt, *updptr = cpu_to_le32(insn); } + +static void arm64_set_ssbd_mitigation(bool state) +{ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, state, NULL); + break; + + default: + WARN_ON_ONCE(1); + break; + } +} + +static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry, + int scope) +{ + struct arm_smccc_res res; + bool supported = true; + + WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); + + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + return false; + + /* + * The probe function return value is either negative + * (unsupported or mitigated), positive (unaffected), or zero + * (requires mitigation). We only need to do anything in the + * last case. + */ + switch (psci_ops.conduit) { + case PSCI_CONDUIT_HVC: + arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + case PSCI_CONDUIT_SMC: + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, + ARM_SMCCC_ARCH_WORKAROUND_2, &res); + if ((int)res.a0 != 0) + supported = false; + break; + + default: + supported = false; + } + + if (supported) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + + return supported; +} #endif /* CONFIG_ARM64_SSBD */ #define MIDR_RANGE(model, min, max) \ @@ -335,6 +396,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = { MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2), .enable = enable_smccc_arch_workaround_1, }, +#endif +#ifdef CONFIG_ARM64_SSBD + { + .desc = "Speculative Store Bypass Disable", + .def_scope = SCOPE_LOCAL_CPU, + .capability = ARM64_SSBD, + .matches = has_ssbd_mitigation, + }, #endif { } From 3a64e6a9989e0748ff17b6deeec659cb1a3e1938 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:25 +0100 Subject: [PATCH 241/970] arm64: Add 'ssbd' command-line option commit a43ae4dfe56a01f5b98ba0cb2f784b6a43bafcc6 upstream. On a system where the firmware implements ARCH_WORKAROUND_2, it may be useful to either permanently enable or disable the workaround for cases where the user decides that they'd rather not get a trap overhead, and keep the mitigation permanently on or off instead of switching it on exception entry/exit. In any case, default to the mitigation being enabled. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt | 17 +++++ arch/arm64/include/asm/cpufeature.h | 6 ++ arch/arm64/kernel/cpu_errata.c | 103 +++++++++++++++++++++++----- 3 files changed, 110 insertions(+), 16 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 52240a63132e..a16f87e4dd10 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -4023,6 +4023,23 @@ bytes respectively. Such letter suffixes can also be entirely omitted. spia_pedr= spia_peddr= + ssbd= [ARM64,HW] + Speculative Store Bypass Disable control + + On CPUs that are vulnerable to the Speculative + Store Bypass vulnerability and offer a + firmware based mitigation, this parameter + indicates how the mitigation should be used: + + force-on: Unconditionally enable mitigation for + for both kernel and userspace + force-off: Unconditionally disable mitigation for + for both kernel and userspace + kernel: Always enable mitigation in the + kernel, and offer a prctl interface + to allow userspace to register its + interest in being mitigated too. + stack_guard_gap= [MM] override the default stack gap protection. The value is in page units and it defines how many pages prior diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 4ea85ebdf4df..87a6e47047ff 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -221,6 +221,12 @@ static inline bool system_supports_mixed_endian_el0(void) return id_aa64mmfr0_mixed_endian_el0(read_system_reg(SYS_ID_AA64MMFR0_EL1)); } +#define ARM64_SSBD_UNKNOWN -1 +#define ARM64_SSBD_FORCE_DISABLE 0 +#define ARM64_SSBD_KERNEL 1 +#define ARM64_SSBD_FORCE_ENABLE 2 +#define ARM64_SSBD_MITIGATED 3 + #endif /* __ASSEMBLY__ */ #endif diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index e57331726ac7..4e9fa2a34b88 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -190,6 +190,38 @@ static int enable_smccc_arch_workaround_1(void *data) #ifdef CONFIG_ARM64_SSBD DEFINE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); +int ssbd_state __read_mostly = ARM64_SSBD_KERNEL; + +static const struct ssbd_options { + const char *str; + int state; +} ssbd_options[] = { + { "force-on", ARM64_SSBD_FORCE_ENABLE, }, + { "force-off", ARM64_SSBD_FORCE_DISABLE, }, + { "kernel", ARM64_SSBD_KERNEL, }, +}; + +static int __init ssbd_cfg(char *buf) +{ + int i; + + if (!buf || !buf[0]) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(ssbd_options); i++) { + int len = strlen(ssbd_options[i].str); + + if (strncmp(buf, ssbd_options[i].str, len)) + continue; + + ssbd_state = ssbd_options[i].state; + return 0; + } + + return -EINVAL; +} +early_param("ssbd", ssbd_cfg); + void __init arm64_update_smccc_conduit(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) @@ -233,44 +265,83 @@ static bool has_ssbd_mitigation(const struct arm64_cpu_capabilities *entry, int scope) { struct arm_smccc_res res; - bool supported = true; + bool required = true; + s32 val; WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible()); - if (psci_ops.smccc_version == SMCCC_VERSION_1_0) + if (psci_ops.smccc_version == SMCCC_VERSION_1_0) { + ssbd_state = ARM64_SSBD_UNKNOWN; return false; + } - /* - * The probe function return value is either negative - * (unsupported or mitigated), positive (unaffected), or zero - * (requires mitigation). We only need to do anything in the - * last case. - */ switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break; case PSCI_CONDUIT_SMC: arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, ARM_SMCCC_ARCH_WORKAROUND_2, &res); - if ((int)res.a0 != 0) - supported = false; break; default: - supported = false; + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; } - if (supported) { - __this_cpu_write(arm64_ssbd_callback_required, 1); + val = (s32)res.a0; + + switch (val) { + case SMCCC_RET_NOT_SUPPORTED: + ssbd_state = ARM64_SSBD_UNKNOWN; + return false; + + case SMCCC_RET_NOT_REQUIRED: + pr_info_once("%s mitigation not required\n", entry->desc); + ssbd_state = ARM64_SSBD_MITIGATED; + return false; + + case SMCCC_RET_SUCCESS: + required = true; + break; + + case 1: /* Mitigation not required on this CPU */ + required = false; + break; + + default: + WARN_ON(1); + return false; + } + + switch (ssbd_state) { + case ARM64_SSBD_FORCE_DISABLE: + pr_info_once("%s disabled from command-line\n", entry->desc); + arm64_set_ssbd_mitigation(false); + required = false; + break; + + case ARM64_SSBD_KERNEL: + if (required) { + __this_cpu_write(arm64_ssbd_callback_required, 1); + arm64_set_ssbd_mitigation(true); + } + break; + + case ARM64_SSBD_FORCE_ENABLE: + pr_info_once("%s forced from command-line\n", entry->desc); arm64_set_ssbd_mitigation(true); + required = true; + break; + + default: + WARN_ON(1); + break; } - return supported; + return required; } #endif /* CONFIG_ARM64_SSBD */ From 242bff3816ad7b48d3db5d7aa9b6c2c961e08d16 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:26 +0100 Subject: [PATCH 242/970] arm64: ssbd: Add global mitigation state accessor commit c32e1736ca03904c03de0e4459a673be194f56fd upstream. We're about to need the mitigation state in various parts of the kernel in order to do the right thing for userspace and guests. Let's expose an accessor that will let other subsystems know about the state. Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/cpufeature.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 87a6e47047ff..0c1529e020a7 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -227,6 +227,16 @@ static inline bool system_supports_mixed_endian_el0(void) #define ARM64_SSBD_FORCE_ENABLE 2 #define ARM64_SSBD_MITIGATED 3 +static inline int arm64_get_ssbd_state(void) +{ +#ifdef CONFIG_ARM64_SSBD + extern int ssbd_state; + return ssbd_state; +#else + return ARM64_SSBD_UNKNOWN; +#endif +} + #endif /* __ASSEMBLY__ */ #endif From 42f967dede3e8dbe34935fd7612e0eba0bc3a666 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:27 +0100 Subject: [PATCH 243/970] arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation commit 986372c4367f46b34a3c0f6918d7fb95cbdf39d6 upstream. In order to avoid checking arm64_ssbd_callback_required on each kernel entry/exit even if no mitigation is required, let's add yet another alternative that by default jumps over the mitigation, and that gets nop'ed out if we're doing dynamic mitigation. Think of it as a poor man's static key... Reviewed-by: Julien Grall Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/cpu_errata.c | 14 ++++++++++++++ arch/arm64/kernel/entry.S | 3 +++ 2 files changed, 17 insertions(+) diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 4e9fa2a34b88..9e0b311873ef 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -244,6 +244,20 @@ void __init arm64_update_smccc_conduit(struct alt_instr *alt, *updptr = cpu_to_le32(insn); } +void __init arm64_enable_wa2_handling(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, + int nr_inst) +{ + BUG_ON(nr_inst != 1); + /* + * Only allow mitigation on EL1 entry/exit and guest + * ARCH_WORKAROUND_2 handling if the SSBD state allows it to + * be flipped. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_KERNEL) + *updptr = cpu_to_le32(aarch64_insn_gen_nop()); +} + static void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 0f2510bd1ef9..75f1d9f051e8 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -100,6 +100,9 @@ alternative_else_nop_endif // to save/restore them if required. .macro apply_ssbd, state, targ, tmp1, tmp2 #ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b \targ +alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 From d8fbc84469f3b796c574740b3f4a0a8df51cb9b6 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:28 +0100 Subject: [PATCH 244/970] arm64: ssbd: Restore mitigation status on CPU resume commit 647d0519b53f440a55df163de21c52a8205431cc upstream. On a system where firmware can dynamically change the state of the mitigation, the CPU will always come up with the mitigation enabled, including when coming back from suspend. If the user has requested "no mitigation" via a command line option, let's enforce it by calling into the firmware again to disable it. Similarily, for a resume from hibernate, the mitigation could have been disabled by the boot kernel. Let's ensure that it is set back on in that case. Acked-by: Will Deacon Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/cpufeature.h | 6 ++++++ arch/arm64/kernel/cpu_errata.c | 2 +- arch/arm64/kernel/hibernate.c | 11 +++++++++++ arch/arm64/kernel/suspend.c | 8 ++++++++ 4 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 0c1529e020a7..15868eca58de 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -237,6 +237,12 @@ static inline int arm64_get_ssbd_state(void) #endif } +#ifdef CONFIG_ARM64_SSBD +void arm64_set_ssbd_mitigation(bool state); +#else +static inline void arm64_set_ssbd_mitigation(bool state) {} +#endif + #endif /* __ASSEMBLY__ */ #endif diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 9e0b311873ef..1db97ad7b58b 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -258,7 +258,7 @@ void __init arm64_enable_wa2_handling(struct alt_instr *alt, *updptr = cpu_to_le32(aarch64_insn_gen_nop()); } -static void arm64_set_ssbd_mitigation(bool state) +void arm64_set_ssbd_mitigation(bool state) { switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index d55a7b09959b..f6e71c73cceb 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -308,6 +308,17 @@ int swsusp_arch_suspend(void) sleep_cpu = -EINVAL; __cpu_suspend_exit(); + + /* + * Just in case the boot kernel did turn the SSBD + * mitigation off behind our back, let's set the state + * to what we expect it to be. + */ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_ENABLE: + case ARM64_SSBD_KERNEL: + arm64_set_ssbd_mitigation(true); + } } local_dbg_restore(flags); diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c index bb0cd787a9d3..1dbf6099e2a5 100644 --- a/arch/arm64/kernel/suspend.c +++ b/arch/arm64/kernel/suspend.c @@ -67,6 +67,14 @@ void notrace __cpu_suspend_exit(void) */ if (hw_breakpoint_restore) hw_breakpoint_restore(cpu); + + /* + * On resume, firmware implementing dynamic mitigation will + * have turned the mitigation on. If the user has forcefully + * disabled it, make sure their wishes are obeyed. + */ + if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) + arm64_set_ssbd_mitigation(false); } /* From cf14b896e77685eb89c8b02951f3b8d428e8dce0 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:29 +0100 Subject: [PATCH 245/970] arm64: ssbd: Introduce thread flag to control userspace mitigation commit 9dd9614f5476687abbff8d4b12cd08ae70d7c2ad upstream. In order to allow userspace to be mitigated on demand, let's introduce a new thread flag that prevents the mitigation from being turned off when exiting to userspace, and doesn't turn it on on entry into the kernel (with the assumption that the mitigation is always enabled in the kernel itself). This will be used by a prctl interface introduced in a later patch. Reviewed-by: Mark Rutland Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/entry.S | 2 ++ 2 files changed, 3 insertions(+) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index e9ea5a6bd449..0dd1bc13f942 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -122,6 +122,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_RESTORE_SIGMASK 20 #define TIF_SINGLESTEP 21 #define TIF_32BIT 22 /* 32bit process */ +#define TIF_SSBD 23 /* Wants SSB mitigation */ #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 75f1d9f051e8..ca978d7d98eb 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -105,6 +105,8 @@ alternative_cb arm64_enable_wa2_handling alternative_cb_end ldr_this_cpu \tmp2, arm64_ssbd_callback_required, \tmp1 cbz \tmp2, \targ + ldr \tmp2, [tsk, #TI_FLAGS] + tbnz \tmp2, #TIF_SSBD, \targ mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 mov w1, #\state alternative_cb arm64_update_smccc_conduit From 9c06aab19b4421d22a43dffb599bae6696669d67 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:30 +0100 Subject: [PATCH 246/970] arm64: ssbd: Add prctl interface for per-thread mitigation commit 9cdc0108baa8ef87c76ed834619886a46bd70cbe upstream. If running on a system that performs dynamic SSBD mitigation, allow userspace to request the mitigation for itself. This is implemented as a prctl call, allowing the mitigation to be enabled or disabled at will for this particular thread. Acked-by: Will Deacon Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/ssbd.c | 108 +++++++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+) create mode 100644 arch/arm64/kernel/ssbd.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 74b8fd860714..6dadaaee796d 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -50,6 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o arm64-obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o \ cpu-reset.o +arm64-obj-$(CONFIG_ARM64_SSBD) += ssbd.o ifeq ($(CONFIG_KVM),y) arm64-obj-$(CONFIG_HARDEN_BRANCH_PREDICTOR) += bpi.o diff --git a/arch/arm64/kernel/ssbd.c b/arch/arm64/kernel/ssbd.c new file mode 100644 index 000000000000..0560738c1d5c --- /dev/null +++ b/arch/arm64/kernel/ssbd.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2018 ARM Ltd, All Rights Reserved. + */ + +#include +#include +#include +#include + +#include + +/* + * prctl interface for SSBD + */ +static int ssbd_prctl_set(struct task_struct *task, unsigned long ctrl) +{ + int state = arm64_get_ssbd_state(); + + /* Unsupported */ + if (state == ARM64_SSBD_UNKNOWN) + return -EINVAL; + + /* Treat the unaffected/mitigated state separately */ + if (state == ARM64_SSBD_MITIGATED) { + switch (ctrl) { + case PR_SPEC_ENABLE: + return -EPERM; + case PR_SPEC_DISABLE: + case PR_SPEC_FORCE_DISABLE: + return 0; + } + } + + /* + * Things are a bit backward here: the arm64 internal API + * *enables the mitigation* when the userspace API *disables + * speculation*. So much fun. + */ + switch (ctrl) { + case PR_SPEC_ENABLE: + /* If speculation is force disabled, enable is not allowed */ + if (state == ARM64_SSBD_FORCE_ENABLE || + task_spec_ssb_force_disable(task)) + return -EPERM; + task_clear_spec_ssb_disable(task); + clear_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_FORCE_DISABLE: + if (state == ARM64_SSBD_FORCE_DISABLE) + return -EPERM; + task_set_spec_ssb_disable(task); + task_set_spec_ssb_force_disable(task); + set_tsk_thread_flag(task, TIF_SSBD); + break; + default: + return -ERANGE; + } + + return 0; +} + +int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, + unsigned long ctrl) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_set(task, ctrl); + default: + return -ENODEV; + } +} + +static int ssbd_prctl_get(struct task_struct *task) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_UNKNOWN: + return -EINVAL; + case ARM64_SSBD_FORCE_ENABLE: + return PR_SPEC_DISABLE; + case ARM64_SSBD_KERNEL: + if (task_spec_ssb_force_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; + if (task_spec_ssb_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_DISABLE; + return PR_SPEC_PRCTL | PR_SPEC_ENABLE; + case ARM64_SSBD_FORCE_DISABLE: + return PR_SPEC_ENABLE; + default: + return PR_SPEC_NOT_AFFECTED; + } +} + +int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssbd_prctl_get(task); + default: + return -ENODEV; + } +} From 7b62e8503fbbf7702a55b1abbfdb2c290c96609e Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:31 +0100 Subject: [PATCH 247/970] arm64: KVM: Add HYP per-cpu accessors commit 85478bab409171de501b719971fd25a3d5d639f9 upstream. As we're going to require to access per-cpu variables at EL2, let's craft the minimum set of accessors required to implement reading a per-cpu variable, relying on tpidr_el2 to contain the per-cpu offset. Reviewed-by: Christoffer Dall Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/kvm_asm.h | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 217630dc51c9..b3b4a03f2d2f 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -66,14 +66,37 @@ extern u32 __kvm_get_mdcr_el2(void); extern u32 __init_stage2_translation(void); +/* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ +#define __hyp_this_cpu_ptr(sym) \ + ({ \ + void *__ptr = hyp_symbol_addr(sym); \ + __ptr += read_sysreg(tpidr_el2); \ + (typeof(&sym))__ptr; \ + }) + +#define __hyp_this_cpu_read(sym) \ + ({ \ + *__hyp_this_cpu_ptr(sym); \ + }) + #else /* __ASSEMBLY__ */ -.macro get_host_ctxt reg, tmp - adr_l \reg, kvm_host_cpu_state +.macro hyp_adr_this_cpu reg, sym, tmp + adr_l \reg, \sym mrs \tmp, tpidr_el2 add \reg, \reg, \tmp .endm +.macro hyp_ldr_this_cpu reg, sym, tmp + adr_l \reg, \sym + mrs \tmp, tpidr_el2 + ldr \reg, [\reg, \tmp] +.endm + +.macro get_host_ctxt reg, tmp + hyp_adr_this_cpu \reg, kvm_host_cpu_state, \tmp +.endm + .macro get_vcpu_ptr vcpu, ctxt get_host_ctxt \ctxt, \vcpu ldr \vcpu, [\ctxt, #HOST_CONTEXT_VCPU] From 68240e9bb16d52f5bf2c88f5984374ac7c6939fa Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:32 +0100 Subject: [PATCH 248/970] arm64: KVM: Add ARCH_WORKAROUND_2 support for guests commit 55e3748e8902ff641e334226bdcb432f9a5d78d3 upstream. In order to offer ARCH_WORKAROUND_2 support to guests, we need a bit of infrastructure. Let's add a flag indicating whether or not the guest uses SSBD mitigation. Depending on the state of this flag, allow KVM to disable ARCH_WORKAROUND_2 before entering the guest, and enable it when exiting it. Reviewed-by: Christoffer Dall Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/kvm_mmu.h | 5 ++++ arch/arm/kvm/arm.c | 6 +++++ arch/arm64/include/asm/kvm_asm.h | 3 +++ arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/include/asm/kvm_mmu.h | 24 +++++++++++++++++++ arch/arm64/kvm/hyp/switch.c | 38 +++++++++++++++++++++++++++++++ 6 files changed, 79 insertions(+) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 6a0d496c4145..e2f05cedaf97 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -256,6 +256,11 @@ static inline int kvm_map_vectors(void) return 0; } +static inline int hyp_map_aux_data(void) +{ + return 0; +} + #endif /* !__ASSEMBLY__ */ #endif /* __ARM_KVM_MMU_H__ */ diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index d7cd5410ab5c..20436972537f 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -1367,6 +1367,12 @@ static int init_hyp_mode(void) } } + err = hyp_map_aux_data(); + if (err) { + kvm_err("Cannot map host auxilary data: %d\n", err); + goto out_err; + } + kvm_info("Hyp mode initialized successfully\n"); return 0; diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index b3b4a03f2d2f..8f5cf83b2339 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -33,6 +33,9 @@ #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT) +#define VCPU_WORKAROUND_2_FLAG_SHIFT 0 +#define VCPU_WORKAROUND_2_FLAG (_AC(1, UL) << VCPU_WORKAROUND_2_FLAG_SHIFT) + /* Translate a kernel address of @sym into its equivalent linear mapping */ #define kvm_ksym_ref(sym) \ ({ \ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 834ae734db41..beeafda89567 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -213,6 +213,9 @@ struct kvm_vcpu_arch { /* Exception Information */ struct kvm_vcpu_fault_info fault; + /* State of various workarounds, see kvm_asm.h for bit assignment */ + u64 workaround_flags; + /* Guest debug state */ u64 debug_flags; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index cd7c9a3e0a03..547519abc751 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -387,5 +387,29 @@ static inline int kvm_map_vectors(void) } #endif +#ifdef CONFIG_ARM64_SSBD +DECLARE_PER_CPU_READ_MOSTLY(u64, arm64_ssbd_callback_required); + +static inline int hyp_map_aux_data(void) +{ + int cpu, err; + + for_each_possible_cpu(cpu) { + u64 *ptr; + + ptr = per_cpu_ptr(&arm64_ssbd_callback_required, cpu); + err = create_hyp_mappings(ptr, ptr + 1, PAGE_HYP); + if (err) + return err; + } + return 0; +} +#else +static inline int hyp_map_aux_data(void) +{ + return 0; +} +#endif + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 57f4e6145664..12f9d1ecdf4c 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -15,6 +15,7 @@ * along with this program. If not, see . */ +#include #include #include #include @@ -267,6 +268,39 @@ static void __hyp_text __skip_instr(struct kvm_vcpu *vcpu) write_sysreg_el2(*vcpu_pc(vcpu), elr); } +static inline bool __hyp_text __needs_ssbd_off(struct kvm_vcpu *vcpu) +{ + if (!cpus_have_cap(ARM64_SSBD)) + return false; + + return !(vcpu->arch.workaround_flags & VCPU_WORKAROUND_2_FLAG); +} + +static void __hyp_text __set_guest_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * The host runs with the workaround always present. If the + * guest wants it disabled, so be it... + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 0, NULL); +#endif +} + +static void __hyp_text __set_host_arch_workaround_state(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ARM64_SSBD + /* + * If the guest has disabled the workaround, bring it back on. + */ + if (__needs_ssbd_off(vcpu) && + __hyp_this_cpu_read(arm64_ssbd_callback_required)) + arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2, 1, NULL); +#endif +} + int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) { struct kvm_cpu_context *host_ctxt; @@ -297,6 +331,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) __sysreg_restore_guest_state(guest_ctxt); __debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt); + __set_guest_arch_workaround_state(vcpu); + /* Jump in the fire! */ again: exit_code = __guest_enter(vcpu, host_ctxt); @@ -339,6 +375,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) } } + __set_host_arch_workaround_state(vcpu); + fp_enabled = __fpsimd_enabled(); __sysreg_save_guest_state(guest_ctxt); From f99e40649147bb49e6dc7c1ce748617363c51434 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:33 +0100 Subject: [PATCH 249/970] arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests commit b4f18c063a13dfb33e3a63fe1844823e19c2265e upstream. In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3, add a small(-ish) sequence to handle it at EL2. Special care must be taken to track the state of the guest itself by updating the workaround flags. We also rely on patching to enable calls into the firmware. Note that since we need to execute branches, this always executes after the Spectre-v2 mitigation has been applied. Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp/hyp-entry.S | 38 ++++++++++++++++++++++++++++++++- 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 6e6375e07778..bd239b1b7a68 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -127,6 +127,7 @@ int main(void) BLANK(); #ifdef CONFIG_KVM_ARM_HOST DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); + DEFINE(VCPU_WORKAROUND_FLAGS, offsetof(struct kvm_vcpu, arch.workaround_flags)); DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs)); DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S index 3418c1dda236..bf4988f9dae8 100644 --- a/arch/arm64/kvm/hyp/hyp-entry.S +++ b/arch/arm64/kvm/hyp/hyp-entry.S @@ -107,8 +107,44 @@ el1_hvc_guest: */ ldr x1, [sp] // Guest's x0 eor w1, w1, #ARM_SMCCC_ARCH_WORKAROUND_1 + cbz w1, wa_epilogue + + /* ARM_SMCCC_ARCH_WORKAROUND_2 handling */ + eor w1, w1, #(ARM_SMCCC_ARCH_WORKAROUND_1 ^ \ + ARM_SMCCC_ARCH_WORKAROUND_2) cbnz w1, el1_trap - mov x0, x1 + +#ifdef CONFIG_ARM64_SSBD +alternative_cb arm64_enable_wa2_handling + b wa2_end +alternative_cb_end + get_vcpu_ptr x2, x0 + ldr x0, [x2, #VCPU_WORKAROUND_FLAGS] + + // Sanitize the argument and update the guest flags + ldr x1, [sp, #8] // Guest's x1 + clz w1, w1 // Murphy's device: + lsr w1, w1, #5 // w1 = !!w1 without using + eor w1, w1, #1 // the flags... + bfi x0, x1, #VCPU_WORKAROUND_2_FLAG_SHIFT, #1 + str x0, [x2, #VCPU_WORKAROUND_FLAGS] + + /* Check that we actually need to perform the call */ + hyp_ldr_this_cpu x0, arm64_ssbd_callback_required, x2 + cbz x0, wa2_end + + mov w0, #ARM_SMCCC_ARCH_WORKAROUND_2 + smc #0 + + /* Don't leak data from the SMC call */ + mov x3, xzr +wa2_end: + mov x2, xzr + mov x1, xzr +#endif + +wa_epilogue: + mov x0, xzr add sp, sp, #16 eret From ba3fe91cba38aaf55c421fe1bae3240aecd34944 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 20 Jul 2018 10:56:34 +0100 Subject: [PATCH 250/970] arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID commit 5d81f7dc9bca4f4963092433e27b508cbe524a32 upstream. Now that all our infrastructure is in place, let's expose the availability of ARCH_WORKAROUND_2 to guests. We take this opportunity to tidy up a couple of SMCCC constants. Acked-by: Christoffer Dall Reviewed-by: Mark Rutland Signed-off-by: Marc Zyngier Signed-off-by: Catalin Marinas Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/kvm_host.h | 12 ++++++++++++ arch/arm/kvm/psci.c | 18 ++++++++++++++++-- arch/arm64/include/asm/kvm_host.h | 23 +++++++++++++++++++++++ arch/arm64/kvm/reset.c | 4 ++++ 4 files changed, 55 insertions(+), 2 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index f4dab20ac9f3..0833d8a1dbbb 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -327,4 +327,16 @@ static inline bool kvm_arm_harden_branch_predictor(void) return false; } +#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + /* No way to detect it yet, pretend it is not there. */ + return KVM_SSBD_UNKNOWN; +} + #endif /* __ARM_KVM_HOST_H__ */ diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c index 8a9c654f4f87..83365bec04b6 100644 --- a/arch/arm/kvm/psci.c +++ b/arch/arm/kvm/psci.c @@ -403,7 +403,7 @@ static int kvm_psci_call(struct kvm_vcpu *vcpu) int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); - u32 val = PSCI_RET_NOT_SUPPORTED; + u32 val = SMCCC_RET_NOT_SUPPORTED; u32 feature; switch (func_id) { @@ -415,7 +415,21 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) switch(feature) { case ARM_SMCCC_ARCH_WORKAROUND_1: if (kvm_arm_harden_branch_predictor()) - val = 0; + val = SMCCC_RET_SUCCESS; + break; + case ARM_SMCCC_ARCH_WORKAROUND_2: + switch (kvm_arm_have_ssbd()) { + case KVM_SSBD_FORCE_DISABLE: + case KVM_SSBD_UNKNOWN: + break; + case KVM_SSBD_KERNEL: + val = SMCCC_RET_SUCCESS; + break; + case KVM_SSBD_FORCE_ENABLE: + case KVM_SSBD_MITIGATED: + val = SMCCC_RET_NOT_REQUIRED; + break; + } break; } break; diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index beeafda89567..4cdfbd01b2de 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -425,4 +425,27 @@ static inline bool kvm_arm_harden_branch_predictor(void) return cpus_have_const_cap(ARM64_HARDEN_BRANCH_PREDICTOR); } +#define KVM_SSBD_UNKNOWN -1 +#define KVM_SSBD_FORCE_DISABLE 0 +#define KVM_SSBD_KERNEL 1 +#define KVM_SSBD_FORCE_ENABLE 2 +#define KVM_SSBD_MITIGATED 3 + +static inline int kvm_arm_have_ssbd(void) +{ + switch (arm64_get_ssbd_state()) { + case ARM64_SSBD_FORCE_DISABLE: + return KVM_SSBD_FORCE_DISABLE; + case ARM64_SSBD_KERNEL: + return KVM_SSBD_KERNEL; + case ARM64_SSBD_FORCE_ENABLE: + return KVM_SSBD_FORCE_ENABLE; + case ARM64_SSBD_MITIGATED: + return KVM_SSBD_MITIGATED; + case ARM64_SSBD_UNKNOWN: + default: + return KVM_SSBD_UNKNOWN; + } +} + #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 5bc460884639..29a27a09f21f 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -135,6 +135,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) /* Reset PMU */ kvm_pmu_vcpu_reset(vcpu); + /* Default workaround setup is enabled (if supported) */ + if (kvm_arm_have_ssbd() == KVM_SSBD_KERNEL) + vcpu->arch.workaround_flags |= VCPU_WORKAROUND_2_FLAG; + /* Reset timer */ return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq); } From 5c067898febcbe6f10ebc2d3c576eba629739326 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 9 Jan 2018 07:21:15 -0800 Subject: [PATCH 251/970] string: drop __must_check from strscpy() and restore strscpy() usages in cgroup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 08a77676f9c5fc69a681ccd2cd8140e65dcb26c7 upstream. e7fd37ba1217 ("cgroup: avoid copying strings longer than the buffers") converted possibly unsafe strncpy() usages in cgroup to strscpy(). However, although the callsites are completely fine with truncated copied, because strscpy() is marked __must_check, it led to the following warnings. kernel/cgroup/cgroup.c: In function ‘cgroup_file_name’: kernel/cgroup/cgroup.c:1400:10: warning: ignoring return value of ‘strscpy’, declared with attribute warn_unused_result [-Wunused-result] strscpy(buf, cft->name, CGROUP_FILE_NAME_MAX); ^ To avoid the warnings, 50034ed49645 ("cgroup: use strlcpy() instead of strscpy() to avoid spurious warning") switched them to strlcpy(). strlcpy() is worse than strlcpy() because it unconditionally runs strlen() on the source string, and the only reason we switched to strlcpy() here was because it was lacking __must_check, which doesn't reflect any material differences between the two function. It's just that someone added __must_check to strscpy() and not to strlcpy(). These basic string copy operations are used in variety of ways, and one of not-so-uncommon use cases is safely handling truncated copies, where the caller naturally doesn't care about the return value. The __must_check doesn't match the actual use cases and forces users to opt for inferior variants which lack __must_check by happenstance or spread ugly (void) casts. Remove __must_check from strscpy() and restore strscpy() usages in cgroup. Signed-off-by: Tejun Heo Suggested-by: Linus Torvalds Cc: Ma Shimiao Cc: Arnd Bergmann Cc: Chris Metcalf [backport only the string.h portion to remove build warnings starting to show up - gregkh] Signed-off-by: Greg Kroah-Hartman --- include/linux/string.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/string.h b/include/linux/string.h index 0c88c0a1a72b..60042e5e88ff 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -27,7 +27,7 @@ extern char * strncpy(char *,const char *, __kernel_size_t); size_t strlcpy(char *, const char *, size_t); #endif #ifndef __HAVE_ARCH_STRSCPY -ssize_t __must_check strscpy(char *, const char *, size_t); +ssize_t strscpy(char *, const char *, size_t); #endif #ifndef __HAVE_ARCH_STRCAT extern char * strcat(char *, const char *); From 19e5f4da1240dcc32cda9b9022308c1b4c72e1f1 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 22 Jul 2018 14:27:43 +0200 Subject: [PATCH 252/970] Linux 4.9.114 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 3884afb2850f..f4cd42c9b940 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 113 +SUBLEVEL = 114 EXTRAVERSION = NAME = Roaring Lionus From 76267a8a19cdc40094796a2bd032eaf437774c35 Mon Sep 17 00:00:00 2001 From: Lan Tianyu Date: Thu, 21 Dec 2017 21:10:36 -0500 Subject: [PATCH 253/970] KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit b5020a8e6b54d2ece80b1e7dedb33c79a40ebd47 upstream. Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel for one specific eventfd. When the assign path hasn't finished but irqfd has been added to kvm->irqfds.items list, another thead may deassign the eventfd and free struct kvm_kernel_irqfd(). The assign path then uses the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid such issue, keep irqfd under kvm->irq_srcu protection after the irqfd has been added to kvm->irqfds.items list, and call synchronize_srcu() in irq_shutdown() to make sure that irqfd has been fully initialized in the assign path. Reported-by: Dmitry Vyukov Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Dmitry Vyukov Signed-off-by: Tianyu Lan Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- virt/kvm/eventfd.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 4d28a9ddbee0..3f24eb1e8554 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -119,8 +119,12 @@ irqfd_shutdown(struct work_struct *work) { struct kvm_kernel_irqfd *irqfd = container_of(work, struct kvm_kernel_irqfd, shutdown); + struct kvm *kvm = irqfd->kvm; u64 cnt; + /* Make sure irqfd has been initalized in assign path. */ + synchronize_srcu(&kvm->irq_srcu); + /* * Synchronize with the wait-queue and unhook ourselves to prevent * further events. @@ -387,7 +391,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) idx = srcu_read_lock(&kvm->irq_srcu); irqfd_update(kvm, irqfd); - srcu_read_unlock(&kvm->irq_srcu, idx); list_add_tail(&irqfd->list, &kvm->irqfds.items); @@ -421,6 +424,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) } #endif + srcu_read_unlock(&kvm->irq_srcu, idx); return 0; fail: From 6ac85d22332118d2dc449536110a31a5ca3956b0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Mon, 9 Jul 2018 16:35:34 +0300 Subject: [PATCH 254/970] x86/apm: Don't access __preempt_count with zeroed fs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 6f6060a5c9cc76fdbc22748264e6aa3779ec2427 upstream. APM_DO_POP_SEGS does not restore fs/gs which were zeroed by APM_DO_ZERO_SEGS. Trying to access __preempt_count with zeroed fs doesn't really work. Move the ibrs call outside the APM_DO_SAVE_SEGS/APM_DO_RESTORE_SEGS invocations so that fs is actually restored before calling preempt_enable(). Fixes the following sort of oopses: [ 0.313581] general protection fault: 0000 [#1] PREEMPT SMP [ 0.313803] Modules linked in: [ 0.314040] CPU: 0 PID: 268 Comm: kapmd Not tainted 4.16.0-rc1-triton-bisect-00090-gdd84441a7971 #19 [ 0.316161] EIP: __apm_bios_call_simple+0xc8/0x170 [ 0.316161] EFLAGS: 00210016 CPU: 0 [ 0.316161] EAX: 00000102 EBX: 00000000 ECX: 00000102 EDX: 00000000 [ 0.316161] ESI: 0000530e EDI: dea95f64 EBP: dea95f18 ESP: dea95ef0 [ 0.316161] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 0.316161] CR0: 80050033 CR2: 00000000 CR3: 015d3000 CR4: 000006d0 [ 0.316161] Call Trace: [ 0.316161] ? cpumask_weight.constprop.15+0x20/0x20 [ 0.316161] on_cpu0+0x44/0x70 [ 0.316161] apm+0x54e/0x720 [ 0.316161] ? __switch_to_asm+0x26/0x40 [ 0.316161] ? __schedule+0x17d/0x590 [ 0.316161] kthread+0xc0/0xf0 [ 0.316161] ? proc_apm_show+0x150/0x150 [ 0.316161] ? kthread_create_worker_on_cpu+0x20/0x20 [ 0.316161] ret_from_fork+0x2e/0x38 [ 0.316161] Code: da 8e c2 8e e2 8e ea 57 55 2e ff 1d e0 bb 5d b1 0f 92 c3 5d 5f 07 1f 89 47 0c 90 8d b4 26 00 00 00 00 90 8d b4 26 00 00 00 00 90 <64> ff 0d 84 16 5c b1 74 7f 8b 45 dc 8e e0 8b 45 d8 8e e8 8b 45 [ 0.316161] EIP: __apm_bios_call_simple+0xc8/0x170 SS:ESP: 0068:dea95ef0 [ 0.316161] ---[ end trace 656253db2deaa12c ]--- Fixes: dd84441a7971 ("x86/speculation: Use IBRS if available before calling into firmware") Signed-off-by: Ville Syrjälä Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Cc: David Woodhouse Cc: "H. Peter Anvin" Cc: x86@kernel.org Cc: David Woodhouse Cc: "H. Peter Anvin" Link: https://lkml.kernel.org/r/20180709133534.5963-1-ville.syrjala@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/apm.h | 6 ------ arch/x86/kernel/apm_32.c | 5 +++++ 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h index 46e40aeae446..93eebc636c76 100644 --- a/arch/x86/include/asm/apm.h +++ b/arch/x86/include/asm/apm.h @@ -6,8 +6,6 @@ #ifndef _ASM_X86_MACH_DEFAULT_APM_H #define _ASM_X86_MACH_DEFAULT_APM_H -#include - #ifdef APM_ZERO_SEGS # define APM_DO_ZERO_SEGS \ "pushl %%ds\n\t" \ @@ -33,7 +31,6 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in, * N.B. We do NOT need a cld after the BIOS call * because we always save and restore the flags. */ - firmware_restrict_branch_speculation_start(); __asm__ __volatile__(APM_DO_ZERO_SEGS "pushl %%edi\n\t" "pushl %%ebp\n\t" @@ -46,7 +43,6 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in, "=S" (*esi) : "a" (func), "b" (ebx_in), "c" (ecx_in) : "memory", "cc"); - firmware_restrict_branch_speculation_end(); } static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in, @@ -59,7 +55,6 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in, * N.B. We do NOT need a cld after the BIOS call * because we always save and restore the flags. */ - firmware_restrict_branch_speculation_start(); __asm__ __volatile__(APM_DO_ZERO_SEGS "pushl %%edi\n\t" "pushl %%ebp\n\t" @@ -72,7 +67,6 @@ static inline bool apm_bios_call_simple_asm(u32 func, u32 ebx_in, "=S" (si) : "a" (func), "b" (ebx_in), "c" (ecx_in) : "memory", "cc"); - firmware_restrict_branch_speculation_end(); return error; } diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c index 51287cd90bf6..313a85a7b222 100644 --- a/arch/x86/kernel/apm_32.c +++ b/arch/x86/kernel/apm_32.c @@ -239,6 +239,7 @@ #include #include #include +#include #if defined(CONFIG_APM_DISPLAY_BLANK) && defined(CONFIG_VT) extern int (*console_blank_hook)(int); @@ -613,11 +614,13 @@ static long __apm_bios_call(void *_call) gdt[0x40 / 8] = bad_bios_desc; apm_irq_save(flags); + firmware_restrict_branch_speculation_start(); APM_DO_SAVE_SEGS; apm_bios_call_asm(call->func, call->ebx, call->ecx, &call->eax, &call->ebx, &call->ecx, &call->edx, &call->esi); APM_DO_RESTORE_SEGS; + firmware_restrict_branch_speculation_end(); apm_irq_restore(flags); gdt[0x40 / 8] = save_desc_40; put_cpu(); @@ -689,10 +692,12 @@ static long __apm_bios_call_simple(void *_call) gdt[0x40 / 8] = bad_bios_desc; apm_irq_save(flags); + firmware_restrict_branch_speculation_start(); APM_DO_SAVE_SEGS; error = apm_bios_call_simple_asm(call->func, call->ebx, call->ecx, &call->eax); APM_DO_RESTORE_SEGS; + firmware_restrict_branch_speculation_end(); apm_irq_restore(flags); gdt[0x40 / 8] = save_desc_40; put_cpu(); From 91b6b9d0bf39443d708352c193c597a01e9b0004 Mon Sep 17 00:00:00 2001 From: Dewet Thibaut Date: Mon, 16 Jul 2018 10:49:27 +0200 Subject: [PATCH 255/970] x86/MCE: Remove min interval polling limitation commit fbdb328c6bae0a7c78d75734a738b66b86dffc96 upstream. commit b3b7c4795c ("x86/MCE: Serialize sysfs changes") introduced a min interval limitation when setting the check interval for polled MCEs. However, the logic is that 0 disables polling for corrected MCEs, see Documentation/x86/x86_64/machinecheck. The limitation prevents disabling. Remove this limitation and allow the value 0 to disable polling again. Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes") Signed-off-by: Dewet Thibaut Signed-off-by: Alexander Sverdlin [ Massage commit message. ] Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Tony Luck Cc: linux-edac Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180716084927.24869-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mcheck/mce.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index c49e146d4332..7e6163c9d434 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2397,9 +2397,6 @@ static ssize_t store_int_with_restart(struct device *s, if (check_interval == old_check_interval) return ret; - if (check_interval < 1) - check_interval = 1; - mutex_lock(&mce_sysfs_mutex); mce_restart(); mutex_unlock(&mce_sysfs_mutex); From 6fc87cc95b4de93348821f7a23c927b58deab593 Mon Sep 17 00:00:00 2001 From: OGAWA Hirofumi Date: Fri, 20 Jul 2018 17:53:42 -0700 Subject: [PATCH 256/970] fat: fix memory allocation failure handling of match_strdup() commit 35033ab988c396ad7bce3b6d24060c16a9066db8 upstream. In parse_options(), if match_strdup() failed, parse_options() leaves opts->iocharset in unexpected state (i.e. still pointing the freed string). And this can be the cause of double free. To fix, this initialize opts->iocharset always when freeing. Link: http://lkml.kernel.org/r/8736wp9dzc.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi Reported-by: syzbot+90b8e10515ae88228a92@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/fat/inode.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/fs/fat/inode.c b/fs/fat/inode.c index a2c05f2ada6d..88720011a6eb 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -696,13 +696,21 @@ static void fat_set_state(struct super_block *sb, brelse(bh); } +static void fat_reset_iocharset(struct fat_mount_options *opts) +{ + if (opts->iocharset != fat_default_iocharset) { + /* Note: opts->iocharset can be NULL here */ + kfree(opts->iocharset); + opts->iocharset = fat_default_iocharset; + } +} + static void delayed_free(struct rcu_head *p) { struct msdos_sb_info *sbi = container_of(p, struct msdos_sb_info, rcu); unload_nls(sbi->nls_disk); unload_nls(sbi->nls_io); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); kfree(sbi); } @@ -1117,7 +1125,7 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat, opts->fs_fmask = opts->fs_dmask = current_umask(); opts->allow_utime = -1; opts->codepage = fat_default_codepage; - opts->iocharset = fat_default_iocharset; + fat_reset_iocharset(opts); if (is_vfat) { opts->shortname = VFAT_SFN_DISPLAY_WINNT|VFAT_SFN_CREATE_WIN95; opts->rodir = 0; @@ -1274,8 +1282,7 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat, /* vfat specific */ case Opt_charset: - if (opts->iocharset != fat_default_iocharset) - kfree(opts->iocharset); + fat_reset_iocharset(opts); iocharset = match_strdup(&args[0]); if (!iocharset) return -ENOMEM; @@ -1866,8 +1873,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat, iput(fat_inode); unload_nls(sbi->nls_io); unload_nls(sbi->nls_disk); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); sb->s_fs_info = NULL; kfree(sbi); return error; From c4f094deb3d69dcc8b4e3dc6c056c1e62a72c33e Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 17 Jul 2018 17:26:43 +0200 Subject: [PATCH 257/970] ALSA: rawmidi: Change resized buffers atomically commit 39675f7a7c7e7702f7d5341f1e0d01db746543a0 upstream. The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the current code is racy. For example, the sequencer client may write to buffer while it being resized. As a simple workaround, let's switch to the resized buffer inside the stream runtime lock. Reported-by: syzbot+52f83f0ea8df16932f7f@syzkaller.appspotmail.com Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/rawmidi.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c index 16f8124b1150..59111cadaec2 100644 --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -635,7 +635,7 @@ static int snd_rawmidi_info_select_user(struct snd_card *card, int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime; if (substream->append && substream->use_count > 1) @@ -648,13 +648,17 @@ int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream, return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; runtime->avail = runtime->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; substream->active_sensing = !params->no_active_sensing; @@ -665,7 +669,7 @@ EXPORT_SYMBOL(snd_rawmidi_output_params); int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime; snd_rawmidi_drain_input(substream); @@ -676,12 +680,16 @@ int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; return 0; From 3a80fb0d7752e6ae163d72736b9401076c7b75c9 Mon Sep 17 00:00:00 2001 From: Alexey Brodkin Date: Thu, 28 Jun 2018 16:59:14 -0700 Subject: [PATCH 258/970] ARC: Fix CONFIG_SWAP commit 6e3761145a9ba3ce267c330b6bff51cf6a057b06 upstream. swap was broken on ARC due to silly copy-paste issue. We encode offset from swapcache page in __swp_entry() as (off << 13) but were not decoding back in __swp_offset() as (off >> 13) - it was still (off << 13). This finally fixes swap usage on ARC. | # mkswap /dev/sda2 | | # swapon -a -e /dev/sda2 | Adding 500728k swap on /dev/sda2. Priority:-2 extents:1 across:500728k | | # free | total used free shared buffers cached | Mem: 765104 13456 751648 4736 8 4736 | -/+ buffers/cache: 8712 756392 | Swap: 500728 0 500728 Cc: stable@vger.kernel.org Signed-off-by: Alexey Brodkin Signed-off-by: Vineet Gupta Signed-off-by: Greg Kroah-Hartman --- arch/arc/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h index e94ca72b974e..c10f5cb203e6 100644 --- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -378,7 +378,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, /* Decode a PTE containing swap "identifier "into constituents */ #define __swp_type(pte_lookalike) (((pte_lookalike).val) & 0x1f) -#define __swp_offset(pte_lookalike) ((pte_lookalike).val << 13) +#define __swp_offset(pte_lookalike) ((pte_lookalike).val >> 13) /* NOPs, to keep generic kernel happy */ #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) }) From 2ee7d6f173308488c4ea190cbedd435b2a166e46 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Wed, 11 Jul 2018 10:42:20 -0700 Subject: [PATCH 259/970] ARC: mm: allow mprotect to make stack mappings executable commit 93312b6da4df31e4102ce5420e6217135a16c7ea upstream. mprotect(EXEC) was failing for stack mappings as default vm flags was missing MAYEXEC. This was triggered by glibc test suite nptl/tst-execstack testcase What is surprising is that despite running LTP for years on, we didn't catch this issue as it lacks a directed test case. gcc dejagnu tests with nested functions also requiring exec stack work fine though because they rely on the GNU_STACK segment spit out by compiler and handled in kernel elf loader. This glibc case is different as the stack is non exec to begin with and a dlopen of shared lib with GNU_STACK segment triggers the exec stack proceedings using a mprotect(PROT_EXEC) which was broken. CC: stable@vger.kernel.org Signed-off-by: Vineet Gupta Signed-off-by: Greg Kroah-Hartman --- arch/arc/include/asm/page.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arc/include/asm/page.h b/arch/arc/include/asm/page.h index 296c3426a6ad..ffb5f33475f1 100644 --- a/arch/arc/include/asm/page.h +++ b/arch/arc/include/asm/page.h @@ -105,7 +105,7 @@ typedef pte_t * pgtable_t; #define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr)) /* Default Permissions for stack/heaps pages (Non Executable) */ -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE) +#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) #define WANT_PAGE_VIRTUAL 1 From 1be686fe508d32521efd4fa7bcda581cb97813e0 Mon Sep 17 00:00:00 2001 From: Alexey Brodkin Date: Wed, 6 Jun 2018 15:59:38 +0300 Subject: [PATCH 260/970] ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs commit 64234961c145606b36eaa82c47b11be842b21049 upstream. We used to have pre-set CONFIG_INITRAMFS_SOURCE with local path to intramfs in ARC defconfigs. This was quite convenient for in-house development but not that convenient for newcomers who obviusly don't have folders like "arc_initramfs" next to the Linux source tree. Which leads to quite surprising failure of defconfig building: ------------------------------->8----------------------------- ../scripts/gen_initramfs_list.sh: Cannot open '../../arc_initramfs_hs/' ../usr/Makefile:57: recipe for target 'usr/initramfs_data.cpio.gz' failed make[2]: *** [usr/initramfs_data.cpio.gz] Error 1 ------------------------------->8----------------------------- So now when more and more people start to deal with our defconfigs let's make their life easier with removal of CONFIG_INITRAMFS_SOURCE. Signed-off-by: Alexey Brodkin Cc: Kevin Hilman Cc: stable@vger.kernel.org Signed-off-by: Alexey Brodkin Signed-off-by: Vineet Gupta Signed-off-by: Greg Kroah-Hartman --- arch/arc/configs/axs101_defconfig | 1 - arch/arc/configs/axs103_defconfig | 1 - arch/arc/configs/axs103_smp_defconfig | 1 - arch/arc/configs/nsim_700_defconfig | 1 - arch/arc/configs/nsim_hs_defconfig | 1 - arch/arc/configs/nsim_hs_smp_defconfig | 1 - arch/arc/configs/nsimosci_defconfig | 1 - arch/arc/configs/nsimosci_hs_defconfig | 1 - arch/arc/configs/nsimosci_hs_smp_defconfig | 1 - 9 files changed, 9 deletions(-) diff --git a/arch/arc/configs/axs101_defconfig b/arch/arc/configs/axs101_defconfig index 0a0eaf09aac7..5367fe36b69d 100644 --- a/arch/arc/configs/axs101_defconfig +++ b/arch/arc/configs/axs101_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs/" CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y # CONFIG_VM_EVENT_COUNTERS is not set diff --git a/arch/arc/configs/axs103_defconfig b/arch/arc/configs/axs103_defconfig index 2233f5777a71..d40fad485982 100644 --- a/arch/arc/configs/axs103_defconfig +++ b/arch/arc/configs/axs103_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/" CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y # CONFIG_VM_EVENT_COUNTERS is not set diff --git a/arch/arc/configs/axs103_smp_defconfig b/arch/arc/configs/axs103_smp_defconfig index 110874705085..06cbd27ef860 100644 --- a/arch/arc/configs/axs103_smp_defconfig +++ b/arch/arc/configs/axs103_smp_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/" CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y # CONFIG_VM_EVENT_COUNTERS is not set diff --git a/arch/arc/configs/nsim_700_defconfig b/arch/arc/configs/nsim_700_defconfig index b0066a749d4c..df609fce999b 100644 --- a/arch/arc/configs/nsim_700_defconfig +++ b/arch/arc/configs/nsim_700_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y diff --git a/arch/arc/configs/nsim_hs_defconfig b/arch/arc/configs/nsim_hs_defconfig index ebe9ebb92933..1dbb661b4c86 100644 --- a/arch/arc/configs/nsim_hs_defconfig +++ b/arch/arc/configs/nsim_hs_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y diff --git a/arch/arc/configs/nsim_hs_smp_defconfig b/arch/arc/configs/nsim_hs_smp_defconfig index 4bde43278be6..cb36a69e5ed1 100644 --- a/arch/arc/configs/nsim_hs_smp_defconfig +++ b/arch/arc/configs/nsim_hs_smp_defconfig @@ -9,7 +9,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y diff --git a/arch/arc/configs/nsimosci_defconfig b/arch/arc/configs/nsimosci_defconfig index f6fb3d26557e..5680daa65471 100644 --- a/arch/arc/configs/nsimosci_defconfig +++ b/arch/arc/configs/nsimosci_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y diff --git a/arch/arc/configs/nsimosci_hs_defconfig b/arch/arc/configs/nsimosci_hs_defconfig index b9f0fe00044b..87decc491c58 100644 --- a/arch/arc/configs/nsimosci_hs_defconfig +++ b/arch/arc/configs/nsimosci_hs_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y diff --git a/arch/arc/configs/nsimosci_hs_smp_defconfig b/arch/arc/configs/nsimosci_hs_smp_defconfig index 6da71ba253a9..4d14684dc74a 100644 --- a/arch/arc/configs/nsimosci_hs_smp_defconfig +++ b/arch/arc/configs/nsimosci_hs_smp_defconfig @@ -9,7 +9,6 @@ CONFIG_IKCONFIG_PROC=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/" CONFIG_PERF_EVENTS=y # CONFIG_COMPAT_BRK is not set CONFIG_KPROBES=y From f46b054eca657260236fca84748dc8946b0a24f3 Mon Sep 17 00:00:00 2001 From: Jing Xia Date: Fri, 20 Jul 2018 17:53:48 -0700 Subject: [PATCH 261/970] mm: memcg: fix use after free in mem_cgroup_iter() commit 9f15bde671355c351cf20d9f879004b234353100 upstream. It was reported that a kernel crash happened in mem_cgroup_iter(), which can be triggered if the legacy cgroup-v1 non-hierarchical mode is used. Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f ...... Call trace: mem_cgroup_iter+0x2e0/0x6d4 shrink_zone+0x8c/0x324 balance_pgdat+0x450/0x640 kswapd+0x130/0x4b8 kthread+0xe8/0xfc ret_from_fork+0x10/0x20 mem_cgroup_iter(): ...... if (css_tryget(css)) <-- crash here break; ...... The crashing reason is that mem_cgroup_iter() uses the memcg object whose pointer is stored in iter->position, which has been freed before and filled with POISON_FREE(0x6b). And the root cause of the use-after-free issue is that invalidate_reclaim_iterators() fails to reset the value of iter->position to NULL when the css of the memcg is released in non- hierarchical mode. Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.com Fixes: 6df38689e0e9 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim") Signed-off-by: Jing Xia Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Cc: Shakeel Butt Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 50088150fc17..349f4a8e3c4f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -895,7 +895,7 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg) int nid; int i; - while ((memcg = parent_mem_cgroup(memcg))) { + for (; memcg; memcg = parent_mem_cgroup(memcg)) { for_each_node(nid) { mz = mem_cgroup_nodeinfo(memcg, nid); for (i = 0; i <= DEF_PRIORITY; i++) { From 3472e37379f15a1a19c4ee66b14e56e07613a309 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Fri, 20 Jul 2018 17:53:45 -0700 Subject: [PATCH 262/970] mm/huge_memory.c: fix data loss when splitting a file pmd commit e1f1b1572e8db87a56609fd05bef76f98f0e456a upstream. __split_huge_pmd_locked() must check if the cleared huge pmd was dirty, and propagate that to PageDirty: otherwise, data may be lost when a huge tmpfs page is modified then split then reclaimed. How has this taken so long to be noticed? Because there was no problem when the huge page is written by a write system call (shmem_write_end() calls set_page_dirty()), nor when the page is allocated for a write fault (fault_dirty_shared_page() calls set_page_dirty()); but when allocated for a read fault (which MAP_POPULATE simulates), no set_page_dirty(). Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1807111741430.1106@eggly.anvils Fixes: d21b9e57c74c ("thp: handle file pages in split_huge_pmd()") Signed-off-by: Hugh Dickins Reported-by: Ashwin Chaugule Reviewed-by: Yang Shi Reviewed-by: Kirill A. Shutemov Cc: "Huang, Ying" Cc: [4.8+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/huge_memory.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 724372866e67..9efe88ef9702 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1642,6 +1642,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, if (vma_is_dax(vma)) return; page = pmd_page(_pmd); + if (!PageDirty(page) && pmd_dirty(_pmd)) + set_page_dirty(page); if (!PageReferenced(page) && pmd_young(_pmd)) SetPageReferenced(page); page_remove_rmap(page, true); From 40974672ae870e02b781bf060b04688cf922e0f9 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Tue, 17 Jul 2018 12:39:00 -0500 Subject: [PATCH 263/970] vfio/pci: Fix potential Spectre v1 commit 0e714d27786ce1fb3efa9aac58abc096e68b1c2a upstream. info.index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl() warn: potential spectre issue 'vdev->region' Fix this by sanitizing info.index before indirectly using it to index vdev->region Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Signed-off-by: Alex Williamson Signed-off-by: Greg Kroah-Hartman --- drivers/vfio/pci/vfio_pci.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 43559bed7822..7338e43faa17 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "vfio_pci_private.h" @@ -755,6 +756,9 @@ static long vfio_pci_ioctl(void *device_data, if (info.index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) return -EINVAL; + info.index = array_index_nospec(info.index, + VFIO_PCI_NUM_REGIONS + + vdev->num_regions); i = info.index - VFIO_PCI_NUM_REGIONS; From 1e02c4f403c0b7f53e9b31ca76352e5351163c7e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Thu, 14 Jun 2018 20:56:25 +0300 Subject: [PATCH 264/970] drm/i915: Fix hotplug irq ack on i965/g4x MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 96a85cc517a9ee4ae5e8d7f5a36cba05023784eb upstream. Just like with PIPESTAT, the edge triggered IIR on i965/g4x also causes problems for hotplug interrupts. To make sure we don't get the IIR port interrupt bit stuck low with the ISR bit high we must force an edge in ISR. Unfortunately we can't borrow the PIPESTAT trick and toggle the enable bits in PORT_HOTPLUG_EN as that act itself generates hotplug interrupts. Instead we just have to loop until we've cleared PORT_HOTPLUG_STAT, or we just give up and WARN. v2: Don't frob with PORT_HOTPLUG_EN Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20180614175625.1615-1-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak (cherry picked from commit 0ba7c51a6fd80a89236f6ceb52e63f8a7f62bfd3) Signed-off-by: Rodrigo Vivi Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/i915_irq.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 02908e37c228..279d1e021421 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1684,10 +1684,38 @@ static void valleyview_pipestat_irq_handler(struct drm_i915_private *dev_priv, static u32 i9xx_hpd_irq_ack(struct drm_i915_private *dev_priv) { - u32 hotplug_status = I915_READ(PORT_HOTPLUG_STAT); + u32 hotplug_status = 0, hotplug_status_mask; + int i; - if (hotplug_status) + if (IS_G4X(dev_priv) || + IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) + hotplug_status_mask = HOTPLUG_INT_STATUS_G4X | + DP_AUX_CHANNEL_MASK_INT_STATUS_G4X; + else + hotplug_status_mask = HOTPLUG_INT_STATUS_I915; + + /* + * We absolutely have to clear all the pending interrupt + * bits in PORT_HOTPLUG_STAT. Otherwise the ISR port + * interrupt bit won't have an edge, and the i965/g4x + * edge triggered IIR will not notice that an interrupt + * is still pending. We can't use PORT_HOTPLUG_EN to + * guarantee the edge as the act of toggling the enable + * bits can itself generate a new hotplug interrupt :( + */ + for (i = 0; i < 10; i++) { + u32 tmp = I915_READ(PORT_HOTPLUG_STAT) & hotplug_status_mask; + + if (tmp == 0) + return hotplug_status; + + hotplug_status |= tmp; I915_WRITE(PORT_HOTPLUG_STAT, hotplug_status); + } + + WARN_ONCE(1, + "PORT_HOTPLUG_STAT did not clear (0x%08x)\n", + I915_READ(PORT_HOTPLUG_STAT)); return hotplug_status; } From ec6a6039d7323abd1f28dafcf435700f072a1537 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Mon, 2 Jul 2018 22:52:20 +0200 Subject: [PATCH 265/970] gen_stats: Fix netlink stats dumping in the presence of padding MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit d5a672ac9f48f81b20b1cad1d9ed7bbf4e418d4c ] The gen_stats facility will add a header for the toplevel nlattr of type TCA_STATS2 that contains all stats added by qdisc callbacks. A reference to this header is stored in the gnet_dump struct, and when all the per-qdisc callbacks have finished adding their stats, the length of the containing header will be adjusted to the right value. However, on architectures that need padding (i.e., that don't set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS), the padding nlattr is added before the stats, which means that the stored pointer will point to the padding, and so when the header is fixed up, the result is just a very big padding nlattr. Because most qdiscs also supply the legacy TCA_STATS struct, this problem has been mostly invisible, but we exposed it with the netlink attribute-based statistics in CAKE. Fix the issue by fixing up the stored pointer if it points to a padding nlattr. Tested-by: Pete Heist Tested-by: Kevin Darbyshire-Bryant Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/gen_stats.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c index 508e051304fb..18f17e1e5762 100644 --- a/net/core/gen_stats.c +++ b/net/core/gen_stats.c @@ -77,8 +77,20 @@ gnet_stats_start_copy_compat(struct sk_buff *skb, int type, int tc_stats_type, d->lock = lock; spin_lock_bh(lock); } - if (d->tail) - return gnet_stats_copy(d, type, NULL, 0, padattr); + if (d->tail) { + int ret = gnet_stats_copy(d, type, NULL, 0, padattr); + + /* The initial attribute added in gnet_stats_copy() may be + * preceded by a padding attribute, in which case d->tail will + * end up pointing at the padding instead of the real attribute. + * Fix this so gnet_stats_finish_copy() adjusts the length of + * the right attribute. + */ + if (ret == 0 && d->tail->nla_type == padattr) + d->tail = (struct nlattr *)((char *)d->tail + + NLA_ALIGN(d->tail->nla_len)); + return ret; + } return 0; } From 79870c6c659713772c8969fd8c21dcba8fd253be Mon Sep 17 00:00:00 2001 From: Tyler Hicks Date: Thu, 5 Jul 2018 18:49:23 +0000 Subject: [PATCH 266/970] ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns [ Upstream commit 70ba5b6db96ff7324b8cfc87e0d0383cf59c9677 ] The low and high values of the net.ipv4.ping_group_range sysctl were being silently forced to the default disabled state when a write to the sysctl contained GIDs that didn't map to the associated user namespace. Confusingly, the sysctl's write operation would return success and then a subsequent read of the sysctl would indicate that the low and high values are the overflowgid. This patch changes the behavior by clearly returning an error when the sysctl write operation receives a GID range that doesn't map to the associated user namespace. In such a situation, the previous value of the sysctl is preserved and that range will be returned in a subsequent read of the sysctl. Signed-off-by: Tyler Hicks Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/sysctl_net_ipv4.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 51a0039cb318..024ab833557d 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -140,8 +140,9 @@ static int ipv4_ping_group_range(struct ctl_table *table, int write, if (write && ret == 0) { low = make_kgid(user_ns, urange[0]); high = make_kgid(user_ns, urange[1]); - if (!gid_valid(low) || !gid_valid(high) || - (urange[1] < urange[0]) || gid_lt(high, low)) { + if (!gid_valid(low) || !gid_valid(high)) + return -EINVAL; + if (urange[1] < urange[0] || gid_lt(high, low)) { low = make_kgid(&init_user_ns, 1); high = make_kgid(&init_user_ns, 0); } From 8582bbfb8629d0dd7f29db5091e308bfcaf7af16 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Tue, 17 Jul 2018 17:12:39 +0100 Subject: [PATCH 267/970] ipv6: fix useless rol32 call on hash [ Upstream commit 169dc027fb02492ea37a0575db6a658cf922b854 ] The rol32 call is currently rotating hash but the rol'd value is being discarded. I believe the current code is incorrect and hash should be assigned the rotated value returned from rol32. Thanks to David Lebrun for spotting this. Signed-off-by: Colin Ian King Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/ipv6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index e64210c98c2b..407087d686a7 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -794,7 +794,7 @@ static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb, * to minimize possbility that any useful information to an * attacker is leaked. Only lower 20 bits are relevant. */ - rol32(hash, 16); + hash = rol32(hash, 16); flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK; From 09ae0085ceb613fa94d82bc8bba5b5b9e2b9b2f6 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Mon, 16 Jul 2018 13:26:13 -0700 Subject: [PATCH 268/970] lib/rhashtable: consider param->min_size when setting initial table size [ Upstream commit 107d01f5ba10f4162c38109496607eb197059064 ] rhashtable_init() currently does not take into account the user-passed min_size parameter unless param->nelem_hint is set as well. As such, the default size (number of buckets) will always be HASH_DEFAULT_SIZE even if the smallest allowed size is larger than that. Remediate this by unconditionally calling into rounded_hashtable_size() and handling things accordingly. Signed-off-by: Davidlohr Bueso Acked-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- lib/rhashtable.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 895961c53385..101dac085c62 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -783,8 +783,16 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop); static size_t rounded_hashtable_size(const struct rhashtable_params *params) { - return max(roundup_pow_of_two(params->nelem_hint * 4 / 3), - (unsigned long)params->min_size); + size_t retsize; + + if (params->nelem_hint) + retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3), + (unsigned long)params->min_size); + else + retsize = max(HASH_DEFAULT_SIZE, + (unsigned long)params->min_size); + + return retsize; } static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed) @@ -841,8 +849,6 @@ int rhashtable_init(struct rhashtable *ht, struct bucket_table *tbl; size_t size; - size = HASH_DEFAULT_SIZE; - if ((!params->key_len && !params->obj_hashfn) || (params->obj_hashfn && !params->obj_cmpfn)) return -EINVAL; @@ -869,8 +875,7 @@ int rhashtable_init(struct rhashtable *ht, ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE); - if (params->nelem_hint) - size = rounded_hashtable_size(&ht->p); + size = rounded_hashtable_size(&ht->p); /* The maximum (not average) chain length grows with the * size of the hash table, at a rate of (log N)/(log log N). From 66a7cfa05740341e6cceab4694b4ace7ce88475f Mon Sep 17 00:00:00 2001 From: Lorenzo Colitti Date: Sat, 7 Jul 2018 16:31:40 +0900 Subject: [PATCH 269/970] net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort [ Upstream commit acc2cf4e37174646a24cba42fa53c668b2338d4e ] When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first frees it by calling inet_csk_reqsk_queue_drop_and_and_put in tcp_abort, and then frees it again by calling sock_gen_put. Since tcp_abort only has one caller, and all the other codepaths in tcp_abort don't free the socket, just remove the free in that function. Cc: David Ahern Tested: passes Android sock_diag_test.py, which exercises this codepath Fixes: d7226c7a4dd1 ("net: diag: Fix refcnt leak in error path destroying socket") Signed-off-by: Lorenzo Colitti Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Tested-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 84ffebf0192d..8f15eae3325b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3238,8 +3238,7 @@ int tcp_abort(struct sock *sk, int err) struct request_sock *req = inet_reqsk(sk); local_bh_disable(); - inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, - req); + inet_csk_reqsk_queue_drop(req->rsk_listener, req); local_bh_enable(); return 0; } From f08ca4c8b423c3cbfc254b73d1c12e622bd63c10 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Sat, 7 Jul 2018 16:15:26 -0700 Subject: [PATCH 270/970] net/ipv4: Set oif in fib_compute_spec_dst [ Upstream commit e7372197e15856ec4ee66b668020a662994db103 ] Xin reported that icmp replies may not use the address on the device the echo request is received if the destination address is broadcast. Instead a route lookup is done without considering VRF context. Fix by setting oif in flow struct to the master device if it is enslaved. That directs the lookup to the VRF table. If the device is not enslaved, oif is still 0 so no affect. Fixes: cd2fbe1b6b51 ("net: Use VRF device index for lookups on RX") Reported-by: Xin Long Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_frontend.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index ffae472e250a..7bdd89354db5 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -290,6 +290,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb) if (!ipv4_is_zeronet(ip_hdr(skb)->saddr)) { struct flowi4 fl4 = { .flowi4_iif = LOOPBACK_IFINDEX, + .flowi4_oif = l3mdev_master_ifindex_rcu(dev), .daddr = ip_hdr(skb)->saddr, .flowi4_tos = RT_TOS(ip_hdr(skb)->tos), .flowi4_scope = scope, From 77befb4bd1083bb2001851c6a4439b2467055423 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Tue, 3 Jul 2018 22:34:54 +0200 Subject: [PATCH 271/970] net: phy: fix flag masking in __set_phy_supported [ Upstream commit df8ed346d4a806a6eef2db5924285e839604b3f9 ] Currently also the pause flags are removed from phydev->supported because they're not included in PHY_DEFAULT_FEATURES. I don't think this is intended, especially when considering that this function can be called via phy_set_max_speed() anywhere in a driver. Change the masking to mask out only the values we're going to change. In addition remove the misleading comment, job of this small function is just to adjust the supported and advertised speeds. Fixes: f3a6bd393c2c ("phylib: Add phy_set_max_speed helper") Signed-off-by: Heiner Kallweit Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/phy_device.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index bf02f8e4648a..b131e555d3c2 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1579,11 +1579,8 @@ static int gen10g_resume(struct phy_device *phydev) static int __set_phy_supported(struct phy_device *phydev, u32 max_speed) { - /* The default values for phydev->supported are provided by the PHY - * driver "features" member, we want to reset to sane defaults first - * before supporting higher speeds. - */ - phydev->supported &= PHY_DEFAULT_FEATURES; + phydev->supported &= ~(PHY_1000BT_FEATURES | PHY_100BT_FEATURES | + PHY_10BT_FEATURES); switch (max_speed) { default: From 5ac2bc675cfc0c77da9a1dcd72b6340acbcd8a29 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Tue, 17 Jul 2018 20:17:33 -0500 Subject: [PATCH 272/970] ptp: fix missing break in switch [ Upstream commit 9ba8376ce1e2cbf4ce44f7e4bee1d0648e10d594 ] It seems that a *break* is missing in order to avoid falling through to the default case. Otherwise, checking *chan* makes no sense. Fixes: 72df7a7244c0 ("ptp: Allow reassigning calibration pin function") Signed-off-by: Gustavo A. R. Silva Acked-by: Richard Cochran Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/ptp/ptp_chardev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c index 58a97d420572..51364621f77c 100644 --- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -89,6 +89,7 @@ int ptp_set_pinfunc(struct ptp_clock *ptp, unsigned int pin, case PTP_PF_PHYSYNC: if (chan != 0) return -EINVAL; + break; default: return -EINVAL; } From 323bbb17d4905992c14e5512e2e8b93d29f060cc Mon Sep 17 00:00:00 2001 From: Matevz Vucnik Date: Wed, 4 Jul 2018 18:12:48 +0200 Subject: [PATCH 273/970] qmi_wwan: add support for Quectel EG91 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 38cd58ed9c4e389799b507bcffe02a7a7a180b33 ] This adds the USB id of LTE modem Quectel EG91. It requires the same quirk as other Quectel modems to make it work. Signed-off-by: Matevz Vucnik Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 6d654d65f8a0..31a6d87b61b2 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -953,6 +953,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1e0e, 0x9001, 5)}, /* SIMCom 7230E */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0125, 4)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0121, 4)}, /* Quectel EC21 Mini PCIe */ + {QMI_QUIRK_SET_DTR(0x2c7c, 0x0191, 4)}, /* Quectel EG91 */ {QMI_FIXED_INTF(0x2c7c, 0x0296, 4)}, /* Quectel BG96 */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0306, 4)}, /* Quectel EP06 Mini PCIe */ From dd08f4e69154625200efa7a239ba1147b2d04db9 Mon Sep 17 00:00:00 2001 From: Sanjeev Bansal Date: Mon, 16 Jul 2018 11:13:32 +0530 Subject: [PATCH 274/970] tg3: Add higher cpu clock for 5762. [ Upstream commit 3a498606bb04af603a46ebde8296040b2de350d1 ] This patch has fix for TX timeout while running bi-directional traffic with 100 Mbps using 5762. Signed-off-by: Sanjeev Bansal Signed-off-by: Siva Reddy Kallam Reviewed-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/tg3.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 4ffbe850d746..6250989c83d8 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -9276,6 +9276,15 @@ static int tg3_chip_reset(struct tg3 *tp) tg3_restore_clk(tp); + /* Increase the core clock speed to fix tx timeout issue for 5762 + * with 100Mbps link speed. + */ + if (tg3_asic_rev(tp) == ASIC_REV_5762) { + val = tr32(TG3_CPMU_CLCK_ORIDE_ENABLE); + tw32(TG3_CPMU_CLCK_ORIDE_ENABLE, val | + TG3_CPMU_MAC_ORIDE_ENABLE); + } + /* Reprobe ASF enable state. */ tg3_flag_clear(tp, ENABLE_ASF); tp->phy_flags &= ~(TG3_PHYFLG_1G_ON_VAUX_OK | From c439f620382a2cb0c9df3609e180f59d3ff2ee10 Mon Sep 17 00:00:00 2001 From: Alexander Couzens Date: Tue, 17 Jul 2018 13:17:09 +0200 Subject: [PATCH 275/970] net: usb: asix: replace mii_nway_restart in resume path [ Upstream commit 5c968f48021a9b3faa61ac2543cfab32461c0e05 ] mii_nway_restart is not pm aware which results in a rtnl deadlock. Implement mii_nway_restart manual by setting BMCR_ANRESTART if BMCR_ANENABLE is set. To reproduce: * plug an asix based usb network interface * wait until the device enters PM (~5 sec) * `ip link set eth1 up` will never return Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter") Signed-off-by: Alexander Couzens Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/asix_devices.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 32e9ec8f1521..5be6b67492d5 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -640,10 +640,12 @@ static void ax88772_restore_phy(struct usbnet *dev) priv->presvd_phy_advertise); /* Restore BMCR */ + if (priv->presvd_phy_bmcr & BMCR_ANENABLE) + priv->presvd_phy_bmcr |= BMCR_ANRESTART; + asix_mdio_write_nopm(dev->net, dev->mii.phy_id, MII_BMCR, priv->presvd_phy_bmcr); - mii_nway_restart(&dev->mii); priv->presvd_phy_advertise = 0; priv->presvd_phy_bmcr = 0; } From cad99229aa2048f8b0d9dd7208577603397e65a2 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Wed, 11 Jul 2018 14:39:42 +0200 Subject: [PATCH 276/970] net: Don't copy pfmemalloc flag in __copy_skb_header() [ Upstream commit 8b7008620b8452728cadead460a36f64ed78c460 ] The pfmemalloc flag indicates that the skb was allocated from the PFMEMALLOC reserves, and the flag is currently copied on skb copy and clone. However, an skb copied from an skb flagged with pfmemalloc wasn't necessarily allocated from PFMEMALLOC reserves, and on the other hand an skb allocated that way might be copied from an skb that wasn't. So we should not copy the flag on skb copy, and rather decide whether to allow an skb to be associated with sockets unrelated to page reclaim depending only on how it was allocated. Move the pfmemalloc flag before headers_start[0] using an existing 1-bit hole, so that __copy_skb_header() doesn't copy it. When cloning, we'll now take care of this flag explicitly, contravening to the warning comment of __skb_clone(). While at it, restore the newline usage introduced by commit b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") to visually separate bytes used in bitfields after headers_start[0], that was gone after commit a9e419dc7be6 ("netfilter: merge ctinfo into nfct pointer storage area"), and describe the pfmemalloc flag in the kernel-doc structure comment. This doesn't change the size of sk_buff or cacheline boundaries, but consolidates the 15 bits hole before tc_index into a 2 bytes hole before csum, that could now be filled more easily. Reported-by: Patrick Talbert Fixes: c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves") Signed-off-by: Stefano Brivio Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/skbuff.h | 10 +++++----- net/core/skbuff.c | 2 ++ 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1b3a2f95503d..b048d3d3b327 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -602,6 +602,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1, * @hash: the packet hash * @queue_mapping: Queue mapping for multiqueue devices * @xmit_more: More SKBs are pending for this queue + * @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves * @ndisc_nodetype: router type (from link layer) * @ooo_okay: allow the mapping of a socket to a queue to be changed * @l4_hash: indicate hash is a canonical 4-tuple hash over transport @@ -692,7 +693,7 @@ struct sk_buff { peeked:1, head_frag:1, xmit_more:1, - __unused:1; /* one bit hole */ + pfmemalloc:1; kmemcheck_bitfield_end(flags1); /* fields enclosed in headers_start/headers_end are copied @@ -712,19 +713,18 @@ struct sk_buff { __u8 __pkt_type_offset[0]; __u8 pkt_type:3; - __u8 pfmemalloc:1; __u8 ignore_df:1; __u8 nfctinfo:3; - __u8 nf_trace:1; + __u8 ip_summed:2; __u8 ooo_okay:1; __u8 l4_hash:1; __u8 sw_hash:1; __u8 wifi_acked_valid:1; __u8 wifi_acked:1; - __u8 no_fcs:1; + /* Indicates the inner headers are valid in the skbuff. */ __u8 encapsulation:1; __u8 encap_hdr_csum:1; @@ -732,11 +732,11 @@ struct sk_buff { __u8 csum_complete_sw:1; __u8 csum_level:2; __u8 csum_bad:1; - #ifdef CONFIG_IPV6_NDISC_NODETYPE __u8 ndisc_nodetype:2; #endif __u8 ipvs_property:1; + __u8 inner_protocol_type:1; __u8 remcsum_offload:1; #ifdef CONFIG_NET_SWITCHDEV diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9f697b00158d..36a70b2192db 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -904,6 +904,8 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) n->cloned = 1; n->nohdr = 0; n->peeked = 0; + if (skb->pfmemalloc) + n->pfmemalloc = 1; n->destructor = NULL; C(tail); C(end); From ad375eae791f92deac62a062c176903b6215e6a2 Mon Sep 17 00:00:00 2001 From: Stefano Brivio Date: Fri, 13 Jul 2018 13:21:07 +0200 Subject: [PATCH 277/970] skbuff: Unconditionally copy pfmemalloc in __skb_clone() [ Upstream commit e78bfb0751d4e312699106ba7efbed2bab1a53ca ] Commit 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") introduced a different handling for the pfmemalloc flag in copy and clone paths. In __skb_clone(), now, the flag is set only if it was set in the original skb, but not cleared if it wasn't. This is wrong and might lead to socket buffers being flagged with pfmemalloc even if the skb data wasn't allocated from pfmemalloc reserves. Copy the flag instead of ORing it. Reported-by: Sabrina Dubroca Fixes: 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") Signed-off-by: Stefano Brivio Tested-by: Sabrina Dubroca Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 36a70b2192db..8cae7aa4a4ec 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -904,8 +904,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) n->cloned = 1; n->nohdr = 0; n->peeked = 0; - if (skb->pfmemalloc) - n->pfmemalloc = 1; + C(pfmemalloc); n->destructor = NULL; C(tail); C(end); From 33b2110bd92aa830300f9a7209b720fc93f0f713 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Thu, 21 Jun 2018 16:19:41 +0300 Subject: [PATCH 278/970] xhci: Fix perceived dead host due to runtime suspend race with event handler commit 229bc19fd7aca4f37964af06e3583c1c8f36b5d6 upstream. Don't rely on event interrupt (EINT) bit alone to detect pending port change in resume. If no change event is detected the host may be suspended again, oterwise roothubs are resumed. There is a lag in xHC setting EINT. If we don't notice the pending change in resume, and the controller is runtime suspeded again, it causes the event handler to assume host is dead as it will fail to read xHC registers once PCI puts the controller to D3 state. [ 268.520969] xhci_hcd: xhci_resume: starting port polling. [ 268.520985] xhci_hcd: xhci_hub_status_data: stopping port polling. [ 268.521030] xhci_hcd: xhci_suspend: stopping port polling. [ 268.521040] xhci_hcd: // Setting command ring address to 0x349bd001 [ 268.521139] xhci_hcd: Port Status Change Event for port 3 [ 268.521149] xhci_hcd: resume root hub [ 268.521163] xhci_hcd: port resume event for port 3 [ 268.521168] xhci_hcd: xHC is not running. [ 268.521174] xhci_hcd: handle_port_status: starting port polling. [ 268.596322] xhci_hcd: xhci_hc_died: xHCI host controller not responding, assume dead The EINT lag is described in a additional note in xhci specs 4.19.2: "Due to internal xHC scheduling and system delays, there will be a lag between a change bit being set and the Port Status Change Event that it generated being written to the Event Ring. If SW reads the PORTSC and sees a change bit set, there is no guarantee that the corresponding Port Status Change Event has already been written into the Event Ring." Cc: Signed-off-by: Mathias Nyman Signed-off-by: Kai-Heng Feng Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci.c | 40 +++++++++++++++++++++++++++++++++++++--- drivers/usb/host/xhci.h | 4 ++++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index a7d239f5fc5f..81f25213cb41 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -891,6 +891,41 @@ static void xhci_disable_port_wake_on_bits(struct xhci_hcd *xhci) spin_unlock_irqrestore(&xhci->lock, flags); } +static bool xhci_pending_portevent(struct xhci_hcd *xhci) +{ + __le32 __iomem **port_array; + int port_index; + u32 status; + u32 portsc; + + status = readl(&xhci->op_regs->status); + if (status & STS_EINT) + return true; + /* + * Checking STS_EINT is not enough as there is a lag between a change + * bit being set and the Port Status Change Event that it generated + * being written to the Event Ring. See note in xhci 1.1 section 4.19.2. + */ + + port_index = xhci->num_usb2_ports; + port_array = xhci->usb2_ports; + while (port_index--) { + portsc = readl(port_array[port_index]); + if (portsc & PORT_CHANGE_MASK || + (portsc & PORT_PLS_MASK) == XDEV_RESUME) + return true; + } + port_index = xhci->num_usb3_ports; + port_array = xhci->usb3_ports; + while (port_index--) { + portsc = readl(port_array[port_index]); + if (portsc & PORT_CHANGE_MASK || + (portsc & PORT_PLS_MASK) == XDEV_RESUME) + return true; + } + return false; +} + /* * Stop HC (not bus-specific) * @@ -987,7 +1022,7 @@ EXPORT_SYMBOL_GPL(xhci_suspend); */ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) { - u32 command, temp = 0, status; + u32 command, temp = 0; struct usb_hcd *hcd = xhci_to_hcd(xhci); struct usb_hcd *secondary_hcd; int retval = 0; @@ -1109,8 +1144,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) done: if (retval == 0) { /* Resume root hubs only when have pending events. */ - status = readl(&xhci->op_regs->status); - if (status & STS_EINT) { + if (xhci_pending_portevent(xhci)) { usb_hcd_resume_root_hub(xhci->shared_hcd); usb_hcd_resume_root_hub(hcd); } diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 836398ade58d..b9181281aa9e 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -385,6 +385,10 @@ struct xhci_op_regs { #define PORT_PLC (1 << 22) /* port configure error change - port failed to configure its link partner */ #define PORT_CEC (1 << 23) +#define PORT_CHANGE_MASK (PORT_CSC | PORT_PEC | PORT_WRC | PORT_OCC | \ + PORT_RC | PORT_PLC | PORT_CEC) + + /* Cold Attach Status - xHC can set this bit to report device attached during * Sx state. Warm port reset should be perfomed to clear this bit and move port * to connected state. From 2ea8b93c0310e8092daaa422376f08bf3b321de1 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 4 May 2018 15:35:46 -0400 Subject: [PATCH 279/970] xprtrdma: Return -ENOBUFS when no pages are available commit a8f688ec437dc2045cc8f0c89fe877d5803850da upstream. The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the transport never calls xprt_write_space() when more pages become available. -ENOBUFS will trigger the correct "delay briefly and call again" logic. Fixes: 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract") Signed-off-by: Chuck Lever Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Anna Schumaker Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- net/sunrpc/xprtrdma/rpc_rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index f57c9f0ab8f9..0287734f126f 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -229,7 +229,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf, /* alloc the pagelist for receiving buffer */ ppages[p] = alloc_page(GFP_ATOMIC); if (!ppages[p]) - return -EAGAIN; + return -ENOBUFS; } seg[n].mr_page = ppages[p]; seg[n].mr_offset = (void *)(unsigned long) page_base; From 3118ceb456200d73179ddf29f2b95f59e35465f9 Mon Sep 17 00:00:00 2001 From: Alan Jenkins Date: Thu, 12 Apr 2018 19:11:58 +0100 Subject: [PATCH 280/970] block: do not use interruptible wait anywhere commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream. When blk_queue_enter() waits for a queue to unfreeze, or unset the PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI device is resumed asynchronously, i.e. after un-freezing userspace tasks. So that commit exposed the bug as a regression in v4.15. A mysterious SIGBUS (or -EIO) sometimes happened during the time the device was being resumed. Most frequently, there was no kernel log message, and we saw Xorg or Xwayland killed by SIGBUS.[1] [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 Without this fix, I get an IO error in this test: # dd if=/dev/sda of=/dev/null iflag=direct & \ while killall -SIGUSR1 dd; do sleep 0.1; done & \ echo mem > /sys/power/state ; \ sleep 5; killall dd # stop after 5 seconds The interruptible wait was added to blk_queue_enter in commit 3ef28e83ab15 ("block: generic request_queue reference counting"). Before then, the interruptible wait was only in blk-mq, but I don't think it could ever have been correct. Reviewed-by: Bart Van Assche Cc: stable@vger.kernel.org Signed-off-by: Alan Jenkins Signed-off-by: Jens Axboe Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- block/blk-core.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 23daf40be371..77b99bf16c83 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -652,7 +652,6 @@ EXPORT_SYMBOL(blk_alloc_queue); int blk_queue_enter(struct request_queue *q, bool nowait) { while (true) { - int ret; if (percpu_ref_tryget_live(&q->q_usage_counter)) return 0; @@ -660,13 +659,11 @@ int blk_queue_enter(struct request_queue *q, bool nowait) if (nowait) return -EBUSY; - ret = wait_event_interruptible(q->mq_freeze_wq, - !atomic_read(&q->mq_freeze_depth) || - blk_queue_dying(q)); + wait_event(q->mq_freeze_wq, + !atomic_read(&q->mq_freeze_depth) || + blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; - if (ret) - return ret; } } From dbcdf42bab53d219caa51bcfe54c8b9066010290 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 25 Jul 2018 11:24:03 +0200 Subject: [PATCH 281/970] Linux 4.9.115 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index f4cd42c9b940..889c58e39928 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 114 +SUBLEVEL = 115 EXTRAVERSION = NAME = Roaring Lionus From 92f724130fac801b97aef580ae5e185a699661c4 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Fri, 20 Jul 2018 13:58:21 +0200 Subject: [PATCH 282/970] MIPS: ath79: fix register address in ath79_ddr_wb_flush() commit bc88ad2efd11f29e00a4fd60fcd1887abfe76833 upstream. ath79_ddr_wb_flush_base has the type void __iomem *, so register offsets need to be a multiple of 4 in order to access the intended register. Signed-off-by: Felix Fietkau Signed-off-by: John Crispin Signed-off-by: Paul Burton Fixes: 24b0e3e84fbf ("MIPS: ath79: Improve the DDR controller interface") Patchwork: https://patchwork.linux-mips.org/patch/19912/ Cc: Alban Bedel Cc: James Hogan Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/ath79/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/ath79/common.c b/arch/mips/ath79/common.c index d071a3a0f876..fc97a11c41c5 100644 --- a/arch/mips/ath79/common.c +++ b/arch/mips/ath79/common.c @@ -58,7 +58,7 @@ EXPORT_SYMBOL_GPL(ath79_ddr_ctrl_init); void ath79_ddr_wb_flush(u32 reg) { - void __iomem *flush_reg = ath79_ddr_wb_flush_base + reg; + void __iomem *flush_reg = ath79_ddr_wb_flush_base + (reg * 4); /* Flush the DDR write buffer. */ __raw_writel(0x1, flush_reg); From 650321fe96150898c69c58c6684b33edbe689de1 Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Thu, 12 Jul 2018 09:33:04 -0700 Subject: [PATCH 283/970] MIPS: Fix off-by-one in pci_resource_to_user() commit 38c0a74fe06da3be133cae3fb7bde6a9438e698b upstream. The MIPS implementation of pci_resource_to_user() introduced in v3.12 by commit 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly") incorrectly sets *end to the address of the byte after the resource, rather than the last byte of the resource. This results in userland seeing resources as a byte larger than they actually are, for example a 32 byte BAR will be reported by a tool such as lspci as being 33 bytes in size: Region 2: I/O ports at 1000 [disabled] [size=33] Correct this by subtracting one from the calculated end address, reporting the correct address to userland. Signed-off-by: Paul Burton Reported-by: Rui Wang Fixes: 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly") Cc: James Hogan Cc: Ralf Baechle Cc: Wolfgang Grandegger Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v3.12+ Patchwork: https://patchwork.linux-mips.org/patch/19829/ Signed-off-by: Greg Kroah-Hartman --- arch/mips/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c index f6325fa657fb..64ae8c094a54 100644 --- a/arch/mips/pci/pci.c +++ b/arch/mips/pci/pci.c @@ -55,7 +55,7 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar, phys_addr_t size = resource_size(rsrc); *start = fixup_bigphys_addr(rsrc->start, size); - *end = rsrc->start + size; + *end = rsrc->start + size - 1; } int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, From 93d94fec94c7df51c7f36f45fdc18994073f47ee Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Mon, 23 Jul 2018 16:50:48 +0200 Subject: [PATCH 284/970] ip: hash fragments consistently [ Upstream commit 3dd1c9a1270736029ffca670e9bd0265f4120600 ] The skb hash for locally generated ip[v6] fragments belonging to the same datagram can vary in several circumstances: * for connected UDP[v6] sockets, the first fragment get its hash via set_owner_w()/skb_set_hash_from_sk() * for unconnected IPv6 UDPv6 sockets, the first fragment can get its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if auto_flowlabel is enabled For the following frags the hash is usually computed via skb_get_hash(). The above can cause OoO for unconnected IPv6 UDPv6 socket: in that scenario the egress tx queue can be selected on a per packet basis via the skb hash. It may also fool flow-oriented schedulers to place fragments belonging to the same datagram in different flows. Fix the issue by copying the skb hash from the head frag into the others at fragmentation time. Before this commit: perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8" netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n & perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1 perf script probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0 After this commit: probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0 Fixes: b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit") Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Signed-off-by: Paolo Abeni Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ip_output.c | 2 ++ net/ipv6/ip6_output.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 3b1f3bc8becb..100c86f1f547 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -497,6 +497,8 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from) to->dev = from->dev; to->mark = from->mark; + skb_copy_hash(to, from); + /* Copy the flags to each fragment. */ IPCB(to)->flags = IPCB(from)->flags; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index eb9046eae581..ea14466cdca8 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -576,6 +576,8 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from) to->dev = from->dev; to->mark = from->mark; + skb_copy_hash(to, from); + #ifdef CONFIG_NET_SCHED to->tc_index = from->tc_index; #endif From 03fbf2b8237a3f1689848cf9f2d9bc7c1869e834 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Mon, 23 Jul 2018 19:36:48 -0400 Subject: [PATCH 285/970] ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull [ Upstream commit 2efd4fca703a6707cad16ab486eaab8fc7f0fd49 ] Syzbot reported a read beyond the end of the skb head when returning IPV6_ORIGDSTADDR: BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242 CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261 copy_to_user include/linux/uaccess.h:184 [inline] put_cmsg+0x5ef/0x860 net/core/scm.c:242 ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719 ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733 rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521 [..] This logic and its ipv4 counterpart read the destination port from the packet at skb_transport_offset(skb) + 4. With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a packet that stores headers exactly up to skb_transport_offset(skb) in the head and the remainder in a frag. Call pskb_may_pull before accessing the pointer to ensure that it lies in skb head. Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ip_sockglue.c | 7 +++++-- net/ipv6/datagram.c | 7 +++++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index dd80276a8205..b21e435f428c 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -135,15 +135,18 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb) { struct sockaddr_in sin; const struct iphdr *iph = ip_hdr(skb); - __be16 *ports = (__be16 *)skb_transport_header(skb); + __be16 *ports; + int end; - if (skb_transport_offset(skb) + 4 > (int)skb->len) + end = skb_transport_offset(skb) + 4; + if (end > 0 && !pskb_may_pull(skb, end)) return; /* All current transport protocols have the port numbers in the * first four bytes of the transport header and this function is * written with this assumption in mind. */ + ports = (__be16 *)skb_transport_header(skb); sin.sin_family = AF_INET; sin.sin_addr.s_addr = iph->daddr; diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 38062f403ceb..2d3c8fe27583 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -694,13 +694,16 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg, } if (np->rxopt.bits.rxorigdstaddr) { struct sockaddr_in6 sin6; - __be16 *ports = (__be16 *) skb_transport_header(skb); + __be16 *ports; + int end; - if (skb_transport_offset(skb) + 4 <= (int)skb->len) { + end = skb_transport_offset(skb) + 4; + if (end <= 0 || pskb_may_pull(skb, end)) { /* All current transport protocols have the port numbers in the * first four bytes of the transport header and this function is * written with this assumption in mind. */ + ports = (__be16 *)skb_transport_header(skb); sin6.sin6_family = AF_INET6; sin6.sin6_addr = ipv6_hdr(skb)->daddr; From 444987d535bf65b109b60f45406abe1d7f11f1f7 Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Tue, 24 Jul 2018 14:27:55 +0300 Subject: [PATCH 286/970] net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper [ Upstream commit 958c696f5a7274d9447a458ad7aa70719b29a50a ] Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp context, rather than the one passed in the input modifier. However, the qp number in the qp context is not defined as a required parameter by the FW. Therefore, drivers may choose to not specify the qp number in the qp context for the reset-to-init transition. Thus, we must save the qp number passed in the command input modifier -- which is always present. (This saved qp number is used as the input modifier for command 2RST_QP when a slave's qp's are destroyed). Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests") Signed-off-by: Jack Morgenstein Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index d6b06bef1b69..9d1a7d5ae835 100644 --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -2916,7 +2916,7 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4_dev *dev, int slave, u32 srqn = qp_get_srqn(qpc) & 0xffffff; int use_srq = (qp_get_srqn(qpc) >> 24) & 1; struct res_srq *srq; - int local_qpn = be32_to_cpu(qpc->local_qpn) & 0xffffff; + int local_qpn = vhcr->in_modifier & 0xffffff; err = adjust_qp_sched_queue(dev, slave, qpc, inbox); if (err) From e2ffdd646cf61ae1444d376d1ea1e756a495eff8 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 19 Jul 2018 16:04:38 -0700 Subject: [PATCH 287/970] net: skb_segment() should not return NULL [ Upstream commit ff907a11a0d68a749ce1a321f4505c03bf72190c ] syzbot caught a NULL deref [1], caused by skb_segment() skb_segment() has many "goto err;" that assume the @err variable contains -ENOMEM. A successful call to __skb_linearize() should not clear @err, otherwise a subsequent memory allocation error could return NULL. While we are at it, we might use -EINVAL instead of -ENOMEM when MAX_SKB_FRAGS limit is reached. [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106 Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 <0f> b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206 RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000 RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090 RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6 R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001 R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128 FS: 00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54 inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342 inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342 skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792 __skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865 skb_gso_segment include/linux/netdevice.h:4099 [inline] validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104 __dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561 dev_queue_xmit+0x17/0x20 net/core/dev.c:3602 neigh_hh_output include/net/neighbour.h:473 [inline] neigh_output include/net/neighbour.h:481 [inline] ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229 ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:276 [inline] ip_output+0x223/0x880 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:444 [inline] ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124 iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91 ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778 ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308 __netdev_start_xmit include/linux/netdevice.h:4148 [inline] netdev_start_xmit include/linux/netdevice.h:4157 [inline] xmit_one net/core/dev.c:3034 [inline] dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050 __dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569 dev_queue_xmit+0x17/0x20 net/core/dev.c:3602 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403 neigh_output include/net/neighbour.h:483 [inline] ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229 ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:276 [inline] ip_output+0x223/0x880 net/ipv4/ip_output.c:405 dst_output include/net/dst.h:444 [inline] ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124 ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504 tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168 tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363 __tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536 tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735 tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:641 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:651 __sys_sendto+0x3d7/0x670 net/socket.c:1797 __do_sys_sendto net/socket.c:1809 [inline] __se_sys_sendto net/socket.c:1805 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x455ab9 Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9 RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014 R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) Fixes: ddff00d42043 ("net: Move skb_has_shared_frag check out of GRE code and into segmentation") Signed-off-by: Eric Dumazet Cc: Alexander Duyck Reported-by: syzbot Acked-by: Alexander Duyck Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8cae7aa4a4ec..84c731aef0d8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3253,6 +3253,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, net_warn_ratelimited( "skb_segment: too many frags: %u %u\n", pos, mss); + err = -EINVAL; goto err; } @@ -3289,11 +3290,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, perform_csum_check: if (!csum) { - if (skb_has_shared_frag(nskb)) { - err = __skb_linearize(nskb); - if (err) - goto err; - } + if (skb_has_shared_frag(nskb) && + __skb_linearize(nskb)) + goto err; + if (!nskb->remcsum_offload) nskb->ip_summed = CHECKSUM_NONE; SKB_GSO_CB(nskb)->csum = From adcecd4ab150f4e2dfdb2fcbf3f1394de6f9dd7a Mon Sep 17 00:00:00 2001 From: Ariel Levkovich Date: Mon, 25 Jun 2018 19:12:02 +0300 Subject: [PATCH 288/970] net/mlx5: Adjust clock overflow work period [ Upstream commit 33180bee86a8940a84950edca46315cd9dd6deb5 ] When driver converts HW timestamp to wall clock time it subtracts the last saved cycle counter from the HW timestamp and converts the difference to nanoseconds. The conversion is done by multiplying the cycles difference with the clock multiplier value as a first step and therefore the cycles difference should be small enough so that the multiplication product doesn't exceed 64bit. The overflow handling routine is in charge of updating the last saved cycle counter in driver and it is called periodically using kernel delayed workqueue. The delay period for this work is calculated using the max HW cycle counter value (a 41 bit mask) as a base which doesn't take the 64bit limit into account so the delay period may be incorrect and too long to prevent a large difference between the HW counter and the last saved counter in SW. This change adjusts the work period for the HW clock overflow work by taking the minimum between the previous value and the quotient of max u64 value and the clock multiplier value. Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support") Signed-off-by: Ariel Levkovich Reviewed-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c index 1612ec0d9103..f8b99d0b54d5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c @@ -233,6 +233,7 @@ static void mlx5e_timestamp_init_config(struct mlx5e_tstamp *tstamp) void mlx5e_timestamp_init(struct mlx5e_priv *priv) { struct mlx5e_tstamp *tstamp = &priv->tstamp; + u64 overflow_cycles; u64 ns; u64 frac = 0; u32 dev_freq; @@ -257,10 +258,17 @@ void mlx5e_timestamp_init(struct mlx5e_priv *priv) /* Calculate period in seconds to call the overflow watchdog - to make * sure counter is checked at least once every wrap around. + * The period is calculated as the minimum between max HW cycles count + * (The clock source mask) and max amount of cycles that can be + * multiplied by clock multiplier where the result doesn't exceed + * 64bits. */ - ns = cyclecounter_cyc2ns(&tstamp->cycles, tstamp->cycles.mask, + overflow_cycles = div64_u64(~0ULL >> 1, tstamp->cycles.mult); + overflow_cycles = min(overflow_cycles, tstamp->cycles.mask >> 1); + + ns = cyclecounter_cyc2ns(&tstamp->cycles, overflow_cycles, frac, &frac); - do_div(ns, NSEC_PER_SEC / 2 / HZ); + do_div(ns, NSEC_PER_SEC / HZ); tstamp->overflow_period = ns; INIT_DELAYED_WORK(&tstamp->overflow_work, mlx5e_timestamp_overflow); From d9d580121617d95aade9aa839d4e405521eafcb8 Mon Sep 17 00:00:00 2001 From: Eran Ben Elisha Date: Sun, 8 Jul 2018 14:52:12 +0300 Subject: [PATCH 289/970] net/mlx5e: Don't allow aRFS for encapsulated packets [ Upstream commit d2e1c57bcf9a07cbb67f30ecf238f298799bce1c ] Driver is yet to support aRFS for encapsulated packets, return early error in such case. Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support") Signed-off-by: Eran Ben Elisha Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index a8cb38789774..885f47e9ae93 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -715,6 +715,9 @@ int mlx5e_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb, skb->protocol != htons(ETH_P_IPV6)) return -EPROTONOSUPPORT; + if (skb->encapsulation) + return -EPROTONOSUPPORT; + arfs_t = arfs_get_table(arfs, arfs_get_ip_proto(skb), skb->protocol); if (!arfs_t) return -EPROTONOSUPPORT; From b7e37add79bf1a2c9f4a7574b0d7f53f0f6df663 Mon Sep 17 00:00:00 2001 From: Eran Ben Elisha Date: Sun, 8 Jul 2018 13:08:55 +0300 Subject: [PATCH 290/970] net/mlx5e: Fix quota counting in aRFS expire flow [ Upstream commit 2630bae8018823c3b88788b69fb9f16ea3b4a11e ] Quota should follow the amount of rules which do expire, and not the number of rules that were examined, fixed that. Fixes: 18c908e477dc ("net/mlx5e: Add accelerated RFS support") Signed-off-by: Eran Ben Elisha Reviewed-by: Maor Gottlieb Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index 885f47e9ae93..4a51fc6908ad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -383,14 +383,14 @@ static void arfs_may_expire_flow(struct mlx5e_priv *priv) HLIST_HEAD(del_list); spin_lock_bh(&priv->fs.arfs.arfs_lock); mlx5e_for_each_arfs_rule(arfs_rule, htmp, priv->fs.arfs.arfs_tables, i, j) { - if (quota++ > MLX5E_ARFS_EXPIRY_QUOTA) - break; if (!work_pending(&arfs_rule->arfs_work) && rps_may_expire_flow(priv->netdev, arfs_rule->rxq, arfs_rule->flow_id, arfs_rule->filter_id)) { hlist_del_init(&arfs_rule->hlist); hlist_add_head(&arfs_rule->hlist, &del_list); + if (quota++ > MLX5E_ARFS_EXPIRY_QUOTA) + break; } } spin_unlock_bh(&priv->fs.arfs.arfs_lock); From cc403d5dc140ba74400335b84b91f97d07dbd569 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Fri, 20 Jul 2018 14:04:27 +0800 Subject: [PATCH 291/970] multicast: do not restore deleted record source filter mode to new one There are two scenarios that we will restore deleted records. The first is when device down and up(or unmap/remap). In this scenario the new filter mode is same with previous one. Because we get it from in_dev->mc_list and we do not touch it during device down and up. The other scenario is when a new socket join a group which was just delete and not finish sending status reports. In this scenario, we should use the current filter mode instead of restore old one. Here are 4 cases in total. old_socket new_socket before_fix after_fix IN(A) IN(A) ALLOW(A) ALLOW(A) IN(A) EX( ) TO_IN( ) TO_EX( ) EX( ) IN(A) TO_EX( ) ALLOW(A) EX( ) EX( ) TO_EX( ) TO_EX( ) Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link down) Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link down) Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/igmp.c | 3 +-- net/ipv6/mcast.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 7f5fe07d0b13..f2e6e874e4ec 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1193,8 +1193,7 @@ static void igmpv3_del_delrec(struct in_device *in_dev, struct ip_mc_list *im) if (pmc) { im->interface = pmc->interface; im->crcount = in_dev->mr_qrv ?: net->ipv4.sysctl_igmp_qrv; - im->sfmode = pmc->sfmode; - if (pmc->sfmode == MCAST_INCLUDE) { + if (im->sfmode == MCAST_INCLUDE) { im->tomb = pmc->tomb; im->sources = pmc->sources; for (psf = im->sources; psf; psf = psf->sf_next) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index ca8fac6e5a09..918c161e5b55 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -771,8 +771,7 @@ static void mld_del_delrec(struct inet6_dev *idev, struct ifmcaddr6 *im) if (pmc) { im->idev = pmc->idev; im->mca_crcount = idev->mc_qrv; - im->mca_sfmode = pmc->mca_sfmode; - if (pmc->mca_sfmode == MCAST_INCLUDE) { + if (im->mca_sfmode == MCAST_INCLUDE) { im->mca_tomb = pmc->mca_tomb; im->mca_sources = pmc->mca_sources; for (psf = im->mca_sources; psf; psf = psf->sf_next) From c6ac36be72e46082bedd4433ccba9925bbf6a322 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 19 Jul 2018 08:15:16 +0200 Subject: [PATCH 292/970] net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv [ Upstream commit 215d08a85b9acf5e1fe9dbf50f1774cde333efef ] The situation described in the comment can occur also with PHY_IGNORE_INTERRUPT, therefore change the condition to include it. Fixes: f555f34fdc58 ("net: phy: fix auto-negotiation stall due to unavailable interrupt") Signed-off-by: Heiner Kallweit Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/phy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 4d217649c8b1..5fde8e335f13 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -598,7 +598,7 @@ static int phy_start_aneg_priv(struct phy_device *phydev, bool sync) * negotiation may already be done and aneg interrupt may not be * generated. */ - if (phy_interrupt_is_valid(phydev) && (phydev->state == PHY_AN)) { + if (phydev->irq != PHY_POLL && phydev->state == PHY_AN) { err = phy_aneg_done(phydev); if (err > 0) { trigger = true; From 19b74799159ab6f780fedd568adf00cdee9edd5a Mon Sep 17 00:00:00 2001 From: Roopa Prabhu Date: Fri, 20 Jul 2018 13:21:01 -0700 Subject: [PATCH 293/970] rtnetlink: add rtnl_link_state check in rtnl_configure_link [ Upstream commit 5025f7f7d506fba9b39e7fe8ca10f6f34cb9bc2d ] rtnl_configure_link sets dev->rtnl_link_state to RTNL_LINK_INITIALIZED and unconditionally calls __dev_notify_flags to notify user-space of dev flags. current call sequence for rtnl_configure_link rtnetlink_newlink rtnl_link_ops->newlink rtnl_configure_link (unconditionally notifies userspace of default and new dev flags) If a newlink handler wants to call rtnl_configure_link early, we will end up with duplicate notifications to user-space. This patch fixes rtnl_configure_link to check rtnl_link_state and call __dev_notify_flags with gchanges = 0 if already RTNL_LINK_INITIALIZED. Later in the series, this patch will help the following sequence where a driver implementing newlink can call rtnl_configure_link to initialize the link early. makes the following call sequence work: rtnetlink_newlink rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes link and notifies user-space of default dev flags) rtnl_configure_link (updates dev flags if requested by user ifm and notifies user-space of new dev flags) Signed-off-by: Roopa Prabhu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/rtnetlink.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f3a0ad14b454..194e844e1021 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2339,9 +2339,12 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm) return err; } - dev->rtnl_link_state = RTNL_LINK_INITIALIZED; - - __dev_notify_flags(dev, old_flags, ~0U); + if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) { + __dev_notify_flags(dev, old_flags, 0U); + } else { + dev->rtnl_link_state = RTNL_LINK_INITIALIZED; + __dev_notify_flags(dev, old_flags, ~0U); + } return 0; } EXPORT_SYMBOL(rtnl_configure_link); From 841778018235f7328a8be29cbdaa36c7703d064c Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Thu, 12 Jul 2018 06:04:52 -0700 Subject: [PATCH 294/970] tcp: fix dctcp delayed ACK schedule [ Upstream commit b0c05d0e99d98d7f0cd41efc1eeec94efdc3325d ] Previously, when a data segment was sent an ACK was piggybacked on the data segment without generating a CA_EVENT_NON_DELAYED_ACK event to notify congestion control modules. So the DCTCP ca->delayed_ack_reserved flag could incorrectly stay set when in fact there were no delayed ACKs being reserved. This could result in sending a special ECN notification ACK that carries an older ACK sequence, when in fact there was no need for such an ACK. DCTCP keeps track of the delayed ACK status with its own separate state ca->delayed_ack_reserved. Previously it may accidentally cancel the delayed ACK without updating this field upon sending a special ACK that carries a older ACK sequence. This inconsistency would lead to DCTCP receiver never acknowledging the latest data until the sender times out and retry in some cases. Packetdrill script (provided by Larry Brakmo) 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 0.100 > SE. 0:0(0) ack 1 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 2:3(1) ack 2001 0.200 < [ect0] . 2001:3001(1000) ack 3 win 257 0.200 < [ect0] . 3001:4001(1000) ack 3 win 257 0.200 > [ect01] . 3:3(0) ack 4001 0.210 < [ce] P. 4001:4501(500) ack 3 win 257 +0.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501 +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257 // Previously the ACK sequence below would be 4501, causing a long RTO +0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data +0 > [ect01] . 4:4(0) ack 6501 // now acks everything +0.500 < F. 9501:9501(0) ack 4 win 257 Reported-by: Larry Brakmo Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell Acked-by: Lawrence Brakmo Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_dctcp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index ab37c6775630..e42f800b5dda 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -134,7 +134,8 @@ static void dctcp_ce_state_0_to_1(struct sock *sk) /* State has changed from CE=0 to CE=1 and delayed * ACK has not sent yet. */ - if (!ca->ce_state && ca->delayed_ack_reserved) { + if (!ca->ce_state && + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { u32 tmp_rcv_nxt; /* Save current rcv_nxt. */ @@ -164,7 +165,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk) /* State has changed from CE=1 to CE=0 and delayed * ACK has not sent yet. */ - if (ca->ce_state && ca->delayed_ack_reserved) { + if (ca->ce_state && + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { u32 tmp_rcv_nxt; /* Save current rcv_nxt. */ From 1fcccc57866bd5c31ec09df7accdfecb0d222a85 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Wed, 18 Jul 2018 13:56:34 -0700 Subject: [PATCH 295/970] tcp: helpers to send special DCTCP ack [ Upstream commit 2987babb6982306509380fc11b450227a844493b ] Refactor and create helpers to send the special ACK in DCTCP. Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_output.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f07a0a1c98ff..ea78ef3195a4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -905,8 +905,8 @@ void tcp_wfree(struct sk_buff *skb) * We are working here with either a clone of the original * SKB, or a fresh unique copy made by the retransmit engine. */ -static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, - gfp_t gfp_mask) +static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, + int clone_it, gfp_t gfp_mask, u32 rcv_nxt) { const struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet; @@ -969,7 +969,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, th->source = inet->inet_sport; th->dest = inet->inet_dport; th->seq = htonl(tcb->seq); - th->ack_seq = htonl(tp->rcv_nxt); + th->ack_seq = htonl(rcv_nxt); *(((__be16 *)th) + 6) = htons(((tcp_header_size >> 2) << 12) | tcb->tcp_flags); @@ -1046,6 +1046,13 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, return err; } +static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, + gfp_t gfp_mask) +{ + return __tcp_transmit_skb(sk, skb, clone_it, gfp_mask, + tcp_sk(sk)->rcv_nxt); +} + /* This routine just queues the buffer for sending. * * NOTE: probe0 timer is not checked, do not forget tcp_push_pending_frames, @@ -3482,7 +3489,7 @@ void tcp_send_delayed_ack(struct sock *sk) } /* This routine sends an ack and also updates the window. */ -void tcp_send_ack(struct sock *sk) +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) { struct sk_buff *buff; @@ -3520,7 +3527,12 @@ void tcp_send_ack(struct sock *sk) /* Send it off, this clears delayed acks for us. */ skb_mstamp_get(&buff->skb_mstamp); - tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0); + __tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0, rcv_nxt); +} + +void tcp_send_ack(struct sock *sk) +{ + __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt); } EXPORT_SYMBOL_GPL(tcp_send_ack); From 57ec8824b14d76089d1fd0d963ccc2d084070e57 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Wed, 18 Jul 2018 13:56:35 -0700 Subject: [PATCH 296/970] tcp: do not cancel delay-AcK on DCTCP special ACK [ Upstream commit 27cde44a259c380a3c09066fc4b42de7dde9b1ad ] Currently when a DCTCP receiver delays an ACK and receive a data packet with a different CE mark from the previous one's, it sends two immediate ACKs acking previous and latest sequences respectly (for ECN accounting). Previously sending the first ACK may mark off the delayed ACK timer (tcp_event_ack_sent). This may subsequently prevent sending the second ACK to acknowledge the latest sequence (tcp_ack_snd_check). The culprit is that tcp_send_ack() assumes it always acknowleges the latest sequence, which is not true for the first special ACK. The fix is to not make the assumption in tcp_send_ack and check the actual ack sequence before cancelling the delayed ACK. Further it's safer to pass the ack sequence number as a local variable into tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid future bugs like this. Reported-by: Neal Cardwell Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 1 + net/ipv4/tcp_dctcp.c | 34 ++++------------------------------ net/ipv4/tcp_output.c | 11 ++++++++--- 3 files changed, 13 insertions(+), 33 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 18f029bcb8c7..fd575bc80745 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -553,6 +553,7 @@ void tcp_send_fin(struct sock *sk); void tcp_send_active_reset(struct sock *sk, gfp_t priority); int tcp_send_synack(struct sock *); void tcp_push_one(struct sock *, unsigned int mss_now); +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt); void tcp_send_ack(struct sock *sk); void tcp_send_delayed_ack(struct sock *sk); void tcp_send_loss_probe(struct sock *sk); diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index e42f800b5dda..f87d2a61543b 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -135,21 +135,8 @@ static void dctcp_ce_state_0_to_1(struct sock *sk) * ACK has not sent yet. */ if (!ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=0. */ - tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; - } + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 1; @@ -166,21 +153,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk) * ACK has not sent yet. */ if (ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=1. */ - tp->ecn_flags |= TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; - } + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ea78ef3195a4..5f916953b28e 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -174,8 +174,13 @@ static void tcp_event_data_sent(struct tcp_sock *tp, } /* Account for an ACK we sent. */ -static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts) +static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts, + u32 rcv_nxt) { + struct tcp_sock *tp = tcp_sk(sk); + + if (unlikely(rcv_nxt != tp->rcv_nxt)) + return; /* Special ACK sent by DCTCP to reflect ECN */ tcp_dec_quickack_mode(sk, pkts); inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK); } @@ -1010,7 +1015,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, icsk->icsk_af_ops->send_check(sk, skb); if (likely(tcb->tcp_flags & TCPHDR_ACK)) - tcp_event_ack_sent(sk, tcp_skb_pcount(skb)); + tcp_event_ack_sent(sk, tcp_skb_pcount(skb), rcv_nxt); if (skb->len != tcp_header_size) { tcp_event_data_sent(tp, sk); @@ -3529,12 +3534,12 @@ void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) skb_mstamp_get(&buff->skb_mstamp); __tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0, rcv_nxt); } +EXPORT_SYMBOL_GPL(__tcp_send_ack); void tcp_send_ack(struct sock *sk) { __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt); } -EXPORT_SYMBOL_GPL(tcp_send_ack); /* This routine sends a packet with an out of date sequence * number. It assumes the other end will try to ack it. From 8736711f4e55a50e0ec89edd366256f8edb5f9c3 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Wed, 18 Jul 2018 13:56:36 -0700 Subject: [PATCH 297/970] tcp: do not delay ACK in DCTCP upon CE status change [ Upstream commit a0496ef2c23b3b180902dd185d0d63ccbc624cf8 ] Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change has to be sent immediately so the sender can respond quickly: """ When receiving packets, the CE codepoint MUST be processed as follows: 1. If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to true and send an immediate ACK. 2. If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE to false and send an immediate ACK. """ Previously DCTCP implementation may continue to delay the ACK. This patch fixes that to implement the RFC by forcing an immediate ACK. Tested with this packetdrill script provided by Larry Brakmo 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 0.100 > SE. 0:0(0) ack 1 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 +0.005 < [ce] . 2001:3001(1000) ack 2 win 257 +0.000 > [ect01] . 2:2(0) ack 2001 // Previously the ACK below would be delayed by 40ms +0.000 > [ect01] E. 2:2(0) ack 3001 +0.500 < F. 9501:9501(0) ack 4 win 257 Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 1 + net/ipv4/tcp_dctcp.c | 30 ++++++++++++++++++------------ net/ipv4/tcp_input.c | 3 ++- 3 files changed, 21 insertions(+), 13 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index fd575bc80745..5d440bb0e409 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -363,6 +363,7 @@ ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); +void tcp_enter_quickack_mode(struct sock *sk); static inline void tcp_dec_quickack_mode(struct sock *sk, const unsigned int pkts) { diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index f87d2a61543b..dd52ccb812ea 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -131,12 +131,15 @@ static void dctcp_ce_state_0_to_1(struct sock *sk) struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); - /* State has changed from CE=0 to CE=1 and delayed - * ACK has not sent yet. - */ - if (!ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) - __tcp_send_ack(sk, ca->prior_rcv_nxt); + if (!ca->ce_state) { + /* State has changed from CE=0 to CE=1, force an immediate + * ACK to reflect the new CE state. If an ACK was delayed, + * send that first to reflect the prior CE state. + */ + if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); + tcp_enter_quickack_mode(sk); + } ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 1; @@ -149,12 +152,15 @@ static void dctcp_ce_state_1_to_0(struct sock *sk) struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); - /* State has changed from CE=1 to CE=0 and delayed - * ACK has not sent yet. - */ - if (ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) - __tcp_send_ack(sk, ca->prior_rcv_nxt); + if (ca->ce_state) { + /* State has changed from CE=1 to CE=0, force an immediate + * ACK to reflect the new CE state. If an ACK was delayed, + * send that first to reflect the prior CE state. + */ + if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); + tcp_enter_quickack_mode(sk); + } ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 0; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index be453aa8fce8..71f2b0958c4f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -209,13 +209,14 @@ static void tcp_incr_quickack(struct sock *sk) icsk->icsk_ack.quick = min(quickacks, TCP_MAX_QUICKACKS); } -static void tcp_enter_quickack_mode(struct sock *sk) +void tcp_enter_quickack_mode(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); tcp_incr_quickack(sk); icsk->icsk_ack.pingpong = 0; icsk->icsk_ack.ato = TCP_ATO_MIN; } +EXPORT_SYMBOL(tcp_enter_quickack_mode); /* Send ACKs quickly, if "quick" count is not exhausted * and the session is not interactive. From 2d08921c8da26bdce3d8848ef6f32068f594d7d4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 23 Jul 2018 09:28:17 -0700 Subject: [PATCH 298/970] tcp: free batches of packets in tcp_prune_ofo_queue() [ Upstream commit 72cd43ba64fc172a443410ce01645895850844c8 ] Juha-Matti Tilli reported that malicious peers could inject tiny packets in out_of_order_queue, forcing very expensive calls to tcp_collapse_ofo_queue() and tcp_prune_ofo_queue() for every incoming packet. out_of_order_queue rb-tree can contain thousands of nodes, iterating over all of them is not nice. Before linux-4.9, we would have pruned all packets in ofo_queue in one go, every XXXX packets. XXXX depends on sk_rcvbuf and skbs truesize, but is about 7000 packets with tcp_rmem[2] default of 6 MB. Since we plan to increase tcp_rmem[2] in the future to cope with modern BDP, can not revert to the old behavior, without great pain. Strategy taken in this patch is to purge ~12.5 % of the queue capacity. Fixes: 36a6503fedda ("tcp: refine tcp_prune_ofo_queue() to not drop all packets") Signed-off-by: Eric Dumazet Reported-by: Juha-Matti Tilli Acked-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/skbuff.h | 2 ++ net/ipv4/tcp_input.c | 15 +++++++++++---- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b048d3d3b327..1f207dd22757 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2982,6 +2982,8 @@ static inline int __skb_grow_rcsum(struct sk_buff *skb, unsigned int len) return __skb_grow(skb, len); } +#define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode) + #define skb_queue_walk(queue, skb) \ for (skb = (queue)->next; \ skb != (struct sk_buff *)(queue); \ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 71f2b0958c4f..2eabf219b71d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4965,6 +4965,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk) * 2) not add too big latencies if thousands of packets sit there. * (But if application shrinks SO_RCVBUF, we could still end up * freeing whole queue here) + * 3) Drop at least 12.5 % of sk_rcvbuf to avoid malicious attacks. * * Return true if queue has shrunk. */ @@ -4972,20 +4973,26 @@ static bool tcp_prune_ofo_queue(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct rb_node *node, *prev; + int goal; if (RB_EMPTY_ROOT(&tp->out_of_order_queue)) return false; NET_INC_STATS(sock_net(sk), LINUX_MIB_OFOPRUNED); + goal = sk->sk_rcvbuf >> 3; node = &tp->ooo_last_skb->rbnode; do { prev = rb_prev(node); rb_erase(node, &tp->out_of_order_queue); + goal -= rb_to_skb(node)->truesize; tcp_drop(sk, rb_entry(node, struct sk_buff, rbnode)); - sk_mem_reclaim(sk); - if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf && - !tcp_under_memory_pressure(sk)) - break; + if (!prev || goal <= 0) { + sk_mem_reclaim(sk); + if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf && + !tcp_under_memory_pressure(sk)) + break; + goal = sk->sk_rcvbuf >> 3; + } node = prev; } while (node); tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode); From fdf258ed5dd85b57cf0e0e66500be98d38d42d02 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 23 Jul 2018 09:28:18 -0700 Subject: [PATCH 299/970] tcp: avoid collapses in tcp_prune_queue() if possible [ Upstream commit f4a3313d8e2ca9fd8d8f45e40a2903ba782607e7 ] Right after a TCP flow is created, receiving tiny out of order packets allways hit the condition : if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) tcp_clamp_window(sk); tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc (guarded by tcp_rmem[2]) Calling tcp_collapse_ofo_queue() in this case is not useful, and offers a O(N^2) surface attack to malicious peers. Better not attempt anything before full queue capacity is reached, forcing attacker to spend lots of resource and allow us to more easily detect the abuse. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Yuchung Cheng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2eabf219b71d..0a1ab0522511 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5027,6 +5027,9 @@ static int tcp_prune_queue(struct sock *sk) else if (tcp_under_memory_pressure(sk)) tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss); + if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) + return 0; + tcp_collapse_ofo_queue(sk); if (!skb_queue_empty(&sk->sk_receive_queue)) tcp_collapse(sk, &sk->sk_receive_queue, NULL, From a878681484a0992ee3dfbd7826439951f9f82a69 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 23 Jul 2018 09:28:19 -0700 Subject: [PATCH 300/970] tcp: detect malicious patterns in tcp_collapse_ofo_queue() [ Upstream commit 3d4bf93ac12003f9b8e1e2de37fe27983deebdcf ] In case an attacker feeds tiny packets completely out of order, tcp_collapse_ofo_queue() might scan the whole rb-tree, performing expensive copies, but not changing socket memory usage at all. 1) Do not attempt to collapse tiny skbs. 2) Add logic to exit early when too many tiny skbs are detected. We prefer not doing aggressive collapsing (which copies packets) for pathological flows, and revert to tcp_prune_ofo_queue() which will be less expensive. In the future, we might add the possibility of terminating flows that are proven to be malicious. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0a1ab0522511..d0c47c2fefd2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4918,6 +4918,7 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, static void tcp_collapse_ofo_queue(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); + u32 range_truesize, sum_tiny = 0; struct sk_buff *skb, *head; struct rb_node *p; u32 start, end; @@ -4936,6 +4937,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk) } start = TCP_SKB_CB(skb)->seq; end = TCP_SKB_CB(skb)->end_seq; + range_truesize = skb->truesize; for (head = skb;;) { skb = tcp_skb_next(skb, NULL); @@ -4946,11 +4948,20 @@ static void tcp_collapse_ofo_queue(struct sock *sk) if (!skb || after(TCP_SKB_CB(skb)->seq, end) || before(TCP_SKB_CB(skb)->end_seq, start)) { - tcp_collapse(sk, NULL, &tp->out_of_order_queue, - head, skb, start, end); + /* Do not attempt collapsing tiny skbs */ + if (range_truesize != head->truesize || + end - start >= SKB_WITH_OVERHEAD(SK_MEM_QUANTUM)) { + tcp_collapse(sk, NULL, &tp->out_of_order_queue, + head, skb, start, end); + } else { + sum_tiny += range_truesize; + if (sum_tiny > sk->sk_rcvbuf >> 3) + return; + } goto new_range; } + range_truesize += skb->truesize; if (unlikely(before(TCP_SKB_CB(skb)->seq, start))) start = TCP_SKB_CB(skb)->seq; if (after(TCP_SKB_CB(skb)->end_seq, end)) From 94623c7463f3424776408df2733012c42b52395a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 23 Jul 2018 09:28:20 -0700 Subject: [PATCH 301/970] tcp: call tcp_drop() from tcp_data_queue_ofo() [ Upstream commit 8541b21e781a22dce52a74fef0b9bed00404a1cd ] In order to be able to give better diagnostics and detect malicious traffic, we need to have better sk->sk_drops tracking. Fixes: 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue") Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Yuchung Cheng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d0c47c2fefd2..44d136fd2af5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4517,7 +4517,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) /* All the bits are present. Drop. */ NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOMERGE); - __kfree_skb(skb); + tcp_drop(sk, skb); skb = NULL; tcp_dsack_set(sk, seq, end_seq); goto add_sack; @@ -4536,7 +4536,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb1)->end_seq); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOMERGE); - __kfree_skb(skb1); + tcp_drop(sk, skb1); goto merge_right; } } else if (tcp_try_coalesce(sk, skb1, skb, &fragstolen)) { From b0bd06a4757e4356569c2a73bc535f090dec5fa8 Mon Sep 17 00:00:00 2001 From: Lubomir Rintel Date: Tue, 10 Jul 2018 08:28:49 +0200 Subject: [PATCH 302/970] usb: cdc_acm: Add quirk for Castles VEGA3000 commit 1445cbe476fc3dd09c0b380b206526a49403c071 upstream. The device (a POS terminal) implements CDC ACM, but has not union descriptor. Signed-off-by: Lubomir Rintel Acked-by: Oliver Neukum Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-acm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 08bef18372ea..a9c950d29ce2 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1785,6 +1785,9 @@ static const struct usb_device_id acm_ids[] = { { USB_DEVICE(0x09d8, 0x0320), /* Elatec GmbH TWN3 */ .driver_info = NO_UNION_NORMAL, /* has misplaced union descriptor */ }, + { USB_DEVICE(0x0ca6, 0xa050), /* Castles VEGA3000 */ + .driver_info = NO_UNION_NORMAL, /* reports zero length descriptor */ + }, { USB_DEVICE(0x2912, 0x0001), /* ATOL FPrint */ .driver_info = CLEAR_HALT_CONDITIONS, From e2996cf59ebf6564002f6bdf9ebb16336cae62e1 Mon Sep 17 00:00:00 2001 From: Bin Liu Date: Thu, 19 Jul 2018 14:39:37 -0500 Subject: [PATCH 303/970] usb: core: handle hub C_PORT_OVER_CURRENT condition commit 249a32b7eeb3edb6897dd38f89651a62163ac4ed upstream. Based on USB2.0 Spec Section 11.12.5, "If a hub has per-port power switching and per-port current limiting, an over-current on one port may still cause the power on another port to fall below specific minimums. In this case, the affected port is placed in the Power-Off state and C_PORT_OVER_CURRENT is set for the port, but PORT_OVER_CURRENT is not set." so let's check C_PORT_OVER_CURRENT too for over current condition. Fixes: 08d1dec6f405 ("usb:hub set hub->change_bits when over-current happens") Cc: Tested-by: Alessandro Antenucci Signed-off-by: Bin Liu Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 8bf0090218dd..bdb19db542a4 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1139,10 +1139,14 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) if (!udev || udev->state == USB_STATE_NOTATTACHED) { /* Tell hub_wq to disconnect the device or - * check for a new connection + * check for a new connection or over current condition. + * Based on USB2.0 Spec Section 11.12.5, + * C_PORT_OVER_CURRENT could be set while + * PORT_OVER_CURRENT is not. So check for any of them. */ if (udev || (portstatus & USB_PORT_STAT_CONNECTION) || - (portstatus & USB_PORT_STAT_OVERCURRENT)) + (portstatus & USB_PORT_STAT_OVERCURRENT) || + (portchange & USB_PORT_STAT_C_OVERCURRENT)) set_bit(port1, hub->change_bits); } else if (portstatus & USB_PORT_STAT_ENABLE) { From 9e10043b6bdcc1a991a029de8a0bc745950345c6 Mon Sep 17 00:00:00 2001 From: Jerry Zhang Date: Mon, 2 Jul 2018 12:48:08 -0700 Subject: [PATCH 304/970] usb: gadget: f_fs: Only return delayed status when len is 0 commit 4d644abf25698362bd33d17c9ddc8f7122c30f17 upstream. Commit 1b9ba000 ("Allow function drivers to pause control transfers") states that USB_GADGET_DELAYED_STATUS is only supported if data phase is 0 bytes. It seems that when the length is not 0 bytes, there is no need to explicitly delay the data stage since the transfer is not completed until the user responds. However, when the length is 0, there is no data stage and the transfer is finished once setup() returns, hence there is a need to explicitly delay completion. This manifests as the following bugs: Prior to 946ef68ad4e4 ('Let setup() return USB_GADGET_DELAYED_STATUS'), when setup is 0 bytes, ffs would require user to queue a 0 byte request in order to clear setup state. However, that 0 byte request was actually not needed and would hang and cause errors in other setup requests. After the above commit, 0 byte setups work since the gadget now accepts empty queues to ep0 to clear the delay, but all other setups hang. Fixes: 946ef68ad4e4 ("Let setup() return USB_GADGET_DELAYED_STATUS") Signed-off-by: Jerry Zhang Cc: stable Acked-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_fs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index af72224f8ba2..04eb64381d92 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -3243,7 +3243,7 @@ static int ffs_func_setup(struct usb_function *f, __ffs_event_add(ffs, FUNCTIONFS_SETUP); spin_unlock_irqrestore(&ffs->ev.waitq.lock, flags); - return USB_GADGET_DELAYED_STATUS; + return creq->wLength == 0 ? USB_GADGET_DELAYED_STATUS : 0; } static bool ffs_func_req_match(struct usb_function *f, From bf0070e2f56eb3da708978da5dab5d36bbed1bf6 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 10 Jul 2018 14:51:33 +0200 Subject: [PATCH 305/970] driver core: Partially revert "driver core: correct device's shutdown order" commit 722e5f2b1eec7de61117b7c0a7914761e3da2eda upstream. Commit 52cdbdd49853 (driver core: correct device's shutdown order) introduced a regression by breaking device shutdown on some systems. Namely, the devices_kset_move_last() call in really_probe() added by that commit is a mistake as it may cause parents to follow children in the devices_kset list which then causes shutdown to fail. For example, if a device has children before really_probe() is called for it (which is not uncommon), that call will cause it to be reordered after the children in the devices_kset list and the ordering of that list will not reflect the correct device shutdown order any more. Also it causes the devices_kset list to be constantly reordered until all drivers have been probed which is totally pointless overhead in the majority of cases and it only covered an issue with system shutdown, while system-wide suspend/resume potentially had the same issue on the affected platforms (which was not covered). Moreover, the shutdown issue originally addressed by the change in really_probe() made by commit 52cdbdd49853 is not present in 4.18-rc any more, since dra7 started to use the sdhci-omap driver which doesn't disable any regulators during shutdown, so the really_probe() part of commit 52cdbdd49853 can be safely reverted. [The original issue was related to the omap_hsmmc driver used by dra7 previously.] For the above reasons, revert the really_probe() modifications made by commit 52cdbdd49853. The other code changes made by commit 52cdbdd49853 are useful and they need not be reverted. Fixes: 52cdbdd49853 (driver core: correct device's shutdown order) Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@mail.gmail.com/ Reported-by: Pingfan Liu Tested-by: Pingfan Liu Reviewed-by: Kishon Vijay Abraham I Signed-off-by: Rafael J. Wysocki Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/base/dd.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index d76cd97a98b6..ee25a69630c3 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -363,14 +363,6 @@ static int really_probe(struct device *dev, struct device_driver *drv) goto probe_failed; } - /* - * Ensure devices are listed in devices_kset in correct order - * It's important to move Dev to the end of devices_kset before - * calling .probe, because it could be recursive and parent Dev - * should always go first - */ - devices_kset_move_last(dev); - if (dev->bus->probe) { ret = dev->bus->probe(dev); if (ret) From de2219a86c0ba1d07e16e81011b0749b867f76c5 Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Tue, 7 Feb 2017 17:01:14 +0200 Subject: [PATCH 306/970] can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK commit 32852c561bffd613d4ed7ec464b1e03e1b7b6c5c upstream. If the device gets into a state where RXNEMP (RX FIFO not empty) interrupt is asserted without RXOK (new frame received successfully) interrupt being asserted, xcan_rx_poll() will continue to try to clear RXNEMP without actually reading frames from RX FIFO. If the RX FIFO is not empty, the interrupt will not be cleared and napi_schedule() will just be called again. This situation can occur when: (a) xcan_rx() returns without reading RX FIFO due to an error condition. The code tries to clear both RXOK and RXNEMP but RXNEMP will not clear due to a frame still being in the FIFO. The frame will never be read from the FIFO as RXOK is no longer set. (b) A frame is received between xcan_rx_poll() reading interrupt status and clearing RXOK. RXOK will be cleared, but RXNEMP will again remain set as the new message is still in the FIFO. I'm able to trigger case (b) by flooding the bus with frames under load. There does not seem to be any benefit in using both RXNEMP and RXOK in the way the driver does, and the polling example in the reference manual (UG585 v1.10 18.3.7 Read Messages from RxFIFO) also says that either RXOK or RXNEMP can be used for detecting incoming messages. Fix the issue and simplify the RX processing by only using RXNEMP without RXOK. Tested with the integrated CAN on Zynq-7000 SoC. Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/xilinx_can.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index c71a03593595..556ef6481117 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -101,7 +101,7 @@ enum xcan_reg { #define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\ XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \ XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \ - XCAN_IXR_ARBLST_MASK | XCAN_IXR_RXOK_MASK) + XCAN_IXR_ARBLST_MASK) /* CAN register bit shift - XCAN___SHIFT */ #define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */ @@ -709,15 +709,7 @@ static int xcan_rx_poll(struct napi_struct *napi, int quota) isr = priv->read_reg(priv, XCAN_ISR_OFFSET); while ((isr & XCAN_IXR_RXNEMP_MASK) && (work_done < quota)) { - if (isr & XCAN_IXR_RXOK_MASK) { - priv->write_reg(priv, XCAN_ICR_OFFSET, - XCAN_IXR_RXOK_MASK); - work_done += xcan_rx(ndev); - } else { - priv->write_reg(priv, XCAN_ICR_OFFSET, - XCAN_IXR_RXNEMP_MASK); - break; - } + work_done += xcan_rx(ndev); priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_RXNEMP_MASK); isr = priv->read_reg(priv, XCAN_ISR_OFFSET); } @@ -728,7 +720,7 @@ static int xcan_rx_poll(struct napi_struct *napi, int quota) if (work_done < quota) { napi_complete(napi); ier = priv->read_reg(priv, XCAN_IER_OFFSET); - ier |= (XCAN_IXR_RXOK_MASK | XCAN_IXR_RXNEMP_MASK); + ier |= XCAN_IXR_RXNEMP_MASK; priv->write_reg(priv, XCAN_IER_OFFSET, ier); } return work_done; @@ -800,9 +792,9 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id) } /* Check for the type of receive interrupt and Processing it */ - if (isr & (XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK)) { + if (isr & XCAN_IXR_RXNEMP_MASK) { ier = priv->read_reg(priv, XCAN_IER_OFFSET); - ier &= ~(XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK); + ier &= ~XCAN_IXR_RXNEMP_MASK; priv->write_reg(priv, XCAN_IER_OFFSET, ier); napi_schedule(&priv->napi); } From 1fadfbd9f593903e84f66d4e6902193886c5de1c Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Thu, 17 May 2018 15:41:19 +0300 Subject: [PATCH 307/970] can: xilinx_can: fix power management handling commit 8ebd83bdb027f29870d96649dba18b91581ea829 upstream. There are several issues with the suspend/resume handling code of the driver: - The device is attached and detached in the runtime_suspend() and runtime_resume() callbacks if the interface is running. However, during xcan_chip_start() the interface is considered running, causing the resume handler to incorrectly call netif_start_queue() at the beginning of xcan_chip_start(), and on xcan_chip_start() error return the suspend handler detaches the device leaving the user unable to bring-up the device anymore. - The device is not brought properly up on system resume. A reset is done and the code tries to determine the bus state after that. However, after reset the device is always in Configuration mode (down), so the state checking code does not make sense and communication will also not work. - The suspend callback tries to set the device to sleep mode (low-power mode which monitors the bus and brings the device back to normal mode on activity), but then immediately disables the clocks (possibly before the device reaches the sleep mode), which does not make sense to me. If a clean shutdown is wanted before disabling clocks, we can just bring it down completely instead of only sleep mode. Reorganize the PM code so that only the clock logic remains in the runtime PM callbacks and the system PM callbacks contain the device bring-up/down logic. This makes calling the runtime PM callbacks during e.g. xcan_chip_start() safe. The system PM callbacks now simply call common code to start/stop the HW if the interface was running, replacing the broken code from before. xcan_chip_stop() is updated to use the common reset code so that it will wait for the reset to complete. Reset also disables all interrupts so do not do that separately. Also, the device_may_wakeup() checks are removed as the driver does not have wakeup support. Tested on Zynq-7000 integrated CAN. Signed-off-by: Anssi Hannula Cc: Michal Simek Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/xilinx_can.c | 69 +++++++++++++++--------------------- 1 file changed, 28 insertions(+), 41 deletions(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 556ef6481117..363bc8b1bc03 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -811,13 +811,9 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id) static void xcan_chip_stop(struct net_device *ndev) { struct xcan_priv *priv = netdev_priv(ndev); - u32 ier; /* Disable interrupts and leave the can in configuration mode */ - ier = priv->read_reg(priv, XCAN_IER_OFFSET); - ier &= ~XCAN_INTR_ALL; - priv->write_reg(priv, XCAN_IER_OFFSET, ier); - priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK); + set_reset_mode(ndev); priv->can.state = CAN_STATE_STOPPED; } @@ -950,10 +946,15 @@ static const struct net_device_ops xcan_netdev_ops = { */ static int __maybe_unused xcan_suspend(struct device *dev) { - if (!device_may_wakeup(dev)) - return pm_runtime_force_suspend(dev); + struct net_device *ndev = dev_get_drvdata(dev); - return 0; + if (netif_running(ndev)) { + netif_stop_queue(ndev); + netif_device_detach(ndev); + xcan_chip_stop(ndev); + } + + return pm_runtime_force_suspend(dev); } /** @@ -965,11 +966,27 @@ static int __maybe_unused xcan_suspend(struct device *dev) */ static int __maybe_unused xcan_resume(struct device *dev) { - if (!device_may_wakeup(dev)) - return pm_runtime_force_resume(dev); + struct net_device *ndev = dev_get_drvdata(dev); + int ret; + + ret = pm_runtime_force_resume(dev); + if (ret) { + dev_err(dev, "pm_runtime_force_resume failed on resume\n"); + return ret; + } + + if (netif_running(ndev)) { + ret = xcan_chip_start(ndev); + if (ret) { + dev_err(dev, "xcan_chip_start failed on resume\n"); + return ret; + } + + netif_device_attach(ndev); + netif_start_queue(ndev); + } return 0; - } /** @@ -984,14 +1001,6 @@ static int __maybe_unused xcan_runtime_suspend(struct device *dev) struct net_device *ndev = dev_get_drvdata(dev); struct xcan_priv *priv = netdev_priv(ndev); - if (netif_running(ndev)) { - netif_stop_queue(ndev); - netif_device_detach(ndev); - } - - priv->write_reg(priv, XCAN_MSR_OFFSET, XCAN_MSR_SLEEP_MASK); - priv->can.state = CAN_STATE_SLEEPING; - clk_disable_unprepare(priv->bus_clk); clk_disable_unprepare(priv->can_clk); @@ -1010,7 +1019,6 @@ static int __maybe_unused xcan_runtime_resume(struct device *dev) struct net_device *ndev = dev_get_drvdata(dev); struct xcan_priv *priv = netdev_priv(ndev); int ret; - u32 isr, status; ret = clk_prepare_enable(priv->bus_clk); if (ret) { @@ -1024,27 +1032,6 @@ static int __maybe_unused xcan_runtime_resume(struct device *dev) return ret; } - priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK); - isr = priv->read_reg(priv, XCAN_ISR_OFFSET); - status = priv->read_reg(priv, XCAN_SR_OFFSET); - - if (netif_running(ndev)) { - if (isr & XCAN_IXR_BSOFF_MASK) { - priv->can.state = CAN_STATE_BUS_OFF; - priv->write_reg(priv, XCAN_SRR_OFFSET, - XCAN_SRR_RESET_MASK); - } else if ((status & XCAN_SR_ESTAT_MASK) == - XCAN_SR_ESTAT_MASK) { - priv->can.state = CAN_STATE_ERROR_PASSIVE; - } else if (status & XCAN_SR_ERRWRN_MASK) { - priv->can.state = CAN_STATE_ERROR_WARNING; - } else { - priv->can.state = CAN_STATE_ERROR_ACTIVE; - } - netif_device_attach(ndev); - netif_start_queue(ndev); - } - return 0; } From c98f577204b4bdc46453c2f64fa760fa17721046 Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Wed, 8 Feb 2017 13:13:40 +0200 Subject: [PATCH 308/970] can: xilinx_can: fix recovery from error states not being propagated commit 877e0b75947e2c7acf5624331bb17ceb093c98ae upstream. The xilinx_can driver contains no mechanism for propagating recovery from CAN_STATE_ERROR_WARNING and CAN_STATE_ERROR_PASSIVE. Add such a mechanism by factoring the handling of XCAN_STATE_ERROR_PASSIVE and XCAN_STATE_ERROR_WARNING out of xcan_err_interrupt and checking for recovery after RX and TX if the interface is in one of those states. Tested with the integrated CAN on Zynq-7000 SoC. Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/xilinx_can.c | 155 ++++++++++++++++++++++++++++------- 1 file changed, 127 insertions(+), 28 deletions(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 363bc8b1bc03..9e64929d1e9f 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -2,6 +2,7 @@ * * Copyright (C) 2012 - 2014 Xilinx, Inc. * Copyright (C) 2009 PetaLogix. All rights reserved. + * Copyright (C) 2017 Sandvik Mining and Construction Oy * * Description: * This driver is developed for Axi CAN IP and for Zynq CANPS Controller. @@ -529,6 +530,123 @@ static int xcan_rx(struct net_device *ndev) return 1; } +/** + * xcan_current_error_state - Get current error state from HW + * @ndev: Pointer to net_device structure + * + * Checks the current CAN error state from the HW. Note that this + * only checks for ERROR_PASSIVE and ERROR_WARNING. + * + * Return: + * ERROR_PASSIVE or ERROR_WARNING if either is active, ERROR_ACTIVE + * otherwise. + */ +static enum can_state xcan_current_error_state(struct net_device *ndev) +{ + struct xcan_priv *priv = netdev_priv(ndev); + u32 status = priv->read_reg(priv, XCAN_SR_OFFSET); + + if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK) + return CAN_STATE_ERROR_PASSIVE; + else if (status & XCAN_SR_ERRWRN_MASK) + return CAN_STATE_ERROR_WARNING; + else + return CAN_STATE_ERROR_ACTIVE; +} + +/** + * xcan_set_error_state - Set new CAN error state + * @ndev: Pointer to net_device structure + * @new_state: The new CAN state to be set + * @cf: Error frame to be populated or NULL + * + * Set new CAN error state for the device, updating statistics and + * populating the error frame if given. + */ +static void xcan_set_error_state(struct net_device *ndev, + enum can_state new_state, + struct can_frame *cf) +{ + struct xcan_priv *priv = netdev_priv(ndev); + u32 ecr = priv->read_reg(priv, XCAN_ECR_OFFSET); + u32 txerr = ecr & XCAN_ECR_TEC_MASK; + u32 rxerr = (ecr & XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT; + + priv->can.state = new_state; + + if (cf) { + cf->can_id |= CAN_ERR_CRTL; + cf->data[6] = txerr; + cf->data[7] = rxerr; + } + + switch (new_state) { + case CAN_STATE_ERROR_PASSIVE: + priv->can.can_stats.error_passive++; + if (cf) + cf->data[1] = (rxerr > 127) ? + CAN_ERR_CRTL_RX_PASSIVE : + CAN_ERR_CRTL_TX_PASSIVE; + break; + case CAN_STATE_ERROR_WARNING: + priv->can.can_stats.error_warning++; + if (cf) + cf->data[1] |= (txerr > rxerr) ? + CAN_ERR_CRTL_TX_WARNING : + CAN_ERR_CRTL_RX_WARNING; + break; + case CAN_STATE_ERROR_ACTIVE: + if (cf) + cf->data[1] |= CAN_ERR_CRTL_ACTIVE; + break; + default: + /* non-ERROR states are handled elsewhere */ + WARN_ON(1); + break; + } +} + +/** + * xcan_update_error_state_after_rxtx - Update CAN error state after RX/TX + * @ndev: Pointer to net_device structure + * + * If the device is in a ERROR-WARNING or ERROR-PASSIVE state, check if + * the performed RX/TX has caused it to drop to a lesser state and set + * the interface state accordingly. + */ +static void xcan_update_error_state_after_rxtx(struct net_device *ndev) +{ + struct xcan_priv *priv = netdev_priv(ndev); + enum can_state old_state = priv->can.state; + enum can_state new_state; + + /* changing error state due to successful frame RX/TX can only + * occur from these states + */ + if (old_state != CAN_STATE_ERROR_WARNING && + old_state != CAN_STATE_ERROR_PASSIVE) + return; + + new_state = xcan_current_error_state(ndev); + + if (new_state != old_state) { + struct sk_buff *skb; + struct can_frame *cf; + + skb = alloc_can_err_skb(ndev, &cf); + + xcan_set_error_state(ndev, new_state, skb ? cf : NULL); + + if (skb) { + struct net_device_stats *stats = &ndev->stats; + + stats->rx_packets++; + stats->rx_bytes += cf->can_dlc; + netif_rx(skb); + } + } +} + /** * xcan_err_interrupt - error frame Isr * @ndev: net_device pointer @@ -544,16 +662,12 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr) struct net_device_stats *stats = &ndev->stats; struct can_frame *cf; struct sk_buff *skb; - u32 err_status, status, txerr = 0, rxerr = 0; + u32 err_status; skb = alloc_can_err_skb(ndev, &cf); err_status = priv->read_reg(priv, XCAN_ESR_OFFSET); priv->write_reg(priv, XCAN_ESR_OFFSET, err_status); - txerr = priv->read_reg(priv, XCAN_ECR_OFFSET) & XCAN_ECR_TEC_MASK; - rxerr = ((priv->read_reg(priv, XCAN_ECR_OFFSET) & - XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT); - status = priv->read_reg(priv, XCAN_SR_OFFSET); if (isr & XCAN_IXR_BSOFF_MASK) { priv->can.state = CAN_STATE_BUS_OFF; @@ -563,28 +677,10 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr) can_bus_off(ndev); if (skb) cf->can_id |= CAN_ERR_BUSOFF; - } else if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK) { - priv->can.state = CAN_STATE_ERROR_PASSIVE; - priv->can.can_stats.error_passive++; - if (skb) { - cf->can_id |= CAN_ERR_CRTL; - cf->data[1] = (rxerr > 127) ? - CAN_ERR_CRTL_RX_PASSIVE : - CAN_ERR_CRTL_TX_PASSIVE; - cf->data[6] = txerr; - cf->data[7] = rxerr; - } - } else if (status & XCAN_SR_ERRWRN_MASK) { - priv->can.state = CAN_STATE_ERROR_WARNING; - priv->can.can_stats.error_warning++; - if (skb) { - cf->can_id |= CAN_ERR_CRTL; - cf->data[1] |= (txerr > rxerr) ? - CAN_ERR_CRTL_TX_WARNING : - CAN_ERR_CRTL_RX_WARNING; - cf->data[6] = txerr; - cf->data[7] = rxerr; - } + } else { + enum can_state new_state = xcan_current_error_state(ndev); + + xcan_set_error_state(ndev, new_state, skb ? cf : NULL); } /* Check for Arbitration lost interrupt */ @@ -714,8 +810,10 @@ static int xcan_rx_poll(struct napi_struct *napi, int quota) isr = priv->read_reg(priv, XCAN_ISR_OFFSET); } - if (work_done) + if (work_done) { can_led_event(ndev, CAN_LED_EVENT_RX); + xcan_update_error_state_after_rxtx(ndev); + } if (work_done < quota) { napi_complete(napi); @@ -746,6 +844,7 @@ static void xcan_tx_interrupt(struct net_device *ndev, u32 isr) isr = priv->read_reg(priv, XCAN_ISR_OFFSET); } can_led_event(ndev, CAN_LED_EVENT_TX); + xcan_update_error_state_after_rxtx(ndev); netif_wake_queue(ndev); } From 1fd9fa57c1d97938b877018324809bafd59e15f5 Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Tue, 7 Feb 2017 13:23:04 +0200 Subject: [PATCH 309/970] can: xilinx_can: fix device dropping off bus on RX overrun commit 2574fe54515ed3487405de329e4e9f13d7098c10 upstream. The xilinx_can driver performs a software reset when an RX overrun is detected. This causes the device to enter Configuration mode where no messages are received or transmitted. The documentation does not mention any need to perform a reset on an RX overrun, and testing by inducing an RX overflow also indicated that the device continues to work just fine without a reset. Remove the software reset. Tested with the integrated CAN on Zynq-7000 SoC. Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/xilinx_can.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 9e64929d1e9f..88c7c2f5901e 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -696,7 +696,6 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr) if (isr & XCAN_IXR_RXOFLW_MASK) { stats->rx_over_errors++; stats->rx_errors++; - priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK); if (skb) { cf->can_id |= CAN_ERR_CRTL; cf->data[1] |= CAN_ERR_CRTL_RX_OVERFLOW; From bee7ff7eaade2f51d0a4b485642e06b6d2b2d125 Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Thu, 23 Feb 2017 14:50:03 +0200 Subject: [PATCH 310/970] can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting commit 620050d9c2be15c47017ba95efe59e0832e99a56 upstream. The xilinx_can driver assumes that the TXOK interrupt only clears after it has been acknowledged as many times as there have been successfully sent frames. However, the documentation does not mention such behavior, instead saying just that the interrupt is cleared when the clear bit is set. Similarly, testing seems to also suggest that it is immediately cleared regardless of the amount of frames having been sent. Performing some heavy TX load and then going back to idle has the tx_head drifting further away from tx_tail over time, steadily reducing the amount of frames the driver keeps in the TX FIFO (but not to zero, as the TXOK interrupt always frees up space for 1 frame from the driver's perspective, so frames continue to be sent) and delaying the local echo frames. The TX FIFO tracking is also otherwise buggy as it does not account for TX FIFO being cleared after software resets, causing BUG!, TX FIFO full when queue awake! messages to be output. There does not seem to be any way to accurately track the state of the TX FIFO for local echo support while using the full TX FIFO. The Zynq version of the HW (but not the soft-AXI version) has watermark programming support and with it an additional TX-FIFO-empty interrupt bit. Modify the driver to only put 1 frame into TX FIFO at a time on soft-AXI and 2 frames at a time on Zynq. On Zynq the TXFEMP interrupt bit is used to detect whether 1 or 2 frames have been sent at interrupt processing time. Tested with the integrated CAN on Zynq-7000 SoC. The 1-frame-FIFO mode was also tested. An alternative way to solve this would be to drop local echo support but keep using the full TX FIFO. v2: Add FIFO space check before TX queue wake with locking to synchronize with queue stop. This avoids waking the queue when xmit() had just filled it. v3: Keep local echo support and reduce the amount of frames in FIFO instead as suggested by Marc Kleine-Budde. Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/xilinx_can.c | 139 +++++++++++++++++++++++++++++++---- 1 file changed, 123 insertions(+), 16 deletions(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 88c7c2f5901e..ad36c68e55e0 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -26,8 +26,10 @@ #include #include #include +#include #include #include +#include #include #include #include @@ -119,6 +121,7 @@ enum xcan_reg { /** * struct xcan_priv - This definition define CAN driver instance * @can: CAN private data structure. + * @tx_lock: Lock for synchronizing TX interrupt handling * @tx_head: Tx CAN packets ready to send on the queue * @tx_tail: Tx CAN packets successfully sended on the queue * @tx_max: Maximum number packets the driver can send @@ -133,6 +136,7 @@ enum xcan_reg { */ struct xcan_priv { struct can_priv can; + spinlock_t tx_lock; unsigned int tx_head; unsigned int tx_tail; unsigned int tx_max; @@ -160,6 +164,11 @@ static const struct can_bittiming_const xcan_bittiming_const = { .brp_inc = 1, }; +#define XCAN_CAP_WATERMARK 0x0001 +struct xcan_devtype_data { + unsigned int caps; +}; + /** * xcan_write_reg_le - Write a value to the device register little endian * @priv: Driver private data structure @@ -239,6 +248,10 @@ static int set_reset_mode(struct net_device *ndev) usleep_range(500, 10000); } + /* reset clears FIFOs */ + priv->tx_head = 0; + priv->tx_tail = 0; + return 0; } @@ -393,6 +406,7 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev) struct net_device_stats *stats = &ndev->stats; struct can_frame *cf = (struct can_frame *)skb->data; u32 id, dlc, data[2] = {0, 0}; + unsigned long flags; if (can_dropped_invalid_skb(ndev, skb)) return NETDEV_TX_OK; @@ -440,6 +454,9 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev) data[1] = be32_to_cpup((__be32 *)(cf->data + 4)); can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max); + + spin_lock_irqsave(&priv->tx_lock, flags); + priv->tx_head++; /* Write the Frame to Xilinx CAN TX FIFO */ @@ -455,10 +472,16 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev) stats->tx_bytes += cf->can_dlc; } + /* Clear TX-FIFO-empty interrupt for xcan_tx_interrupt() */ + if (priv->tx_max > 1) + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXFEMP_MASK); + /* Check if the TX buffer is full */ if ((priv->tx_head - priv->tx_tail) == priv->tx_max) netif_stop_queue(ndev); + spin_unlock_irqrestore(&priv->tx_lock, flags); + return NETDEV_TX_OK; } @@ -832,19 +855,71 @@ static void xcan_tx_interrupt(struct net_device *ndev, u32 isr) { struct xcan_priv *priv = netdev_priv(ndev); struct net_device_stats *stats = &ndev->stats; + unsigned int frames_in_fifo; + int frames_sent = 1; /* TXOK => at least 1 frame was sent */ + unsigned long flags; + int retries = 0; - while ((priv->tx_head - priv->tx_tail > 0) && - (isr & XCAN_IXR_TXOK_MASK)) { + /* Synchronize with xmit as we need to know the exact number + * of frames in the FIFO to stay in sync due to the TXFEMP + * handling. + * This also prevents a race between netif_wake_queue() and + * netif_stop_queue(). + */ + spin_lock_irqsave(&priv->tx_lock, flags); + + frames_in_fifo = priv->tx_head - priv->tx_tail; + + if (WARN_ON_ONCE(frames_in_fifo == 0)) { + /* clear TXOK anyway to avoid getting back here */ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); + spin_unlock_irqrestore(&priv->tx_lock, flags); + return; + } + + /* Check if 2 frames were sent (TXOK only means that at least 1 + * frame was sent). + */ + if (frames_in_fifo > 1) { + WARN_ON(frames_in_fifo > priv->tx_max); + + /* Synchronize TXOK and isr so that after the loop: + * (1) isr variable is up-to-date at least up to TXOK clear + * time. This avoids us clearing a TXOK of a second frame + * but not noticing that the FIFO is now empty and thus + * marking only a single frame as sent. + * (2) No TXOK is left. Having one could mean leaving a + * stray TXOK as we might process the associated frame + * via TXFEMP handling as we read TXFEMP *after* TXOK + * clear to satisfy (1). + */ + while ((isr & XCAN_IXR_TXOK_MASK) && !WARN_ON(++retries == 100)) { + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); + isr = priv->read_reg(priv, XCAN_ISR_OFFSET); + } + + if (isr & XCAN_IXR_TXFEMP_MASK) { + /* nothing in FIFO anymore */ + frames_sent = frames_in_fifo; + } + } else { + /* single frame in fifo, just clear TXOK */ + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); + } + + while (frames_sent--) { can_get_echo_skb(ndev, priv->tx_tail % priv->tx_max); priv->tx_tail++; stats->tx_packets++; - isr = priv->read_reg(priv, XCAN_ISR_OFFSET); } + + netif_wake_queue(ndev); + + spin_unlock_irqrestore(&priv->tx_lock, flags); + can_led_event(ndev, CAN_LED_EVENT_TX); xcan_update_error_state_after_rxtx(ndev); - netif_wake_queue(ndev); } /** @@ -1138,6 +1213,18 @@ static const struct dev_pm_ops xcan_dev_pm_ops = { SET_RUNTIME_PM_OPS(xcan_runtime_suspend, xcan_runtime_resume, NULL) }; +static const struct xcan_devtype_data xcan_zynq_data = { + .caps = XCAN_CAP_WATERMARK, +}; + +/* Match table for OF platform binding */ +static const struct of_device_id xcan_of_match[] = { + { .compatible = "xlnx,zynq-can-1.0", .data = &xcan_zynq_data }, + { .compatible = "xlnx,axi-can-1.00.a", }, + { /* end of list */ }, +}; +MODULE_DEVICE_TABLE(of, xcan_of_match); + /** * xcan_probe - Platform registration call * @pdev: Handle to the platform device structure @@ -1152,8 +1239,10 @@ static int xcan_probe(struct platform_device *pdev) struct resource *res; /* IO mem resources */ struct net_device *ndev; struct xcan_priv *priv; + const struct of_device_id *of_id; + int caps = 0; void __iomem *addr; - int ret, rx_max, tx_max; + int ret, rx_max, tx_max, tx_fifo_depth; /* Get the virtual base address for the device */ res = platform_get_resource(pdev, IORESOURCE_MEM, 0); @@ -1163,7 +1252,8 @@ static int xcan_probe(struct platform_device *pdev) goto err; } - ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", &tx_max); + ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", + &tx_fifo_depth); if (ret < 0) goto err; @@ -1171,6 +1261,30 @@ static int xcan_probe(struct platform_device *pdev) if (ret < 0) goto err; + of_id = of_match_device(xcan_of_match, &pdev->dev); + if (of_id) { + const struct xcan_devtype_data *devtype_data = of_id->data; + + if (devtype_data) + caps = devtype_data->caps; + } + + /* There is no way to directly figure out how many frames have been + * sent when the TXOK interrupt is processed. If watermark programming + * is supported, we can have 2 frames in the FIFO and use TXFEMP + * to determine if 1 or 2 frames have been sent. + * Theoretically we should be able to use TXFWMEMP to determine up + * to 3 frames, but it seems that after putting a second frame in the + * FIFO, with watermark at 2 frames, it can happen that TXFWMEMP (less + * than 2 frames in FIFO) is set anyway with no TXOK (a frame was + * sent), which is not a sensible state - possibly TXFWMEMP is not + * completely synchronized with the rest of the bits? + */ + if (caps & XCAN_CAP_WATERMARK) + tx_max = min(tx_fifo_depth, 2); + else + tx_max = 1; + /* Create a CAN device instance */ ndev = alloc_candev(sizeof(struct xcan_priv), tx_max); if (!ndev) @@ -1185,6 +1299,7 @@ static int xcan_probe(struct platform_device *pdev) CAN_CTRLMODE_BERR_REPORTING; priv->reg_base = addr; priv->tx_max = tx_max; + spin_lock_init(&priv->tx_lock); /* Get IRQ for the device */ ndev->irq = platform_get_irq(pdev, 0); @@ -1249,9 +1364,9 @@ static int xcan_probe(struct platform_device *pdev) pm_runtime_put(&pdev->dev); - netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n", + netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth: actual %d, using %d\n", priv->reg_base, ndev->irq, priv->can.clock.freq, - priv->tx_max); + tx_fifo_depth, priv->tx_max); return 0; @@ -1285,14 +1400,6 @@ static int xcan_remove(struct platform_device *pdev) return 0; } -/* Match table for OF platform binding */ -static const struct of_device_id xcan_of_match[] = { - { .compatible = "xlnx,zynq-can-1.0", }, - { .compatible = "xlnx,axi-can-1.00.a", }, - { /* end of list */ }, -}; -MODULE_DEVICE_TABLE(of, xcan_of_match); - static struct platform_driver xcan_driver = { .probe = xcan_probe, .remove = xcan_remove, From 9f7308434ed6922545e00562a9a6ccd0aa3759da Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Mon, 26 Feb 2018 14:39:59 +0200 Subject: [PATCH 311/970] can: xilinx_can: fix incorrect clear of non-processed interrupts commit 2f4f0f338cf453bfcdbcf089e177c16f35f023c8 upstream. xcan_interrupt() clears ERROR|RXOFLV|BSOFF|ARBLST interrupts if any of them is asserted. This does not take into account that some of them could have been asserted between interrupt status read and interrupt clear, therefore clearing them without handling them. Fix the code to only clear those interrupts that it knows are asserted and therefore going to be processed in xcan_err_interrupt(). Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula Cc: Michal Simek Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/xilinx_can.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index ad36c68e55e0..8a6264c10777 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -938,6 +938,7 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id) struct net_device *ndev = (struct net_device *)dev_id; struct xcan_priv *priv = netdev_priv(ndev); u32 isr, ier; + u32 isr_errors; /* Get the interrupt status from Xilinx CAN */ isr = priv->read_reg(priv, XCAN_ISR_OFFSET); @@ -956,11 +957,10 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id) xcan_tx_interrupt(ndev, isr); /* Check for the type of error interrupt and Processing it */ - if (isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK | - XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK)) { - priv->write_reg(priv, XCAN_ICR_OFFSET, (XCAN_IXR_ERROR_MASK | - XCAN_IXR_RXOFLW_MASK | XCAN_IXR_BSOFF_MASK | - XCAN_IXR_ARBLST_MASK)); + isr_errors = isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK | + XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK); + if (isr_errors) { + priv->write_reg(priv, XCAN_ICR_OFFSET, isr_errors); xcan_err_interrupt(ndev, isr); } From b2019f0f7021b5ca9da3173ace7fcf97aa1d4923 Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Mon, 26 Feb 2018 14:27:13 +0200 Subject: [PATCH 312/970] can: xilinx_can: fix RX overflow interrupt not being enabled commit 83997997252f5d3fc7f04abc24a89600c2b504ab upstream. RX overflow interrupt (RXOFLW) is disabled even though xcan_interrupt() processes it. This means that an RX overflow interrupt will only be processed when another interrupt gets asserted (e.g. for RX/TX). Fix that by enabling the RXOFLW interrupt. Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula Cc: Michal Simek Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/xilinx_can.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 8a6264c10777..e680bab27dd7 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -104,7 +104,7 @@ enum xcan_reg { #define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\ XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \ XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \ - XCAN_IXR_ARBLST_MASK) + XCAN_IXR_RXOFLW_MASK | XCAN_IXR_ARBLST_MASK) /* CAN register bit shift - XCAN___SHIFT */ #define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */ From b1a1d9bdb1b5123924956f069920bf4b9c4b11c4 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 26 Jul 2018 10:13:22 +0200 Subject: [PATCH 313/970] turn off -Wattribute-alias Starting with gcc-8.1, we get a warning about all system call definitions, which use an alias between functions with incompatible prototypes, e.g.: In file included from ../mm/process_vm_access.c:19: ../include/linux/syscalls.h:211:18: warning: 'sys_process_vm_readv' alias between functions of incompatible types 'long int(pid_t, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)' {aka 'long int(int, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Wattribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^~~~~~~~~~~~~~~~~ ../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~~~~~~~~~~~~~ ../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec, ^~~~~~~~~~~~~~~ ../include/linux/syscalls.h:215:18: note: aliased declaration here asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ ^~~ ../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^~~~~~~~~~~~~~~~~ ../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~~~~~~~~~~~~~ ../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec, This is really noisy and does not indicate a real problem. In the latest mainline kernel, this was addressed by commit bee20031772a ("disable -Wattribute-alias warning for SYSCALL_DEFINEx()"), which seems too invasive to backport. This takes a much simpler approach and just disables the warning across the kernel. Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/Makefile b/Makefile index 889c58e39928..db7f6eeeaa1a 100644 --- a/Makefile +++ b/Makefile @@ -635,6 +635,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning,frame-address,) KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation) KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow) KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context) +KBUILD_CFLAGS += $(call cc-disable-warning, attribute-alias) ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION KBUILD_CFLAGS += $(call cc-option,-ffunction-sections,) From b9dd13488acb680c5f141f5c077880317c9749cf Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 14 Dec 2017 15:32:41 -0800 Subject: [PATCH 314/970] exec: avoid gcc-8 warning for get_task_comm commit 3756f6401c302617c5e091081ca4d26ab604bec5 upstream. gcc-8 warns about using strncpy() with the source size as the limit: fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] This is indeed slightly suspicious, as it protects us from source arguments without NUL-termination, but does not guarantee that the destination is terminated. This keeps the strncpy() to ensure we have properly padded target buffer, but ensures that we use the correct length, by passing the actual length of the destination buffer as well as adding a build-time check to ensure it is exactly TASK_COMM_LEN. There are only 23 callsites which I all reviewed to ensure this is currently the case. We could get away with doing only the check or passing the right length, but it doesn't hurt to do both. Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de Signed-off-by: Arnd Bergmann Suggested-by: Kees Cook Acked-by: Kees Cook Acked-by: Ingo Molnar Cc: Alexander Viro Cc: Peter Zijlstra Cc: Serge Hallyn Cc: James Morris Cc: Aleksa Sarai Cc: "Eric W. Biederman" Cc: Frederic Weisbecker Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/exec.c | 7 +++---- include/linux/sched.h | 6 +++++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index b8c43be24751..fcd8642ef2d2 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1228,15 +1228,14 @@ static int de_thread(struct task_struct *tsk) return -EAGAIN; } -char *get_task_comm(char *buf, struct task_struct *tsk) +char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk) { - /* buf must be at least sizeof(tsk->comm) in size */ task_lock(tsk); - strncpy(buf, tsk->comm, sizeof(tsk->comm)); + strncpy(buf, tsk->comm, buf_size); task_unlock(tsk); return buf; } -EXPORT_SYMBOL_GPL(get_task_comm); +EXPORT_SYMBOL_GPL(__get_task_comm); /* * These functions flushes out all traces of the currently running executable diff --git a/include/linux/sched.h b/include/linux/sched.h index 5ebef8c86c26..1cc5723a7821 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2999,7 +2999,11 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from) { __set_task_comm(tsk, from, false); } -extern char *get_task_comm(char *to, struct task_struct *tsk); +extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk); +#define get_task_comm(buf, tsk) ({ \ + BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN); \ + __get_task_comm(buf, sizeof(buf), tsk); \ +}) #ifdef CONFIG_SMP void scheduler_ipi(void); From 94c67449c7550597e86f15cce923055f7b2c8e09 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 28 Jul 2018 07:49:14 +0200 Subject: [PATCH 315/970] Linux 4.9.116 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index db7f6eeeaa1a..a6b011778960 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 115 +SUBLEVEL = 116 EXTRAVERSION = NAME = Roaring Lionus From 19e28842d02a66ffbe9574215ffb283d9c6cb062 Mon Sep 17 00:00:00 2001 From: Donald Shanty III Date: Wed, 4 Jul 2018 15:50:47 +0000 Subject: [PATCH 316/970] Input: elan_i2c - add ACPI ID for lenovo ideapad 330 commit 938f45008d8bc391593c97508bc798cc95a52b9b upstream. This allows Elan driver to bind to the touchpad found in Lenovo Ideapad 330 series laptops. Signed-off-by: Donald Shanty III Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elan_i2c_core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/mouse/elan_i2c_core.c b/drivers/input/mouse/elan_i2c_core.c index 97f6e05cffce..a7515f5dd76c 100644 --- a/drivers/input/mouse/elan_i2c_core.c +++ b/drivers/input/mouse/elan_i2c_core.c @@ -1251,6 +1251,7 @@ static const struct acpi_device_id elan_acpi_id[] = { { "ELAN0611", 0 }, { "ELAN0612", 0 }, { "ELAN0618", 0 }, + { "ELAN061D", 0 }, { "ELAN1000", 0 }, { } }; From 79f4095a167f414918668a6b79a1c26b0b4b7ae6 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Wed, 18 Jul 2018 17:24:35 +0000 Subject: [PATCH 317/970] Input: i8042 - add Lenovo LaVie Z to the i8042 reset list commit 384cf4285b34e08917e3e66603382f2b0c4f6e1b upstream. The Lenovo LaVie Z laptop requires i8042 to be reset in order to consistently detect its Elantech touchpad. The nomux and kbdreset quirks are not sufficient. It's possible the other LaVie Z models from NEC require this as well. Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/serio/i8042-x86ia64io.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index e484ea2dc787..34be09651ee8 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -527,6 +527,13 @@ static const struct dmi_system_id __initconst i8042_dmi_nomux_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "N24_25BU"), }, }, + { + /* Lenovo LaVie Z */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "Lenovo LaVie Z"), + }, + }, { } }; From 6ed569edd49030e11a3e8e3ba44ebe3f0da02491 Mon Sep 17 00:00:00 2001 From: KT Liao Date: Mon, 16 Jul 2018 12:10:03 +0000 Subject: [PATCH 318/970] Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST commit 6f88a6439da5d94de334a341503bc2c7f4a7ea7f upstream. Add ELAN0622 to ACPI mapping table to support Elan touchpad found in Ideapad 330-15AST. Signed-off-by: KT Liao Reported-by: Anant Shende Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elan_i2c_core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/input/mouse/elan_i2c_core.c b/drivers/input/mouse/elan_i2c_core.c index a7515f5dd76c..a716482774db 100644 --- a/drivers/input/mouse/elan_i2c_core.c +++ b/drivers/input/mouse/elan_i2c_core.c @@ -1252,6 +1252,7 @@ static const struct acpi_device_id elan_acpi_id[] = { { "ELAN0612", 0 }, { "ELAN0618", 0 }, { "ELAN061D", 0 }, + { "ELAN0622", 0 }, { "ELAN1000", 0 }, { } }; From eb025250ae5f92ccb8adcc60843ddd442401d869 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Thu, 26 Jul 2018 16:37:45 -0700 Subject: [PATCH 319/970] kvm, mm: account shadow page tables to kmemcg MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d97e5e6160c0e0a23963ec198c7cb1c69e6bf9e8 upstream. The size of kvm's shadow page tables corresponds to the size of the guest virtual machines on the system. Large VMs can spend a significant amount of memory as shadow page tables which can not be left as system memory overhead. So, account shadow page tables to the kmemcg. [shakeelb@google.com: replace (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT] Link: http://lkml.kernel.org/r/20180629140224.205849-1-shakeelb@google.com Link: http://lkml.kernel.org/r/20180627181349.149778-1-shakeelb@google.com Signed-off-by: Shakeel Butt Cc: Michal Hocko Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Paolo Bonzini Cc: Greg Thelen Cc: Radim Krčmář Cc: Peter Feiner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a16c06604a56..8a4d6bc8fed0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -698,7 +698,7 @@ static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache, if (cache->nobjs >= min) return 0; while (cache->nobjs < ARRAY_SIZE(cache->objects)) { - page = (void *)__get_free_page(GFP_KERNEL); + page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); if (!page) return -ENOMEM; cache->objects[cache->nobjs++] = page; From 2a0ce1ff087c7a49b979ee2cea9cc1dd833c0822 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Tue, 24 Jul 2018 19:13:31 -0400 Subject: [PATCH 320/970] tracing: Fix double free of event_trigger_data commit 1863c387259b629e4ebfb255495f67cd06aa229b upstream. Running the following: # cd /sys/kernel/debug/tracing # echo 500000 > buffer_size_kb [ Or some other number that takes up most of memory ] # echo snapshot > events/sched/sched_switch/trigger Triggers the following bug: ------------[ cut here ]------------ kernel BUG at mm/slub.c:296! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:kfree+0x16c/0x180 Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246 RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80 RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500 RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00 FS: 00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0 Call Trace: event_trigger_callback+0xee/0x1d0 event_trigger_write+0xfc/0x1a0 __vfs_write+0x33/0x190 ? handle_mm_fault+0x115/0x230 ? _cond_resched+0x16/0x40 vfs_write+0xb0/0x190 ksys_write+0x52/0xc0 do_syscall_64+0x5a/0x160 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f363e16ab50 Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24 RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50 RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001 RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009 R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0 Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper 86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e ---[ end trace d301afa879ddfa25 ]--- The cause is because the register_snapshot_trigger() call failed to allocate the snapshot buffer, and then called unregister_trigger() which freed the data that was passed to it. Then on return to the function that called register_snapshot_trigger(), as it sees it failed to register, it frees the trigger_data again and causes a double free. By calling event_trigger_init() on the trigger_data (which only ups the reference counter for it), and then event_trigger_free() afterward, the trigger_data would not get freed by the registering trigger function as it would only up and lower the ref count for it. If the register trigger function fails, then the event_trigger_free() called after it will free the trigger data normally. Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home Cc: stable@vger.kerne.org Fixes: 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command") Reported-by: Masami Hiramatsu Reviewed-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_events_trigger.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index 88f398af57fa..9fa8b76c58e8 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -678,6 +678,8 @@ event_trigger_callback(struct event_command *cmd_ops, goto out_free; out_reg: + /* Up the trigger_data count to make sure reg doesn't free it on failure */ + event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, @@ -685,11 +687,13 @@ event_trigger_callback(struct event_command *cmd_ops, * Consider no functions a failure too. */ if (!ret) { + cmd_ops->unreg(glob, trigger_ops, trigger_data, file); ret = -ENOENT; - goto out_free; - } else if (ret < 0) - goto out_free; - ret = 0; + } else if (ret > 0) + ret = 0; + + /* Down the counter of trigger_data or free it if not used anymore */ + event_trigger_free(trigger_ops, trigger_data); out: return ret; From a9737bb91c70d81654357963fe59df7b9fa8f03c Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 25 Jul 2018 16:02:06 -0400 Subject: [PATCH 321/970] tracing: Fix possible double free in event_enable_trigger_func() commit 15cc78644d0075e76d59476a4467e7143860f660 upstream. There was a case that triggered a double free in event_trigger_callback() due to the called reg() function freeing the trigger_data and then it getting freed again by the error return by the caller. The solution there was to up the trigger_data ref count. Code inspection found that event_enable_trigger_func() has the same issue, but is not as easy to trigger (requires harder to trigger failures). It needs to be solved slightly different as it needs more to clean up when the reg() function fails. Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 7862ad1846e99 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands") Reivewed-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_events_trigger.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index 9fa8b76c58e8..8819944bbcbf 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1389,6 +1389,9 @@ int event_enable_trigger_func(struct event_command *cmd_ops, goto out; } + /* Up the trigger_data count to make sure nothing frees it on failure */ + event_trigger_init(trigger_ops, trigger_data); + if (trigger) { number = strsep(&trigger, ":"); @@ -1439,6 +1442,7 @@ int event_enable_trigger_func(struct event_command *cmd_ops, goto out_disable; /* Just return zero, not the number of enabled functions */ ret = 0; + event_trigger_free(trigger_ops, trigger_data); out: return ret; @@ -1449,7 +1453,7 @@ int event_enable_trigger_func(struct event_command *cmd_ops, out_free: if (cmd_ops->set_filter) cmd_ops->set_filter(NULL, trigger_data, NULL); - kfree(trigger_data); + event_trigger_free(trigger_ops, trigger_data); kfree(enable_data); goto out; } From b38f8292f08eccf4fe4c598e2b7c4e970ded383e Mon Sep 17 00:00:00 2001 From: Snild Dolkow Date: Thu, 26 Jul 2018 09:15:39 +0200 Subject: [PATCH 322/970] kthread, tracing: Don't expose half-written comm when creating kthreads commit 3e536e222f2930534c252c1cc7ae799c725c5ff9 upstream. There is a window for racing when printing directly to task->comm, allowing other threads to see a non-terminated string. The vsnprintf function fills the buffer, counts the truncated chars, then finally writes the \0 at the end. creator other vsnprintf: fill (not terminated) count the rest trace_sched_waking(p): ... memcpy(comm, p->comm, TASK_COMM_LEN) write \0 The consequences depend on how 'other' uses the string. In our case, it was copied into the tracing system's saved cmdlines, a buffer of adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be): crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk' 0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12" ...and a strcpy out of there would cause stack corruption: [224761.522292] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffff9bf9783c78 crash-arm64> kbt | grep 'comm\|trace_print_context' #6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396) comm (char [16]) = "irq/497-pwr_even" crash-arm64> rd 0xffffffd4d0e17d14 8 ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_ ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16: ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`.. ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`.......... The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was likely needed because of this same bug. Solved by vsnprintf:ing to a local buffer, then using set_task_comm(). This way, there won't be a window where comm is not terminated. Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com Cc: stable@vger.kernel.org Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure") Reviewed-by: Steven Rostedt (VMware) Signed-off-by: Snild Dolkow Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/kthread.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index c2c911a106cf..fbc230e41969 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -290,8 +290,14 @@ static struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), task = create->result; if (!IS_ERR(task)) { static const struct sched_param param = { .sched_priority = 0 }; + char name[TASK_COMM_LEN]; - vsnprintf(task->comm, sizeof(task->comm), namefmt, args); + /* + * task is already visible to other tasks, so updating + * COMM must be protected. + */ + vsnprintf(name, sizeof(name), namefmt, args); + set_task_comm(task, name); /* * root may have changed our (kthreadd's) priority or CPU mask. * The kernel thread should not inherit these properties. From 987e425ad386543e8d8c3fc9a2530424fc3eb575 Mon Sep 17 00:00:00 2001 From: Artem Savkov Date: Wed, 25 Jul 2018 16:20:38 +0200 Subject: [PATCH 323/970] tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure commit 57ea2a34adf40f3a6e88409aafcf803b8945619a upstream. If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe it returns an error, but does not unset the tp flags it set previously. This results in a probe being considered enabled and failures like being unable to remove the probe through kprobe_events file since probes_open() expects every probe to be disabled. Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com Cc: Ingo Molnar Cc: stable@vger.kernel.org Fixes: 41a7dd420c57 ("tracing/kprobes: Support ftrace_event_file base multibuffer") Acked-by: Masami Hiramatsu Reviewed-by: Josh Poimboeuf Signed-off-by: Artem Savkov Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_kprobe.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index ea3ed03fed7e..f76c0e1433c0 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -359,11 +359,10 @@ static struct trace_kprobe *find_trace_kprobe(const char *event, static int enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file) { + struct event_file_link *link; int ret = 0; if (file) { - struct event_file_link *link; - link = kmalloc(sizeof(*link), GFP_KERNEL); if (!link) { ret = -ENOMEM; @@ -383,6 +382,16 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file) else ret = enable_kprobe(&tk->rp.kp); } + + if (ret) { + if (file) { + list_del_rcu(&link->list); + kfree(link); + tk->tp.flags &= ~TP_FLAG_TRACE; + } else { + tk->tp.flags &= ~TP_FLAG_PROFILE; + } + } out: return ret; } From b985a7303de1d25e6040f7c7693072acfc6139ae Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 25 Jul 2018 22:28:56 -0400 Subject: [PATCH 324/970] tracing: Quiet gcc warning about maybe unused link variable commit 2519c1bbe38d7acacc9aacba303ca6f97482ed53 upstream. Commit 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure") added an if statement that depends on another if statement that gcc doesn't see will initialize the "link" variable and gives the warning: "warning: 'link' may be used uninitialized in this function" It is really a false positive, but to quiet the warning, and also to make sure that it never actually is used uninitialized, initialize the "link" variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler thinks it could be used uninitialized. Cc: stable@vger.kernel.org Fixes: 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure") Reported-by: kbuild test robot Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_kprobe.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index f76c0e1433c0..3b4cd44ad323 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -359,7 +359,7 @@ static struct trace_kprobe *find_trace_kprobe(const char *event, static int enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file) { - struct event_file_link *link; + struct event_file_link *link = NULL; int ret = 0; if (file) { @@ -385,7 +385,9 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file) if (ret) { if (file) { - list_del_rcu(&link->list); + /* Notice the if is true on not WARN() */ + if (!WARN_ON_ONCE(!link)) + list_del_rcu(&link->list); kfree(link); tk->tp.flags &= ~TP_FLAG_TRACE; } else { From e8d77bd71e80897abcf30441ab26f5808383ddaf Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 23 Jul 2018 10:18:23 -0400 Subject: [PATCH 325/970] arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups commit 7b0eb6b41a08fa1fa0d04b1c53becd62b5fbfaee upstream. Arnd reports the following arm64 randconfig build error with the PSI patches that add another page flag: /git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init': /git/arm-soc/include/linux/compiler.h:357:38: error: call to '__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT) The additional page flag causes other information stored in page->flags to get bumped into their own struct page member: #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT #else #define LAST_CPUPID_WIDTH 0 #endif #if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0 #define LAST_CPUPID_NOT_IN_PAGE_FLAGS #endif which in turn causes the struct page size to exceed the size set in STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the VMEMMAP page array according to address space and struct page size. However, the check is performed - and triggers here - on a !VMEMMAP config, which consumes an additional 22 page bits for the sparse section id. When VMEMMAP is enabled, those bits are returned, cpupid doesn't need its own member, and the page passes the VMEMMAP check. Restrict that check to the situation it was meant to check: that we are sizing the VMEMMAP page array correctly. Says Arnd: Further experiments show that the build error already existed before, but was only triggered with larger values of CONFIG_NR_CPU and/or CONFIG_NODES_SHIFT that might be used in actual configurations but not in randconfig builds. With longer CPU and node masks, I could recreate the problem with kernels as old as linux-4.7 when arm64 NUMA support got added. Reported-by: Arnd Bergmann Tested-by: Arnd Bergmann Cc: stable@vger.kernel.org Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") Fixes: 3e1907d5bf5a ("arm64: mm: move vmemmap region right below the linear region") Signed-off-by: Johannes Weiner Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/mm/init.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 9b8b477c363d..9d07b421f090 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -468,11 +468,13 @@ void __init mem_init(void) BUILD_BUG_ON(TASK_SIZE_32 > TASK_SIZE_64); #endif +#ifdef CONFIG_SPARSEMEM_VMEMMAP /* * Make sure we chose the upper bound of sizeof(struct page) - * correctly. + * correctly when sizing the VMEMMAP array. */ BUILD_BUG_ON(sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)); +#endif if (PAGE_SIZE >= 16384 && get_num_physpages() <= 128) { extern int sysctl_overcommit_memory; From 7ff1861f49e64da20a9b9451ebbdc9bbf68b8a4b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antti=20Sepp=C3=A4l=C3=A4?= Date: Thu, 5 Jul 2018 17:31:53 +0300 Subject: [PATCH 326/970] usb: dwc2: Fix DMA alignment to start at allocated boundary MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 56406e017a883b54b339207b230f85599f4d70ae upstream. The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way") introduced a common way to align DMA allocations. The code in the commit aligns the struct dma_aligned_buffer but the actual DMA address pointed by data[0] gets aligned to an offset from the allocated boundary by the kmalloc_ptr and the old_xfer_buffer pointers. This is against the recommendation in Documentation/DMA-API.txt which states: Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries). The effect of this is that architectures with non-coherent DMA caches may run into memory corruption or kernel crashes with Unhandled kernel unaligned accesses exceptions. Fix the alignment by positioning the DMA area in front of the allocation and use memory at the end of the area for storing the orginal transfer_buffer pointer. This may have the added benefit of increased performance as the DMA area is now fully aligned on all architectures. Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM). Fixes: 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way") Cc: Reviewed-by: Douglas Anderson [ Antti: backported to 4.9: edited difference in whitespace ] Signed-off-by: Antti Seppälä Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc2/hcd.c | 44 ++++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index 0a0cf154814b..984d6aae7529 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -2544,34 +2544,29 @@ static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg, #define DWC2_USB_DMA_ALIGN 4 -struct dma_aligned_buffer { - void *kmalloc_ptr; - void *old_xfer_buffer; - u8 data[0]; -}; - static void dwc2_free_dma_aligned_buffer(struct urb *urb) { - struct dma_aligned_buffer *temp; + void *stored_xfer_buffer; if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER)) return; - temp = container_of(urb->transfer_buffer, - struct dma_aligned_buffer, data); + /* Restore urb->transfer_buffer from the end of the allocated area */ + memcpy(&stored_xfer_buffer, urb->transfer_buffer + + urb->transfer_buffer_length, sizeof(urb->transfer_buffer)); if (usb_urb_dir_in(urb)) - memcpy(temp->old_xfer_buffer, temp->data, + memcpy(stored_xfer_buffer, urb->transfer_buffer, urb->transfer_buffer_length); - urb->transfer_buffer = temp->old_xfer_buffer; - kfree(temp->kmalloc_ptr); + kfree(urb->transfer_buffer); + urb->transfer_buffer = stored_xfer_buffer; urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER; } static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags) { - struct dma_aligned_buffer *temp, *kmalloc_ptr; + void *kmalloc_ptr; size_t kmalloc_size; if (urb->num_sgs || urb->sg || @@ -2579,22 +2574,29 @@ static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags) !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1))) return 0; - /* Allocate a buffer with enough padding for alignment */ + /* + * Allocate a buffer with enough padding for original transfer_buffer + * pointer. This allocation is guaranteed to be aligned properly for + * DMA + */ kmalloc_size = urb->transfer_buffer_length + - sizeof(struct dma_aligned_buffer) + DWC2_USB_DMA_ALIGN - 1; + sizeof(urb->transfer_buffer); kmalloc_ptr = kmalloc(kmalloc_size, mem_flags); if (!kmalloc_ptr) return -ENOMEM; - /* Position our struct dma_aligned_buffer such that data is aligned */ - temp = PTR_ALIGN(kmalloc_ptr + 1, DWC2_USB_DMA_ALIGN) - 1; - temp->kmalloc_ptr = kmalloc_ptr; - temp->old_xfer_buffer = urb->transfer_buffer; + /* + * Position value of original urb->transfer_buffer pointer to the end + * of allocation for later referencing + */ + memcpy(kmalloc_ptr + urb->transfer_buffer_length, + &urb->transfer_buffer, sizeof(urb->transfer_buffer)); + if (usb_urb_dir_out(urb)) - memcpy(temp->data, urb->transfer_buffer, + memcpy(kmalloc_ptr, urb->transfer_buffer, urb->transfer_buffer_length); - urb->transfer_buffer = temp->data; + urb->transfer_buffer = kmalloc_ptr; urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER; From 31ad104de6feb222500abe23a2cdbf08b6794c77 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Thu, 14 Jun 2018 15:27:34 -0700 Subject: [PATCH 327/970] kcov: ensure irq code sees a valid area [ Upstream commit c9484b986ef03492357fddd50afbdd02929cfa72 ] Patch series "kcov: fix unexpected faults". These patches fix a few issues where KCOV code could trigger recursive faults, discovered while debugging a patch enabling KCOV for arch/arm: * On CONFIG_PREEMPT kernels, there's a small race window where __sanitizer_cov_trace_pc() can see a bogus kcov_area. * Lazy faulting of the vmalloc area can cause mutual recursion between fault handling code and __sanitizer_cov_trace_pc(). * During the context switch, switching the mm can cause the kcov_area to be transiently unmapped. These are prerequisites for enabling KCOV on arm, but the issues themsevles are generic -- we just happen to avoid them by chance rather than design on x86-64 and arm64. This patch (of 3): For kernels built with CONFIG_PREEMPT, some C code may execute before or after the interrupt handler, while the hardirq count is zero. In these cases, in_task() can return true. A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it resets the task's kcov data via kcov_task_init(). Instrumented code executed during this period will call __sanitizer_cov_trace_pc(), and as in_task() returns true, will inspect t->kcov_mode before trying to write to t->kcov_area. In kcov_init_task() we update t->kcov_{mode,area,size} with plain stores, which may be re-ordered, torn, etc. Thus __sanitizer_cov_trace_pc() may see bogus values for any of these fields, and may attempt to write to memory which is not mapped. Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a barrier() to ensure this is ordered before we clear t->kov_{area,size}. This ensures that any code execute while kcov_init_task() is preempted will either see valid values for t->kcov_{area,size}, or will see that t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching t->kcov_area. Link: http://lkml.kernel.org/r/20180504135535.53744-2-mark.rutland@arm.com Signed-off-by: Mark Rutland Acked-by: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Ingo Molnar Cc: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/kcov.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/kcov.c b/kernel/kcov.c index 3883df58aa12..b0ec31493fdc 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -103,7 +103,8 @@ static void kcov_put(struct kcov *kcov) void kcov_task_init(struct task_struct *t) { - t->kcov_mode = KCOV_MODE_DISABLED; + WRITE_ONCE(t->kcov_mode, KCOV_MODE_DISABLED); + barrier(); t->kcov_size = 0; t->kcov_area = NULL; t->kcov = NULL; From acd9aba8e481eaf88799bfba33590c1f5c322eb2 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Tue, 12 Jun 2018 08:57:53 +0200 Subject: [PATCH 328/970] xen/netfront: raise max number of slots in xennet_get_responses() [ Upstream commit 57f230ab04d2910a06d17d988f1c4d7586a59113 ] The max number of slots used in xennet_get_responses() is set to MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD). In old kernel-xen MAX_SKB_FRAGS was 18, while nowadays it is 17. This difference is resulting in frequent messages "too many slots" and a reduced network throughput for some workloads (factor 10 below that of a kernel-xen based guest). Replacing MAX_SKB_FRAGS by XEN_NETIF_NR_SLOTS_MIN for calculation of the max number of slots to use solves that problem (tests showed no more messages "too many slots" and throughput was as high as with the kernel-xen based guest system). Replace MAX_SKB_FRAGS-2 by XEN_NETIF_NR_SLOTS_MIN-1 in netfront_tx_slot_available() for making it clearer what is really being tested without actually modifying the tested value. Signed-off-by: Juergen Gross Reviewed-by: Boris Ostrovsky Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/xen-netfront.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 520050eae836..a5908e4c06cb 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -238,7 +238,7 @@ static void rx_refill_timeout(unsigned long data) static int netfront_tx_slot_available(struct netfront_queue *queue) { return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) < - (NET_TX_RING_SIZE - MAX_SKB_FRAGS - 2); + (NET_TX_RING_SIZE - XEN_NETIF_NR_SLOTS_MIN - 1); } static void xennet_maybe_wake_tx(struct netfront_queue *queue) @@ -789,7 +789,7 @@ static int xennet_get_responses(struct netfront_queue *queue, RING_IDX cons = queue->rx.rsp_cons; struct sk_buff *skb = xennet_get_rx_skb(queue, cons); grant_ref_t ref = xennet_get_rx_ref(queue, cons); - int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD); + int max = XEN_NETIF_NR_SLOTS_MIN + (rx->status <= RX_COPY_THRESHOLD); int slots = 1; int err = 0; unsigned long ret; From 9f9e506d8e6912c5b648288e68e6e442d6d9e2d7 Mon Sep 17 00:00:00 2001 From: Zhouyang Jia Date: Mon, 11 Jun 2018 16:18:40 +0800 Subject: [PATCH 329/970] ALSA: emu10k1: add error handling for snd_ctl_add [ Upstream commit 6d531e7b972cb62ded011c2dfcc2d9f72ea6c421 ] When snd_ctl_add fails, the lack of error-handling code may cause unexpected results. This patch adds error-handling code after calling snd_ctl_add. Signed-off-by: Zhouyang Jia Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/pci/emu10k1/emupcm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sound/pci/emu10k1/emupcm.c b/sound/pci/emu10k1/emupcm.c index 37be1e14d756..0d2bb30d9f7e 100644 --- a/sound/pci/emu10k1/emupcm.c +++ b/sound/pci/emu10k1/emupcm.c @@ -1850,7 +1850,9 @@ int snd_emu10k1_pcm_efx(struct snd_emu10k1 *emu, int device) if (!kctl) return -ENOMEM; kctl->id.device = device; - snd_ctl_add(emu->card, kctl); + err = snd_ctl_add(emu->card, kctl); + if (err < 0) + return err; snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, snd_dma_pci_data(emu->pci), 64*1024, 64*1024); From ca08131ee77bbdd2e1a72b02a06188b3bdd3e192 Mon Sep 17 00:00:00 2001 From: Zhouyang Jia Date: Mon, 11 Jun 2018 16:04:06 +0800 Subject: [PATCH 330/970] ALSA: fm801: add error handling for snd_ctl_add [ Upstream commit ef1ffbe7889e99f5b5cccb41c89e5c94f50f3218 ] When snd_ctl_add fails, the lack of error-handling code may cause unexpected results. This patch adds error-handling code after calling snd_ctl_add. Signed-off-by: Zhouyang Jia Acked-by: Andy Shevchenko Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/pci/fm801.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/sound/pci/fm801.c b/sound/pci/fm801.c index a178e0d03088..8561f60b4284 100644 --- a/sound/pci/fm801.c +++ b/sound/pci/fm801.c @@ -1068,11 +1068,19 @@ static int snd_fm801_mixer(struct fm801 *chip) if ((err = snd_ac97_mixer(chip->ac97_bus, &ac97, &chip->ac97_sec)) < 0) return err; } - for (i = 0; i < FM801_CONTROLS; i++) - snd_ctl_add(chip->card, snd_ctl_new1(&snd_fm801_controls[i], chip)); + for (i = 0; i < FM801_CONTROLS; i++) { + err = snd_ctl_add(chip->card, + snd_ctl_new1(&snd_fm801_controls[i], chip)); + if (err < 0) + return err; + } if (chip->multichannel) { - for (i = 0; i < FM801_CONTROLS_MULTI; i++) - snd_ctl_add(chip->card, snd_ctl_new1(&snd_fm801_controls_multi[i], chip)); + for (i = 0; i < FM801_CONTROLS_MULTI; i++) { + err = snd_ctl_add(chip->card, + snd_ctl_new1(&snd_fm801_controls_multi[i], chip)); + if (err < 0) + return err; + } } return 0; } From 8bccc6c9025f1ecf495dc0889bc1aea28d1e7fec Mon Sep 17 00:00:00 2001 From: Scott Mayhew Date: Fri, 8 Jun 2018 16:31:46 -0400 Subject: [PATCH 331/970] nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo [ Upstream commit 3171822fdcdd6e6d536047c425af6dc7a92dc585 ] When running a fuzz tester against a KASAN-enabled kernel, the following splat periodically occurs. The problem occurs when the test sends a GETDEVICEINFO request with a malformed xdr array (size but no data) for gdia_notify_types and the array size is > 0x3fffffff, which results in an overflow in the value of nbytes which is passed to read_buf(). If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after the overflow occurs, the value of nbytes 0, and when that happens the pointer returned by read_buf() points to the end of the xdr data (i.e. argp->end) when really it should be returning NULL. Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this value is arbitrary, but it's the same threshold used by nfsd4_decode_bitmap()... in could really be any value >= 1 since it's expected to get at most a single bitmap in gdia_notify_types). [ 119.256854] ================================================================== [ 119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd] [ 119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538 [ 119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1 [ 119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014 [ 119.261202] Call Trace: [ 119.262265] dump_stack+0x71/0xab [ 119.263371] print_address_description+0x6a/0x270 [ 119.264609] kasan_report+0x258/0x380 [ 119.265854] ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd] [ 119.267291] nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd] [ 119.268549] ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd] [ 119.269873] ? nfsd4_decode_sequence+0x490/0x490 [nfsd] [ 119.271095] nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd] [ 119.272393] ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd] [ 119.273658] nfsd_dispatch+0x183/0x850 [nfsd] [ 119.274918] svc_process+0x161c/0x31a0 [sunrpc] [ 119.276172] ? svc_printk+0x190/0x190 [sunrpc] [ 119.277386] ? svc_xprt_release+0x451/0x680 [sunrpc] [ 119.278622] nfsd+0x2b9/0x430 [nfsd] [ 119.279771] ? nfsd_destroy+0x1c0/0x1c0 [nfsd] [ 119.281157] kthread+0x2db/0x390 [ 119.282347] ? kthread_create_worker_on_cpu+0xc0/0xc0 [ 119.283756] ret_from_fork+0x35/0x40 [ 119.286041] Allocated by task 436: [ 119.287525] kasan_kmalloc+0xa0/0xd0 [ 119.288685] kmem_cache_alloc+0xe9/0x1f0 [ 119.289900] get_empty_filp+0x7b/0x410 [ 119.291037] path_openat+0xca/0x4220 [ 119.292242] do_filp_open+0x182/0x280 [ 119.293411] do_sys_open+0x216/0x360 [ 119.294555] do_syscall_64+0xa0/0x2f0 [ 119.295721] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 119.298068] Freed by task 436: [ 119.299271] __kasan_slab_free+0x130/0x180 [ 119.300557] kmem_cache_free+0x78/0x210 [ 119.301823] rcu_process_callbacks+0x35b/0xbd0 [ 119.303162] __do_softirq+0x192/0x5ea [ 119.305443] The buggy address belongs to the object at ffff880113ada000 which belongs to the cache filp of size 256 [ 119.308556] The buggy address is located 0 bytes inside of 256-byte region [ffff880113ada000, ffff880113ada100) [ 119.311376] The buggy address belongs to the page: [ 119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780 [ 119.314428] flags: 0x17ffe000000100(slab) [ 119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001 [ 119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000 [ 119.319050] page dumped because: kasan: bad access detected [ 119.321652] Memory state around the buggy address: [ 119.322993] ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 119.324515] ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 119.327547] ^ [ 119.328730] ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 119.330218] ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 119.331740] ================================================================== Signed-off-by: Scott Mayhew Signed-off-by: J. Bruce Fields Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfs4xdr.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bdbd9e6d1ace..b16a6c036352 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1536,6 +1536,8 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, gdev->gd_maxcount = be32_to_cpup(p++); num = be32_to_cpup(p++); if (num) { + if (num > 1000) + goto xdr_error; READ_BUF(4 * num); gdev->gd_notify_types = be32_to_cpup(p++); for (i = 1; i < num; i++) { From c6e8116307f54c8639c1ef02516cc97c31bcd796 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Wed, 11 Apr 2018 11:15:48 +0200 Subject: [PATCH 332/970] vfio: platform: Fix reset module leak in error path [ Upstream commit 28a68387888997e8a7fa57940ea5d55f2e16b594 ] If the IOMMU group setup fails, the reset module is not released. Fixes: b5add544d677d363 ("vfio, platform: make reset driver a requirement by default") Signed-off-by: Geert Uytterhoeven Reviewed-by: Eric Auger Reviewed-by: Simon Horman Acked-by: Eric Auger Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/vfio/platform/vfio_platform_common.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index d78142830754..d143d08c4f0f 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -696,18 +696,23 @@ int vfio_platform_probe_common(struct vfio_platform_device *vdev, group = vfio_iommu_group_get(dev); if (!group) { pr_err("VFIO: No IOMMU group for device %s\n", vdev->name); - return -EINVAL; + ret = -EINVAL; + goto put_reset; } ret = vfio_add_group_dev(dev, &vfio_platform_ops, vdev); - if (ret) { - vfio_iommu_group_put(group, dev); - return ret; - } + if (ret) + goto put_iommu; mutex_init(&vdev->igate); return 0; + +put_iommu: + vfio_iommu_group_put(group, dev); +put_reset: + vfio_platform_put_reset(vdev); + return ret; } EXPORT_SYMBOL_GPL(vfio_platform_probe_common); From e18d3280da8bdf690c320bb6c9cb8f98e4fc5e94 Mon Sep 17 00:00:00 2001 From: Chintan Pandya Date: Thu, 7 Jun 2018 17:06:50 -0700 Subject: [PATCH 333/970] mm: vmalloc: avoid racy handling of debugobjects in vunmap [ Upstream commit f3c01d2f3ade6790db67f80fef60df84424f8964 ] Currently, __vunmap flow is, 1) Release the VM area 2) Free the debug objects corresponding to that vm area. This leave some race window open. 1) Release the VM area 1.5) Some other client gets the same vm area 1.6) This client allocates new debug objects on the same vm area 2) Free the debug objects corresponding to this vm area. Here, we actually free 'other' client's debug objects. Fix this by freeing the debug objects first and then releasing the VM area. Link: http://lkml.kernel.org/r/1523961828-9485-2-git-send-email-cpandya@codeaurora.org Signed-off-by: Chintan Pandya Reviewed-by: Andrew Morton Cc: Ard Biesheuvel Cc: Byungchul Park Cc: Catalin Marinas Cc: Florian Fainelli Cc: Johannes Weiner Cc: Laura Abbott Cc: Vlastimil Babka Cc: Wei Yang Cc: Yisheng Xie Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/vmalloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 195de42bea1f..fa598162dbf0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1494,7 +1494,7 @@ static void __vunmap(const void *addr, int deallocate_pages) addr)) return; - area = remove_vm_area(addr); + area = find_vmap_area((unsigned long)addr)->vm; if (unlikely(!area)) { WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", addr); @@ -1504,6 +1504,7 @@ static void __vunmap(const void *addr, int deallocate_pages) debug_check_no_locks_freed(addr, get_vm_area_size(area)); debug_check_no_obj_freed(addr, get_vm_area_size(area)); + remove_vm_area(addr); if (deallocate_pages) { int i; From c99dbd95723e7bd5616e0dbe4e829e26d4db7177 Mon Sep 17 00:00:00 2001 From: Mathieu Malaterre Date: Thu, 7 Jun 2018 17:05:17 -0700 Subject: [PATCH 334/970] mm/slub.c: add __printf verification to slab_err() [ Upstream commit a38965bf941b7c2af50de09c96bc5f03e136caef ] __printf is useful to verify format and arguments. Remove the following warning (with W=1): mm/slub.c:721:2: warning: function might be possible candidate for `gnu_printf' format attribute [-Wsuggest-attribute=format] Link: http://lkml.kernel.org/r/20180505200706.19986-1-malat@debian.org Signed-off-by: Mathieu Malaterre Reviewed-by: Andrew Morton Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index edc79ca3c6d5..e0ce5dec84ba 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -673,7 +673,7 @@ void object_err(struct kmem_cache *s, struct page *page, print_trailer(s, page, object); } -static void slab_err(struct kmem_cache *s, struct page *page, +static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page, const char *fmt, ...) { va_list args; From fda8caa9cb0c80deab362d73a8fd2cd322ce20fc Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Tue, 5 Jun 2018 23:09:14 +0200 Subject: [PATCH 335/970] rtc: ensure rtc_set_alarm fails when alarms are not supported [ Upstream commit abfdff44bc38e9e2ef7929f633fb8462632299d4 ] When using RTC_ALM_SET or RTC_WKALM_SET with rtc_wkalrm.enabled not set, rtc_timer_enqueue() is not called and rtc_set_alarm() may succeed but the subsequent RTC_AIE_ON ioctl will fail. RTC_ALM_READ would also fail in that case. Ensure rtc_set_alarm() fails when alarms are not supported to avoid letting programs think the alarms are working for a particular RTC when they are not. Signed-off-by: Alexandre Belloni Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/interface.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index 25cf3069e2e7..4131bfb2cc61 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -359,6 +359,11 @@ int rtc_set_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) { int err; + if (!rtc->ops) + return -ENODEV; + else if (!rtc->ops->set_alarm) + return -EINVAL; + err = rtc_valid_tm(&alarm->time); if (err != 0) return err; From 56295051214ef9616d90ef34a3fe43628985433f Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 5 Jun 2018 14:14:16 +0200 Subject: [PATCH 336/970] perf tools: Fix pmu events parsing rule [ Upstream commit ceac7b79df7bd67ef9aaf464b0179a2686aff4ee ] Currently all the event parsing fails end up in the event_pmu rule, and display misleading help like: $ perf stat -e inst kill event syntax error: 'inst' \___ Cannot find PMU `inst'. Missing kernel support? ... The reason is that the event_pmu is too strong and match also single string. Changing it to force the '/' separators to be part of the rule, and getting the proper error now: $ perf stat -e inst kill event syntax error: 'inst' \___ parser error Run 'perf list' for a list of valid events ... Suggested-by: Adrian Hunter Signed-off-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Adrian Hunter Cc: Alexander Shishkin Cc: David Ahern Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180605121416.31645-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/parse-events.y | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y index 879115f93edc..98a4205a5f8a 100644 --- a/tools/perf/util/parse-events.y +++ b/tools/perf/util/parse-events.y @@ -68,6 +68,7 @@ static void inc_group_count(struct list_head *list, %type value_sym %type event_config %type opt_event_config +%type opt_pmu_config %type event_term %type event_pmu %type event_legacy_symbol @@ -219,7 +220,7 @@ event_def: event_pmu | event_bpf_file event_pmu: -PE_NAME opt_event_config +PE_NAME opt_pmu_config { struct parse_events_evlist *data = _data; struct list_head *list; @@ -482,6 +483,17 @@ opt_event_config: $$ = NULL; } +opt_pmu_config: +'/' event_config '/' +{ + $$ = $2; +} +| +'/' '/' +{ + $$ = NULL; +} + start_terms: event_config { struct parse_events_terms *data = _data; From 6e02c062e94a235b50dd2ec15068026e3a841c1e Mon Sep 17 00:00:00 2001 From: Jozsef Kadlecsik Date: Thu, 31 May 2018 18:45:21 +0200 Subject: [PATCH 337/970] netfilter: ipset: List timing out entries with "timeout 1" instead of zero [ Upstream commit bd975e691486ba52790ba23cc9b4fecab7bc0d31 ] When listing sets with timeout support, there's a probability that just timing out entries with "0" timeout value is listed/saved. However when restoring the saved list, the zero timeout value means permanent elelements. The new behaviour is that timing out entries are listed with "timeout 1" instead of zero. Fixes netfilter bugzilla #1258. Signed-off-by: Jozsef Kadlecsik Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/netfilter/ipset/ip_set_timeout.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h b/include/linux/netfilter/ipset/ip_set_timeout.h index 1d6a935c1ac5..8793f5a7b820 100644 --- a/include/linux/netfilter/ipset/ip_set_timeout.h +++ b/include/linux/netfilter/ipset/ip_set_timeout.h @@ -65,8 +65,14 @@ ip_set_timeout_set(unsigned long *timeout, u32 value) static inline u32 ip_set_timeout_get(unsigned long *timeout) { - return *timeout == IPSET_ELEM_PERMANENT ? 0 : - jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC; + u32 t; + + if (*timeout == IPSET_ELEM_PERMANENT) + return 0; + + t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC; + /* Zero value in userspace means no timeout */ + return t == 0 ? 1 : t; } #endif /* __KERNEL__ */ From 73298a828c90398d582ec0e204b637e9bbee2dd5 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Fri, 1 Jun 2018 11:31:44 -0700 Subject: [PATCH 338/970] infiniband: fix a possible use-after-free bug [ Upstream commit cb2595c1393b4a5211534e6f0a0fbad369e21ad8 ] ucma_process_join() will free the new allocated "mc" struct, if there is any error after that, especially the copy_to_user(). But in parallel, ucma_leave_multicast() could find this "mc" through idr_find() before ucma_process_join() frees it, since it is already published. So "mc" could be used in ucma_leave_multicast() after it is been allocated and freed in ucma_process_join(), since we don't refcnt it. Fix this by separating "publish" from ID allocation, so that we can get an ID first and publish it later after copy_to_user(). Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Reported-by: Noam Rathaus Signed-off-by: Cong Wang Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/ucma.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index a036d7087ddf..3bef6d4ffe6f 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -218,7 +218,7 @@ static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx) return NULL; mutex_lock(&mut); - mc->id = idr_alloc(&multicast_idr, mc, 0, 0, GFP_KERNEL); + mc->id = idr_alloc(&multicast_idr, NULL, 0, 0, GFP_KERNEL); mutex_unlock(&mut); if (mc->id < 0) goto error; @@ -1385,6 +1385,10 @@ static ssize_t ucma_process_join(struct ucma_file *file, goto err3; } + mutex_lock(&mut); + idr_replace(&multicast_idr, mc, mc->id); + mutex_unlock(&mut); + mutex_unlock(&file->mut); ucma_put_ctx(ctx); return 0; From ee245de4b32ba8836b43798a9aeca99f6a186592 Mon Sep 17 00:00:00 2001 From: Sam Bobroff Date: Fri, 25 May 2018 13:11:30 +1000 Subject: [PATCH 339/970] powerpc/eeh: Fix use-after-release of EEH driver [ Upstream commit 46d4be41b987a6b2d25a2ebdd94cafb44e21d6c5 ] Correct two cases where eeh_pcid_get() is used to reference the driver's module but the reference is dropped before the driver pointer is used. In eeh_rmv_device() also refactor a little so that only two calls to eeh_pcid_put() are needed, rather than three and the reference isn't taken at all if it wasn't needed. Signed-off-by: Sam Bobroff Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/eeh_driver.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 27843665da9e..620e08d4eb6e 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -450,9 +450,11 @@ static void *eeh_add_virt_device(void *data, void *userdata) driver = eeh_pcid_get(dev); if (driver) { - eeh_pcid_put(dev); - if (driver->err_handler) + if (driver->err_handler) { + eeh_pcid_put(dev); return NULL; + } + eeh_pcid_put(dev); } #ifdef CONFIG_PPC_POWERNV @@ -489,17 +491,19 @@ static void *eeh_rmv_device(void *data, void *userdata) if (eeh_dev_removed(edev)) return NULL; - driver = eeh_pcid_get(dev); - if (driver) { - eeh_pcid_put(dev); - if (removed && - eeh_pe_passed(edev->pe)) - return NULL; - if (removed && - driver->err_handler && - driver->err_handler->error_detected && - driver->err_handler->slot_reset) + if (removed) { + if (eeh_pe_passed(edev->pe)) return NULL; + driver = eeh_pcid_get(dev); + if (driver) { + if (driver->err_handler && + driver->err_handler->error_detected && + driver->err_handler->slot_reset) { + eeh_pcid_put(dev); + return NULL; + } + eeh_pcid_put(dev); + } } /* Remove it from PCI subsystem */ From c3e347251cfdd18591c571b1b0bafbbd314ab72b Mon Sep 17 00:00:00 2001 From: Stewart Smith Date: Thu, 29 Mar 2018 17:02:46 +1100 Subject: [PATCH 340/970] hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common() [ Upstream commit 447808bf500a7cc92173266a59f8a494e132b122 ] time_init() will set up tb_ticks_per_usec based on reality. time_init() is called *after* udbg_init_opal_common() during boot. from arch/powerpc/kernel/time.c: unsigned long tb_ticks_per_usec = 100; /* sane default */ Currently, all powernv systems have a timebase frequency of 512mhz (512000000/1000000 == 0x200) - although there's nothing written down anywhere that I can find saying that we couldn't make that different based on the requirements in the ISA. So, we've been (accidentally) thwacking the (currently) correct (for powernv at least) value for tb_ticks_per_usec earlier than we otherwise would have. The "sane default" seems to be adequate for our purposes between udbg_init_opal_common() and time_init() being called, and if it isn't, then we should probably be setting it somewhere that isn't hvc_opal.c! Signed-off-by: Stewart Smith Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/hvc/hvc_opal.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c index 510799311099..1fc5d5b82778 100644 --- a/drivers/tty/hvc/hvc_opal.c +++ b/drivers/tty/hvc/hvc_opal.c @@ -332,7 +332,6 @@ static void udbg_init_opal_common(void) udbg_putc = udbg_opal_putc; udbg_getc = udbg_opal_getc; udbg_getc_poll = udbg_opal_getc_poll; - tb_ticks_per_usec = 0x200; /* Make udelay not suck */ } void __init hvc_opal_init_early(void) From ea8e4ff38ffae80011aa83ce17e790f44770a800 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Wed, 30 May 2018 20:31:22 +1000 Subject: [PATCH 341/970] powerpc/64s: Fix compiler store ordering to SLB shadow area [ Upstream commit 926bc2f100c24d4842b3064b5af44ae964c1d81c ] The stores to update the SLB shadow area must be made as they appear in the C code, so that the hypervisor does not see an entry with mismatched vsid and esid. Use WRITE_ONCE for this. GCC has been observed to elide the first store to esid in the update, which means that if the hypervisor interrupts the guest after storing to vsid, it could see an entry with old esid and new vsid, which may possibly result in memory corruption. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/mm/slb.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 48fc28bab544..64c9a91773af 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -68,14 +68,14 @@ static inline void slb_shadow_update(unsigned long ea, int ssize, * updating it. No write barriers are needed here, provided * we only update the current CPU's SLB shadow buffer. */ - p->save_area[index].esid = 0; - p->save_area[index].vsid = cpu_to_be64(mk_vsid_data(ea, ssize, flags)); - p->save_area[index].esid = cpu_to_be64(mk_esid_data(ea, ssize, index)); + WRITE_ONCE(p->save_area[index].esid, 0); + WRITE_ONCE(p->save_area[index].vsid, cpu_to_be64(mk_vsid_data(ea, ssize, flags))); + WRITE_ONCE(p->save_area[index].esid, cpu_to_be64(mk_esid_data(ea, ssize, index))); } static inline void slb_shadow_clear(enum slb_index index) { - get_slb_shadow()->save_area[index].esid = 0; + WRITE_ONCE(get_slb_shadow()->save_area[index].esid, 0); } static inline void create_shadowed_slbe(unsigned long ea, int ssize, From efb4dd6ab9d65ce0be2b5edc9e596627c212bd8a Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 29 May 2018 14:56:19 +0300 Subject: [PATCH 342/970] RDMA/mad: Convert BUG_ONs to error flows [ Upstream commit 2468b82d69e3a53d024f28d79ba0fdb8bf43dfbf ] Let's perform checks in-place instead of BUG_ONs. Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/mad.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 2395fe2021c9..3e2ab04201e2 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1549,7 +1549,8 @@ static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req, mad_reg_req->oui, 3)) { method = &(*vendor_table)->vendor_class[ vclass]->method_table[i]; - BUG_ON(!*method); + if (!*method) + goto error3; goto check_in_use; } } @@ -1559,10 +1560,12 @@ static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req, vclass]->oui[i])) { method = &(*vendor_table)->vendor_class[ vclass]->method_table[i]; - BUG_ON(*method); /* Allocate method table for this OUI */ - if ((ret = allocate_method_table(method))) - goto error3; + if (!*method) { + ret = allocate_method_table(method); + if (ret) + goto error3; + } memcpy((*vendor_table)->vendor_class[vclass]->oui[i], mad_reg_req->oui, 3); goto check_in_use; From 759fb7f94fab59cf3ca9e67c6646b87cff919918 Mon Sep 17 00:00:00 2001 From: Alexey Kodanev Date: Thu, 31 May 2018 19:53:33 +0300 Subject: [PATCH 343/970] netfilter: nf_tables: check msg_type before nft_trans_set(trans) [ Upstream commit 9c7f96fd77b0dbe1fe7ed1f9c462c45dc48a1076 ] The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before using nft_trans_set(trans). Otherwise we can get out of bounds read. For example, KASAN reported the one when running 0001_cache_handling_0 nft test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE: [75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables] [75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356 ... [75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1 [75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2 [75517.618129] Call Trace: [75517.648821] dump_stack+0xd1/0x13b [75517.691040] ? show_regs_print_info+0x5/0x5 [75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5 [75517.799300] ? lock_acquire+0x143/0x310 [75517.846738] print_address_description+0x85/0x3a0 [75517.904547] kasan_report+0x18d/0x4b0 [75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables] [75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables] [75518.357154] ? nla_parse+0x1a5/0x300 [75518.401455] ? kasan_kmalloc+0xa6/0xd0 [75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink] [75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink] [75518.631711] ? lock_acquire+0x143/0x310 [75518.679133] ? netlink_deliver_tap+0x9b/0x1070 [75518.733840] ? kasan_unpoison_shadow+0x31/0x40 [75518.788542] netlink_unicast+0x45d/0x680 [75518.837111] ? __isolate_free_page+0x890/0x890 [75518.891913] ? netlink_attachskb+0x6b0/0x6b0 [75518.944542] netlink_sendmsg+0x6fa/0xd30 [75518.993107] ? netlink_unicast+0x680/0x680 [75519.043758] ? netlink_unicast+0x680/0x680 [75519.094402] sock_sendmsg+0xd9/0x160 [75519.138810] ___sys_sendmsg+0x64d/0x980 [75519.186234] ? copy_msghdr_from_user+0x350/0x350 [75519.243118] ? lock_downgrade+0x650/0x650 [75519.292738] ? do_raw_spin_unlock+0x5d/0x250 [75519.345456] ? _raw_spin_unlock+0x24/0x30 [75519.395065] ? __handle_mm_fault+0xbde/0x3410 [75519.448830] ? sock_setsockopt+0x3d2/0x1940 [75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0 [75519.558448] ? lock_downgrade+0x650/0x650 [75519.608057] ? __audit_syscall_entry+0x317/0x720 [75519.664960] ? __fget_light+0x58/0x250 [75519.711325] ? __sys_sendmsg+0xde/0x170 [75519.758850] __sys_sendmsg+0xde/0x170 [75519.804193] ? __ia32_sys_shutdown+0x90/0x90 [75519.856725] ? syscall_trace_enter+0x897/0x10e0 [75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920 [75519.979432] ? __audit_syscall_entry+0x720/0x720 [75520.036118] do_syscall_64+0xa3/0x3d0 [75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0 [75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75520.201680] RIP: 0033:0x7fc153320ba0 [75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0 [75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003 [75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090 [75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660 [75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001 [75520.790946] Allocated by task 7356: [75520.833994] kasan_kmalloc+0xa6/0xd0 [75520.878088] __kmalloc+0x189/0x450 [75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables] [75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables] [75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75521.108655] netlink_unicast+0x45d/0x680 [75521.157013] netlink_sendmsg+0x6fa/0xd30 [75521.205271] sock_sendmsg+0xd9/0x160 [75521.249365] ___sys_sendmsg+0x64d/0x980 [75521.296686] __sys_sendmsg+0xde/0x170 [75521.341822] do_syscall_64+0xa3/0x3d0 [75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.467867] Freed by task 23454: [75521.507804] __kasan_slab_free+0x132/0x180 [75521.558137] kfree+0x14d/0x4d0 [75521.596005] free_rt_sched_group+0x153/0x280 [75521.648410] sched_autogroup_create_attach+0x19a/0x520 [75521.711330] ksys_setsid+0x2ba/0x400 [75521.755529] __ia32_sys_setsid+0xa/0x10 [75521.802850] do_syscall_64+0xa3/0x3d0 [75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.929000] The buggy address belongs to the object at ffff881bdb643f80 which belongs to the cache kmalloc-96 of size 96 [75522.079797] The buggy address is located 72 bytes inside of 96-byte region [ffff881bdb643f80, ffff881bdb643fe0) [75522.221234] The buggy address belongs to the page: [75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0 [75522.377443] flags: 0x2fffff80000100(slab) [75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020 [75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000 [75522.615601] page dumped because: kasan: bad access detected Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets") Signed-off-by: Alexey Kodanev Acked-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_tables_api.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 762f31fb5b67..a3fb30f5a1a9 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2476,12 +2476,13 @@ struct nft_set *nf_tables_set_lookup_byid(const struct net *net, u32 id = ntohl(nla_get_be32(nla)); list_for_each_entry(trans, &net->nft.commit_list, list) { - struct nft_set *set = nft_trans_set(trans); + if (trans->msg_type == NFT_MSG_NEWSET) { + struct nft_set *set = nft_trans_set(trans); - if (trans->msg_type == NFT_MSG_NEWSET && - id == nft_trans_set_id(trans) && - nft_active_genmask(set, genmask)) - return set; + if (id == nft_trans_set_id(trans) && + nft_active_genmask(set, genmask)) + return set; + } } return ERR_PTR(-ENOENT); } From b05c460a0ce3bef757f40475d028d8fcfea2bb5d Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 22 May 2018 11:17:16 -0400 Subject: [PATCH 344/970] pnfs: Don't release the sequence slot until we've processed layoutget on open [ Upstream commit ae55e59da0e401893b3c52b575fc18a00623d0a1 ] If the server recalls the layout that was just handed out, we risk hitting a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we release the sequence slot after processing the LAYOUTGET operation that was sent as part of the OPEN compound. Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 91e017ca7072..cf5fdc25289a 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2701,7 +2701,7 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, if (ret != 0) goto out; - state = nfs4_opendata_to_nfs4_state(opendata); + state = _nfs4_opendata_to_nfs4_state(opendata); ret = PTR_ERR(state); if (IS_ERR(state)) goto out; @@ -2737,6 +2737,7 @@ static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata, nfs4_schedule_stateid_recovery(server, state); } out: + nfs4_sequence_free_slot(&opendata->o_res.seq_res); return ret; } From c9ab0cefc59e0deddf85edd8ab3d310570a0367c Mon Sep 17 00:00:00 2001 From: Anatoly Pugachev Date: Mon, 28 May 2018 02:06:37 +0300 Subject: [PATCH 345/970] disable loading f2fs module on PAGE_SIZE > 4KB [ Upstream commit 4071e67cffcc5c2a007116a02437471351f550eb ] The following patch disables loading of f2fs module on architectures which have PAGE_SIZE > 4096 , since it is impossible to mount f2fs on such architectures , log messages are: mount: /mnt: wrong fs type, bad option, bad superblock on /dev/vdiskb1, missing codepage or helper program, or other error. /dev/vdiskb1: F2FS filesystem, UUID=1d8b9ca4-2389-4910-af3b-10998969f09c, volume name "" May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid page_cache_size (8192), supports only 4KB May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS filesystem in 1th superblock May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid page_cache_size (8192), supports only 4KB May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Can't find valid F2FS filesystem in 2th superblock May 15 18:03:13 ttip kernel: F2FS-fs (vdiskb1): Invalid page_cache_size (8192), supports only 4KB which was introduced by git commit 5c9b469295fb6b10d98923eab5e79c4edb80ed20 tested on git kernel 4.17.0-rc6-00309-gec30dcf7f425 with patch applied: modprobe: ERROR: could not insert 'f2fs': Invalid argument May 28 01:40:28 v215 kernel: F2FS not supported on PAGE_SIZE(8192) != 4096 Signed-off-by: Anatoly Pugachev Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/super.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index eb20b8767f3c..e627671f0183 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1980,6 +1980,12 @@ static int __init init_f2fs_fs(void) { int err; + if (PAGE_SIZE != F2FS_BLKSIZE) { + printk("F2FS not supported on PAGE_SIZE(%lu) != %d\n", + PAGE_SIZE, F2FS_BLKSIZE); + return -EINVAL; + } + f2fs_build_trace_ios(); err = init_inodecache(); From 4e6b7aad50edd0521ac69f5de9c8a4d03e40453a Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Mon, 28 May 2018 16:59:27 +0800 Subject: [PATCH 346/970] f2fs: fix error path of move_data_page [ Upstream commit 14a28559f43ac7c0b98dd1b0e73ec9ec8ab4fc45 ] This patch fixes error path of move_data_page: - clear cold data flag if it fails to write page. - redirty page for non-ENOMEM case. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/gc.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 17ab23f64bba..ad4dfd29d923 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -688,9 +688,14 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type) set_cold_data(page); err = do_write_data_page(&fio); - if (err == -ENOMEM && is_dirty) { - congestion_wait(BLK_RW_ASYNC, HZ/50); - goto retry; + if (err) { + clear_cold_data(page); + if (err == -ENOMEM) { + congestion_wait(BLK_RW_ASYNC, HZ/50); + goto retry; + } + if (is_dirty) + set_page_dirty(page); } clear_cold_data(page); From b7ea2b8616d950bbbebb1d28239d7ab9eee4fe1b Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Sat, 26 May 2018 18:03:34 +0800 Subject: [PATCH 347/970] f2fs: fix to don't trigger writeback during recovery [ Upstream commit 64c74a7ab505ea40d1b3e5d02735ecab08ae1b14 ] - f2fs_fill_super - recover_fsync_data - recover_data - del_fsync_inode - iput - iput_final - write_inode_now - f2fs_write_inode - f2fs_balance_fs - f2fs_balance_fs_bg - sync_dirty_inodes With data_flush mount option, during recovery, in order to avoid entering above writeback flow, let's detect recovery status and do skip in f2fs_balance_fs_bg. Signed-off-by: Chao Yu Signed-off-by: Yunlei He Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/segment.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e10f61684ea4..393039fdb26d 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -369,6 +369,9 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need) void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi) { + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) + return; + /* try to shrink extent cache when there is no enough memory */ if (!available_free_memory(sbi, EXTENT_CACHE)) f2fs_shrink_extent_tree(sbi, EXTENT_CACHE_SHRINK_NUMBER); From 570f12a8b651e45a0af48cced6c54c571e5bb328 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Mon, 23 Apr 2018 10:36:13 +0800 Subject: [PATCH 348/970] f2fs: fix to wait page writeback during revoking atomic write [ Upstream commit e5e5732d8120654159254c16834bc8663d8be124 ] After revoking atomic write, related LBA can be reused by others, so we need to wait page writeback before reusing the LBA, in order to avoid interference between old atomic written in-flight IO and new IO. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/segment.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 393039fdb26d..35d48ef0573c 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -207,6 +207,8 @@ static int __revoke_inmem_pages(struct inode *inode, lock_page(page); + f2fs_wait_on_page_writeback(page, DATA, true); + if (recover) { struct dnode_of_data dn; struct node_info ni; From bce7f720f4ca9d48ea4be05b49010c35e1c7f956 Mon Sep 17 00:00:00 2001 From: Sahitya Tummala Date: Fri, 18 May 2018 11:51:52 +0530 Subject: [PATCH 349/970] f2fs: Fix deadlock in shutdown ioctl [ Upstream commit 60b2b4ee2bc01dd052f99fa9d65da2232102ef8e ] f2fs_ioc_shutdown() ioctl gets stuck in the below path when issued with F2FS_GOING_DOWN_FULLSYNC option. __switch_to+0x90/0xc4 percpu_down_write+0x8c/0xc0 freeze_super+0xec/0x1e4 freeze_bdev+0xc4/0xcc f2fs_ioctl+0xc0c/0x1ce0 f2fs_compat_ioctl+0x98/0x1f0 Signed-off-by: Sahitya Tummala Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/file.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 801111e1f8ef..249f917a494b 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1670,9 +1670,11 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) if (get_user(in, (__u32 __user *)arg)) return -EFAULT; - ret = mnt_want_write_file(filp); - if (ret) - return ret; + if (in != F2FS_GOING_DOWN_FULLSYNC) { + ret = mnt_want_write_file(filp); + if (ret) + return ret; + } switch (in) { case F2FS_GOING_DOWN_FULLSYNC: @@ -1700,7 +1702,8 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) } f2fs_update_time(sbi, REQ_TIME); out: - mnt_drop_write_file(filp); + if (in != F2FS_GOING_DOWN_FULLSYNC) + mnt_drop_write_file(filp); return ret; } From 9e222d7ca5d535d524e92ff303b6f1e61f5ec4d5 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Tue, 17 Apr 2018 17:51:28 +0800 Subject: [PATCH 350/970] f2fs: fix race in between GC and atomic open [ Upstream commit 27319ba4044c0c67d62ae39e53c0118c89f0a029 ] Thread GC thread - f2fs_ioc_start_atomic_write - get_dirty_pages - filemap_write_and_wait_range - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - f2fs_is_atomic_file - set_page_dirty - set_inode_flag(, FI_ATOMIC_FILE) Dirty data page can still be generated by GC in race condition as above call stack. This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write to avoid such race. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 249f917a494b..7d0e8d6bf009 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1512,6 +1512,8 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) inode_lock(inode); + down_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + if (f2fs_is_atomic_file(inode)) goto out; @@ -1532,6 +1534,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) if (ret) clear_inode_flag(inode, FI_ATOMIC_FILE); out: + up_write(&F2FS_I(inode)->dio_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; From ce28cf5fb47f149e30176b7d6de161ebf1c6c6a1 Mon Sep 17 00:00:00 2001 From: "Shuah Khan (Samsung OSG)" Date: Tue, 29 May 2018 16:13:03 -0600 Subject: [PATCH 351/970] usbip: usbip_detach: Fix memory, udev context and udev leak [ Upstream commit d179f99a651685b19333360e6558110da2fe9bd7 ] detach_port() fails to call usbip_vhci_driver_close() from its error path after usbip_vhci_detach_device() returns failure, leaking memory allocated in usbip_vhci_driver_open() and holding udev_context and udev references. Fix it to call usbip_vhci_driver_close(). Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/usb/usbip/src/usbip_detach.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/tools/usb/usbip/src/usbip_detach.c b/tools/usb/usbip/src/usbip_detach.c index 9db9d21bb2ec..6a8db858caa5 100644 --- a/tools/usb/usbip/src/usbip_detach.c +++ b/tools/usb/usbip/src/usbip_detach.c @@ -43,7 +43,7 @@ void usbip_detach_usage(void) static int detach_port(char *port) { - int ret; + int ret = 0; uint8_t portnum; char path[PATH_MAX+1]; @@ -73,9 +73,12 @@ static int detach_port(char *port) } ret = usbip_vhci_detach_device(portnum); - if (ret < 0) - return -1; + if (ret < 0) { + ret = -1; + goto call_driver_close; + } +call_driver_close: usbip_vhci_driver_close(); return ret; From 47fc151cbdbe7671e9124d18245e890374509da2 Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Thu, 3 May 2018 11:25:08 -0700 Subject: [PATCH 352/970] perf/x86/intel/uncore: Correct fixed counter index check in generic code [ Upstream commit 4749f8196452eeb73cf2086a6a9705bae479d33d ] There is no index which is bigger than UNCORE_PMC_IDX_FIXED. The only exception is client IMC uncore, which has been specially handled. For generic code, it is not correct to use >= to check fixed counter. The code quality issue will bring problem when a new counter index is introduced. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-3-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/intel/uncore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index aec6cc925af8..4f365267b12f 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -212,7 +212,7 @@ void uncore_perf_event_update(struct intel_uncore_box *box, struct perf_event *e u64 prev_count, new_count, delta; int shift; - if (event->hw.idx >= UNCORE_PMC_IDX_FIXED) + if (event->hw.idx == UNCORE_PMC_IDX_FIXED) shift = 64 - uncore_fixed_ctr_bits(box); else shift = 64 - uncore_perf_ctr_bits(box); From 9f4dd60356e76ca98fab1711bf5dc8d0da500c23 Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Thu, 3 May 2018 11:25:07 -0700 Subject: [PATCH 353/970] perf/x86/intel/uncore: Correct fixed counter index check for NHM [ Upstream commit d71f11c076c420c4e2fceb4faefa144e055e0935 ] For Nehalem and Westmere, there is only one fixed counter for W-Box. There is no index which is bigger than UNCORE_PMC_IDX_FIXED. It is not correct to use >= to check fixed counter. The code quality issue will bring problem when new counter index is introduced. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-2-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/intel/uncore_nhmex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/events/intel/uncore_nhmex.c b/arch/x86/events/intel/uncore_nhmex.c index cda569332005..83e2188adac4 100644 --- a/arch/x86/events/intel/uncore_nhmex.c +++ b/arch/x86/events/intel/uncore_nhmex.c @@ -245,7 +245,7 @@ static void nhmex_uncore_msr_enable_event(struct intel_uncore_box *box, struct p { struct hw_perf_event *hwc = &event->hw; - if (hwc->idx >= UNCORE_PMC_IDX_FIXED) + if (hwc->idx == UNCORE_PMC_IDX_FIXED) wrmsrl(hwc->config_base, NHMEX_PMON_CTL_EN_BIT0); else if (box->pmu->type->event_mask & NHMEX_PMON_CTL_EN_BIT0) wrmsrl(hwc->config_base, hwc->config | NHMEX_PMON_CTL_EN_BIT22); From d4fd1bf83f447063fa4bfaf37c4b0142e57a7e77 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20D=C3=ADaz?= Date: Tue, 10 Apr 2018 17:11:15 -0500 Subject: [PATCH 354/970] selftests/intel_pstate: Improve test, minor fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit e9d33f149f52981fd856a0b16aa8ebda89b02e34 ] A few changes improve the overall usability of the test: * fix a hard-coded maximum frequency (3300), * don't adjust the CPU frequency if only evaluating results, * fix a comparison for multiple frequencies. A symptom of that last issue looked like this: ./run.sh: line 107: [: too many arguments ./run.sh: line 110: 3099 3099 3100-3100: syntax error in expression (error token is \"3099 3100-3100\") Because a check will count how many differente frequencies there are among the CPUs of the system, and after they are tallied another read is performed, which might produce different results. Signed-off-by: Daniel Díaz Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/intel_pstate/run.sh | 24 +++++++++------------ 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/tools/testing/selftests/intel_pstate/run.sh b/tools/testing/selftests/intel_pstate/run.sh index 7868c106b8b1..b62876f41eca 100755 --- a/tools/testing/selftests/intel_pstate/run.sh +++ b/tools/testing/selftests/intel_pstate/run.sh @@ -48,11 +48,12 @@ function run_test () { echo "sleeping for 5 seconds" sleep 5 - num_freqs=$(cat /proc/cpuinfo | grep MHz | sort -u | wc -l) - if [ $num_freqs -le 2 ]; then - cat /proc/cpuinfo | grep MHz | sort -u | tail -1 > /tmp/result.$1 + grep MHz /proc/cpuinfo | sort -u > /tmp/result.freqs + num_freqs=$(wc -l /tmp/result.freqs | awk ' { print $1 } ') + if [ $num_freqs -ge 2 ]; then + tail -n 1 /tmp/result.freqs > /tmp/result.$1 else - cat /proc/cpuinfo | grep MHz | sort -u > /tmp/result.$1 + cp /tmp/result.freqs /tmp/result.$1 fi ./msr 0 >> /tmp/result.$1 @@ -82,21 +83,20 @@ _max_freq=$(cpupower frequency-info -l | tail -1 | awk ' { print $2 } ') max_freq=$(($_max_freq / 1000)) -for freq in `seq $max_freq -100 $min_freq` +[ $EVALUATE_ONLY -eq 0 ] && for freq in `seq $max_freq -100 $min_freq` do echo "Setting maximum frequency to $freq" cpupower frequency-set -g powersave --max=${freq}MHz >& /dev/null - [ $EVALUATE_ONLY -eq 0 ] && run_test $freq + run_test $freq done -echo "==============================================================================" +[ $EVALUATE_ONLY -eq 0 ] && cpupower frequency-set -g powersave --max=${max_freq}MHz >& /dev/null +echo "==============================================================================" echo "The marketing frequency of the cpu is $mkt_freq MHz" echo "The maximum frequency of the cpu is $max_freq MHz" echo "The minimum frequency of the cpu is $min_freq MHz" -cpupower frequency-set -g powersave --max=${max_freq}MHz >& /dev/null - # make a pretty table echo "Target Actual Difference MSR(0x199) max_perf_pct" for freq in `seq $max_freq -100 $min_freq` @@ -104,10 +104,6 @@ do result_freq=$(cat /tmp/result.${freq} | grep "cpu MHz" | awk ' { print $4 } ' | awk -F "." ' { print $1 } ') msr=$(cat /tmp/result.${freq} | grep "msr" | awk ' { print $3 } ') max_perf_pct=$(cat /tmp/result.${freq} | grep "max_perf_pct" | awk ' { print $2 } ' ) - if [ $result_freq -eq $freq ]; then - echo " $freq $result_freq 0 $msr $(($max_perf_pct*3300))" - else - echo " $freq $result_freq $(($result_freq-$freq)) $msr $(($max_perf_pct*$max_freq))" - fi + echo " $freq $result_freq $(($result_freq-$freq)) $msr $(($max_perf_pct*$max_freq))" done exit 0 From 2e1bfab64c379d40dc6960aa0b9776fe01d8f952 Mon Sep 17 00:00:00 2001 From: Shaul Triebitz Date: Thu, 22 Mar 2018 14:14:45 +0200 Subject: [PATCH 355/970] iwlwifi: pcie: fix race in Rx buffer allocator [ Upstream commit 0f22e40053bd5378ad1e3250e65c574fd61c0cd6 ] Make sure the rx_allocator worker is canceled before running the rx_init routine. rx_init frees and re-allocates all rxb's pages. The rx_allocator worker also allocates pages for the used rxb's. Running rx_init and rx_allocator simultaniously causes a kernel panic. Fix that by canceling the work in rx_init. Signed-off-by: Shaul Triebitz Signed-off-by: Luca Coelho Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c index 6fe5546dc773..996a928142ad 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c @@ -898,6 +898,8 @@ int iwl_pcie_rx_init(struct iwl_trans *trans) WQ_HIGHPRI | WQ_UNBOUND, 1); INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work); + cancel_work_sync(&rba->rx_alloc); + spin_lock(&rba->lock); atomic_set(&rba->req_pending, 0); atomic_set(&rba->req_ready, 0); From 922c66852976fc1bc273fee29fcc0be98cc0fa24 Mon Sep 17 00:00:00 2001 From: Thierry Escande Date: Tue, 29 May 2018 18:37:16 +0200 Subject: [PATCH 356/970] Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning [ Upstream commit 9960521c44a5d828f29636ceac0600603ecbddbf ] This patch fixes the following warning during boot: do not call blocking ops when !TASK_RUNNING; state=1 set at [<(ptrval)>] qca_setup+0x194/0x750 [hci_uart] WARNING: CPU: 2 PID: 1878 at kernel/sched/core.c:6135 __might_sleep+0x7c/0x88 In qca_set_baudrate(), the current task state is set to TASK_UNINTERRUPTIBLE before going to sleep for 300ms. It was then restored to TASK_INTERRUPTIBLE. This patch sets the current task state back to TASK_RUNNING instead. Signed-off-by: Thierry Escande Signed-off-by: Marcel Holtmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/hci_qca.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 3a8b9aef96a6..0986c324459f 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -884,7 +884,7 @@ static int qca_set_baudrate(struct hci_dev *hdev, uint8_t baudrate) */ set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(BAUDRATE_SETTLE_TIMEOUT_MS)); - set_current_state(TASK_INTERRUPTIBLE); + set_current_state(TASK_RUNNING); return 0; } From c70cc9407571ee54795cba7abf1b924cf117659d Mon Sep 17 00:00:00 2001 From: Jian-Hong Pan Date: Mon, 21 May 2018 18:09:20 +0800 Subject: [PATCH 357/970] Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011 [ Upstream commit 66d9975c5a7c40aa7e4bb0ec0b0c37ba1f190923 ] Without this patch we cannot turn on the Bluethooth adapter on ASUS E406MA. T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=2ff8 ProdID=b011 Rev= 2.00 S: Manufacturer=Realtek S: Product=802.11n WLAN Adapter S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms Signed-off-by: Jian-Hong Pan Signed-off-by: Marcel Holtmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/btusb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index bff67c5a5fe7..44bccb1afa06 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -348,6 +348,9 @@ static const struct usb_device_id blacklist_table[] = { /* Additional Realtek 8723BU Bluetooth devices */ { USB_DEVICE(0x7392, 0xa611), .driver_info = BTUSB_REALTEK }, + /* Additional Realtek 8723DE Bluetooth devices */ + { USB_DEVICE(0x2ff8, 0xb011), .driver_info = BTUSB_REALTEK }, + /* Additional Realtek 8821AE Bluetooth devices */ { USB_DEVICE(0x0b05, 0x17dc), .driver_info = BTUSB_REALTEK }, { USB_DEVICE(0x13d3, 0x3414), .driver_info = BTUSB_REALTEK }, From 32b7d638a05e721b526658c7a33292b2615a03b4 Mon Sep 17 00:00:00 2001 From: Kai Chieh Chuang Date: Mon, 28 May 2018 10:18:18 +0800 Subject: [PATCH 358/970] ASoC: dpcm: fix BE dai not hw_free and shutdown [ Upstream commit 9c0ac70ad24d76b873c1551e27790c7f6a815d5c ] In case, one BE is used by two FE1/FE2 FE1--->BE--> | FE2----] when FE1/FE2 call dpcm_be_dai_hw_free() together the BE users will be 2 (> 1), hence cannot be hw_free the be state will leave at, ex. SND_SOC_DPCM_STATE_STOP later FE1/FE2 call dpcm_be_dai_shutdown(), will be skip due to wrong state. leaving the BE not being hw_free and shutdown. The BE dai will be hw_free later when calling dpcm_be_dai_shutdown() if still in invalid state. Signed-off-by: KaiChieh Chuang Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/soc/soc-pcm.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 80088c98ce27..20680a490897 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1793,8 +1793,10 @@ int dpcm_be_dai_shutdown(struct snd_soc_pcm_runtime *fe, int stream) continue; if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_HW_FREE) && - (be->dpcm[stream].state != SND_SOC_DPCM_STATE_OPEN)) - continue; + (be->dpcm[stream].state != SND_SOC_DPCM_STATE_OPEN)) { + soc_pcm_hw_free(be_substream); + be->dpcm[stream].state = SND_SOC_DPCM_STATE_HW_FREE; + } dev_dbg(be->dev, "ASoC: close BE %s\n", be->dai_link->name); From 5e0b8c1732653a520a19a2d9e20a525bb4ebcc09 Mon Sep 17 00:00:00 2001 From: Vincent Palatin Date: Wed, 18 Apr 2018 12:23:58 +0200 Subject: [PATCH 359/970] mfd: cros_ec: Fail early if we cannot identify the EC [ Upstream commit 0dbbf25561b29ffab5ba6277429760abdf49ceff ] If we cannot communicate with the EC chip to detect the protocol version and its features, it's very likely useless to continue. Else we will commit all kind of uninformed mistakes (using the wrong protocol, the wrong buffer size, mixing the EC with other chips). Signed-off-by: Vincent Palatin Acked-by: Benson Leung Signed-off-by: Enric Balletbo i Serra Reviewed-by: Gwendal Grignou Reviewed-by: Andy Shevchenko Signed-off-by: Lee Jones Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/cros_ec.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c index abd83424b498..3e18d2595b6d 100644 --- a/drivers/mfd/cros_ec.c +++ b/drivers/mfd/cros_ec.c @@ -86,7 +86,11 @@ int cros_ec_register(struct cros_ec_device *ec_dev) mutex_init(&ec_dev->lock); - cros_ec_query_all(ec_dev); + err = cros_ec_query_all(ec_dev); + if (err) { + dev_err(dev, "Cannot identify the EC: error %d\n", err); + return err; + } if (ec_dev->irq) { err = request_threaded_irq(ec_dev->irq, NULL, ec_irq_thread, From a7a336ed3d39ecdbe6e202e87e27220afca682e5 Mon Sep 17 00:00:00 2001 From: Ganapathi Bhat Date: Thu, 24 May 2018 19:18:27 +0530 Subject: [PATCH 360/970] mwifiex: handle race during mwifiex_usb_disconnect [ Upstream commit b817047ae70c0bd67b677b65d0d69d72cd6e9728 ] Race condition is observed during rmmod of mwifiex_usb: 1. The rmmod thread will call mwifiex_usb_disconnect(), download SHUTDOWN command and do wait_event_interruptible_timeout(), waiting for response. 2. The main thread will handle the response and will do a wake_up_interruptible(), unblocking rmmod thread. 3. On getting unblocked, rmmod thread will make rx_cmd.urb = NULL in mwifiex_usb_free(). 4. The main thread will try to resubmit rx_cmd.urb in mwifiex_usb_submit_rx_urb(), which is NULL. To fix, wait for main thread to complete before calling mwifiex_usb_free(). Signed-off-by: Ganapathi Bhat Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/marvell/mwifiex/usb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c b/drivers/net/wireless/marvell/mwifiex/usb.c index 73eb0846db21..09185a1f7379 100644 --- a/drivers/net/wireless/marvell/mwifiex/usb.c +++ b/drivers/net/wireless/marvell/mwifiex/usb.c @@ -624,6 +624,9 @@ static void mwifiex_usb_disconnect(struct usb_interface *intf) MWIFIEX_FUNC_SHUTDOWN); } + if (adapter->workqueue) + flush_workqueue(adapter->workqueue); + mwifiex_usb_free(card); mwifiex_dbg(adapter, FATAL, From a783c6d7a9d7e07bd637dcf892ad70c1753e3185 Mon Sep 17 00:00:00 2001 From: Eyal Reizer Date: Mon, 28 May 2018 11:36:42 +0300 Subject: [PATCH 361/970] wlcore: sdio: check for valid platform device data before suspend [ Upstream commit 6e91d48371e79862ea2c05867aaebe4afe55a865 ] the wl pointer can be null In case only wlcore_sdio is probed while no WiLink module is successfully probed, as in the case of mounting a wl12xx module while using a device tree file configured with wl18xx related settings. In this case the system was crashing in wl1271_suspend() as platform device data is not set. Make sure wl the pointer is valid before using it. Signed-off-by: Eyal Reizer Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ti/wlcore/sdio.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/wireless/ti/wlcore/sdio.c b/drivers/net/wireless/ti/wlcore/sdio.c index 47fe7f96a242..6921cb0bf563 100644 --- a/drivers/net/wireless/ti/wlcore/sdio.c +++ b/drivers/net/wireless/ti/wlcore/sdio.c @@ -404,6 +404,11 @@ static int wl1271_suspend(struct device *dev) mmc_pm_flag_t sdio_flags; int ret = 0; + if (!wl) { + dev_err(dev, "no wilink module was probed\n"); + goto out; + } + dev_dbg(dev, "wl1271 suspend. wow_enabled: %d\n", wl->wow_enabled); From 739feeba55a34cece1fa5ac5faf3c40eb4db6c18 Mon Sep 17 00:00:00 2001 From: Ezequiel Garcia Date: Fri, 18 May 2018 17:07:48 -0400 Subject: [PATCH 362/970] media: tw686x: Fix incorrect vb2_mem_ops GFP flags [ Upstream commit 636757ab6c93e19e2f58d3b3af1312e34eaffbab ] When the driver is configured in the "memcpy" dma-mode, it uses vb2_vmalloc_memops, which is backed by a SLAB allocator and so shouldn't be using GFP_DMA32. Fix it. Signed-off-by: Ezequiel Garcia Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/pci/tw686x/tw686x-video.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/media/pci/tw686x/tw686x-video.c b/drivers/media/pci/tw686x/tw686x-video.c index c3fafa97b2d0..0ea8dd44026c 100644 --- a/drivers/media/pci/tw686x/tw686x-video.c +++ b/drivers/media/pci/tw686x/tw686x-video.c @@ -1228,7 +1228,8 @@ int tw686x_video_init(struct tw686x_dev *dev) vc->vidq.timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC; vc->vidq.min_buffers_needed = 2; vc->vidq.lock = &vc->vb_mutex; - vc->vidq.gfp_flags = GFP_DMA32; + vc->vidq.gfp_flags = dev->dma_mode != TW686X_DMA_MODE_MEMCPY ? + GFP_DMA32 : 0; vc->vidq.dev = &dev->pci_dev->dev; err = vb2_queue_init(&vc->vidq); From 9ac47200b51cb09d2f15dbefa67e0412741d98aa Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Mon, 21 May 2018 08:43:02 -0400 Subject: [PATCH 363/970] media: videobuf2-core: don't call memop 'finish' when queueing [ Upstream commit 90b2da89a083e1395cb322521a42397c49ae4500 ] When a buffer is queued or requeued in vb2_buffer_done, then don't call the finish memop. In this case the buffer is only returned to vb2, not to userspace. Calling 'finish' here will cause an unbalance when the queue is canceled, since the core will call the same memop again. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/v4l2-core/videobuf2-core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c index 4299ce06c25b..b3a9fa75e8e7 100644 --- a/drivers/media/v4l2-core/videobuf2-core.c +++ b/drivers/media/v4l2-core/videobuf2-core.c @@ -914,9 +914,12 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) dprintk(4, "done processing on buffer %d, state: %d\n", vb->index, state); - /* sync buffers */ - for (plane = 0; plane < vb->num_planes; ++plane) - call_void_memop(vb, finish, vb->planes[plane].mem_priv); + if (state != VB2_BUF_STATE_QUEUED && + state != VB2_BUF_STATE_REQUEUEING) { + /* sync buffers */ + for (plane = 0; plane < vb->num_planes; ++plane) + call_void_memop(vb, finish, vb->planes[plane].mem_priv); + } spin_lock_irqsave(&q->done_lock, flags); if (state == VB2_BUF_STATE_QUEUED || From 65cb469d02313175840b2038f159264b6b843ab2 Mon Sep 17 00:00:00 2001 From: David Sterba Date: Tue, 24 Apr 2018 14:53:56 +0200 Subject: [PATCH 364/970] btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups [ Upstream commit 3d3a2e610ea5e7c6d4f9481ecce5d8e2d8317843 ] Currently the code assumes that there's an implied barrier by the sequence of code preceding the wakeup, namely the mutex unlock. As Nikolay pointed out: I think this is wrong (not your code) but the original assumption that the RELEASE semantics provided by mutex_unlock is sufficient. According to memory-barriers.txt: Section 'LOCK ACQUISITION FUNCTIONS' states: (2) RELEASE operation implication: Memory operations issued before the RELEASE will be completed before the RELEASE operation has completed. Memory operations issued after the RELEASE *may* be completed before the RELEASE operation has completed. (I've bolded the may portion) The example given there: As an example, consider the following: *A = a; *B = b; ACQUIRE *C = c; *D = d; RELEASE *E = e; *F = f; The following sequence of events is acceptable: ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE So if we assume that *C is modifying the flag which the waitqueue is checking, and *E is the actual wakeup, then those accesses can be re-ordered... IMHO this code should be considered broken... Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tree-log.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 44d34923de9c..44966fd00790 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2979,8 +2979,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, mutex_unlock(&log_root_tree->log_mutex); /* - * The barrier before waitqueue_active is implied by mutex_unlock + * The barrier before waitqueue_active is needed so all the updates + * above are seen by the woken threads. It might not be necessary, but + * proving that seems to be hard. */ + smp_mb(); if (waitqueue_active(&log_root_tree->log_commit_wait[index2])) wake_up(&log_root_tree->log_commit_wait[index2]); out: @@ -2991,8 +2994,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, mutex_unlock(&root->log_mutex); /* - * The barrier before waitqueue_active is implied by mutex_unlock + * The barrier before waitqueue_active is needed so all the updates + * above are seen by the woken threads. It might not be necessary, but + * proving that seems to be hard. */ + smp_mb(); if (waitqueue_active(&root->log_commit_wait[index1])) wake_up(&root->log_commit_wait[index1]); return ret; From 7e51effb7a5bc011fe93c7f0c0c67a43f290c5e0 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Mon, 14 May 2018 09:38:13 +0800 Subject: [PATCH 365/970] btrfs: qgroup: Finish rescan when hit the last leaf of extent tree [ Upstream commit ff3d27a048d926b3920ccdb75d98788c567cae0d ] Under the following case, qgroup rescan can double account cowed tree blocks: In this case, extent tree only has one tree block. - | transid=5 last committed=4 | btrfs_qgroup_rescan_worker() | |- btrfs_start_transaction() | | transid = 5 | |- qgroup_rescan_leaf() | |- btrfs_search_slot_for_read() on extent tree | Get the only extent tree block from commit root (transid = 4). | Scan it, set qgroup_rescan_progress to the last | EXTENT/META_ITEM + 1 | now qgroup_rescan_progress = A + 1. | | fs tree get CoWed, new tree block is at A + 16K | transid 5 get committed - | transid=6 last committed=5 | btrfs_qgroup_rescan_worker() | btrfs_qgroup_rescan_worker() | |- btrfs_start_transaction() | | transid = 5 | |- qgroup_rescan_leaf() | |- btrfs_search_slot_for_read() on extent tree | Get the only extent tree block from commit root (transid = 5). | scan it using qgroup_rescan_progress (A + 1). | found new tree block beyong A, and it's fs tree block, | account it to increase qgroup numbers. - In above case, tree block A, and tree block A + 16K get accounted twice, while qgroup rescan should stop when it already reach the last leaf, other than continue using its qgroup_rescan_progress. Such case could happen by just looping btrfs/017 and with some possibility it can hit such double qgroup accounting problem. Fix it by checking the path to determine if we should finish qgroup rescan, other than relying on next loop to exit. Reported-by: Nikolay Borisov Signed-off-by: Qu Wenruo Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/qgroup.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index dfd99867ff4d..9afad8c14220 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2236,6 +2236,21 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) BUG(); } +/* + * Check if the leaf is the last leaf. Which means all node pointers + * are at their last position. + */ +static bool is_last_leaf(struct btrfs_path *path) +{ + int i; + + for (i = 1; i < BTRFS_MAX_LEVEL && path->nodes[i]; i++) { + if (path->slots[i] != btrfs_header_nritems(path->nodes[i]) - 1) + return false; + } + return true; +} + /* * returns < 0 on error, 0 when more leafs are to be scanned. * returns 1 when done. @@ -2249,6 +2264,7 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, struct ulist *roots = NULL; struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem); u64 num_bytes; + bool done; int slot; int ret; @@ -2277,6 +2293,7 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, mutex_unlock(&fs_info->qgroup_rescan_lock); return ret; } + done = is_last_leaf(path); btrfs_item_key_to_cpu(path->nodes[0], &found, btrfs_header_nritems(path->nodes[0]) - 1); @@ -2323,6 +2340,8 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, } btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem); + if (done && !ret) + ret = 1; return ret; } From db16571fb75ba384a21f9e4c370893216224e74c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 18 May 2018 18:56:24 +0200 Subject: [PATCH 366/970] PCI: Prevent sysfs disable of device while driver is attached [ Upstream commit 6f5cdfa802733dcb561bf664cc89d203f2fd958f ] Manipulating the enable_cnt behind the back of the driver will wreak complete havoc with the kernel state, so disallow it. Signed-off-by: Christoph Hellwig Signed-off-by: Bjorn Helgaas Reviewed-by: Johannes Thumshirn Acked-by: Keith Busch Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pci-sysfs.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index f9f4d1c18eb2..e5d8e2e2bd30 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -180,13 +180,16 @@ static ssize_t enable_store(struct device *dev, struct device_attribute *attr, if (!capable(CAP_SYS_ADMIN)) return -EPERM; - if (!val) { - if (pci_is_enabled(pdev)) - pci_disable_device(pdev); - else - result = -EIO; - } else + device_lock(dev); + if (dev->driver) + result = -EBUSY; + else if (val) result = pci_enable_device(pdev); + else if (pci_is_enabled(pdev)) + pci_disable_device(pdev); + else + result = -EIO; + device_unlock(dev); return result < 0 ? result : count; } From 1d4de3ff8731ec267b7a6401a3bc190c3f7bb82d Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:11:30 +0300 Subject: [PATCH 367/970] ath: Add regulatory mapping for FCC3_ETSIC [ Upstream commit 01fb2994a98dc72c8818c274f7b5983d5dd885c7 ] The regdomain code is used to select the correct the correct conformance test limits (CTL) for a country. If the regdomain code isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this regdomain code are: * 2.4GHz: ETSI * 5GHz: FCC Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd_common.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index bdd2b4d61f2f..7d955fa8c24c 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -35,6 +35,7 @@ enum EnumRd { FRANCE_RES = 0x31, FCC3_FCCA = 0x3A, FCC3_WORLD = 0x3B, + FCC3_ETSIC = 0x3F, ETSI1_WORLD = 0x37, ETSI3_ETSIA = 0x32, @@ -168,6 +169,7 @@ static struct reg_dmn_pair_mapping regDomainPairs[] = { {FCC2_ETSIC, CTL_FCC, CTL_ETSI}, {FCC3_FCCA, CTL_FCC, CTL_FCC}, {FCC3_WORLD, CTL_FCC, CTL_ETSI}, + {FCC3_ETSIC, CTL_FCC, CTL_ETSI}, {FCC4_FCCA, CTL_FCC, CTL_FCC}, {FCC5_FCCA, CTL_FCC, CTL_FCC}, {FCC6_FCCA, CTL_FCC, CTL_FCC}, From e6cd75968d52b19c67d1410ce301f53ed258ae38 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:11:18 +0300 Subject: [PATCH 368/970] ath: Add regulatory mapping for ETSI8_WORLD [ Upstream commit 45faf6e096da8bb80e1ddf8c08a26a9601d9469e ] The regdomain code is used to select the correct the correct conformance test limits (CTL) for a country. If the regdomain code isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this regdomain code are: * 2.4GHz: ETSI * 5GHz: ETSI Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd_common.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index 7d955fa8c24c..7c0fcbbf1900 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -45,6 +45,7 @@ enum EnumRd { ETSI4_ETSIC = 0x38, ETSI5_WORLD = 0x39, ETSI6_WORLD = 0x34, + ETSI8_WORLD = 0x3D, ETSI_RESERVED = 0x33, MKK1_MKKA = 0x40, @@ -181,6 +182,7 @@ static struct reg_dmn_pair_mapping regDomainPairs[] = { {ETSI4_WORLD, CTL_ETSI, CTL_ETSI}, {ETSI5_WORLD, CTL_ETSI, CTL_ETSI}, {ETSI6_WORLD, CTL_ETSI, CTL_ETSI}, + {ETSI8_WORLD, CTL_ETSI, CTL_ETSI}, /* XXX: For ETSI3_ETSIA, Was NO_CTL meant for the 2 GHz band ? */ {ETSI3_ETSIA, CTL_ETSI, CTL_ETSI}, From 31e1b250c0d828a2b189b48f63425778116676b7 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:11:14 +0300 Subject: [PATCH 369/970] ath: Add regulatory mapping for APL13_WORLD [ Upstream commit 9ba8df0c52b3e6baa436374b429d3d73bd09a320 ] The regdomain code is used to select the correct the correct conformance test limits (CTL) for a country. If the regdomain code isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this regdomain code are: * 2.4GHz: ETSI * 5GHz: ETSI Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd_common.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index 7c0fcbbf1900..2c873840a46a 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -69,6 +69,7 @@ enum EnumRd { APL1_ETSIC = 0x55, APL2_ETSIC = 0x56, APL5_WORLD = 0x58, + APL13_WORLD = 0x5A, APL6_WORLD = 0x5B, APL7_FCCA = 0x5C, APL8_WORLD = 0x5D, @@ -195,6 +196,7 @@ static struct reg_dmn_pair_mapping regDomainPairs[] = { {APL3_WORLD, CTL_FCC, CTL_ETSI}, {APL4_WORLD, CTL_FCC, CTL_ETSI}, {APL5_WORLD, CTL_FCC, CTL_ETSI}, + {APL13_WORLD, CTL_ETSI, CTL_ETSI}, {APL6_WORLD, CTL_ETSI, CTL_ETSI}, {APL8_WORLD, CTL_ETSI, CTL_ETSI}, {APL9_WORLD, CTL_ETSI, CTL_ETSI}, From 3cfd18697dc47c50d0ea2ac0ae4b5a1dcb615b43 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:11:05 +0300 Subject: [PATCH 370/970] ath: Add regulatory mapping for APL2_FCCA [ Upstream commit 4f183687e3fad3ce0e06e38976cad81bc4541990 ] The regdomain code is used to select the correct the correct conformance test limits (CTL) for a country. If the regdomain code isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this regdomain code are: * 2.4GHz: FCC * 5GHz: FCC Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd_common.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index 2c873840a46a..d8a7db4976f0 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -61,6 +61,7 @@ enum EnumRd { MKK1_MKKA1 = 0x4A, MKK1_MKKA2 = 0x4B, MKK1_MKKC = 0x4C, + APL2_FCCA = 0x4D, APL3_FCCA = 0x50, APL1_WORLD = 0x52, @@ -193,6 +194,7 @@ static struct reg_dmn_pair_mapping regDomainPairs[] = { {FCC1_FCCA, CTL_FCC, CTL_FCC}, {APL1_WORLD, CTL_FCC, CTL_ETSI}, {APL2_WORLD, CTL_FCC, CTL_ETSI}, + {APL2_FCCA, CTL_FCC, CTL_FCC}, {APL3_WORLD, CTL_FCC, CTL_ETSI}, {APL4_WORLD, CTL_FCC, CTL_ETSI}, {APL5_WORLD, CTL_FCC, CTL_ETSI}, From 410639a85914c0c6f66d38edf851c556f185ebbb Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:10:54 +0300 Subject: [PATCH 371/970] ath: Add regulatory mapping for Uganda [ Upstream commit 1ea3986ad2bc72081c69f3fbc1e5e0eeb3c44f17 ] The country code is used by the ath to detect the ISO 3166-1 alpha-2 name and to select the correct conformance test limits (CTL) for a country. If the country isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this country are: * 2.4GHz: ETSI * 5GHz: FCC Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd.h | 1 + drivers/net/wireless/ath/regd_common.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd.h b/drivers/net/wireless/ath/regd.h index 565d3075f06e..218802dff209 100644 --- a/drivers/net/wireless/ath/regd.h +++ b/drivers/net/wireless/ath/regd.h @@ -175,6 +175,7 @@ enum CountryCode { CTRY_TUNISIA = 788, CTRY_TURKEY = 792, CTRY_UAE = 784, + CTRY_UGANDA = 800, CTRY_UKRAINE = 804, CTRY_UNITED_KINGDOM = 826, CTRY_UNITED_STATES = 840, diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index d8a7db4976f0..cba1020bc854 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -467,6 +467,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_TRINIDAD_Y_TOBAGO, FCC3_WORLD, "TT"}, {CTRY_TUNISIA, ETSI3_WORLD, "TN"}, {CTRY_TURKEY, ETSI3_WORLD, "TR"}, + {CTRY_UGANDA, FCC3_WORLD, "UG"}, {CTRY_UKRAINE, NULL1_WORLD, "UA"}, {CTRY_UAE, NULL1_WORLD, "AE"}, {CTRY_UNITED_KINGDOM, ETSI1_WORLD, "GB"}, From 9d04d93f4b85a13fd61fd520c706d4a4162866e3 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:10:48 +0300 Subject: [PATCH 372/970] ath: Add regulatory mapping for Tanzania [ Upstream commit 667ddac5745fb9fddfe8f7fd2523070f50bd4442 ] The country code is used by the ath to detect the ISO 3166-1 alpha-2 name and to select the correct conformance test limits (CTL) for a country. If the country isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this country are: * 2.4GHz: ETSI * 5GHz: FCC Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd.h | 1 + drivers/net/wireless/ath/regd_common.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd.h b/drivers/net/wireless/ath/regd.h index 218802dff209..9044d047f151 100644 --- a/drivers/net/wireless/ath/regd.h +++ b/drivers/net/wireless/ath/regd.h @@ -170,6 +170,7 @@ enum CountryCode { CTRY_SWITZERLAND = 756, CTRY_SYRIA = 760, CTRY_TAIWAN = 158, + CTRY_TANZANIA = 834, CTRY_THAILAND = 764, CTRY_TRINIDAD_Y_TOBAGO = 780, CTRY_TUNISIA = 788, diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index cba1020bc854..b85dc86cc188 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -463,6 +463,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_SWITZERLAND, ETSI1_WORLD, "CH"}, {CTRY_SYRIA, NULL1_WORLD, "SY"}, {CTRY_TAIWAN, APL3_FCCA, "TW"}, + {CTRY_TANZANIA, APL1_WORLD, "TZ"}, {CTRY_THAILAND, FCC3_WORLD, "TH"}, {CTRY_TRINIDAD_Y_TOBAGO, FCC3_WORLD, "TT"}, {CTRY_TUNISIA, ETSI3_WORLD, "TN"}, From 0d50a24c54ba32b54e75f60ceefddee58d671057 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:10:43 +0300 Subject: [PATCH 373/970] ath: Add regulatory mapping for Serbia [ Upstream commit 2a3169a54bb53717928392a04fb84deb765b51f1 ] The country code is used by the ath to detect the ISO 3166-1 alpha-2 name and to select the correct conformance test limits (CTL) for a country. If the country isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this country are: * 2.4GHz: ETSI * 5GHz: ETSI Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd.h | 1 + drivers/net/wireless/ath/regd_common.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd.h b/drivers/net/wireless/ath/regd.h index 9044d047f151..e63e864d6a93 100644 --- a/drivers/net/wireless/ath/regd.h +++ b/drivers/net/wireless/ath/regd.h @@ -159,6 +159,7 @@ enum CountryCode { CTRY_ROMANIA = 642, CTRY_RUSSIA = 643, CTRY_SAUDI_ARABIA = 682, + CTRY_SERBIA = 688, CTRY_SERBIA_MONTENEGRO = 891, CTRY_SINGAPORE = 702, CTRY_SLOVAKIA = 703, diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index b85dc86cc188..1ced5a323cf8 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -452,6 +452,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_ROMANIA, NULL1_WORLD, "RO"}, {CTRY_RUSSIA, NULL1_WORLD, "RU"}, {CTRY_SAUDI_ARABIA, NULL1_WORLD, "SA"}, + {CTRY_SERBIA, ETSI1_WORLD, "RS"}, {CTRY_SERBIA_MONTENEGRO, ETSI1_WORLD, "CS"}, {CTRY_SINGAPORE, APL6_WORLD, "SG"}, {CTRY_SLOVAKIA, ETSI1_WORLD, "SK"}, From c7cc26414a3e8e22226ba33b82be904f802260a2 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:09:59 +0300 Subject: [PATCH 374/970] ath: Add regulatory mapping for Bermuda [ Upstream commit 9c790f2d234f65697e3b0948adbfdf36dbe63dd7 ] The country code is used by the ath to detect the ISO 3166-1 alpha-2 name and to select the correct conformance test limits (CTL) for a country. If the country isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this country are: * 2.4GHz: FCC * 5GHz: FCC Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd.h | 1 + drivers/net/wireless/ath/regd_common.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd.h b/drivers/net/wireless/ath/regd.h index e63e864d6a93..716da0e65d63 100644 --- a/drivers/net/wireless/ath/regd.h +++ b/drivers/net/wireless/ath/regd.h @@ -74,6 +74,7 @@ enum CountryCode { CTRY_BELARUS = 112, CTRY_BELGIUM = 56, CTRY_BELIZE = 84, + CTRY_BERMUDA = 60, CTRY_BOLIVIA = 68, CTRY_BOSNIA_HERZ = 70, CTRY_BRAZIL = 76, diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index 1ced5a323cf8..e13b96e45d53 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -313,6 +313,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_BELGIUM, ETSI1_WORLD, "BE"}, {CTRY_BELGIUM2, ETSI4_WORLD, "BL"}, {CTRY_BELIZE, APL1_ETSIC, "BZ"}, + {CTRY_BERMUDA, FCC3_FCCA, "BM"}, {CTRY_BOLIVIA, APL1_ETSIC, "BO"}, {CTRY_BOSNIA_HERZ, ETSI1_WORLD, "BA"}, {CTRY_BRAZIL, FCC3_WORLD, "BR"}, From cf619559ec8232f94e40062231b27f9cb8d876f2 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Wed, 23 May 2018 11:09:53 +0300 Subject: [PATCH 375/970] ath: Add regulatory mapping for Bahamas [ Upstream commit 699e2302c286a14afe7b7394151ce6c4e1790cc1 ] The country code is used by the ath to detect the ISO 3166-1 alpha-2 name and to select the correct conformance test limits (CTL) for a country. If the country isn't available and it is still programmed in the EEPROM then it will cause an error and stop the initialization with: Invalid EEPROM contents The current CTL mappings for this country are: * 2.4GHz: ETSI * 5GHz: FCC Signed-off-by: Sven Eckelmann Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/regd.h | 1 + drivers/net/wireless/ath/regd_common.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/wireless/ath/regd.h b/drivers/net/wireless/ath/regd.h index 716da0e65d63..8553ab44d930 100644 --- a/drivers/net/wireless/ath/regd.h +++ b/drivers/net/wireless/ath/regd.h @@ -68,6 +68,7 @@ enum CountryCode { CTRY_AUSTRALIA = 36, CTRY_AUSTRIA = 40, CTRY_AZERBAIJAN = 31, + CTRY_BAHAMAS = 44, CTRY_BAHRAIN = 48, CTRY_BANGLADESH = 50, CTRY_BARBADOS = 52, diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index e13b96e45d53..15bbd1e0d912 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -306,6 +306,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_AUSTRALIA2, FCC6_WORLD, "AU"}, {CTRY_AUSTRIA, ETSI1_WORLD, "AT"}, {CTRY_AZERBAIJAN, ETSI4_WORLD, "AZ"}, + {CTRY_BAHAMAS, FCC3_WORLD, "BS"}, {CTRY_BAHRAIN, APL6_WORLD, "BH"}, {CTRY_BANGLADESH, NULL1_WORLD, "BD"}, {CTRY_BARBADOS, FCC2_WORLD, "BB"}, From ecd04c80fa32f7f111f676ab062bb621367cbf26 Mon Sep 17 00:00:00 2001 From: Mathieu Malaterre Date: Thu, 22 Mar 2018 21:20:03 +0100 Subject: [PATCH 376/970] powerpc/32: Add a missing include header MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit c89ca593220931c150cffda24b4d4ccf82f13fc8 ] The header file was missing from the includes. Fix the following warning, treated as error with W=1: arch/powerpc/kernel/pci_32.c:286:6: error: no previous prototype for ‘sys_pciconfig_iobase’ [-Werror=missing-prototypes] Signed-off-by: Mathieu Malaterre Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/pci_32.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c index 678f87a63645..97b02b8c4f10 100644 --- a/arch/powerpc/kernel/pci_32.c +++ b/arch/powerpc/kernel/pci_32.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include From f851d8ac65cc3378983cc65dfab4b0384ec70ebb Mon Sep 17 00:00:00 2001 From: Mathieu Malaterre Date: Thu, 22 Mar 2018 21:19:56 +0100 Subject: [PATCH 377/970] powerpc/chrp/time: Make some functions static, add missing header include MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit b87a358b4a1421abd544c0b554b1b7159b2b36c0 ] Add a missing include . These functions can all be static, make it so. Fix warnings treated as errors with W=1: arch/powerpc/platforms/chrp/time.c:41:13: error: no previous prototype for ‘chrp_time_init’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:66:5: error: no previous prototype for ‘chrp_cmos_clock_read’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:74:6: error: no previous prototype for ‘chrp_cmos_clock_write’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:86:5: error: no previous prototype for ‘chrp_set_rtc_time’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:130:6: error: no previous prototype for ‘chrp_get_rtc_time’ [-Werror=missing-prototypes] Signed-off-by: Mathieu Malaterre Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/chrp/time.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/chrp/time.c b/arch/powerpc/platforms/chrp/time.c index f803f4b8ab6f..8608e358217f 100644 --- a/arch/powerpc/platforms/chrp/time.c +++ b/arch/powerpc/platforms/chrp/time.c @@ -27,6 +27,8 @@ #include #include +#include + extern spinlock_t rtc_lock; #define NVRAM_AS0 0x74 @@ -62,7 +64,7 @@ long __init chrp_time_init(void) return 0; } -int chrp_cmos_clock_read(int addr) +static int chrp_cmos_clock_read(int addr) { if (nvram_as1 != 0) outb(addr>>8, nvram_as1); @@ -70,7 +72,7 @@ int chrp_cmos_clock_read(int addr) return (inb(nvram_data)); } -void chrp_cmos_clock_write(unsigned long val, int addr) +static void chrp_cmos_clock_write(unsigned long val, int addr) { if (nvram_as1 != 0) outb(addr>>8, nvram_as1); From 0cd9fd8406a6c8f8dc8f2883caefb5dad36d28fc Mon Sep 17 00:00:00 2001 From: Mathieu Malaterre Date: Wed, 4 Apr 2018 22:13:05 +0200 Subject: [PATCH 378/970] powerpc/powermac: Add missing prototype for note_bootable_part() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit f72cf3f1d49f2c35d6cb682af2e8c93550f264e4 ] Add a missing prototype for function `note_bootable_part` to silence a warning treated as error with W=1: arch/powerpc/platforms/powermac/setup.c:361:12: error: no previous prototype for ‘note_bootable_part’ [-Werror=missing-prototypes] Suggested-by: Christophe Leroy Signed-off-by: Mathieu Malaterre Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powermac/setup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c index 6b4e9d181126..4929dd4b165e 100644 --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -352,6 +352,7 @@ static int pmac_late_init(void) } machine_late_initcall(powermac, pmac_late_init); +void note_bootable_part(dev_t dev, int part, int goodness); /* * This is __ref because we check for "initializing" before * touching any of the __init sensitive things and "initializing" From e0da21e7e7f18b5a723b7771f709a3e7796b9ca0 Mon Sep 17 00:00:00 2001 From: Mathieu Malaterre Date: Wed, 4 Apr 2018 22:07:46 +0200 Subject: [PATCH 379/970] powerpc/powermac: Mark variable x as unused MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 5a4b475cf8511da721f20ba432c244061db7139f ] Since the value of x is never intended to be read, declare it with gcc attribute as unused. Fix warning treated as error with W=1: arch/powerpc/platforms/powermac/bootx_init.c:471:21: error: variable ‘x’ set but not used [-Werror=unused-but-set-variable] Suggested-by: Christophe Leroy Signed-off-by: Mathieu Malaterre Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powermac/bootx_init.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powermac/bootx_init.c b/arch/powerpc/platforms/powermac/bootx_init.c index c3c9bbb3573a..ba0964c17620 100644 --- a/arch/powerpc/platforms/powermac/bootx_init.c +++ b/arch/powerpc/platforms/powermac/bootx_init.c @@ -468,7 +468,7 @@ void __init bootx_init(unsigned long r3, unsigned long r4) boot_infos_t *bi = (boot_infos_t *) r4; unsigned long hdr; unsigned long space; - unsigned long ptr, x; + unsigned long ptr; char *model; unsigned long offset = reloc_offset(); @@ -562,6 +562,8 @@ void __init bootx_init(unsigned long r3, unsigned long r4) * MMU switched OFF, so this should not be useful anymore. */ if (bi->version < 4) { + unsigned long x __maybe_unused; + bootx_printf("Touching pages...\n"); /* From 38d96f7888f57f781e5ae80dde22ab97c1802e36 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Thu, 24 May 2018 11:02:06 +0000 Subject: [PATCH 380/970] powerpc/8xx: fix invalid register expression in head_8xx.S [ Upstream commit e4ccb1dae6bdef228d729c076c38161ef6e7ca34 ] New binutils generate the following warning AS arch/powerpc/kernel/head_8xx.o arch/powerpc/kernel/head_8xx.S: Assembler messages: arch/powerpc/kernel/head_8xx.S:916: Warning: invalid register expression This patch fixes it. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/head_8xx.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index fb133a163263..2274be535dda 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -769,7 +769,7 @@ start_here: tovirt(r6,r6) lis r5, abatron_pteptrs@h ori r5, r5, abatron_pteptrs@l - stw r5, 0xf0(r0) /* Must match your Abatron config file */ + stw r5, 0xf0(0) /* Must match your Abatron config file */ tophys(r5,r5) stw r6, 0(r5) From 23d25f9bdaefa276226f1a62fef8dd47f64c09e4 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Wed, 23 May 2018 21:07:12 +0200 Subject: [PATCH 381/970] pinctrl: at91-pio4: add missing of_node_put [ Upstream commit 21816364715f508c10da1e087e352bc1e326614f ] The device node iterators perform an of_node_get on each iteration, so a jump out of the loop requires an of_node_put. The semantic patch that fixes this problem is as follows (http://coccinelle.lip6.fr): // @@ expression root,e; local idexpression child; iterator name for_each_child_of_node; @@ for_each_child_of_node(root, child) { ... when != of_node_put(child) when != e = child + of_node_put(child); ? break; ... } ... when != child // Signed-off-by: Julia Lawall Acked-by: Ludovic Desroches Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/pinctrl-at91-pio4.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pinctrl/pinctrl-at91-pio4.c b/drivers/pinctrl/pinctrl-at91-pio4.c index 28bbc1bb9e6c..88ba9c50cc8e 100644 --- a/drivers/pinctrl/pinctrl-at91-pio4.c +++ b/drivers/pinctrl/pinctrl-at91-pio4.c @@ -573,8 +573,10 @@ static int atmel_pctl_dt_node_to_map(struct pinctrl_dev *pctldev, for_each_child_of_node(np_config, np) { ret = atmel_pctl_dt_subnode_to_map(pctldev, np, map, &reserved_maps, num_maps); - if (ret < 0) + if (ret < 0) { + of_node_put(np); break; + } } } From 0416be409e50dae1a58cf5b5d9a0d799e1e9b790 Mon Sep 17 00:00:00 2001 From: Sandipan Das Date: Thu, 24 May 2018 12:26:46 +0530 Subject: [PATCH 382/970] bpf: powerpc64: pad function address loads with NOPs [ Upstream commit 4ea69b2fd623dee2bbc77d3b6b7d8c0924e2026a ] For multi-function programs, loading the address of a callee function to a register requires emitting instructions whose count varies from one to five depending on the nature of the address. Since we come to know of the callee's address only before the extra pass, the number of instructions required to load this address may vary from what was previously generated. This can make the JITed image grow or shrink. To avoid this, we should generate a constant five-instruction when loading function addresses by padding the optimized load sequence with NOPs. Signed-off-by: Sandipan Das Signed-off-by: Daniel Borkmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/net/bpf_jit_comp64.c | 34 +++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index be9d968244ad..c0e817f35e69 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -207,25 +207,37 @@ static void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx) static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, u64 func) { + unsigned int i, ctx_idx = ctx->idx; + + /* Load function address into r12 */ + PPC_LI64(12, func); + + /* For bpf-to-bpf function calls, the callee's address is unknown + * until the last extra pass. As seen above, we use PPC_LI64() to + * load the callee's address, but this may optimize the number of + * instructions required based on the nature of the address. + * + * Since we don't want the number of instructions emitted to change, + * we pad the optimized PPC_LI64() call with NOPs to guarantee that + * we always have a five-instruction sequence, which is the maximum + * that PPC_LI64() can emit. + */ + for (i = ctx->idx - ctx_idx; i < 5; i++) + PPC_NOP(); + #ifdef PPC64_ELF_ABI_v1 - /* func points to the function descriptor */ - PPC_LI64(b2p[TMP_REG_2], func); - /* Load actual entry point from function descriptor */ - PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0); - /* ... and move it to LR */ - PPC_MTLR(b2p[TMP_REG_1]); /* * Load TOC from function descriptor at offset 8. * We can clobber r2 since we get called through a * function pointer (so caller will save/restore r2) * and since we don't use a TOC ourself. */ - PPC_BPF_LL(2, b2p[TMP_REG_2], 8); -#else - /* We can clobber r12 */ - PPC_FUNC_ADDR(12, func); - PPC_MTLR(12); + PPC_BPF_LL(2, 12, 8); + /* Load actual entry point from function descriptor */ + PPC_BPF_LL(12, 12, 0); #endif + + PPC_MTLR(12); PPC_BLRL(); } From 15da89437656dc3b8ee3601fc01a567bb4bd0d33 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Wed, 23 May 2018 17:19:22 -0500 Subject: [PATCH 383/970] PCI: pciehp: Request control of native hotplug only if supported [ Upstream commit 408fec36a1ab3d14273c2116b449ef1e9be3cb8b ] Currently we request control of native PCIe hotplug unconditionally. Native PCIe hotplug events are handled by the pciehp driver, and if it is not enabled those events will be lost. Request control of native PCIe hotplug only if the pciehp driver is enabled, so we will actually handle native PCIe hotplug events. Suggested-by: Bjorn Helgaas Signed-off-by: Mika Westerberg Signed-off-by: Bjorn Helgaas Reviewed-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/pci_root.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index bf601d4df8cf..b66815f35be6 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -472,9 +472,11 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm) } control = OSC_PCI_EXPRESS_CAPABILITY_CONTROL - | OSC_PCI_EXPRESS_NATIVE_HP_CONTROL | OSC_PCI_EXPRESS_PME_CONTROL; + if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)) + control |= OSC_PCI_EXPRESS_NATIVE_HP_CONTROL; + if (pci_aer_available()) { if (aer_acpi_firmware_first()) dev_info(&device->dev, From f14629f34746cf4442f2598fc108f4b8b87a8cec Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Vok=C3=A1=C4=8D?= Date: Wed, 23 May 2018 08:20:19 +0200 Subject: [PATCH 384/970] net: dsa: qca8k: Add support for QCA8334 switch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 64cf81675a1f64c1b311e4611dd3b6a961607612 ] Add support for the four-port variant of the Qualcomm QCA833x switch. Signed-off-by: Michal Vokáč Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/dsa/qca8k.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index b3df70d07ff6..33ed3997f9b9 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -1018,6 +1018,7 @@ static SIMPLE_DEV_PM_OPS(qca8k_pm_ops, qca8k_suspend, qca8k_resume); static const struct of_device_id qca8k_of_match[] = { + { .compatible = "qca,qca8334" }, { .compatible = "qca,qca8337" }, { /* sentinel */ }, }; From db6872750dfd10bc952e96e984b5654d37de9657 Mon Sep 17 00:00:00 2001 From: Xinming Hu Date: Fri, 18 May 2018 15:38:54 +0800 Subject: [PATCH 385/970] mwifiex: correct histogram data with appropriate index [ Upstream commit 30bfce0b63fa68c14ae1613eb9d259fa18644074 ] Correct snr/nr/rssi data index to avoid possible buffer underflow. Signed-off-by: Xinming Hu Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/marvell/mwifiex/util.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/marvell/mwifiex/util.c b/drivers/net/wireless/marvell/mwifiex/util.c index 18fbb96a46e9..d75756c68e16 100644 --- a/drivers/net/wireless/marvell/mwifiex/util.c +++ b/drivers/net/wireless/marvell/mwifiex/util.c @@ -723,12 +723,14 @@ void mwifiex_hist_data_set(struct mwifiex_private *priv, u8 rx_rate, s8 snr, s8 nflr) { struct mwifiex_histogram_data *phist_data = priv->hist_data; + s8 nf = -nflr; + s8 rssi = snr - nflr; atomic_inc(&phist_data->num_samples); atomic_inc(&phist_data->rx_rate[rx_rate]); - atomic_inc(&phist_data->snr[snr]); - atomic_inc(&phist_data->noise_flr[128 + nflr]); - atomic_inc(&phist_data->sig_str[nflr - snr]); + atomic_inc(&phist_data->snr[snr + 128]); + atomic_inc(&phist_data->noise_flr[nf + 128]); + atomic_inc(&phist_data->sig_str[rssi + 128]); } /* function to reset histogram data during init/reset */ From 81be5529c8e62b5f251e2a458dc8212b98640b8d Mon Sep 17 00:00:00 2001 From: Mimi Zohar Date: Fri, 27 Apr 2018 14:31:40 -0400 Subject: [PATCH 386/970] ima: based on policy verify firmware signatures (pre-allocated buffer) [ Upstream commit fd90bc559bfba743ae8de87ff23b92a5e4668062 ] Don't differentiate, for now, between kernel_read_file_id READING_FIRMWARE and READING_FIRMWARE_PREALLOC_BUFFER enumerations. Fixes: a098ecd firmware: support loading into a pre-allocated buffer (since 4.8) Signed-off-by: Mimi Zohar Cc: Luis R. Rodriguez Cc: David Howells Cc: Kees Cook Cc: Serge E. Hallyn Cc: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- security/integrity/ima/ima_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index a71f906b4f7a..9652541c4d43 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -379,6 +379,7 @@ int ima_read_file(struct file *file, enum kernel_read_file_id read_id) static int read_idmap[READING_MAX_ID] = { [READING_FIRMWARE] = FIRMWARE_CHECK, + [READING_FIRMWARE_PREALLOC_BUFFER] = FIRMWARE_CHECK, [READING_MODULE] = MODULE_CHECK, [READING_KEXEC_IMAGE] = KEXEC_KERNEL_CHECK, [READING_KEXEC_INITRAMFS] = KEXEC_INITRAMFS_CHECK, From e6d90b8c608a02d63623c945e3127893e3884b14 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 21 May 2018 18:19:49 +0100 Subject: [PATCH 387/970] drivers/perf: arm-ccn: don't log to dmesg in event_init [ Upstream commit 1898eb61fbc9703efee886d3abec27a388cf28c3 ] The ARM CCN PMU driver uses dev_warn() to complain about parameters in the user-provided perf_event_attr. This means that under normal operation (e.g. a single invocation of the perf tool), a number of messages warnings may be logged to dmesg. Tools may issue multiple syscalls to probe for feature support, and multiple applications (from multiple users) can attempt to open events simultaneously, so this is not very helpful, even if a user happens to have access to dmesg. Worse, this can push important information out of the dmesg ring buffer, and can significantly slow down syscall fuzzers, vastly increasing the time it takes to find critical bugs. Demote the dev_warn() instances to dev_dbg(), as is the case for all other PMU drivers under drivers/perf/. Users who wish to debug PMU event initialisation can enable dynamic debug to receive these messages. Signed-off-by: Mark Rutland Cc: Pawel Moll Cc: Will Deacon Signed-off-by: Will Deacon Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bus/arm-ccn.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c index 45d7ecc66b22..4e9e9e618c9f 100644 --- a/drivers/bus/arm-ccn.c +++ b/drivers/bus/arm-ccn.c @@ -736,7 +736,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) ccn = pmu_to_arm_ccn(event->pmu); if (hw->sample_period) { - dev_warn(ccn->dev, "Sampling not supported!\n"); + dev_dbg(ccn->dev, "Sampling not supported!\n"); return -EOPNOTSUPP; } @@ -744,12 +744,12 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) event->attr.exclude_kernel || event->attr.exclude_hv || event->attr.exclude_idle || event->attr.exclude_host || event->attr.exclude_guest) { - dev_warn(ccn->dev, "Can't exclude execution levels!\n"); + dev_dbg(ccn->dev, "Can't exclude execution levels!\n"); return -EINVAL; } if (event->cpu < 0) { - dev_warn(ccn->dev, "Can't provide per-task data!\n"); + dev_dbg(ccn->dev, "Can't provide per-task data!\n"); return -EOPNOTSUPP; } /* @@ -771,13 +771,13 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) switch (type) { case CCN_TYPE_MN: if (node_xp != ccn->mn_id) { - dev_warn(ccn->dev, "Invalid MN ID %d!\n", node_xp); + dev_dbg(ccn->dev, "Invalid MN ID %d!\n", node_xp); return -EINVAL; } break; case CCN_TYPE_XP: if (node_xp >= ccn->num_xps) { - dev_warn(ccn->dev, "Invalid XP ID %d!\n", node_xp); + dev_dbg(ccn->dev, "Invalid XP ID %d!\n", node_xp); return -EINVAL; } break; @@ -785,11 +785,11 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) break; default: if (node_xp >= ccn->num_nodes) { - dev_warn(ccn->dev, "Invalid node ID %d!\n", node_xp); + dev_dbg(ccn->dev, "Invalid node ID %d!\n", node_xp); return -EINVAL; } if (!arm_ccn_pmu_type_eq(type, ccn->node[node_xp].type)) { - dev_warn(ccn->dev, "Invalid type 0x%x for node %d!\n", + dev_dbg(ccn->dev, "Invalid type 0x%x for node %d!\n", type, node_xp); return -EINVAL; } @@ -808,19 +808,19 @@ static int arm_ccn_pmu_event_init(struct perf_event *event) if (event_id != e->event) continue; if (e->num_ports && port >= e->num_ports) { - dev_warn(ccn->dev, "Invalid port %d for node/XP %d!\n", + dev_dbg(ccn->dev, "Invalid port %d for node/XP %d!\n", port, node_xp); return -EINVAL; } if (e->num_vcs && vc >= e->num_vcs) { - dev_warn(ccn->dev, "Invalid vc %d for node/XP %d!\n", + dev_dbg(ccn->dev, "Invalid vc %d for node/XP %d!\n", vc, node_xp); return -EINVAL; } valid = 1; } if (!valid) { - dev_warn(ccn->dev, "Invalid event 0x%x for node/XP %d!\n", + dev_dbg(ccn->dev, "Invalid event 0x%x for node/XP %d!\n", event_id, node_xp); return -EINVAL; } From 3ce14632e78a117e47c13f678a597a93bd7f2b78 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 20 Apr 2018 16:30:02 -0700 Subject: [PATCH 388/970] fscrypt: use unbound workqueue for decryption [ Upstream commit 36dd26e0c8d42699eeba87431246c07c28075bae ] Improve fscrypt read performance by switching the decryption workqueue from bound to unbound. With the bound workqueue, when multiple bios completed on the same CPU, they were decrypted on that same CPU. But with the unbound queue, they are now decrypted in parallel on any CPU. Although fscrypt read performance can be tough to measure due to the many sources of variation, this change is most beneficial when decryption is slow, e.g. on CPUs without AES instructions. For example, I timed tarring up encrypted directories on f2fs. On x86 with AES-NI instructions disabled, the unbound workqueue improved performance by about 25-35%, using 1 to NUM_CPUs jobs with 4 or 8 CPUs available. But with AES-NI enabled, performance was unchanged to within ~2%. I also did the same test on a quad-core ARM CPU using xts-speck128-neon encryption. There performance was usually about 10% better with the unbound workqueue, bringing it closer to the unencrypted speed. The unbound workqueue may be worse in some cases due to worse locality, but I think it's still the better default. dm-crypt uses an unbound workqueue by default too, so this change makes fscrypt match. Signed-off-by: Eric Biggers Signed-off-by: Theodore Ts'o Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/crypto/crypto.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 73de1446c8d4..1a8962569b5c 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -517,8 +517,17 @@ EXPORT_SYMBOL(fscrypt_initialize); */ static int __init fscrypt_init(void) { + /* + * Use an unbound workqueue to allow bios to be decrypted in parallel + * even when they happen to complete on the same CPU. This sacrifices + * locality, but it's worthwhile since decryption is CPU-intensive. + * + * Also use a high-priority workqueue to prioritize decryption work, + * which blocks reads from completing, over regular application tasks. + */ fscrypt_read_workqueue = alloc_workqueue("fscrypt_read_queue", - WQ_HIGHPRI, 0); + WQ_UNBOUND | WQ_HIGHPRI, + num_online_cpus()); if (!fscrypt_read_workqueue) goto fail; From 62413bacafa319f0901185c6e11f69d39709a541 Mon Sep 17 00:00:00 2001 From: Maya Erez Date: Thu, 3 May 2018 16:37:16 +0530 Subject: [PATCH 389/970] scsi: ufs: fix exception event handling [ Upstream commit 2e3611e9546c2ed4def152a51dfd34e8dddae7a5 ] The device can set the exception event bit in one of the response UPIU, for example to notify the need for urgent BKOPs operation. In such a case, the host driver calls ufshcd_exception_event_handler to handle this notification. When trying to check the exception event status (for finding the cause for the exception event), the device may be busy with additional SCSI commands handling and may not respond within the 100ms timeout. To prevent that, we need to block SCSI commands during handling of exception events and allow retransmissions of the query requests, in case of timeout. Signed-off-by: Subhash Jadavani Signed-off-by: Maya Erez Signed-off-by: Can Guo Signed-off-by: Asutosh Das Reviewed-by: Subhash Jadavani Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/ufs/ufshcd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 86a3110c6d75..f857086ce2fa 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -4012,6 +4012,7 @@ static void ufshcd_exception_event_handler(struct work_struct *work) hba = container_of(work, struct ufs_hba, eeh_work); pm_runtime_get_sync(hba->dev); + scsi_block_requests(hba->host); err = ufshcd_get_ee_status(hba, &status); if (err) { dev_err(hba->dev, "%s: failed to get exception status %d\n", @@ -4025,6 +4026,7 @@ static void ufshcd_exception_event_handler(struct work_struct *work) ufshcd_bkops_exception_event_handler(hba); out: + scsi_unblock_requests(hba->host); pm_runtime_put_sync(hba->dev); return; } From 995cbcab6d3e1740169a251c699a9e55c9dc5e8e Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 17 May 2018 20:02:23 +0200 Subject: [PATCH 390/970] ALSA: emu10k1: Rate-limit error messages about page errors [ Upstream commit 11d42c81036324697d367600bfc16f6dd37636fd ] The error messages at sanity checks of memory pages tend to repeat too many times once when it hits, and without the rate limit, it may flood and become unreadable. Replace such messages with the *_ratelimited() variant. Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1093027 Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/pci/emu10k1/memory.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sound/pci/emu10k1/memory.c b/sound/pci/emu10k1/memory.c index 4f1f69be1865..8c778fa33031 100644 --- a/sound/pci/emu10k1/memory.c +++ b/sound/pci/emu10k1/memory.c @@ -237,13 +237,13 @@ search_empty(struct snd_emu10k1 *emu, int size) static int is_valid_page(struct snd_emu10k1 *emu, dma_addr_t addr) { if (addr & ~emu->dma_mask) { - dev_err(emu->card->dev, + dev_err_ratelimited(emu->card->dev, "max memory size is 0x%lx (addr = 0x%lx)!!\n", emu->dma_mask, (unsigned long)addr); return 0; } if (addr & (EMUPAGESIZE-1)) { - dev_err(emu->card->dev, "page is not aligned\n"); + dev_err_ratelimited(emu->card->dev, "page is not aligned\n"); return 0; } return 1; @@ -334,7 +334,7 @@ snd_emu10k1_alloc_pages(struct snd_emu10k1 *emu, struct snd_pcm_substream *subst else addr = snd_pcm_sgbuf_get_addr(substream, ofs); if (! is_valid_page(emu, addr)) { - dev_err(emu->card->dev, + dev_err_ratelimited(emu->card->dev, "emu: failure page = %d\n", idx); mutex_unlock(&hdr->block_mutex); return NULL; From 211c2bc42a1cd29d6e01a785dec6dc8dc85e72c5 Mon Sep 17 00:00:00 2001 From: Anson Huang Date: Thu, 17 May 2018 15:27:22 +0800 Subject: [PATCH 391/970] regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops [ Upstream commit 0b01fd3d40fe6402e5fa3b491ef23109feb1aaa5 ] If is_enabled() is not defined, regulator core will assume this regulator is already enabled, then it can NOT be really enabled after disabled. Based on Li Jun's patch from the NXP kernel tree. Signed-off-by: Anson Huang Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/regulator/pfuze100-regulator.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/regulator/pfuze100-regulator.c b/drivers/regulator/pfuze100-regulator.c index cb18b5c4f2db..86b348740fcd 100644 --- a/drivers/regulator/pfuze100-regulator.c +++ b/drivers/regulator/pfuze100-regulator.c @@ -153,6 +153,7 @@ static struct regulator_ops pfuze100_sw_regulator_ops = { static struct regulator_ops pfuze100_swb_regulator_ops = { .enable = regulator_enable_regmap, .disable = regulator_disable_regmap, + .is_enabled = regulator_is_enabled_regmap, .list_voltage = regulator_list_voltage_table, .map_voltage = regulator_map_voltage_ascend, .set_voltage_sel = regulator_set_voltage_sel_regmap, From e51f4fcfad77f932d104304a2e1f0473752f9a85 Mon Sep 17 00:00:00 2001 From: Yufen Yu Date: Fri, 4 May 2018 18:08:10 +0800 Subject: [PATCH 392/970] md: fix NULL dereference of mddev->pers in remove_and_add_spares() [ Upstream commit c42a0e2675721e1444f56e6132a07b7b1ec169ac ] We met NULL pointer BUG as follow: [ 151.760358] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 [ 151.761340] PGD 80000001011eb067 P4D 80000001011eb067 PUD 1011ea067 PMD 0 [ 151.762039] Oops: 0000 [#1] SMP PTI [ 151.762406] Modules linked in: [ 151.762723] CPU: 2 PID: 3561 Comm: mdadm-test Kdump: loaded Not tainted 4.17.0-rc1+ #238 [ 151.763542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [ 151.764432] RIP: 0010:remove_and_add_spares.part.56+0x13c/0x3a0 [ 151.765061] RSP: 0018:ffffc90001d7fcd8 EFLAGS: 00010246 [ 151.765590] RAX: 0000000000000000 RBX: ffff88013601d600 RCX: 0000000000000000 [ 151.766306] RDX: 0000000000000000 RSI: ffff88013601d600 RDI: ffff880136187000 [ 151.767014] RBP: ffff880136187018 R08: 0000000000000003 R09: 0000000000000051 [ 151.767728] R10: ffffc90001d7fed8 R11: 0000000000000000 R12: ffff88013601d600 [ 151.768447] R13: ffff8801298b1300 R14: ffff880136187000 R15: 0000000000000000 [ 151.769160] FS: 00007f2624276700(0000) GS:ffff88013ae80000(0000) knlGS:0000000000000000 [ 151.769971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 151.770554] CR2: 0000000000000060 CR3: 0000000111aac000 CR4: 00000000000006e0 [ 151.771272] Call Trace: [ 151.771542] md_ioctl+0x1df2/0x1e10 [ 151.771906] ? __switch_to+0x129/0x440 [ 151.772295] ? __schedule+0x244/0x850 [ 151.772672] blkdev_ioctl+0x4bd/0x970 [ 151.773048] block_ioctl+0x39/0x40 [ 151.773402] do_vfs_ioctl+0xa4/0x610 [ 151.773770] ? dput.part.23+0x87/0x100 [ 151.774151] ksys_ioctl+0x70/0x80 [ 151.774493] __x64_sys_ioctl+0x16/0x20 [ 151.774877] do_syscall_64+0x5b/0x180 [ 151.775258] entry_SYSCALL_64_after_hwframe+0x44/0xa9 For raid6, when two disk of the array are offline, two spare disks can be added into the array. Before spare disks recovery completing, system reboot and mdadm thinks it is ok to restart the degraded array by md_ioctl(). Since disks in raid6 is not only_parity(), raid5_run() will abort, when there is no PPL feature or not setting 'start_dirty_degraded' parameter. Therefore, mddev->pers is NULL. But, mddev->raid_disks has been set and it will not be cleared when raid5_run abort. md_ioctl() can execute cmd 'HOT_REMOVE_DISK' to remove a disk by mdadm, which will cause NULL pointer dereference in remove_and_add_spares() finally. Signed-off-by: Yufen Yu Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/md.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index 3bb985679f34..a7a0e3acdb2f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6192,6 +6192,9 @@ static int hot_remove_disk(struct mddev *mddev, dev_t dev) char b[BDEVNAME_SIZE]; struct md_rdev *rdev; + if (!mddev->pers) + return -ENODEV; + rdev = find_rdev(mddev, dev); if (!rdev) return -ENXIO; From 8d02fc16faaa1f29500fdda143ef99855d6b9518 Mon Sep 17 00:00:00 2001 From: Emil Tantilov Date: Mon, 14 May 2018 11:16:16 -0700 Subject: [PATCH 393/970] ixgbevf: fix MAC address changes through ixgbevf_set_mac() [ Upstream commit 6e7d0ba1e59b1a306761a731e67634c0f2efea2a ] Set hw->mac.perm_addr in ixgbevf_set_mac() in order to avoid losing the custom MAC on reset. This can happen in the following case: >ip link set $vf address $mac >ethtool -r $vf Signed-off-by: Emil Tantilov Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index 1499ce2bf9f6..029513294984 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -3729,6 +3729,7 @@ static int ixgbevf_set_mac(struct net_device *netdev, void *p) return -EPERM; ether_addr_copy(hw->mac.addr, addr->sa_data); + ether_addr_copy(hw->mac.perm_addr, addr->sa_data); ether_addr_copy(netdev->dev_addr, addr->sa_data); return 0; From 77b6f72cefc82861216cd8e85c6c86d39107da32 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Wed, 25 Apr 2018 11:04:21 -0400 Subject: [PATCH 394/970] media: smiapp: fix timeout checking in smiapp_read_nvm [ Upstream commit 7a2148dfda8001c983f0effd9afd8a7fa58e99c4 ] The current code decrements the timeout counter i and the end of each loop i is incremented, so the check for timeout will always be false and hence the timeout mechanism is just a dead code path. Potentially, if the RD_READY bit is not set, we could end up in an infinite loop. Fix this so the timeout starts from 1000 and decrements to zero, if at the end of the loop i is zero we have a timeout condition. Detected by CoverityScan, CID#1324008 ("Logically dead code") Fixes: ccfc97bdb5ae ("[media] smiapp: Add driver") Signed-off-by: Colin Ian King Signed-off-by: Sakari Ailus Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/i2c/smiapp/smiapp-core.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/media/i2c/smiapp/smiapp-core.c b/drivers/media/i2c/smiapp/smiapp-core.c index 44f8c7e10a35..8ffa13f39a86 100644 --- a/drivers/media/i2c/smiapp/smiapp-core.c +++ b/drivers/media/i2c/smiapp/smiapp-core.c @@ -991,7 +991,7 @@ static int smiapp_read_nvm(struct smiapp_sensor *sensor, if (rval) goto out; - for (i = 0; i < 1000; i++) { + for (i = 1000; i > 0; i--) { rval = smiapp_read( sensor, SMIAPP_REG_U8_DATA_TRANSFER_IF_1_STATUS, &s); @@ -1002,11 +1002,10 @@ static int smiapp_read_nvm(struct smiapp_sensor *sensor, if (s & SMIAPP_DATA_TRANSFER_IF_1_STATUS_RD_READY) break; - if (--i == 0) { - rval = -ETIMEDOUT; - goto out; - } - + } + if (!i) { + rval = -ETIMEDOUT; + goto out; } for (i = 0; i < SMIAPP_NVM_PAGE_SIZE; i++) { From 1fa620150c9b1b09f56def71aab060718c1cad64 Mon Sep 17 00:00:00 2001 From: Grygorii Strashko Date: Tue, 15 May 2018 18:37:25 -0500 Subject: [PATCH 395/970] net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value [ Upstream commit c6213eb1aee308e67377fd1890d84f7284caf531 ] This fixes klockworks warnings: Pointer 'dev' returned from call to function 'bus_find_device' at line 179 may be NULL and will be dereferenced at line 181. cpsw-phy-sel.c:179: 'dev' is assigned the return value from function 'bus_find_device'. bus.c:342: 'bus_find_device' explicitly returns a NULL value. cpsw-phy-sel.c:181: 'dev' is dereferenced by passing argument 1 to function 'dev_get_drvdata'. device.h:1024: 'dev' is passed to function 'dev_get_drvdata'. device.h:1026: 'dev' is explicitly dereferenced. Signed-off-by: Grygorii Strashko [nsekhar@ti.com: add an error message, fix return path] Signed-off-by: Sekhar Nori Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ti/cpsw-phy-sel.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/cpsw-phy-sel.c b/drivers/net/ethernet/ti/cpsw-phy-sel.c index 18013645e76c..0c1adad7415d 100644 --- a/drivers/net/ethernet/ti/cpsw-phy-sel.c +++ b/drivers/net/ethernet/ti/cpsw-phy-sel.c @@ -177,12 +177,18 @@ void cpsw_phy_sel(struct device *dev, phy_interface_t phy_mode, int slave) } dev = bus_find_device(&platform_bus_type, NULL, node, match); - of_node_put(node); + if (!dev) { + dev_err(dev, "unable to find platform device for %pOF\n", node); + goto out; + } + priv = dev_get_drvdata(dev); priv->cpsw_phy_sel(priv, phy_mode, slave); put_device(dev); +out: + of_node_put(node); } EXPORT_SYMBOL_GPL(cpsw_phy_sel); From 03df65a0bc5e15e7c4cc7463cccfd86bc3c01a66 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 16 May 2018 20:07:18 +0200 Subject: [PATCH 396/970] ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback [ Upstream commit 377a879d9832f4ba69bd6a1fc996bb4181b1e504 ] retire_capture_urb() may print warning messages when the given URB doesn't align, and this may flood the system log easily. Put the rate limit to the message for avoiding it. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1093485 Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/usb/pcm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c index c5dfe82beb24..e6ac7b9b4648 100644 --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -1310,7 +1310,7 @@ static void retire_capture_urb(struct snd_usb_substream *subs, if (bytes % (runtime->sample_bits >> 3) != 0) { int oldbytes = bytes; bytes = frames * stride; - dev_warn(&subs->dev->dev, + dev_warn_ratelimited(&subs->dev->dev, "Corrected urb data len. %d->%d\n", oldbytes, bytes); } From fba1048559d3fb234fee3568c4ab6f2261620307 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 30 Apr 2018 13:56:32 +0100 Subject: [PATCH 397/970] arm64: cmpwait: Clear event register before arming exclusive monitor [ Upstream commit 1cfc63b5ae60fe7e01773f38132f98d8b13a99a0 ] When waiting for a cacheline to change state in cmpwait, we may immediately wake-up the first time around the outer loop if the event register was already set (for example, because of the event stream). Avoid these spurious wakeups by explicitly clearing the event register before loading the cacheline and setting the exclusive monitor. Signed-off-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/cmpxchg.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h index ae852add053d..0f2e1ab5e166 100644 --- a/arch/arm64/include/asm/cmpxchg.h +++ b/arch/arm64/include/asm/cmpxchg.h @@ -229,7 +229,9 @@ static inline void __cmpwait_case_##name(volatile void *ptr, \ unsigned long tmp; \ \ asm volatile( \ - " ldxr" #sz "\t%" #w "[tmp], %[v]\n" \ + " sevl\n" \ + " wfe\n" \ + " ldxr" #sz "\t%" #w "[tmp], %[v]\n" \ " eor %" #w "[tmp], %" #w "[tmp], %" #w "[val]\n" \ " cbnz %" #w "[tmp], 1f\n" \ " wfe\n" \ From c57798822f3b5164a69fee89894c5ddd2c371235 Mon Sep 17 00:00:00 2001 From: Terry Junge Date: Mon, 30 Apr 2018 13:32:46 -0700 Subject: [PATCH 398/970] HID: hid-plantronics: Re-resend Update to map button for PTT products [ Upstream commit 37e376df5f4993677c33968a0c19b0c5acbf1108 ] Add a mapping for Push-To-Talk joystick trigger button. Tested on ChromeBox/ChromeBook with various Plantronics devices. Signed-off-by: Terry Junge Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/hid/hid-plantronics.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/hid/hid-plantronics.c b/drivers/hid/hid-plantronics.c index febb21ee190e..584b10d3fc3d 100644 --- a/drivers/hid/hid-plantronics.c +++ b/drivers/hid/hid-plantronics.c @@ -2,7 +2,7 @@ * Plantronics USB HID Driver * * Copyright (c) 2014 JD Cole - * Copyright (c) 2015 Terry Junge + * Copyright (c) 2015-2018 Terry Junge */ /* @@ -48,6 +48,10 @@ static int plantronics_input_mapping(struct hid_device *hdev, unsigned short mapped_key; unsigned long plt_type = (unsigned long)hid_get_drvdata(hdev); + /* special case for PTT products */ + if (field->application == HID_GD_JOYSTICK) + goto defaulted; + /* handle volume up/down mapping */ /* non-standard types or multi-HID interfaces - plt_type is PID */ if (!(plt_type & HID_USAGE_PAGE)) { From cab5ec8da3fbd8c4f3528e014e1f9455ece7a052 Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck Date: Tue, 24 Apr 2018 15:15:13 +0200 Subject: [PATCH 399/970] drm/radeon: fix mode_valid's return type [ Upstream commit 7a47f20eb1fb8fa8d7a8fe3a4fd8c721f04c2174 ] The method struct drm_connector_helper_funcs::mode_valid is defined as returning an 'enum drm_mode_status' but the driver implementation for this method uses an 'int' for it. Fix this by using 'enum drm_mode_status' in the driver too. Signed-off-by: Luc Van Oostenryck Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/radeon/radeon_connectors.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c index f416f5c2e8e9..c5e1aa5f1d8e 100644 --- a/drivers/gpu/drm/radeon/radeon_connectors.c +++ b/drivers/gpu/drm/radeon/radeon_connectors.c @@ -850,7 +850,7 @@ static int radeon_lvds_get_modes(struct drm_connector *connector) return ret; } -static int radeon_lvds_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_lvds_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_encoder *encoder = radeon_best_single_encoder(connector); @@ -1010,7 +1010,7 @@ static int radeon_vga_get_modes(struct drm_connector *connector) return ret; } -static int radeon_vga_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_vga_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_device *dev = connector->dev; @@ -1154,7 +1154,7 @@ static int radeon_tv_get_modes(struct drm_connector *connector) return 1; } -static int radeon_tv_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_tv_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { if ((mode->hdisplay > 1024) || (mode->vdisplay > 768)) @@ -1496,7 +1496,7 @@ static void radeon_dvi_force(struct drm_connector *connector) radeon_connector->use_digital = true; } -static int radeon_dvi_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_dvi_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_device *dev = connector->dev; @@ -1798,7 +1798,7 @@ radeon_dp_detect(struct drm_connector *connector, bool force) return ret; } -static int radeon_dp_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_dp_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_device *dev = connector->dev; From e7de1c6bbe51898186d478176aa8f69586a9e562 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonathan=20Neusch=C3=A4fer?= Date: Thu, 10 May 2018 23:59:19 +0200 Subject: [PATCH 400/970] powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9dcb3df4281876731e4e8bff7940514d72375154 ] The interrupt controller inside the Wii's Hollywood chip is connected to two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with their own interrupt status and mask registers. When booting the Wii with mini[1], interrupts from the SD card controller (IRQ 7) are handled by the ARM, because mini provides SD access over IPC. Linux however can't currently use or disable this IPC service, so both sides try to handle IRQ 7 without coordination. Let's instead make sure that all interrupts that are unmasked on the PPC side are masked on the ARM side; this will also make sure that Linux can properly talk to the SD card controller (and potentially other devices). If access to a device through IPC is desired in the future, interrupts from that device should not be handled by Linux directly. [1]: https://github.com/lewurm/mini Signed-off-by: Jonathan Neuschäfer Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/embedded6xx/hlwd-pic.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/platforms/embedded6xx/hlwd-pic.c b/arch/powerpc/platforms/embedded6xx/hlwd-pic.c index 89c54de88b7a..bf4a125faec6 100644 --- a/arch/powerpc/platforms/embedded6xx/hlwd-pic.c +++ b/arch/powerpc/platforms/embedded6xx/hlwd-pic.c @@ -35,6 +35,8 @@ */ #define HW_BROADWAY_ICR 0x00 #define HW_BROADWAY_IMR 0x04 +#define HW_STARLET_ICR 0x08 +#define HW_STARLET_IMR 0x0c /* @@ -74,6 +76,9 @@ static void hlwd_pic_unmask(struct irq_data *d) void __iomem *io_base = irq_data_get_irq_chip_data(d); setbits32(io_base + HW_BROADWAY_IMR, 1 << irq); + + /* Make sure the ARM (aka. Starlet) doesn't handle this interrupt. */ + clrbits32(io_base + HW_STARLET_IMR, 1 << irq); } From 3d06d3ca402c169e03811833a31bfe056f267c99 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Wed, 9 May 2018 12:12:15 -0700 Subject: [PATCH 401/970] HID: i2c-hid: check if device is there before really probing [ Upstream commit b3a81b6c4fc6730ac49e20d789a93c0faabafc98 ] On many Chromebooks touch devices are multi-sourced; the components are electrically compatible and one can be freely swapped for another without changing the OS image or firmware. To avoid bunch of scary messages when device is not actually present in the system let's try testing basic communication with it and if there is no response terminate probe early with -ENXIO. Signed-off-by: Dmitry Torokhov Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/hid/i2c-hid/i2c-hid.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/hid/i2c-hid/i2c-hid.c b/drivers/hid/i2c-hid/i2c-hid.c index 00bce002b357..ce2b80009c19 100644 --- a/drivers/hid/i2c-hid/i2c-hid.c +++ b/drivers/hid/i2c-hid/i2c-hid.c @@ -1101,6 +1101,14 @@ static int i2c_hid_probe(struct i2c_client *client, pm_runtime_enable(&client->dev); device_enable_async_suspend(&client->dev); + /* Make sure there is something at this address */ + ret = i2c_smbus_read_byte(client); + if (ret < 0) { + dev_dbg(&client->dev, "nothing at this address: %d\n", ret); + ret = -ENXIO; + goto err_pm; + } + ret = i2c_hid_fetch_hid_descriptor(ihid); if (ret < 0) goto err_pm; From b0d0e7162cb9dcbc6291e1627d68f27768005924 Mon Sep 17 00:00:00 2001 From: Thor Thayer Date: Mon, 14 May 2018 12:04:01 -0500 Subject: [PATCH 402/970] EDAC, altera: Fix ARM64 build warning [ Upstream commit 9ef20753e044f7468c4113e5aecd785419b0b3cc ] The kbuild test robot reported the following warning: drivers/edac/altera_edac.c: In function 'ocram_free_mem': drivers/edac/altera_edac.c:1410:42: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] gen_pool_free((struct gen_pool *)other, (u32)p, size); ^ After adding support for ARM64 architectures, the unsigned long parameter is 64 bits and causes a build warning on 64-bit configs. Fix by casting to the correct size (unsigned long) instead of u32. Reported-by: kbuild test robot Signed-off-by: Thor Thayer Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac Fixes: c3eea1942a16 ("EDAC, altera: Add Altera L2 cache and OCRAM support") Link: http://lkml.kernel.org/r/1526317441-4996-1-git-send-email-thor.thayer@linux.intel.com Signed-off-by: Borislav Petkov Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/edac/altera_edac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c index 61262a7a5c3a..b0bd0f64d8f2 100644 --- a/drivers/edac/altera_edac.c +++ b/drivers/edac/altera_edac.c @@ -1111,7 +1111,7 @@ static void *ocram_alloc_mem(size_t size, void **other) static void ocram_free_mem(void *p, size_t size, void *other) { - gen_pool_free((struct gen_pool *)other, (u32)p, size); + gen_pool_free((struct gen_pool *)other, (unsigned long)p, size); } static const struct edac_device_prv_data ocramecc_data = { From 1af8796a8bcc014fab938a20a889846afc6b7501 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 7 May 2018 15:40:05 +0200 Subject: [PATCH 403/970] ARM: dts: emev2: Add missing interrupt-affinity to PMU node [ Upstream commit 7207b94754b6f503b278b5b200faaf662ffa1da8 ] The PMU node references two interrupts, but lacks the interrupt-affinity property, which is required in that case: hw perfevents: no interrupt-affinity property for /pmu, guessing. Add the missing property to fix this. Signed-off-by: Geert Uytterhoeven Signed-off-by: Simon Horman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/emev2.dtsi | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/emev2.dtsi b/arch/arm/boot/dts/emev2.dtsi index cd119400f440..fd6f9ce9206a 100644 --- a/arch/arm/boot/dts/emev2.dtsi +++ b/arch/arm/boot/dts/emev2.dtsi @@ -30,13 +30,13 @@ #address-cells = <1>; #size-cells = <0>; - cpu@0 { + cpu0: cpu@0 { device_type = "cpu"; compatible = "arm,cortex-a9"; reg = <0>; clock-frequency = <533000000>; }; - cpu@1 { + cpu1: cpu@1 { device_type = "cpu"; compatible = "arm,cortex-a9"; reg = <1>; @@ -56,6 +56,7 @@ compatible = "arm,cortex-a9-pmu"; interrupts = , ; + interrupt-affinity = <&cpu0>, <&cpu1>; }; clocks@e0110000 { From 202a0cf0c0e70deeafec09c5a1bfd404c2acf6ba Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 7 May 2018 15:40:04 +0200 Subject: [PATCH 404/970] ARM: dts: sh73a0: Add missing interrupt-affinity to PMU node [ Upstream commit 57a66497e1b7486609250a482f05935eae5035e9 ] The PMU node references two interrupts, but lacks the interrupt-affinity property, which is required in that case: hw perfevents: no interrupt-affinity property for /pmu, guessing. Add the missing property to fix this. Signed-off-by: Geert Uytterhoeven Signed-off-by: Simon Horman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/sh73a0.dtsi | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/sh73a0.dtsi b/arch/arm/boot/dts/sh73a0.dtsi index 032fe2f14b16..6b0cc225149c 100644 --- a/arch/arm/boot/dts/sh73a0.dtsi +++ b/arch/arm/boot/dts/sh73a0.dtsi @@ -22,7 +22,7 @@ #address-cells = <1>; #size-cells = <0>; - cpu@0 { + cpu0: cpu@0 { device_type = "cpu"; compatible = "arm,cortex-a9"; reg = <0>; @@ -30,7 +30,7 @@ power-domains = <&pd_a2sl>; next-level-cache = <&L2>; }; - cpu@1 { + cpu1: cpu@1 { device_type = "cpu"; compatible = "arm,cortex-a9"; reg = <1>; @@ -89,6 +89,7 @@ compatible = "arm,cortex-a9-pmu"; interrupts = , ; + interrupt-affinity = <&cpu0>, <&cpu1>; }; cmt1: timer@e6138000 { From 30ac755c76c33b43b17f7cff8f015b6f4176d522 Mon Sep 17 00:00:00 2001 From: Mathieu Malaterre Date: Fri, 11 May 2018 12:07:03 +0100 Subject: [PATCH 405/970] nvmem: properly handle returned value nvmem_reg_read [ Upstream commit 50808bfcc14b854775a9f1d0abe3dac2babcf5c3 ] Function nvmem_reg_read can return a non zero value indicating an error. This returned value must be read and error propagated to nvmem_cell_prepare_write_buffer. Silence the following gcc warning (W=1): drivers/nvmem/core.c:1093:9: warning: variable 'rc' set but not used [-Wunused-but-set-variable] Signed-off-by: Mathieu Malaterre Signed-off-by: Srinivas Kandagatla Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/nvmem/core.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c index 1b4d93e9157e..824e282cd80e 100644 --- a/drivers/nvmem/core.c +++ b/drivers/nvmem/core.c @@ -1031,6 +1031,8 @@ static inline void *nvmem_cell_prepare_write_buffer(struct nvmem_cell *cell, /* setup the first byte with lsb bits from nvmem */ rc = nvmem_reg_read(nvmem, cell->offset, &v, 1); + if (rc) + goto err; *b++ |= GENMASK(bit_offset - 1, 0) & v; /* setup rest of the byte if any */ @@ -1049,11 +1051,16 @@ static inline void *nvmem_cell_prepare_write_buffer(struct nvmem_cell *cell, /* setup the last byte with msb bits from nvmem */ rc = nvmem_reg_read(nvmem, cell->offset + cell->bytes - 1, &v, 1); + if (rc) + goto err; *p |= GENMASK(7, (nbits + bit_offset) % BITS_PER_BYTE) & v; } return buf; +err: + kfree(buf); + return ERR_PTR(rc); } /** From d83904cb2eb2c4d937eaf15032214b0578f25099 Mon Sep 17 00:00:00 2001 From: DaeRyong Jeong Date: Tue, 1 May 2018 00:27:04 +0900 Subject: [PATCH 406/970] tty: Fix data race in tty_insert_flip_string_fixed_flag [ Upstream commit b6da31b2c07c46f2dcad1d86caa835227a16d9ff ] Unlike normal serials, in pty layer, there is no guarantee that multiple threads don't insert input characters at the same time. If it is happened, tty_insert_flip_string_fixed_flag can be executed concurrently. This can lead slab out-of-bounds write in tty_insert_flip_string_fixed_flag. Call sequences are as follows. CPU0 CPU1 n_tty_ioctl_helper n_tty_ioctl_helper __start_tty tty_send_xchar tty_wakeup pty_write n_hdlc_tty_wakeup tty_insert_flip_string n_hdlc_send_frames tty_insert_flip_string_fixed_flag pty_write tty_insert_flip_string tty_insert_flip_string_fixed_flag To fix the race, acquire port->lock in pty_write() before it inserts input characters to tty buffer. It prevents multiple threads from inserting input characters concurrently. The crash log is as follows: BUG: KASAN: slab-out-of-bounds in tty_insert_flip_string_fixed_flag+0xb5/ 0x130 drivers/tty/tty_buffer.c:316 at addr ffff880114fcc121 Write of size 1792 by task syz-executor0/30017 CPU: 1 PID: 30017 Comm: syz-executor0 Not tainted 4.8.0 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 0000000000000000 ffff88011638f888 ffffffff81694cc3 ffff88007d802140 ffff880114fcb300 ffff880114fcc300 ffff880114fcb300 ffff88011638f8b0 ffffffff8130075c ffff88011638f940 ffff88007d802140 ffff880194fcc121 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0xb3/0x110 lib/dump_stack.c:51 kasan_object_err+0x1c/0x70 mm/kasan/report.c:156 print_address_description mm/kasan/report.c:194 [inline] kasan_report_error+0x1f7/0x4e0 mm/kasan/report.c:283 kasan_report+0x36/0x40 mm/kasan/report.c:303 check_memory_region_inline mm/kasan/kasan.c:292 [inline] check_memory_region+0x13e/0x1a0 mm/kasan/kasan.c:299 memcpy+0x37/0x50 mm/kasan/kasan.c:335 tty_insert_flip_string_fixed_flag+0xb5/0x130 drivers/tty/tty_buffer.c:316 tty_insert_flip_string include/linux/tty_flip.h:35 [inline] pty_write+0x7f/0xc0 drivers/tty/pty.c:115 n_hdlc_send_frames+0x1d4/0x3b0 drivers/tty/n_hdlc.c:419 n_hdlc_tty_wakeup+0x73/0xa0 drivers/tty/n_hdlc.c:496 tty_wakeup+0x92/0xb0 drivers/tty/tty_io.c:601 __start_tty.part.26+0x66/0x70 drivers/tty/tty_io.c:1018 __start_tty+0x34/0x40 drivers/tty/tty_io.c:1013 n_tty_ioctl_helper+0x146/0x1e0 drivers/tty/tty_ioctl.c:1138 n_hdlc_tty_ioctl+0xb3/0x2b0 drivers/tty/n_hdlc.c:794 tty_ioctl+0xa85/0x16d0 drivers/tty/tty_io.c:2992 vfs_ioctl fs/ioctl.c:43 [inline] do_vfs_ioctl+0x13e/0xba0 fs/ioctl.c:679 SYSC_ioctl fs/ioctl.c:694 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685 entry_SYSCALL_64_fastpath+0x1f/0xbd Signed-off-by: DaeRyong Jeong Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/pty.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 2b907385b4a8..171130a9ecc8 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -106,16 +106,19 @@ static void pty_unthrottle(struct tty_struct *tty) static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c) { struct tty_struct *to = tty->link; + unsigned long flags; if (tty->stopped) return 0; if (c > 0) { + spin_lock_irqsave(&to->port->lock, flags); /* Stuff the data into the input queue of the other end */ c = tty_insert_flip_string(to->port, buf, c); /* And shovel */ if (c) tty_flip_buffer_push(to->port); + spin_unlock_irqrestore(&to->port->lock, flags); } return c; } From 4fccb92b53a6474fce276882b928dea71c7658e5 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 8 May 2018 13:14:33 +0100 Subject: [PATCH 407/970] dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 8a22a3e1e768c309b718f99bd86f9f25a453e0dc ] Inclusion of include/dma-iommu.h when CONFIG_IOMMU_DMA is not selected results in the following splat: In file included from drivers/irqchip/irq-gic-v3-mbi.c:20:0: ./include/linux/dma-iommu.h:95:69: error: unknown type name ‘dma_addr_t’ static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base) ^~~~~~~~~~ ./include/linux/dma-iommu.h:108:74: warning: ‘struct list_head’ declared inside parameter list will not be visible outside of this definition or declaration static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) ^~~~~~~~~ scripts/Makefile.build:312: recipe for target 'drivers/irqchip/irq-gic-v3-mbi.o' failed Fix it by including linux/types.h. Signed-off-by: Marc Zyngier Signed-off-by: Thomas Gleixner Cc: Rob Herring Cc: Jason Cooper Cc: Ard Biesheuvel Cc: Srinivas Kandagatla Cc: Thomas Petazzoni Cc: Miquel Raynal Link: https://lkml.kernel.org/r/20180508121438.11301-5-marc.zyngier@arm.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/dma-iommu.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 32c589062bd9..f30d23011060 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -17,6 +17,7 @@ #define __DMA_IOMMU_H #ifdef __KERNEL__ +#include #include #ifdef CONFIG_IOMMU_DMA From f3be42dc93677e7d6807a6521569b4cea772b140 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Tue, 12 Jul 2016 07:21:46 -0400 Subject: [PATCH 408/970] media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open() [ Upstream commit 43d0d3c52787df0221d1c52494daabd824fe84f1 ] Add the missing clk_disable_unprepare() before return from jpu_open() in the software reset error handling case. Signed-off-by: Wei Yongjun Acked-by: Mikhail Ulyanov Reviewed-by: Kieran Bingham Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/platform/rcar_jpu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/media/platform/rcar_jpu.c b/drivers/media/platform/rcar_jpu.c index d1746ecc645d..db1110a492e0 100644 --- a/drivers/media/platform/rcar_jpu.c +++ b/drivers/media/platform/rcar_jpu.c @@ -1280,7 +1280,7 @@ static int jpu_open(struct file *file) /* ...issue software reset */ ret = jpu_reset(jpu); if (ret) - goto device_prepare_rollback; + goto jpu_reset_rollback; } jpu->ref_count++; @@ -1288,6 +1288,8 @@ static int jpu_open(struct file *file) mutex_unlock(&jpu->mutex); return 0; +jpu_reset_rollback: + clk_disable_unprepare(jpu->clk); device_prepare_rollback: mutex_unlock(&jpu->mutex); v4l_prepare_rollback: From cbc0c24c9c9f51c5b4d429bbd6230a8660628462 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Wed, 9 May 2018 09:28:12 +0900 Subject: [PATCH 409/970] libata: Fix command retry decision [ Upstream commit 804689ad2d9b66d0d3920b48cf05881049d44589 ] For failed commands with valid sense data (e.g. NCQ commands), scsi_check_sense() is used in ata_analyze_tf() to determine if the command can be retried. In such case, rely on this decision and ignore the command error mask based decision done in ata_worth_retry(). This fixes useless retries of commands such as unaligned writes on zoned disks (TYPE_ZAC). Signed-off-by: Damien Le Moal Reviewed-by: Hannes Reinecke Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/ata/libata-eh.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 6475a1343483..90c38778bc1f 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2282,12 +2282,16 @@ static void ata_eh_link_autopsy(struct ata_link *link) if (qc->err_mask & ~AC_ERR_OTHER) qc->err_mask &= ~AC_ERR_OTHER; - /* SENSE_VALID trumps dev/unknown error and revalidation */ + /* + * SENSE_VALID trumps dev/unknown error and revalidation. Upper + * layers will determine whether the command is worth retrying + * based on the sense data and device class/type. Otherwise, + * determine directly if the command is worth retrying using its + * error mask and flags. + */ if (qc->flags & ATA_QCFLAG_SENSE_VALID) qc->err_mask &= ~(AC_ERR_DEV | AC_ERR_OTHER); - - /* determine whether the command is worth retrying */ - if (ata_eh_worth_retry(qc)) + else if (ata_eh_worth_retry(qc)) qc->flags |= ATA_QCFLAG_RETRY; /* accumulate error info */ From f638764e9baa63a652db22e68a426f76ed37c1d1 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Mon, 7 May 2018 14:09:46 -0400 Subject: [PATCH 410/970] media: media-device: fix ioctl function types [ Upstream commit daa36370b62428cca6d48d1b2530a8419f631c8c ] This change fixes function types for media device ioctls to avoid indirect call mismatches with Control-Flow Integrity checking. Signed-off-by: Sami Tolvanen Acked-by: Sakari Ailus Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/media-device.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c index 4462d8c69d57..6f46c59415fe 100644 --- a/drivers/media/media-device.c +++ b/drivers/media/media-device.c @@ -58,9 +58,10 @@ static int media_device_close(struct file *filp) return 0; } -static int media_device_get_info(struct media_device *dev, - struct media_device_info *info) +static long media_device_get_info(struct media_device *dev, void *arg) { + struct media_device_info *info = arg; + memset(info, 0, sizeof(*info)); if (dev->driver_name[0]) @@ -97,9 +98,9 @@ static struct media_entity *find_entity(struct media_device *mdev, u32 id) return NULL; } -static long media_device_enum_entities(struct media_device *mdev, - struct media_entity_desc *entd) +static long media_device_enum_entities(struct media_device *mdev, void *arg) { + struct media_entity_desc *entd = arg; struct media_entity *ent; ent = find_entity(mdev, entd->id); @@ -150,9 +151,9 @@ static void media_device_kpad_to_upad(const struct media_pad *kpad, upad->flags = kpad->flags; } -static long media_device_enum_links(struct media_device *mdev, - struct media_links_enum *links) +static long media_device_enum_links(struct media_device *mdev, void *arg) { + struct media_links_enum *links = arg; struct media_entity *entity; entity = find_entity(mdev, links->entity); @@ -198,9 +199,9 @@ static long media_device_enum_links(struct media_device *mdev, return 0; } -static long media_device_setup_link(struct media_device *mdev, - struct media_link_desc *linkd) +static long media_device_setup_link(struct media_device *mdev, void *arg) { + struct media_link_desc *linkd = arg; struct media_link *link = NULL; struct media_entity *source; struct media_entity *sink; @@ -226,9 +227,9 @@ static long media_device_setup_link(struct media_device *mdev, return __media_entity_setup_link(link, linkd->flags); } -static long media_device_get_topology(struct media_device *mdev, - struct media_v2_topology *topo) +static long media_device_get_topology(struct media_device *mdev, void *arg) { + struct media_v2_topology *topo = arg; struct media_entity *entity; struct media_interface *intf; struct media_pad *pad; From 523a9ce7d2c807066e4c99fc19b30e36b02c17f2 Mon Sep 17 00:00:00 2001 From: Brad Love Date: Fri, 4 May 2018 17:53:35 -0400 Subject: [PATCH 411/970] media: saa7164: Fix driver name in debug output [ Upstream commit 0cc4655cb57af0b7e105d075c4f83f8046efafe7 ] This issue was reported by a user who downloaded a corrupt saa7164 firmware, then went looking for a valid xc5000 firmware to fix the error displayed...but the device in question has no xc5000, thus after much effort, the wild goose chase eventually led to a support call. The xc5000 has nothing to do with saa7164 (as far as I can tell), so replace the string with saa7164 as well as give a meaningful hint on the firmware mismatch. Signed-off-by: Brad Love Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/pci/saa7164/saa7164-fw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/media/pci/saa7164/saa7164-fw.c b/drivers/media/pci/saa7164/saa7164-fw.c index 269e0782c7b6..93d53195e8ca 100644 --- a/drivers/media/pci/saa7164/saa7164-fw.c +++ b/drivers/media/pci/saa7164/saa7164-fw.c @@ -430,7 +430,8 @@ int saa7164_downloadfirmware(struct saa7164_dev *dev) __func__, fw->size); if (fw->size != fwlength) { - printk(KERN_ERR "xc5000: firmware incorrect size\n"); + printk(KERN_ERR "saa7164: firmware incorrect size %zu != %u\n", + fw->size, fwlength); ret = -ENOMEM; goto out; } From e70e69a8dcda82aba0eba04323555404724691fe Mon Sep 17 00:00:00 2001 From: Jane Wan Date: Tue, 8 May 2018 14:19:53 -0700 Subject: [PATCH 412/970] mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages [ Upstream commit a75bbe71a27875fdc61cde1af6d799037cef6bed ] Per ONFI specification (Rev. 4.0), if the CRC of the first parameter page read is not valid, the host should read redundant parameter page copies. Fix FSL NAND driver to read the two redundant copies which are mandatory in the specification. Signed-off-by: Jane Wan Signed-off-by: Boris Brezillon Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/nand/fsl_ifc_nand.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c index 2f6b55229d5b..4c3b986dd74d 100644 --- a/drivers/mtd/nand/fsl_ifc_nand.c +++ b/drivers/mtd/nand/fsl_ifc_nand.c @@ -372,9 +372,16 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, case NAND_CMD_READID: case NAND_CMD_PARAM: { + /* + * For READID, read 8 bytes that are currently used. + * For PARAM, read all 3 copies of 256-bytes pages. + */ + int len = 8; int timing = IFC_FIR_OP_RB; - if (command == NAND_CMD_PARAM) + if (command == NAND_CMD_PARAM) { timing = IFC_FIR_OP_RBCD; + len = 256 * 3; + } ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) | (IFC_FIR_OP_UA << IFC_NAND_FIR0_OP1_SHIFT) | @@ -384,12 +391,8 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, &ifc->ifc_nand.nand_fcr0); ifc_out32(column, &ifc->ifc_nand.row3); - /* - * although currently it's 8 bytes for READID, we always read - * the maximum 256 bytes(for PARAM) - */ - ifc_out32(256, &ifc->ifc_nand.nand_fbcr); - ifc_nand_ctrl->read_bytes = 256; + ifc_out32(len, &ifc->ifc_nand.nand_fbcr); + ifc_nand_ctrl->read_bytes = len; set_addr(mtd, 0, 0, 0); fsl_ifc_run_command(mtd); From 4139a621020b96e805115f1d0c32d1ba3d142e67 Mon Sep 17 00:00:00 2001 From: Sean Lanigan Date: Fri, 4 May 2018 16:48:23 +1000 Subject: [PATCH 413/970] brcmfmac: Add support for bcm43364 wireless chipset [ Upstream commit 9c4a121e82634aa000a702c98cd6f05b27d6e186 ] Add support for the BCM43364 chipset via an SDIO interface, as used in e.g. the Murata 1FX module. The BCM43364 uses the same firmware as the BCM43430 (which is already included), the only difference is the omission of Bluetooth. However, the SDIO_ID for the BCM43364 is 02D0:A9A4, giving it a MODALIAS of sdio:c00v02D0dA9A4, which doesn't get recognised and hence doesn't load the brcmfmac module. Adding the 'A9A4' ID in the appropriate place triggers the brcmfmac driver to load, and then correctly use the firmware file 'brcmfmac43430-sdio.bin'. Signed-off-by: Sean Lanigan Acked-by: Ulf Hansson Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 1 + include/linux/mmc/sdio_ids.h | 1 + 2 files changed, 2 insertions(+) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c index 746f8c9a891d..e69cf0ef9574 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c @@ -1099,6 +1099,7 @@ static const struct sdio_device_id brcmf_sdmmc_ids[] = { BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43340), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43341), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43362), + BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43364), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_4335_4339), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_4339), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43430), diff --git a/include/linux/mmc/sdio_ids.h b/include/linux/mmc/sdio_ids.h index d43ef96bf075..3e4d4f4bccd3 100644 --- a/include/linux/mmc/sdio_ids.h +++ b/include/linux/mmc/sdio_ids.h @@ -34,6 +34,7 @@ #define SDIO_DEVICE_ID_BROADCOM_4335_4339 0x4335 #define SDIO_DEVICE_ID_BROADCOM_4339 0x4339 #define SDIO_DEVICE_ID_BROADCOM_43362 0xa962 +#define SDIO_DEVICE_ID_BROADCOM_43364 0xa9a4 #define SDIO_DEVICE_ID_BROADCOM_43430 0xa9a6 #define SDIO_DEVICE_ID_BROADCOM_4345 0x4345 #define SDIO_DEVICE_ID_BROADCOM_4354 0x4354 From 157674ac443eff2f4d93d025c7d82e52170b5501 Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Tue, 8 May 2018 10:18:39 +0200 Subject: [PATCH 414/970] s390/cpum_sf: Add data entry sizes to sampling trailer entry [ Upstream commit 77715b7ddb446bd39a06f3376e85f4bb95b29bb8 ] The CPU Measurement sampling facility creates a trailer entry for each Sample-Data-Block of stored samples. The trailer entry contains the sizes (in bytes) of the stored sampling types: - basic-sampling data entry size - diagnostic-sampling data entry size Both sizes are 2 bytes long. This patch changes the trailer entry definition to reflect this. Fixes: fcc77f507333 ("s390/cpum_sf: Atomically reset trailer entry fields of sample-data-blocks") Signed-off-by: Thomas Richter Reviewed-by: Hendrik Brueckner Signed-off-by: Martin Schwidefsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/cpu_mf.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/s390/include/asm/cpu_mf.h b/arch/s390/include/asm/cpu_mf.h index 03516476127b..e698d98ed64c 100644 --- a/arch/s390/include/asm/cpu_mf.h +++ b/arch/s390/include/asm/cpu_mf.h @@ -129,7 +129,9 @@ struct hws_trailer_entry { unsigned int f:1; /* 0 - Block Full Indicator */ unsigned int a:1; /* 1 - Alert request control */ unsigned int t:1; /* 2 - Timestamp format */ - unsigned long long:61; /* 3 - 63: Reserved */ + unsigned int :29; /* 3 - 31: Reserved */ + unsigned int bsdes:16; /* 32-47: size of basic SDE */ + unsigned int dsdes:16; /* 48-63: size of diagnostic SDE */ }; unsigned long long flags; /* 0 - 63: All indicators */ }; From 67d64e1cb1d278e9b8689830f8339704088d7874 Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Tue, 8 May 2018 07:53:39 +0200 Subject: [PATCH 415/970] perf: fix invalid bit in diagnostic entry [ Upstream commit 3c0a83b14ea71fef5ccc93a3bd2de5f892be3194 ] The s390 CPU measurement facility sampling mode supports basic entries and diagnostic entries. Each entry has a valid bit to indicate the status of the entry as valid or invalid. This bit is bit 31 in the diagnostic entry, but the bit mask definition refers to bit 30. Fix this by making the reserved field one bit larger. Fixes: 7e75fc3ff4cf ("s390/cpum_sf: Add raw data sampling to support the diagnostic-sampling function") Signed-off-by: Thomas Richter Reviewed-by: Hendrik Brueckner Signed-off-by: Martin Schwidefsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/cpu_mf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/cpu_mf.h b/arch/s390/include/asm/cpu_mf.h index e698d98ed64c..ee64e624c511 100644 --- a/arch/s390/include/asm/cpu_mf.h +++ b/arch/s390/include/asm/cpu_mf.h @@ -113,7 +113,7 @@ struct hws_basic_entry { struct hws_diag_entry { unsigned int def:16; /* 0-15 Data Entry Format */ - unsigned int R:14; /* 16-19 and 20-30 reserved */ + unsigned int R:15; /* 16-19 and 20-30 reserved */ unsigned int I:1; /* 31 entry valid or invalid */ u8 data[]; /* Machine-dependent sample data */ } __packed; From a85b32ebaac08970d41a7b0105ce51fed90d153e Mon Sep 17 00:00:00 2001 From: Michael Chan Date: Tue, 8 May 2018 03:18:39 -0400 Subject: [PATCH 416/970] bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only. [ Upstream commit dac0490718bd17df5e3995ffca14255e5f9ed22d ] Only non-NPAR PFs need to actively check and manage unsupported link speeds. NPAR functions and VFs do not control the link speed and should skip the unsupported speed detection logic, to avoid warning messages from firmware rejecting the unsupported firmware calls. Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index ca57eb56c717..8777c3a4c095 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -5257,6 +5257,9 @@ static int bnxt_update_link(struct bnxt *bp, bool chng_link_state) } mutex_unlock(&bp->hwrm_cmd_lock); + if (!BNXT_SINGLE_PF(bp)) + return 0; + diff = link_info->support_auto_speeds ^ link_info->advertising; if ((link_info->support_auto_speeds | diff) != link_info->support_auto_speeds) { From 80e75bdc0e1be707edfd3730a2b46595648cf5b8 Mon Sep 17 00:00:00 2001 From: Wenwen Wang Date: Mon, 7 May 2018 19:46:43 -0500 Subject: [PATCH 417/970] scsi: 3w-9xxx: fix a missing-check bug [ Upstream commit c9318a3e0218bc9dacc25be46b9eec363259536f ] In twa_chrdev_ioctl(), the ioctl driver command is firstly copied from the userspace pointer 'argp' and saved to the kernel object 'driver_command'. Then a security check is performed on the data buffer size indicated by 'driver_command', which is 'driver_command.buffer_length'. If the security check is passed, the entire ioctl command is copied again from the 'argp' pointer and saved to the kernel object 'tw_ioctl'. Then, various operations are performed on 'tw_ioctl' according to the 'cmd'. Given that the 'argp' pointer resides in userspace, a malicious userspace process can race to change the buffer size between the two copies. This way, the user can bypass the security check and inject invalid data buffer size. This can cause potential security issues in the following execution. This patch checks for capable(CAP_SYS_ADMIN) in twa_chrdev_open()t o avoid the above issues. Signed-off-by: Wenwen Wang Acked-by: Adam Radford Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/3w-9xxx.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c index a56a7b243e91..5466246c69b4 100644 --- a/drivers/scsi/3w-9xxx.c +++ b/drivers/scsi/3w-9xxx.c @@ -889,6 +889,11 @@ static int twa_chrdev_open(struct inode *inode, struct file *file) unsigned int minor_number; int retval = TW_IOCTL_ERROR_OS_ENODEV; + if (!capable(CAP_SYS_ADMIN)) { + retval = -EACCES; + goto out; + } + minor_number = iminor(inode); if (minor_number >= twa_device_extension_count) goto out; From 5a644f682267dff1614e5aaf23cdcffeefe91213 Mon Sep 17 00:00:00 2001 From: Wenwen Wang Date: Mon, 7 May 2018 19:54:01 -0500 Subject: [PATCH 418/970] scsi: 3w-xxxx: fix a missing-check bug [ Upstream commit 9899e4d3523faaef17c67141aa80ff2088f17871 ] In tw_chrdev_ioctl(), the length of the data buffer is firstly copied from the userspace pointer 'argp' and saved to the kernel object 'data_buffer_length'. Then a security check is performed on it to make sure that the length is not more than 'TW_MAX_IOCTL_SECTORS * 512'. Otherwise, an error code -EINVAL is returned. If the security check is passed, the entire ioctl command is copied again from the 'argp' pointer and saved to the kernel object 'tw_ioctl'. Then, various operations are performed on 'tw_ioctl' according to the 'cmd'. Given that the 'argp' pointer resides in userspace, a malicious userspace process can race to change the buffer length between the two copies. This way, the user can bypass the security check and inject invalid data buffer length. This can cause potential security issues in the following execution. This patch checks for capable(CAP_SYS_ADMIN) in tw_chrdev_open() to avoid the above issues. Signed-off-by: Wenwen Wang Acked-by: Adam Radford Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/3w-xxxx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/3w-xxxx.c b/drivers/scsi/3w-xxxx.c index 25aba1613e21..24ac19e31003 100644 --- a/drivers/scsi/3w-xxxx.c +++ b/drivers/scsi/3w-xxxx.c @@ -1034,6 +1034,9 @@ static int tw_chrdev_open(struct inode *inode, struct file *file) dprintk(KERN_WARNING "3w-xxxx: tw_ioctl_open()\n"); + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + minor_number = iminor(inode); if (minor_number >= tw_device_extension_count) return -ENODEV; From 749c6f0e3b5d22a50c026c75b06c030650a46af2 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 3 May 2018 13:54:32 +0300 Subject: [PATCH 419/970] scsi: megaraid: silence a static checker bug [ Upstream commit 27e833dabab74ee665e487e291c9afc6d71effba ] If we had more than 32 megaraid cards then it would cause memory corruption. That's not likely, of course, but it's handy to enforce it and make the static checker happy. Signed-off-by: Dan Carpenter Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/megaraid.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 9d05302a3bcd..19bffe0b2cc0 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -4197,6 +4197,9 @@ megaraid_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) int irq, i, j; int error = -ENODEV; + if (hba_count >= MAX_CONTROLLERS) + goto out; + if (pci_enable_device(pdev)) goto out; pci_set_master(pdev); From 30f32e09af72473965dfd1f1d86550df3a45112f Mon Sep 17 00:00:00 2001 From: Doug Oucahrek Date: Tue, 1 May 2018 22:22:19 -0700 Subject: [PATCH 420/970] staging: lustre: o2iblnd: fix race at kiblnd_connect_peer [ Upstream commit cf04968efe341b9b1c30a527e5dd61b2af9c43d2 ] cmid will be destroyed at OFED if kiblnd_cm_callback return error. if error happen before the end of kiblnd_connect_peer, it will touch destroyed cmid and fail as (o2iblnd_cb.c:1315:kiblnd_connect_peer()) ASSERTION( cmid->device != ((void *)0) ) failed: Signed-off-by: Alexander Boyko Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10015 Reviewed-by: Alexey Lyashkov Reviewed-by: Doug Oucharek Reviewed-by: John L. Hammond Signed-off-by: Doug Oucharek Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- .../lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index ea9a0c21d29d..4ff293129675 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1299,11 +1299,6 @@ kiblnd_connect_peer(struct kib_peer *peer) goto failed2; } - LASSERT(cmid->device); - CDEBUG(D_NET, "%s: connection bound to %s:%pI4h:%s\n", - libcfs_nid2str(peer->ibp_nid), dev->ibd_ifname, - &dev->ibd_ifip, cmid->device->name); - return; failed2: @@ -3005,8 +3000,19 @@ kiblnd_cm_callback(struct rdma_cm_id *cmid, struct rdma_cm_event *event) } else { rc = rdma_resolve_route( cmid, *kiblnd_tunables.kib_timeout * 1000); - if (!rc) + if (!rc) { + struct kib_net *net = peer->ibp_ni->ni_data; + struct kib_dev *dev = net->ibn_dev; + + CDEBUG(D_NET, "%s: connection bound to "\ + "%s:%pI4h:%s\n", + libcfs_nid2str(peer->ibp_nid), + dev->ibd_ifname, + &dev->ibd_ifip, cmid->device->name); + return 0; + } + /* Can't initiate route resolution */ CERROR("Can't resolve route for %s: %d\n", libcfs_nid2str(peer->ibp_nid), rc); From 3221a270e2c28cad0144421bd8a9a32c23d31e6c Mon Sep 17 00:00:00 2001 From: Bartlomiej Zolnierkiewicz Date: Thu, 26 Apr 2018 13:51:16 +0200 Subject: [PATCH 421/970] thermal: exynos: fix setting rising_threshold for Exynos5433 [ Upstream commit 8bfc218d0ebbabcba8ed2b8ec1831e0cf1f71629 ] Add missing clearing of the previous value when setting rising temperature threshold. Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Eduardo Valentin Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/thermal/samsung/exynos_tmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c index a45810b43f70..c974cb5fb958 100644 --- a/drivers/thermal/samsung/exynos_tmu.c +++ b/drivers/thermal/samsung/exynos_tmu.c @@ -598,6 +598,7 @@ static int exynos5433_tmu_initialize(struct platform_device *pdev) threshold_code = temp_to_code(data, temp); rising_threshold = readl(data->base + rising_reg_offset); + rising_threshold &= ~(0xff << j * 8); rising_threshold |= (threshold_code << j * 8); writel(rising_threshold, data->base + rising_reg_offset); From e31a06ec828ff8184cc2c1fbae49be783c3b4f11 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 3 May 2018 18:37:17 -0700 Subject: [PATCH 422/970] bpf: fix references to free_bpf_prog_info() in comments [ Upstream commit ab7f5bf0928be2f148d000a6eaa6c0a36e74750e ] Comments in the verifier refer to free_bpf_prog_info() which seems to have never existed in tree. Replace it with free_used_maps(). Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet Signed-off-by: Daniel Borkmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 076e4a0ff95e..dafa2708ce9e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3225,7 +3225,7 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env) /* hold the map. If the program is rejected by verifier, * the map will be released by release_maps() or it * will be used by the valid program until it's unloaded - * and all maps are released in free_bpf_prog_info() + * and all maps are released in free_used_maps() */ map = bpf_map_inc(map, false); if (IS_ERR(map)) { @@ -3629,7 +3629,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr) vfree(log_buf); if (!env->prog->aux->used_maps) /* if we didn't copy map pointers into bpf_prog_info, release - * them now. Otherwise free_bpf_prog_info() will release them. + * them now. Otherwise free_used_maps() will release them. */ release_maps(env); *prog = env->prog; From f3382cb5572f5e670e533e3a300138f03c7ad926 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 20 Apr 2018 08:32:16 -0400 Subject: [PATCH 423/970] media: siano: get rid of __le32/__le16 cast warnings [ Upstream commit e1b7f11b37def5f3021c06e8c2b4953e099357aa ] Those are all false-positives that appear with smatch when building for arm: drivers/media/common/siano/smsendian.c:38:36: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:38:36: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:38:36: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:38:36: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:38:36: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:38:36: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:47:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:47:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:47:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:47:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:47:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:47:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:67:35: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:67:35: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:67:35: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:67:35: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:84:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:84:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:84:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:84:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:84:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:84:44: warning: cast to restricted __le32 drivers/media/common/siano/smsendian.c:98:26: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:98:26: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:98:26: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:98:26: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:99:28: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:99:28: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:99:28: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:99:28: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:100:27: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:100:27: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:100:27: warning: cast to restricted __le16 drivers/media/common/siano/smsendian.c:100:27: warning: cast to restricted __le16 Get rid of them by adding explicit forced casts. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/common/siano/smsendian.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/media/common/siano/smsendian.c b/drivers/media/common/siano/smsendian.c index bfe831c10b1c..b95a631f23f9 100644 --- a/drivers/media/common/siano/smsendian.c +++ b/drivers/media/common/siano/smsendian.c @@ -35,7 +35,7 @@ void smsendian_handle_tx_message(void *buffer) switch (msg->x_msg_header.msg_type) { case MSG_SMS_DATA_DOWNLOAD_REQ: { - msg->msg_data[0] = le32_to_cpu(msg->msg_data[0]); + msg->msg_data[0] = le32_to_cpu((__force __le32)(msg->msg_data[0])); break; } @@ -44,7 +44,7 @@ void smsendian_handle_tx_message(void *buffer) sizeof(struct sms_msg_hdr))/4; for (i = 0; i < msg_words; i++) - msg->msg_data[i] = le32_to_cpu(msg->msg_data[i]); + msg->msg_data[i] = le32_to_cpu((__force __le32)msg->msg_data[i]); break; } @@ -64,7 +64,7 @@ void smsendian_handle_rx_message(void *buffer) { struct sms_version_res *ver = (struct sms_version_res *) msg; - ver->chip_model = le16_to_cpu(ver->chip_model); + ver->chip_model = le16_to_cpu((__force __le16)ver->chip_model); break; } @@ -81,7 +81,7 @@ void smsendian_handle_rx_message(void *buffer) sizeof(struct sms_msg_hdr))/4; for (i = 0; i < msg_words; i++) - msg->msg_data[i] = le32_to_cpu(msg->msg_data[i]); + msg->msg_data[i] = le32_to_cpu((__force __le32)msg->msg_data[i]); break; } @@ -95,9 +95,9 @@ void smsendian_handle_message_header(void *msg) #ifdef __BIG_ENDIAN struct sms_msg_hdr *phdr = (struct sms_msg_hdr *)msg; - phdr->msg_type = le16_to_cpu(phdr->msg_type); - phdr->msg_length = le16_to_cpu(phdr->msg_length); - phdr->msg_flags = le16_to_cpu(phdr->msg_flags); + phdr->msg_type = le16_to_cpu((__force __le16)phdr->msg_type); + phdr->msg_length = le16_to_cpu((__force __le16)phdr->msg_length); + phdr->msg_flags = le16_to_cpu((__force __le16)phdr->msg_flags); #endif /* __BIG_ENDIAN */ } EXPORT_SYMBOL_GPL(smsendian_handle_message_header); From 004256bb888290cd26eadcd07e72bd9e9f5e497a Mon Sep 17 00:00:00 2001 From: Satendra Singh Thakur Date: Thu, 3 May 2018 11:19:32 +0530 Subject: [PATCH 424/970] drm/atomic: Handling the case when setting old crtc for plane [ Upstream commit fc2a69f3903dfd97cd47f593e642b47918c949df ] In the func drm_atomic_set_crtc_for_plane, with the current code, if crtc of the plane_state and crtc passed as argument to the func are same, entire func will executed in vein. It will get state of crtc and clear and set the bits in plane_mask. All these steps are not required for same old crtc. Ideally, we should do nothing in this case, this patch handles the same, and causes the program to return without doing anything in such scenario. Signed-off-by: Satendra Singh Thakur Cc: Madhur Verma Cc: Hemanshu Srivastava Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/1525326572-25854-1-git-send-email-satendra.t@samsung.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_atomic.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 34adde169a78..dd6fff1c98d6 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -1091,7 +1091,9 @@ drm_atomic_set_crtc_for_plane(struct drm_plane_state *plane_state, { struct drm_plane *plane = plane_state->plane; struct drm_crtc_state *crtc_state; - + /* Nothing to do for same crtc*/ + if (plane_state->crtc == crtc) + return 0; if (plane_state->crtc) { crtc_state = drm_atomic_get_crtc_state(plane_state->state, plane_state->crtc); From 575aa79d55a65bd2b0485836196d05a7041cb68d Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Wed, 2 May 2018 22:48:16 +0900 Subject: [PATCH 425/970] ALSA: hda/ca0132: fix build failure when a local macro is defined [ Upstream commit 8e142e9e628975b0dddd05cf1b095331dff6e2de ] DECLARE_TLV_DB_SCALE (alias of SNDRV_CTL_TLVD_DECLARE_DB_SCALE) is used but tlv.h is not included. This causes build failure when local macro is defined by comment-out. This commit fixes the bug. At the same time, the alias macro is replaced with a destination macro added at a commit 46e860f76804 ("ALSA: rename TLV-related macros so that they're friendly to user applications") Reported-by: Connor McAdams Fixes: 44f0c9782cc6 ('ALSA: hda/ca0132: Add tuning controls') Signed-off-by: Takashi Sakamoto Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_ca0132.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c index 9ec4dba8a793..280999961226 100644 --- a/sound/pci/hda/patch_ca0132.c +++ b/sound/pci/hda/patch_ca0132.c @@ -38,6 +38,10 @@ /* Enable this to see controls for tuning purpose. */ /*#define ENABLE_TUNING_CONTROLS*/ +#ifdef ENABLE_TUNING_CONTROLS +#include +#endif + #define FLOAT_ZERO 0x00000000 #define FLOAT_ONE 0x3f800000 #define FLOAT_TWO 0x40000000 @@ -3067,8 +3071,8 @@ static int equalizer_ctl_put(struct snd_kcontrol *kcontrol, return 1; } -static const DECLARE_TLV_DB_SCALE(voice_focus_db_scale, 2000, 100, 0); -static const DECLARE_TLV_DB_SCALE(eq_db_scale, -2400, 100, 0); +static const SNDRV_CTL_TLVD_DECLARE_DB_SCALE(voice_focus_db_scale, 2000, 100, 0); +static const SNDRV_CTL_TLVD_DECLARE_DB_SCALE(eq_db_scale, -2400, 100, 0); static int add_tuning_control(struct hda_codec *codec, hda_nid_t pnid, hda_nid_t nid, From de3466cc154e398f3e8de6b2139aadf8651eb03a Mon Sep 17 00:00:00 2001 From: Shawn Lin Date: Mon, 26 Mar 2018 17:26:25 +0800 Subject: [PATCH 426/970] mmc: dw_mmc: update actual clock for mmc debugfs [ Upstream commit ff178981bd5fd1667f373098740cb1c6d6efa1ba ] Respect the actual clock for mmc debugfs to help better debug the hardware. mmc_host mmc0: Bus speed (slot 0) = 135475200Hz (slot req 150000000Hz, actual 135475200HZ div = 0) cat /sys/kernel/debug/mmc0/ios clock: 150000000 Hz actual clock: 135475200 Hz vdd: 21 (3.3 ~ 3.4 V) bus mode: 2 (push-pull) chip select: 0 (don't care) power mode: 2 (on) bus width: 3 (8 bits) timing spec: 9 (mmc HS200) signal voltage: 0 (1.80 V) driver type: 0 (driver type B) Cc: Xiao Yao Cc: Ziyuan Signed-off-by: Shawn Lin Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/dw_mmc.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c index 1a1501fde010..e10a00d0d44d 100644 --- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -1164,6 +1164,8 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, bool force_clkinit) if (host->state == STATE_WAITING_CMD11_DONE) sdmmc_cmd_bits |= SDMMC_CMD_VOLT_SWITCH; + slot->mmc->actual_clock = 0; + if (!clock) { mci_writel(host, CLKENA, 0); mci_send_cmd(slot, sdmmc_cmd_bits, 0); @@ -1209,6 +1211,8 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, bool force_clkinit) /* keep the last clock value that was requested from core */ slot->__clk_old = clock; + slot->mmc->actual_clock = div ? ((host->bus_hz / div) >> 1) : + host->bus_hz; } host->current_speed = clock; From df157f60b9e7f12f30550254111c13c043d10529 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Mon, 26 Mar 2018 17:33:14 +1100 Subject: [PATCH 427/970] mmc: pwrseq: Use kmalloc_array instead of stack VLA [ Upstream commit 486e6661367b40f927aadbed73237693396cbf94 ] The use of stack Variable Length Arrays needs to be avoided, as they can be a vector for stack exhaustion, which can be both a runtime bug (kernel Oops) or a security flaw (overwriting memory beyond the stack). Also, in general, as code evolves it is easy to lose track of how big a VLA can get. Thus, we can end up having runtime failures that are hard to debug. As part of the directive[1] to remove all VLAs from the kernel, and build with -Wvla. Currently driver is using a VLA declared using the number of descriptors. This array is used to store integer values and is later used as an argument to `gpiod_set_array_value_cansleep()` This can be avoided by using `kmalloc_array()` to allocate memory for the array of integer values. Memory is free'd before return from function. >From the code it appears that it is safe to sleep so we can use GFP_KERNEL (based _cansleep() suffix of function `gpiod_set_array_value_cansleep()`. It can be expected that this patch will result in a small increase in overhead due to the use of `kmalloc_array()` [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Tobin C. Harding Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/pwrseq_simple.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/core/pwrseq_simple.c b/drivers/mmc/core/pwrseq_simple.c index 1304160de168..8cd9ddf1fab9 100644 --- a/drivers/mmc/core/pwrseq_simple.c +++ b/drivers/mmc/core/pwrseq_simple.c @@ -39,14 +39,18 @@ static void mmc_pwrseq_simple_set_gpios_value(struct mmc_pwrseq_simple *pwrseq, struct gpio_descs *reset_gpios = pwrseq->reset_gpios; if (!IS_ERR(reset_gpios)) { - int i; - int values[reset_gpios->ndescs]; + int i, *values; + int nvalues = reset_gpios->ndescs; - for (i = 0; i < reset_gpios->ndescs; i++) + values = kmalloc_array(nvalues, sizeof(int), GFP_KERNEL); + if (!values) + return; + + for (i = 0; i < nvalues; i++) values[i] = value; - gpiod_set_array_value_cansleep( - reset_gpios->ndescs, reset_gpios->desc, values); + gpiod_set_array_value_cansleep(nvalues, reset_gpios->desc, values); + kfree(values); } } From 77620f39904147da44737a43304baaf80f44d671 Mon Sep 17 00:00:00 2001 From: Martin Blumenstingl Date: Sun, 22 Apr 2018 12:53:28 +0200 Subject: [PATCH 428/970] dt-bindings: pinctrl: meson: add support for the Meson8m2 SoC [ Upstream commit 03d9fbc39730b3e6b2e7047dc85f0f70de8fb97d ] The Meson8m2 SoC is a variant of Meson8 with some updates from Meson8b (such as the Gigabit capable DesignWare MAC). It is mostly pin compatible with Meson8, only 10 (existing) CBUS pins get an additional function (four of these are Ethernet RXD2, RXD3, TXD2 and TXD3 which are required when the board uses an RGMII PHY). The AOBUS pins seem to be identical on Meson8 and Meson8m2. Signed-off-by: Martin Blumenstingl Reviewed-by: Rob Herring Reviewed-by: Kevin Hilman Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt index fe7fe0b03cfb..1b9881786ce9 100644 --- a/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt +++ b/Documentation/devicetree/bindings/pinctrl/meson,pinctrl.txt @@ -3,8 +3,10 @@ Required properties for the root node: - compatible: one of "amlogic,meson8-cbus-pinctrl" "amlogic,meson8b-cbus-pinctrl" + "amlogic,meson8m2-cbus-pinctrl" "amlogic,meson8-aobus-pinctrl" "amlogic,meson8b-aobus-pinctrl" + "amlogic,meson8m2-aobus-pinctrl" "amlogic,meson-gxbb-periphs-pinctrl" "amlogic,meson-gxbb-aobus-pinctrl" - reg: address and size of registers controlling irq functionality From 68f96e54102997ae9e956e2e434ce3befccc181e Mon Sep 17 00:00:00 2001 From: Yixun Lan Date: Sat, 28 Apr 2018 10:21:10 +0000 Subject: [PATCH 429/970] dt-bindings: net: meson-dwmac: new compatible name for AXG SoC [ Upstream commit 7e5d05e18ba1ed491c6f836edee7f0b90f3167bc ] We need to introduce a new compatible name for the Meson-AXG SoC in order to support the RMII 100M ethernet PHY, since the PRG_ETH0 register of the dwmac glue layer is changed from previous old SoC. Signed-off-by: Yixun Lan Reviewed-by: Rob Herring Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/net/meson-dwmac.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt b/Documentation/devicetree/bindings/net/meson-dwmac.txt index 89e62ddc69ca..da37da0fdd3f 100644 --- a/Documentation/devicetree/bindings/net/meson-dwmac.txt +++ b/Documentation/devicetree/bindings/net/meson-dwmac.txt @@ -10,6 +10,7 @@ Required properties on all platforms: - "amlogic,meson6-dwmac" - "amlogic,meson8b-dwmac" - "amlogic,meson-gxbb-dwmac" + - "amlogic,meson-axg-dwmac" Additionally "snps,dwmac" and any applicable more detailed version number described in net/stmmac.txt should be used. From 7d044d940faeb6a7e65a1d215415e425822f365c Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 23 Apr 2018 21:16:35 +0200 Subject: [PATCH 430/970] stop_machine: Use raw spinlocks [ Upstream commit de5b55c1d4e30740009864eb35ce4ed856aac01d ] Use raw-locks in stop_machine() to allow locking in irq-off and preempt-disabled regions on -RT. This also documents the possible locking context in general. [bigeasy: update patch description.] Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Link: https://lkml.kernel.org/r/20180423191635.6014-1-bigeasy@linutronix.de Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/stop_machine.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index ec9ab2f01489..9b8cd7ebf27b 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -36,7 +36,7 @@ struct cpu_stop_done { struct cpu_stopper { struct task_struct *thread; - spinlock_t lock; + raw_spinlock_t lock; bool enabled; /* is this stopper enabled? */ struct list_head works; /* list of pending works */ @@ -78,13 +78,13 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work) unsigned long flags; bool enabled; - spin_lock_irqsave(&stopper->lock, flags); + raw_spin_lock_irqsave(&stopper->lock, flags); enabled = stopper->enabled; if (enabled) __cpu_stop_queue_work(stopper, work); else if (work->done) cpu_stop_signal_done(work->done); - spin_unlock_irqrestore(&stopper->lock, flags); + raw_spin_unlock_irqrestore(&stopper->lock, flags); return enabled; } @@ -231,8 +231,8 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2); int err; retry: - spin_lock_irq(&stopper1->lock); - spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING); + raw_spin_lock_irq(&stopper1->lock); + raw_spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING); err = -ENOENT; if (!stopper1->enabled || !stopper2->enabled) @@ -255,8 +255,8 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, __cpu_stop_queue_work(stopper1, work1); __cpu_stop_queue_work(stopper2, work2); unlock: - spin_unlock(&stopper2->lock); - spin_unlock_irq(&stopper1->lock); + raw_spin_unlock(&stopper2->lock); + raw_spin_unlock_irq(&stopper1->lock); if (unlikely(err == -EDEADLK)) { while (stop_cpus_in_progress) @@ -448,9 +448,9 @@ static int cpu_stop_should_run(unsigned int cpu) unsigned long flags; int run; - spin_lock_irqsave(&stopper->lock, flags); + raw_spin_lock_irqsave(&stopper->lock, flags); run = !list_empty(&stopper->works); - spin_unlock_irqrestore(&stopper->lock, flags); + raw_spin_unlock_irqrestore(&stopper->lock, flags); return run; } @@ -461,13 +461,13 @@ static void cpu_stopper_thread(unsigned int cpu) repeat: work = NULL; - spin_lock_irq(&stopper->lock); + raw_spin_lock_irq(&stopper->lock); if (!list_empty(&stopper->works)) { work = list_first_entry(&stopper->works, struct cpu_stop_work, list); list_del_init(&work->list); } - spin_unlock_irq(&stopper->lock); + raw_spin_unlock_irq(&stopper->lock); if (work) { cpu_stop_fn_t fn = work->fn; @@ -541,7 +541,7 @@ static int __init cpu_stop_init(void) for_each_possible_cpu(cpu) { struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu); - spin_lock_init(&stopper->lock); + raw_spin_lock_init(&stopper->lock); INIT_LIST_HEAD(&stopper->works); } From 1516a6019485297c1c5b8d10a869fe6769d9227a Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Mon, 9 Apr 2018 22:28:27 +0300 Subject: [PATCH 431/970] memory: tegra: Do not handle spurious interrupts [ Upstream commit bf3fbdfbec947cdd04b2f2c4bce11534c8786eee ] The ISR reads interrupts-enable mask, but doesn't utilize it. Apply the mask to the interrupt status and don't handle interrupts that MC driver haven't asked for. Kernel would disable spurious MC IRQ and report the error. This would happen only in a case of a very severe bug. Signed-off-by: Dmitry Osipenko Signed-off-by: Thierry Reding Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/memory/tegra/mc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c index a4803ac192bb..d2005b995821 100644 --- a/drivers/memory/tegra/mc.c +++ b/drivers/memory/tegra/mc.c @@ -252,8 +252,11 @@ static irqreturn_t tegra_mc_irq(int irq, void *data) unsigned int bit; /* mask all interrupts to avoid flooding */ - status = mc_readl(mc, MC_INTSTATUS); mask = mc_readl(mc, MC_INTMASK); + status = mc_readl(mc, MC_INTSTATUS) & mask; + + if (!status) + return IRQ_NONE; for_each_set_bit(bit, &status, 32) { const char *error = status_names[bit] ?: "unknown"; From dc6afdde4b788ae91fc2e79e7e9e3e83351fa646 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Mon, 9 Apr 2018 22:28:29 +0300 Subject: [PATCH 432/970] memory: tegra: Apply interrupts mask per SoC [ Upstream commit 1c74d5c0de0c2cc29fef97a19251da2ad6f579bd ] Currently we are enabling handling of interrupts specific to Tegra124+ which happen to overlap with previous generations. Let's specify interrupts mask per SoC generation for consistency and in a preparation of squashing of Tegra20 driver into the common one that will enable handling of GART faults which may be undesirable by newer generations. Signed-off-by: Dmitry Osipenko Signed-off-by: Thierry Reding Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/memory/tegra/mc.c | 21 +++------------------ drivers/memory/tegra/mc.h | 9 +++++++++ drivers/memory/tegra/tegra114.c | 2 ++ drivers/memory/tegra/tegra124.c | 6 ++++++ drivers/memory/tegra/tegra210.c | 3 +++ drivers/memory/tegra/tegra30.c | 2 ++ include/soc/tegra/mc.h | 2 ++ 7 files changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c index d2005b995821..1d49a8dd4a37 100644 --- a/drivers/memory/tegra/mc.c +++ b/drivers/memory/tegra/mc.c @@ -20,14 +20,6 @@ #include "mc.h" #define MC_INTSTATUS 0x000 -#define MC_INT_DECERR_MTS (1 << 16) -#define MC_INT_SECERR_SEC (1 << 13) -#define MC_INT_DECERR_VPR (1 << 12) -#define MC_INT_INVALID_APB_ASID_UPDATE (1 << 11) -#define MC_INT_INVALID_SMMU_PAGE (1 << 10) -#define MC_INT_ARBITRATION_EMEM (1 << 9) -#define MC_INT_SECURITY_VIOLATION (1 << 8) -#define MC_INT_DECERR_EMEM (1 << 6) #define MC_INTMASK 0x004 @@ -248,13 +240,11 @@ static const char *const error_names[8] = { static irqreturn_t tegra_mc_irq(int irq, void *data) { struct tegra_mc *mc = data; - unsigned long status, mask; + unsigned long status; unsigned int bit; /* mask all interrupts to avoid flooding */ - mask = mc_readl(mc, MC_INTMASK); - status = mc_readl(mc, MC_INTSTATUS) & mask; - + status = mc_readl(mc, MC_INTSTATUS) & mc->soc->intmask; if (!status) return IRQ_NONE; @@ -349,7 +339,6 @@ static int tegra_mc_probe(struct platform_device *pdev) const struct of_device_id *match; struct resource *res; struct tegra_mc *mc; - u32 value; int err; match = of_match_node(tegra_mc_of_match, pdev->dev.of_node); @@ -417,11 +406,7 @@ static int tegra_mc_probe(struct platform_device *pdev) WARN(!mc->soc->client_id_mask, "Missing client ID mask for this SoC\n"); - value = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | - MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | - MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM; - - mc_writel(mc, value, MC_INTMASK); + mc_writel(mc, mc->soc->intmask, MC_INTMASK); return 0; } diff --git a/drivers/memory/tegra/mc.h b/drivers/memory/tegra/mc.h index ddb16676c3af..24e020b4609b 100644 --- a/drivers/memory/tegra/mc.h +++ b/drivers/memory/tegra/mc.h @@ -14,6 +14,15 @@ #include +#define MC_INT_DECERR_MTS (1 << 16) +#define MC_INT_SECERR_SEC (1 << 13) +#define MC_INT_DECERR_VPR (1 << 12) +#define MC_INT_INVALID_APB_ASID_UPDATE (1 << 11) +#define MC_INT_INVALID_SMMU_PAGE (1 << 10) +#define MC_INT_ARBITRATION_EMEM (1 << 9) +#define MC_INT_SECURITY_VIOLATION (1 << 8) +#define MC_INT_DECERR_EMEM (1 << 6) + static inline u32 mc_readl(struct tegra_mc *mc, unsigned long offset) { return readl(mc->regs + offset); diff --git a/drivers/memory/tegra/tegra114.c b/drivers/memory/tegra/tegra114.c index ba8fff3d66a6..6d2a5a849d92 100644 --- a/drivers/memory/tegra/tegra114.c +++ b/drivers/memory/tegra/tegra114.c @@ -930,4 +930,6 @@ const struct tegra_mc_soc tegra114_mc_soc = { .atom_size = 32, .client_id_mask = 0x7f, .smmu = &tegra114_smmu_soc, + .intmask = MC_INT_INVALID_SMMU_PAGE | MC_INT_SECURITY_VIOLATION | + MC_INT_DECERR_EMEM, }; diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c index 5a58e440f4a7..9f68a56f2727 100644 --- a/drivers/memory/tegra/tegra124.c +++ b/drivers/memory/tegra/tegra124.c @@ -1020,6 +1020,9 @@ const struct tegra_mc_soc tegra124_mc_soc = { .smmu = &tegra124_smmu_soc, .emem_regs = tegra124_mc_emem_regs, .num_emem_regs = ARRAY_SIZE(tegra124_mc_emem_regs), + .intmask = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | + MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | + MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM, }; #endif /* CONFIG_ARCH_TEGRA_124_SOC */ @@ -1042,5 +1045,8 @@ const struct tegra_mc_soc tegra132_mc_soc = { .atom_size = 32, .client_id_mask = 0x7f, .smmu = &tegra132_smmu_soc, + .intmask = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | + MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | + MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM, }; #endif /* CONFIG_ARCH_TEGRA_132_SOC */ diff --git a/drivers/memory/tegra/tegra210.c b/drivers/memory/tegra/tegra210.c index 5e144abe4c18..47c78a6d8f00 100644 --- a/drivers/memory/tegra/tegra210.c +++ b/drivers/memory/tegra/tegra210.c @@ -1077,4 +1077,7 @@ const struct tegra_mc_soc tegra210_mc_soc = { .atom_size = 64, .client_id_mask = 0xff, .smmu = &tegra210_smmu_soc, + .intmask = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | + MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | + MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM, }; diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c index b44737840e70..d0689428ea1a 100644 --- a/drivers/memory/tegra/tegra30.c +++ b/drivers/memory/tegra/tegra30.c @@ -952,4 +952,6 @@ const struct tegra_mc_soc tegra30_mc_soc = { .atom_size = 16, .client_id_mask = 0x7f, .smmu = &tegra30_smmu_soc, + .intmask = MC_INT_INVALID_SMMU_PAGE | MC_INT_SECURITY_VIOLATION | + MC_INT_DECERR_EMEM, }; diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h index 44202ff897fd..f759e0918037 100644 --- a/include/soc/tegra/mc.h +++ b/include/soc/tegra/mc.h @@ -99,6 +99,8 @@ struct tegra_mc_soc { u8 client_id_mask; const struct tegra_smmu_soc *smmu; + + u32 intmask; }; struct tegra_mc { From b7131631290e27ea6e91b7645bb22bae6b59d684 Mon Sep 17 00:00:00 2001 From: Enric Balletbo i Serra Date: Mon, 16 Apr 2018 11:39:57 -0300 Subject: [PATCH 433/970] arm64: defconfig: Enable Rockchip io-domain driver MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 7c8b77f81552c2b0e5d9c560da70bc4149ce66a5 ] Heiko Stübner justified pretty well the change in commit e330eb86ba0b ("ARM: multi_v7_defconfig: enable Rockchip io-domain driver"). This change is also needed for arm64 rockchip boards, so, do the same for arm64. The io-domain driver is necessary to notify the soc about voltages changes happening on supplying regulators. Probably the most important user right now is the mmc tuning code, where the soc needs to get notified when the voltage is dropped to the 1.8V point. As this option is necessary to successfully tune UHS cards etc, it should get built in. Otherwise, tuning will fail with, dwmmc_rockchip fe320000.dwmmc: All phases bad! mmc0: tuning execution failed: -5 Signed-off-by: Enric Balletbo i Serra Acked-by: Robin Murphy Signed-off-by: Heiko Stuebner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm64/configs/defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index dab2cb0c1f1c..b4c4d823569a 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -260,6 +260,8 @@ CONFIG_GPIO_XGENE=y CONFIG_GPIO_PCA953X=y CONFIG_GPIO_PCA953X_IRQ=y CONFIG_GPIO_MAX77620=y +CONFIG_POWER_AVS=y +CONFIG_ROCKCHIP_IODOMAIN=y CONFIG_POWER_RESET_MSM=y CONFIG_BATTERY_BQ27XXX=y CONFIG_POWER_RESET_XGENE=y From 917f481feb8d1492f9567ac6a7038866a39c9540 Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck Date: Tue, 24 Apr 2018 15:14:57 +0200 Subject: [PATCH 434/970] drm/gma500: fix psb_intel_lvds_mode_valid()'s return type [ Upstream commit 2ea009095c6e7396915a1d0dd480c41f02985f79 ] The method struct drm_connector_helper_funcs::mode_valid is defined as returning an 'enum drm_mode_status' but the driver implementation for this method, psb_intel_lvds_mode_valid(), uses an 'int' for it. Fix this by using 'enum drm_mode_status' for psb_intel_lvds_mode_valid(). Signed-off-by: Luc Van Oostenryck Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20180424131458.2060-1-luc.vanoostenryck@gmail.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/gma500/psb_intel_drv.h | 2 +- drivers/gpu/drm/gma500/psb_intel_lvds.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/gma500/psb_intel_drv.h b/drivers/gpu/drm/gma500/psb_intel_drv.h index 2a3b7c684db2..fbd3fa340c4f 100644 --- a/drivers/gpu/drm/gma500/psb_intel_drv.h +++ b/drivers/gpu/drm/gma500/psb_intel_drv.h @@ -255,7 +255,7 @@ extern int intelfb_remove(struct drm_device *dev, extern bool psb_intel_lvds_mode_fixup(struct drm_encoder *encoder, const struct drm_display_mode *mode, struct drm_display_mode *adjusted_mode); -extern int psb_intel_lvds_mode_valid(struct drm_connector *connector, +extern enum drm_mode_status psb_intel_lvds_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode); extern int psb_intel_lvds_set_property(struct drm_connector *connector, struct drm_property *property, diff --git a/drivers/gpu/drm/gma500/psb_intel_lvds.c b/drivers/gpu/drm/gma500/psb_intel_lvds.c index 79e9d3690667..e2c6ba3eded4 100644 --- a/drivers/gpu/drm/gma500/psb_intel_lvds.c +++ b/drivers/gpu/drm/gma500/psb_intel_lvds.c @@ -343,7 +343,7 @@ static void psb_intel_lvds_restore(struct drm_connector *connector) } } -int psb_intel_lvds_mode_valid(struct drm_connector *connector, +enum drm_mode_status psb_intel_lvds_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_psb_private *dev_priv = connector->dev->dev_private; From 34447a69c912dbfc5bd580e1860ce7d633bd47d3 Mon Sep 17 00:00:00 2001 From: Chris Novakovic Date: Tue, 24 Apr 2018 03:56:37 +0100 Subject: [PATCH 435/970] ipconfig: Correctly initialise ic_nameservers [ Upstream commit 300eec7c0a2495f771709c7642aa15f7cc148b83 ] ic_nameservers, which stores the list of name servers discovered by ipconfig, is initialised (i.e. has all of its elements set to NONE, or 0xffffffff) by ic_nameservers_predef() in the following scenarios: - before the "ip=" and "nfsaddrs=" kernel command line parameters are parsed (in ip_auto_config_setup()); - before autoconfiguring via DHCP or BOOTP (in ic_bootp_init()), in order to clear any values that may have been set after parsing "ip=" or "nfsaddrs=" and are no longer needed. This means that ic_nameservers_predef() is not called when neither "ip=" nor "nfsaddrs=" is specified on the kernel command line. In this scenario, every element in ic_nameservers remains set to 0x00000000, which is indistinguishable from ANY and causes pnp_seq_show() to write the following (bogus) information to /proc/net/pnp: #MANUAL nameserver 0.0.0.0 nameserver 0.0.0.0 nameserver 0.0.0.0 This is potentially problematic for systems that blindly link /etc/resolv.conf to /proc/net/pnp. Ensure that ic_nameservers is also initialised when neither "ip=" nor "nfsaddrs=" are specified by calling ic_nameservers_predef() in ip_auto_config(), but only when ip_auto_config_setup() was not called earlier. This causes the following to be written to /proc/net/pnp, and is consistent with what gets written when ipconfig is configured manually but no name servers are specified on the kernel command line: #MANUAL Signed-off-by: Chris Novakovic Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ipconfig.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index b23464d9c538..d278b06459ac 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -779,6 +779,11 @@ static void __init ic_bootp_init_ext(u8 *e) */ static inline void __init ic_bootp_init(void) { + /* Re-initialise all name servers to NONE, in case any were set via the + * "ip=" or "nfsaddrs=" kernel command line parameters: any IP addresses + * specified there will already have been decoded but are no longer + * needed + */ ic_nameservers_predef(); dev_add_pack(&bootp_packet_type); @@ -1401,6 +1406,13 @@ static int __init ip_auto_config(void) int err; unsigned int i; + /* Initialise all name servers to NONE (but only if the "ip=" or + * "nfsaddrs=" kernel command line parameters weren't decoded, otherwise + * we'll overwrite the IP addresses specified there) + */ + if (ic_set_manually == 0) + ic_nameservers_predef(); + #ifdef CONFIG_PROC_FS proc_create("pnp", S_IRUGO, init_net.proc_net, &pnp_seq_fops); #endif /* CONFIG_PROC_FS */ @@ -1621,6 +1633,7 @@ static int __init ip_auto_config_setup(char *addrs) return 1; } + /* Initialise all name servers to NONE */ ic_nameservers_predef(); /* Parse string for static IP assignment. */ From 3c90e828db81b22f37fa7d37b9759cd3f1d305ba Mon Sep 17 00:00:00 2001 From: Siva Rebbagondla Date: Wed, 11 Apr 2018 12:13:32 +0530 Subject: [PATCH 436/970] rsi: Fix 'invalid vdd' warning in mmc [ Upstream commit 78e450719c702784e42af6da912d3692fd3da0cb ] While performing cleanup, driver is messing with card->ocr value by not masking rocr against ocr_avail. Below panic is observed with some of the SDIO host controllers due to this. Issue is resolved by reverting incorrect modifications to vdd. [ 927.423821] mmc1: Invalid vdd 0x1f [ 927.423925] Modules linked in: rsi_sdio(+) cmac bnep arc4 rsi_91x mac80211 cfg80211 btrsi rfcomm bluetooth ecdh_generic [ 927.424073] CPU: 0 PID: 1624 Comm: insmod Tainted: G W 4.15.0-1000-caracalla #1 [ 927.424075] Hardware name: Dell Inc. Edge Gateway 3003/ , BIOS 01.00.06 01/22/2018 [ 927.424082] RIP: 0010:sdhci_set_power_noreg+0xdd/0x190[sdhci] [ 927.424085] RSP: 0018:ffffac3fc064b930 EFLAGS: 00010282 [ 927.424107] Call Trace: [ 927.424118] sdhci_set_power+0x5a/0x60 [sdhci] [ 927.424125] sdhci_set_ios+0x360/0x3b0 [sdhci] [ 927.424133] mmc_set_initial_state+0x92/0x120 [ 927.424137] mmc_power_up.part.34+0x33/0x1d0 [ 927.424141] mmc_power_up+0x17/0x20 [ 927.424147] mmc_sdio_runtime_resume+0x2d/0x50 [ 927.424151] mmc_runtime_resume+0x17/0x20 [ 927.424156] __rpm_callback+0xc4/0x200 [ 927.424161] ? idr_alloc_cyclic+0x57/0xd0 [ 927.424165] ? mmc_runtime_suspend+0x20/0x20 [ 927.424169] rpm_callback+0x24/0x80 [ 927.424172] ? mmc_runtime_suspend+0x20/0x20 [ 927.424176] rpm_resume+0x4b3/0x6c0 [ 927.424181] __pm_runtime_resume+0x4e/0x80 [ 927.424188] driver_probe_device+0x41/0x490 [ 927.424192] __driver_attach+0xdf/0xf0 [ 927.424196] ? driver_probe_device+0x490/0x490 [ 927.424201] bus_for_each_dev+0x6c/0xc0 [ 927.424205] driver_attach+0x1e/0x20 [ 927.424209] bus_add_driver+0x1f4/0x270 [ 927.424217] ? rsi_sdio_ack_intr+0x50/0x50 [rsi_sdio] [ 927.424221] driver_register+0x60/0xe0 [ 927.424227] ? rsi_sdio_ack_intr+0x50/0x50 [rsi_sdio] [ 927.424231] sdio_register_driver+0x20/0x30 [ 927.424237] rsi_module_init+0x16/0x40 [rsi_sdio] Signed-off-by: Siva Rebbagondla Signed-off-by: Amitkumar Karwar Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/rsi/rsi_91x_sdio.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c index 8428858204a6..fc895b466ebb 100644 --- a/drivers/net/wireless/rsi/rsi_91x_sdio.c +++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c @@ -155,7 +155,6 @@ static void rsi_reset_card(struct sdio_func *pfunction) int err; struct mmc_card *card = pfunction->card; struct mmc_host *host = card->host; - s32 bit = (fls(host->ocr_avail) - 1); u8 cmd52_resp; u32 clock, resp, i; u16 rca; @@ -175,7 +174,6 @@ static void rsi_reset_card(struct sdio_func *pfunction) msleep(20); /* Initialize the SDIO card */ - host->ios.vdd = bit; host->ios.chip_select = MMC_CS_DONTCARE; host->ios.bus_mode = MMC_BUSMODE_OPENDRAIN; host->ios.power_mode = MMC_POWER_UP; From 5f5e70d7ec14d2b194f06e7ebe7f9396b4cdbb5f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ondrej=20Mosn=C3=A1=C4=8Dek?= Date: Mon, 9 Apr 2018 10:00:06 +0200 Subject: [PATCH 437/970] audit: allow not equal op for audit by executable [ Upstream commit 23bcc480dac204c7dbdf49d96b2c918ed98223c2 ] Current implementation of auditing by executable name only implements the 'equal' operator. This patch extends it to also support the 'not equal' operator. See: https://github.com/linux-audit/audit-kernel/issues/53 Signed-off-by: Ondrej Mosnacek Reviewed-by: Richard Guy Briggs Signed-off-by: Paul Moore Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/auditfilter.c | 2 +- kernel/auditsc.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c index 85d9cac497e4..cd4f41397c7e 100644 --- a/kernel/auditfilter.c +++ b/kernel/auditfilter.c @@ -406,7 +406,7 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f) return -EINVAL; break; case AUDIT_EXE: - if (f->op != Audit_equal) + if (f->op != Audit_not_equal && f->op != Audit_equal) return -EINVAL; if (entry->rule.listnr != AUDIT_FILTER_EXIT) return -EINVAL; diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 2cd5256dbff7..c2aaf539728f 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -469,6 +469,8 @@ static int audit_filter_rules(struct task_struct *tsk, break; case AUDIT_EXE: result = audit_exe_compare(tsk, rule->exe); + if (f->op == Audit_not_equal) + result = !result; break; case AUDIT_UID: result = audit_uid_comparator(cred->uid, f->op, f->uid); From 1c802923321df0685e10124ebba10a527cb43ff8 Mon Sep 17 00:00:00 2001 From: James Simmons Date: Mon, 16 Apr 2018 00:15:10 -0400 Subject: [PATCH 438/970] staging: lustre: llite: correct removexattr detection [ Upstream commit 1b60f6dfa38403ff7c4d0b4b7ecdb810f9789a2a ] In ll_xattr_set_common() detect the removexattr() case correctly by testing for a NULL value as well as XATTR_REPLACE. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10787 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Signed-off-by: James Simmons Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/lustre/lustre/llite/xattr.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index e070adb7a3cc..57121fd5f050 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -103,7 +103,11 @@ ll_xattr_set_common(const struct xattr_handler *handler, __u64 valid; int rc; - if (flags == XATTR_REPLACE) { + /* When setxattr() is called with a size of 0 the value is + * unconditionally replaced by "". When removexattr() is + * called we get a NULL value and XATTR_REPLACE for flags. + */ + if (!value && flags == XATTR_REPLACE) { ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_REMOVEXATTR, 1); valid = OBD_MD_FLXATTRRM; } else { From c18d68c7c2d03f78b5b2ab1fc592a38e582811e4 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 29 Mar 2018 15:26:48 +1100 Subject: [PATCH 439/970] staging: lustre: ldlm: free resource when ldlm_lock_create() fails. [ Upstream commit d8caf662b4aeeb2ac83ac0b22e40db88e9360c77 ] ldlm_lock_create() gets a resource, but don't put it on all failure paths. It should. Signed-off-by: NeilBrown Reviewed-by: James Simmons Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/lustre/lustre/ldlm/ldlm_lock.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c index d18ab3f28c70..9addcdbe9374 100644 --- a/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lock.c @@ -1489,8 +1489,10 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns, return ERR_CAST(res); lock = ldlm_lock_new(res); - if (!lock) + if (!lock) { + ldlm_resource_putref(res); return ERR_PTR(-ENOMEM); + } lock->l_req_mode = mode; lock->l_ast_data = data; @@ -1533,6 +1535,8 @@ struct ldlm_lock *ldlm_lock_create(struct ldlm_namespace *ns, return ERR_PTR(rc); } + + /** * Enqueue (request) a lock. * On the client this is called from ldlm_cli_enqueue_fini From 1d1a409502ae0ba1ac5353c89b04d42b74b26131 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 23 Mar 2018 10:58:31 -0700 Subject: [PATCH 440/970] serial: core: Make sure compiler barfs for 16-byte earlycon names [ Upstream commit c1c734cb1f54b062f7e67ffc9656d82f5b412b9c ] As part of bringup I ended up wanting to call an earlycon driver by a name that was exactly 16-bytes big, specifically "qcom_geni_serial". Unfortunately, when I tried this I found that things compiled just fine. They just didn't work. Specifically the compiler felt perfectly justified in initting the ".name" field of "struct earlycon_id" with the full 16-bytes and just skipping the '\0'. Needless to say, that behavior didn't seem ideal, but I guess someone must have allowed it for a reason. One way to fix this is to shorten the name field to 15 bytes and then add an extra byte after that nobody touches. This should always be initted to 0 and we're golden. There are, of course, other ways to fix this too. We could audit all the users of the "name" field and make them stop at both null termination or at 16 bytes. We could also just make the name field much bigger so that we're not likely to run into this. ...but both seem like we'll just hit the bug again. Signed-off-by: Douglas Anderson Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 7b16c5322673..eb4f6456521e 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -344,7 +344,8 @@ struct earlycon_device { }; struct earlycon_id { - char name[16]; + char name[15]; + char name_term; /* In case compiler didn't '\0' term name */ char compatible[128]; int (*setup)(struct earlycon_device *, const char *options); }; From eac904dd39f4c69b7c9119444ab34c21d88cc1e5 Mon Sep 17 00:00:00 2001 From: Michal Simek Date: Tue, 10 Apr 2018 15:05:42 +0200 Subject: [PATCH 441/970] microblaze: Fix simpleImage format generation [ Upstream commit ece97f3a5fb50cf5f98886fbc63c9665f2bb199d ] simpleImage generation was broken for some time. This patch is fixing steps how simpleImage.*.ub file is generated. Steps are objdump of vmlinux and create .ub. Also make sure that there is striped elf version with .strip suffix. Signed-off-by: Michal Simek Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/microblaze/boot/Makefile | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/microblaze/boot/Makefile b/arch/microblaze/boot/Makefile index 91d2068da1b9..0f3fe6a151dc 100644 --- a/arch/microblaze/boot/Makefile +++ b/arch/microblaze/boot/Makefile @@ -21,17 +21,19 @@ $(obj)/linux.bin.gz: $(obj)/linux.bin FORCE quiet_cmd_cp = CP $< $@$2 cmd_cp = cat $< >$@$2 || (rm -f $@ && echo false) -quiet_cmd_strip = STRIP $@ +quiet_cmd_strip = STRIP $< $@$2 cmd_strip = $(STRIP) -K microblaze_start -K _end -K __log_buf \ - -K _fdt_start vmlinux -o $@ + -K _fdt_start $< -o $@$2 UIMAGE_LOADADDR = $(CONFIG_KERNEL_BASE_ADDR) +UIMAGE_IN = $@ +UIMAGE_OUT = $@.ub $(obj)/simpleImage.%: vmlinux FORCE $(call if_changed,cp,.unstrip) $(call if_changed,objcopy) $(call if_changed,uimage) - $(call if_changed,strip) - @echo 'Kernel: $@ is ready' ' (#'`cat .version`')' + $(call if_changed,strip,.strip) + @echo 'Kernel: $(UIMAGE_OUT) is ready' ' (#'`cat .version`')' clean-files += simpleImage.*.unstrip linux.bin.ub dts/*.dtb From 399e549fe55d4591883eb7b3dc254832fb09773e Mon Sep 17 00:00:00 2001 From: Dominik Bozek Date: Fri, 13 Apr 2018 10:42:31 -0700 Subject: [PATCH 442/970] usb: hub: Don't wait for connect state at resume for powered-off ports [ Upstream commit 5d111f5190848d6fb1c414dc57797efea3526a2f ] wait_for_connected() wait till a port change status to USB_PORT_STAT_CONNECTION, but this is not possible if the port is unpowered. The loop will only exit at timeout. Such case take place if an over-current incident happen while system is in S3. Then during resume wait_for_connected() will wait 2s, which may be noticeable by the user. Signed-off-by: Dominik Bozek Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hub.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index bdb19db542a4..7aee55244b4a 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3363,6 +3363,10 @@ static int wait_for_connected(struct usb_device *udev, while (delay_ms < 2000) { if (status || *portstatus & USB_PORT_STAT_CONNECTION) break; + if (!port_is_power_on(hub, *portstatus)) { + status = -ENODEV; + break; + } msleep(20); delay_ms += 20; status = hub_port_status(hub, *port1, portstatus, portchange); From 6b4cdfa0ab4316436ff01b0ebec6d886acdf931a Mon Sep 17 00:00:00 2001 From: Tudor-Dan Ambarus Date: Tue, 3 Apr 2018 09:39:01 +0300 Subject: [PATCH 443/970] crypto: authencesn - don't leak pointers to authenc keys [ Upstream commit 31545df391d58a3bb60e29b1192644a6f2b5a8dd ] In crypto_authenc_esn_setkey we save pointers to the authenc keys in a local variable of type struct crypto_authenc_keys and we don't zeroize it after use. Fix this and don't leak pointers to the authenc keys. Signed-off-by: Tudor Ambarus Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- crypto/authencesn.c | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/authencesn.c b/crypto/authencesn.c index 18c94e1c31d1..49e7e85a23d5 100644 --- a/crypto/authencesn.c +++ b/crypto/authencesn.c @@ -90,6 +90,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 * CRYPTO_TFM_RES_MASK); out: + memzero_explicit(&keys, sizeof(keys)); return err; badkey: From 15aa793dadf70c7c5f4e2da7b6a60f4ef74f599b Mon Sep 17 00:00:00 2001 From: Tudor-Dan Ambarus Date: Tue, 3 Apr 2018 09:39:00 +0300 Subject: [PATCH 444/970] crypto: authenc - don't leak pointers to authenc keys [ Upstream commit ad2fdcdf75d169e7a5aec6c7cb421c0bec8ec711 ] In crypto_authenc_setkey we save pointers to the authenc keys in a local variable of type struct crypto_authenc_keys and we don't zeroize it after use. Fix this and don't leak pointers to the authenc keys. Signed-off-by: Tudor Ambarus Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- crypto/authenc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/authenc.c b/crypto/authenc.c index a7e1ac786c5d..c3180eb6d1ee 100644 --- a/crypto/authenc.c +++ b/crypto/authenc.c @@ -108,6 +108,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, CRYPTO_TFM_RES_MASK); out: + memzero_explicit(&keys, sizeof(keys)); return err; badkey: From 8fcb8b5ea088e90702b7d1e89b6863bb02fbe170 Mon Sep 17 00:00:00 2001 From: Suman Anna Date: Wed, 14 Mar 2018 11:41:36 -0400 Subject: [PATCH 445/970] media: omap3isp: fix unbalanced dma_iommu_mapping [ Upstream commit b7e1e6859fbf60519fd82d7120cee106a6019512 ] The OMAP3 ISP driver manages its MMU mappings through the IOMMU-aware ARM DMA backend. The current code creates a dma_iommu_mapping and attaches this to the ISP device, but never detaches the mapping in either the probe failure paths or the driver remove path resulting in an unbalanced mapping refcount and a memory leak. Fix this properly. Reported-by: Pavel Machek Signed-off-by: Suman Anna Tested-by: Pavel Machek Reviewed-by: Laurent Pinchart Signed-off-by: Sakari Ailus Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/platform/omap3isp/isp.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c index 0321d84addc7..15a86bb4e61c 100644 --- a/drivers/media/platform/omap3isp/isp.c +++ b/drivers/media/platform/omap3isp/isp.c @@ -1941,6 +1941,7 @@ static int isp_initialize_modules(struct isp_device *isp) static void isp_detach_iommu(struct isp_device *isp) { + arm_iommu_detach_device(isp->dev); arm_iommu_release_mapping(isp->mapping); isp->mapping = NULL; iommu_group_remove_device(isp->dev); @@ -1974,8 +1975,7 @@ static int isp_attach_iommu(struct isp_device *isp) mapping = arm_iommu_create_mapping(&platform_bus_type, SZ_1G, SZ_2G); if (IS_ERR(mapping)) { dev_err(isp->dev, "failed to create ARM IOMMU mapping\n"); - ret = PTR_ERR(mapping); - goto error; + return PTR_ERR(mapping); } isp->mapping = mapping; @@ -1990,7 +1990,8 @@ static int isp_attach_iommu(struct isp_device *isp) return 0; error: - isp_detach_iommu(isp); + arm_iommu_release_mapping(isp->mapping); + isp->mapping = NULL; return ret; } From 6337861a0f0304faa7806719ce73b994db246b7c Mon Sep 17 00:00:00 2001 From: Xose Vazquez Perez Date: Sat, 7 Apr 2018 00:47:23 +0200 Subject: [PATCH 446/970] scsi: scsi_dh: replace too broad "TP9" string with the exact models [ Upstream commit 37b37d2609cb0ac267280ef27350b962d16d272e ] SGI/TP9100 is not an RDAC array: ^^^ https://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmultipath/hwtable.c;h=88b4700beb1d8940008020fbe4c3cd97d62f4a56;hb=HEAD#l235 This partially reverts commit 35204772ea03 ("[SCSI] scsi_dh_rdac : Consolidate rdac strings together") [mkp: fixed up the new entries to align with rest of struct] Cc: NetApp RDAC team Cc: Hannes Reinecke Cc: James E.J. Bottomley Cc: Martin K. Petersen Cc: SCSI ML Cc: DM ML Signed-off-by: Xose Vazquez Perez Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/scsi_dh.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_dh.c b/drivers/scsi/scsi_dh.c index a5e30e9449ef..375cede0c534 100644 --- a/drivers/scsi/scsi_dh.c +++ b/drivers/scsi/scsi_dh.c @@ -58,7 +58,10 @@ static const struct scsi_dh_blist scsi_dh_blist[] = { {"IBM", "3526", "rdac", }, {"IBM", "3542", "rdac", }, {"IBM", "3552", "rdac", }, - {"SGI", "TP9", "rdac", }, + {"SGI", "TP9300", "rdac", }, + {"SGI", "TP9400", "rdac", }, + {"SGI", "TP9500", "rdac", }, + {"SGI", "TP9700", "rdac", }, {"SGI", "IS", "rdac", }, {"STK", "OPENstorage", "rdac", }, {"STK", "FLEXLINE 380", "rdac", }, From 6e8738c1c1037d36c317a65fd41ce1c812f5d9f4 Mon Sep 17 00:00:00 2001 From: Shivasharan S Date: Fri, 6 Apr 2018 02:02:11 -0700 Subject: [PATCH 447/970] scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs [ Upstream commit 3239b8cd28fd849a2023483257d35d68c5876c74 ] Hardware could time out Fastpath IOs one second earlier than the timeout provided by the host. For non-RAID devices, driver provides timeout value based on OS provided timeout value. Under certain scenarios, if the OS provides a timeout value of 1 second, due to above behavior hardware will timeout immediately. Increase timeout value for non-RAID fastpath IOs by 1 second. Signed-off-by: Shivasharan S Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index a156451553a7..f722a0e6caa4 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -2031,6 +2031,9 @@ megasas_build_syspd_fusion(struct megasas_instance *instance, pRAID_Context->timeoutValue = cpu_to_le16(os_timeout_value); pRAID_Context->VirtualDiskTgtId = cpu_to_le16(device_id); } else { + if (os_timeout_value) + os_timeout_value++; + /* system pd Fast Path */ io_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST; timeout_limit = (scmd->device->type == TYPE_DISK) ? From 401103613da6455ead0e9c99fbe5f6948615282e Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Fri, 6 Apr 2018 07:54:51 -0400 Subject: [PATCH 448/970] media: si470x: fix __be16 annotations [ Upstream commit 90db5c829692a0a7845e977e45719b4699216bd4 ] The annotations there are wrong as warned: drivers/media/radio/si470x/radio-si470x-i2c.c:107:35: warning: cast to restricted __be16 drivers/media/radio/si470x/radio-si470x-i2c.c:107:35: warning: cast to restricted __be16 drivers/media/radio/si470x/radio-si470x-i2c.c:107:35: warning: cast to restricted __be16 drivers/media/radio/si470x/radio-si470x-i2c.c:107:35: warning: cast to restricted __be16 drivers/media/radio/si470x/radio-si470x-i2c.c:129:24: warning: incorrect type in assignment (different base types) drivers/media/radio/si470x/radio-si470x-i2c.c:129:24: expected unsigned short [unsigned] [short] drivers/media/radio/si470x/radio-si470x-i2c.c:129:24: got restricted __be16 [usertype] drivers/media/radio/si470x/radio-si470x-i2c.c:163:39: warning: cast to restricted __be16 drivers/media/radio/si470x/radio-si470x-i2c.c:163:39: warning: cast to restricted __be16 drivers/media/radio/si470x/radio-si470x-i2c.c:163:39: warning: cast to restricted __be16 drivers/media/radio/si470x/radio-si470x-i2c.c:163:39: warning: cast to restricted __be16 Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/radio/si470x/radio-si470x-i2c.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/media/radio/si470x/radio-si470x-i2c.c b/drivers/media/radio/si470x/radio-si470x-i2c.c index ee0470a3196b..f218886c504d 100644 --- a/drivers/media/radio/si470x/radio-si470x-i2c.c +++ b/drivers/media/radio/si470x/radio-si470x-i2c.c @@ -96,7 +96,7 @@ MODULE_PARM_DESC(max_rds_errors, "RDS maximum block errors: *1*"); */ int si470x_get_register(struct si470x_device *radio, int regnr) { - u16 buf[READ_REG_NUM]; + __be16 buf[READ_REG_NUM]; struct i2c_msg msgs[1] = { { .addr = radio->client->addr, @@ -121,7 +121,7 @@ int si470x_get_register(struct si470x_device *radio, int regnr) int si470x_set_register(struct si470x_device *radio, int regnr) { int i; - u16 buf[WRITE_REG_NUM]; + __be16 buf[WRITE_REG_NUM]; struct i2c_msg msgs[1] = { { .addr = radio->client->addr, @@ -151,7 +151,7 @@ int si470x_set_register(struct si470x_device *radio, int regnr) static int si470x_get_all_registers(struct si470x_device *radio) { int i; - u16 buf[READ_REG_NUM]; + __be16 buf[READ_REG_NUM]; struct i2c_msg msgs[1] = { { .addr = radio->client->addr, From f685597b133510047dbbea61c942e5c708654118 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jos=C3=A9=20Roberto=20de=20Souza?= Date: Wed, 28 Mar 2018 15:30:37 -0700 Subject: [PATCH 449/970] drm: Add DP PSR2 sink enable bit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 4f212e40468650e220c1770876c7f25b8e0c1ff5 ] To comply with eDP1.4a this bit should be set when enabling PSR2. Signed-off-by: José Roberto de Souza Reviewed-by: Rodrigo Vivi Signed-off-by: Rodrigo Vivi Link: https://patchwork.freedesktop.org/patch/msgid/20180328223046.16125-1-jose.souza@intel.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/drm/drm_dp_helper.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h index 2a79882cb68e..2fff10de317d 100644 --- a/include/drm/drm_dp_helper.h +++ b/include/drm/drm_dp_helper.h @@ -345,6 +345,7 @@ # define DP_PSR_FRAME_CAPTURE (1 << 3) # define DP_PSR_SELECTIVE_UPDATE (1 << 4) # define DP_PSR_IRQ_HPD_WITH_CRC_ERRORS (1 << 5) +# define DP_PSR_ENABLE_PSR2 (1 << 6) /* eDP 1.4a */ #define DP_ADAPTER_CTRL 0x1a0 # define DP_ADAPTER_CTRL_FORCE_LOAD_SENSE (1 << 0) From 820f2bcacbdb8e537abce8cdf660ff3b2c67871c Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sat, 14 Jul 2018 23:55:57 -0400 Subject: [PATCH 450/970] random: mix rdrand with entropy sent in from userspace commit 81e69df38e2911b642ec121dec319fad2a4782f3 upstream. Fedora has integrated the jitter entropy daemon to work around slow boot problems, especially on VM's that don't support virtio-rng: https://bugzilla.redhat.com/show_bug.cgi?id=1572944 It's understandable why they did this, but the Jitter entropy daemon works fundamentally on the principle: "the CPU microarchitecture is **so** complicated and we can't figure it out, so it *must* be random". Yes, it uses statistical tests to "prove" it is secure, but AES_ENCRYPT(NSA_KEY, COUNTER++) will also pass statistical tests with flying colors. So if RDRAND is available, mix it into entropy submitted from userspace. It can't hurt, and if you believe the NSA has backdoored RDRAND, then they probably have enough details about the Intel microarchitecture that they can reverse engineer how the Jitter entropy daemon affects the microarchitecture, and attack its output stream. And if RDRAND is in fact an honest DRNG, it will immeasurably improve on what the Jitter entropy daemon might produce. This also provides some protection against someone who is able to read or set the entropy seed file. Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org Cc: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- drivers/char/random.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index ddeac4eefd0a..81b65d0e7563 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1826,14 +1826,22 @@ static int write_pool(struct entropy_store *r, const char __user *buffer, size_t count) { size_t bytes; - __u32 buf[16]; + __u32 t, buf[16]; const char __user *p = buffer; while (count > 0) { + int b, i = 0; + bytes = min(count, sizeof(buf)); if (copy_from_user(&buf, p, bytes)) return -EFAULT; + for (b = bytes ; b > 0 ; b -= sizeof(__u32), i++) { + if (!arch_get_random_int(&t)) + break; + buf[i] ^= t; + } + count -= bytes; p += bytes; From 1aecbe4326b1da7e4afb075c92e24ba2c52ff3e8 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 29 Jul 2018 12:44:46 -0700 Subject: [PATCH 451/970] squashfs: be more careful about metadata corruption commit 01cfb7937a9af2abb1136c7e89fbf3fd92952956 upstream. Anatoly Trosinenko reports that a corrupted squashfs image can cause a kernel oops. It turns out that squashfs can end up being confused about negative fragment lengths. The regular squashfs_read_data() does check for negative lengths, but squashfs_read_metadata() did not, and the fragment size code just blindly trusted the on-disk value. Fix both the fragment parsing and the metadata reading code. Reported-by: Anatoly Trosinenko Cc: Al Viro Cc: Phillip Lougher Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/squashfs/cache.c | 3 +++ fs/squashfs/file.c | 8 ++++++-- fs/squashfs/fragment.c | 4 +--- fs/squashfs/squashfs_fs.h | 6 ++++++ 4 files changed, 16 insertions(+), 5 deletions(-) diff --git a/fs/squashfs/cache.c b/fs/squashfs/cache.c index 23813c078cc9..0839efa720b3 100644 --- a/fs/squashfs/cache.c +++ b/fs/squashfs/cache.c @@ -350,6 +350,9 @@ int squashfs_read_metadata(struct super_block *sb, void *buffer, TRACE("Entered squashfs_read_metadata [%llx:%x]\n", *block, *offset); + if (unlikely(length < 0)) + return -EIO; + while (length) { entry = squashfs_cache_get(sb, msblk->block_cache, *block, 0); if (entry->error) { diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index 13d80947bf9e..fcff2e0487fe 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -194,7 +194,11 @@ static long long read_indexes(struct super_block *sb, int n, } for (i = 0; i < blocks; i++) { - int size = le32_to_cpu(blist[i]); + int size = squashfs_block_size(blist[i]); + if (size < 0) { + err = size; + goto failure; + } block += SQUASHFS_COMPRESSED_SIZE_BLOCK(size); } n -= blocks; @@ -367,7 +371,7 @@ static int read_blocklist(struct inode *inode, int index, u64 *block) sizeof(size)); if (res < 0) return res; - return le32_to_cpu(size); + return squashfs_block_size(size); } /* Copy data into page cache */ diff --git a/fs/squashfs/fragment.c b/fs/squashfs/fragment.c index 0ed6edbc5c71..86ad9a4b8c36 100644 --- a/fs/squashfs/fragment.c +++ b/fs/squashfs/fragment.c @@ -61,9 +61,7 @@ int squashfs_frag_lookup(struct super_block *sb, unsigned int fragment, return size; *fragment_block = le64_to_cpu(fragment_entry.start_block); - size = le32_to_cpu(fragment_entry.size); - - return size; + return squashfs_block_size(fragment_entry.size); } diff --git a/fs/squashfs/squashfs_fs.h b/fs/squashfs/squashfs_fs.h index 506f4ba5b983..e66486366f02 100644 --- a/fs/squashfs/squashfs_fs.h +++ b/fs/squashfs/squashfs_fs.h @@ -129,6 +129,12 @@ #define SQUASHFS_COMPRESSED_BLOCK(B) (!((B) & SQUASHFS_COMPRESSED_BIT_BLOCK)) +static inline int squashfs_block_size(__le32 raw) +{ + u32 size = le32_to_cpu(raw); + return (size >> 25) ? -EIO : size; +} + /* * Inode number ops. Inodes consist of a compressed block number, and an * uncompressed offset within that block From 5eed597ca6c8d34312ce1298446988431bbbcd5f Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Tue, 10 Jul 2018 01:07:43 -0400 Subject: [PATCH 452/970] ext4: fix inline data updates with checksums enabled commit 362eca70b53389bddf3143fe20f53dcce2cfdf61 upstream. The inline data code was updating the raw inode directly; this is problematic since if metadata checksums are enabled, ext4_mark_inode_dirty() must be called to update the inode's checksum. In addition, the jbd2 layer requires that get_write_access() be called before the metadata buffer is modified. Fix both of these problems. https://bugzilla.kernel.org/show_bug.cgi?id=200443 Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inline.c | 19 +++++++++++-------- fs/ext4/inode.c | 16 +++++++--------- 2 files changed, 18 insertions(+), 17 deletions(-) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index e6ac24de119d..436baf7cdca3 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -679,6 +679,10 @@ int ext4_try_to_write_inline_data(struct address_space *mapping, goto convert; } + ret = ext4_journal_get_write_access(handle, iloc.bh); + if (ret) + goto out; + flags |= AOP_FLAG_NOFS; page = grab_cache_page_write_begin(mapping, 0, flags); @@ -707,7 +711,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping, out_up_read: up_read(&EXT4_I(inode)->xattr_sem); out: - if (handle) + if (handle && (ret != 1)) ext4_journal_stop(handle); brelse(iloc.bh); return ret; @@ -749,6 +753,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, ext4_write_unlock_xattr(inode, &no_expand); brelse(iloc.bh); + mark_inode_dirty(inode); out: return copied; } @@ -895,7 +900,6 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping, goto out; } - page = grab_cache_page_write_begin(mapping, 0, flags); if (!page) { ret = -ENOMEM; @@ -913,6 +917,9 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping, if (ret < 0) goto out_release_page; } + ret = ext4_journal_get_write_access(handle, iloc.bh); + if (ret) + goto out_release_page; up_read(&EXT4_I(inode)->xattr_sem); *pagep = page; @@ -933,7 +940,6 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied, struct page *page) { - int i_size_changed = 0; int ret; ret = ext4_write_inline_data_end(inode, pos, len, copied, page); @@ -951,10 +957,8 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * But it's important to update i_size while still holding page lock: * page writeout could otherwise come in and zero beyond i_size. */ - if (pos+copied > inode->i_size) { + if (pos+copied > inode->i_size) i_size_write(inode, pos+copied); - i_size_changed = 1; - } unlock_page(page); put_page(page); @@ -964,8 +968,7 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * ordering of page lock and transaction start for journaling * filesystems. */ - if (i_size_changed) - mark_inode_dirty(inode); + mark_inode_dirty(inode); return copied; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 5c4c9af4aaf4..f62eca8cbde0 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1318,9 +1318,10 @@ static int ext4_write_end(struct file *file, loff_t old_size = inode->i_size; int ret = 0, ret2; int i_size_changed = 0; + int inline_data = ext4_has_inline_data(inode); trace_ext4_write_end(inode, pos, len, copied); - if (ext4_has_inline_data(inode)) { + if (inline_data) { ret = ext4_write_inline_data_end(inode, pos, len, copied, page); if (ret < 0) { @@ -1348,7 +1349,7 @@ static int ext4_write_end(struct file *file, * ordering of page lock and transaction start for journaling * filesystems. */ - if (i_size_changed) + if (i_size_changed || inline_data) ext4_mark_inode_dirty(handle, inode); if (pos + len > inode->i_size && ext4_can_truncate(inode)) @@ -1422,6 +1423,7 @@ static int ext4_journalled_write_end(struct file *file, int partial = 0; unsigned from, to; int size_changed = 0; + int inline_data = ext4_has_inline_data(inode); trace_ext4_journalled_write_end(inode, pos, len, copied); from = pos & (PAGE_SIZE - 1); @@ -1429,7 +1431,7 @@ static int ext4_journalled_write_end(struct file *file, BUG_ON(!ext4_handle_valid(handle)); - if (ext4_has_inline_data(inode)) { + if (inline_data) { ret = ext4_write_inline_data_end(inode, pos, len, copied, page); if (ret < 0) { @@ -1460,7 +1462,7 @@ static int ext4_journalled_write_end(struct file *file, if (old_size < pos) pagecache_isize_extended(inode, old_size, pos); - if (size_changed) { + if (size_changed || inline_data) { ret2 = ext4_mark_inode_dirty(handle, inode); if (!ret) ret = ret2; @@ -1958,11 +1960,7 @@ static int __ext4_journalled_writepage(struct page *page, } if (inline_data) { - BUFFER_TRACE(inode_bh, "get write access"); - ret = ext4_journal_get_write_access(handle, inode_bh); - - err = ext4_handle_dirty_metadata(handle, inode, inode_bh); - + ret = ext4_mark_inode_dirty(handle, inode); } else { ret = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, do_journal_get_write_access); From 262a62cc50697246eeb431b970db6b3ee35c33f3 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Thu, 12 Jul 2018 19:08:05 -0400 Subject: [PATCH 453/970] ext4: check for allocation block validity with block group locked commit 8d5a803c6a6ce4ec258e31f76059ea5153ba46ef upstream. With commit 044e6e3d74a3: "ext4: don't update checksum of new initialized bitmaps" the buffer valid bit will get set without actually setting up the checksum for the allocation bitmap, since the checksum will get calculated once we actually allocate an inode or block. If we are doing this, then we need to (re-)check the verified bit after we take the block group lock. Otherwise, we could race with another process reading and verifying the bitmap, which would then complain about the checksum being invalid. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137 Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/balloc.c | 3 +++ fs/ext4/ialloc.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index ad13f07cf0d3..2455fe1446d6 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -378,6 +378,8 @@ static int ext4_validate_block_bitmap(struct super_block *sb, return -EFSCORRUPTED; ext4_lock_group(sb, block_group); + if (buffer_verified(bh)) + goto verified; if (unlikely(!ext4_block_bitmap_csum_verify(sb, block_group, desc, bh))) { ext4_unlock_group(sb, block_group); @@ -400,6 +402,7 @@ static int ext4_validate_block_bitmap(struct super_block *sb, return -EFSCORRUPTED; } set_buffer_verified(bh); +verified: ext4_unlock_group(sb, block_group); return 0; } diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 460866b2166d..ffaf66a51de3 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -88,6 +88,8 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, return -EFSCORRUPTED; ext4_lock_group(sb, block_group); + if (buffer_verified(bh)) + goto verified; blk = ext4_inode_bitmap(sb, desc); if (!ext4_inode_bitmap_csum_verify(sb, block_group, desc, bh, EXT4_INODES_PER_GROUP(sb) / 8)) { @@ -105,6 +107,7 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, return -EFSBADCRC; } set_buffer_verified(bh); +verified: ext4_unlock_group(sb, block_group); return 0; } From 40af3250e9f283bdf228f845e3aab23c30ede7f1 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Sun, 24 Jun 2018 11:23:42 +0300 Subject: [PATCH 454/970] RDMA/uverbs: Protect from attempts to create flows on unsupported QP commit 940efcc8889f0d15567eb07fc9fd69b06e366aa5 upstream. Flows can be created on UD and RAW_PACKET QP types. Attempts to provide other QP types as an input causes to various unpredictable failures. The reason is that in order to support all various types (e.g. XRC), we are supposed to use real_qp handle and not qp handle and expect to driver/FW to fail such (XRC) flows. The simpler and safer variant is to ban all QP types except UD and RAW_PACKET, instead of relying on driver/FW. Cc: # 3.11 Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs") Cc: syzkaller Reported-by: Noa Osherovich Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/uverbs_cmd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 4b717cf50d27..6f875bf9cc9d 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -3725,6 +3725,11 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file, goto err_uobj; } + if (qp->qp_type != IB_QPT_UD && qp->qp_type != IB_QPT_RAW_PACKET) { + err = -EINVAL; + goto err_put; + } + flow_attr = kzalloc(sizeof(*flow_attr) + cmd.flow_attr.num_of_specs * sizeof(union ib_flow_spec), GFP_KERNEL); if (!flow_attr) { From e59af2831d9b5d452f6f9281c4bb6b9ecc4e591f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Vok=C3=A1=C4=8D?= Date: Wed, 23 May 2018 08:20:21 +0200 Subject: [PATCH 455/970] net: dsa: qca8k: Force CPU port to its highest bandwidth MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 79a4ed4f0f93fc65e48a0fc5247ffa5645f7b0cc upstream. By default autonegotiation is enabled to configure MAC on all ports. For the CPU port autonegotiation can not be used so we need to set some sensible defaults manually. This patch forces the default setting of the CPU port to 1000Mbps/full duplex which is the chip maximum capability. Also correct size of the bit field used to configure link speed. Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family") Signed-off-by: Michal Vokáč Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/dsa/qca8k.c | 6 +++++- drivers/net/dsa/qca8k.h | 6 ++++-- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index 33ed3997f9b9..b3f6f4f48952 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -491,6 +491,7 @@ qca8k_setup(struct dsa_switch *ds) { struct qca8k_priv *priv = (struct qca8k_priv *)ds->priv; int ret, i, phy_mode = -1; + u32 mask; /* Make sure that port 0 is the cpu port */ if (!dsa_is_cpu_port(ds, 0)) { @@ -516,7 +517,10 @@ qca8k_setup(struct dsa_switch *ds) if (ret < 0) return ret; - /* Enable CPU Port */ + /* Enable CPU Port, force it to maximum bandwidth and full-duplex */ + mask = QCA8K_PORT_STATUS_SPEED_1000 | QCA8K_PORT_STATUS_TXFLOW | + QCA8K_PORT_STATUS_RXFLOW | QCA8K_PORT_STATUS_DUPLEX; + qca8k_write(priv, QCA8K_REG_PORT_STATUS(QCA8K_CPU_PORT), mask); qca8k_reg_set(priv, QCA8K_REG_GLOBAL_FW_CTRL0, QCA8K_GLOBAL_FW_CTRL0_CPU_PORT_EN); qca8k_port_set_status(priv, QCA8K_CPU_PORT, 1); diff --git a/drivers/net/dsa/qca8k.h b/drivers/net/dsa/qca8k.h index 201464719531..2f85f7a0fe83 100644 --- a/drivers/net/dsa/qca8k.h +++ b/drivers/net/dsa/qca8k.h @@ -51,8 +51,10 @@ #define QCA8K_GOL_MAC_ADDR0 0x60 #define QCA8K_GOL_MAC_ADDR1 0x64 #define QCA8K_REG_PORT_STATUS(_i) (0x07c + (_i) * 4) -#define QCA8K_PORT_STATUS_SPEED GENMASK(2, 0) -#define QCA8K_PORT_STATUS_SPEED_S 0 +#define QCA8K_PORT_STATUS_SPEED GENMASK(1, 0) +#define QCA8K_PORT_STATUS_SPEED_10 0 +#define QCA8K_PORT_STATUS_SPEED_100 0x1 +#define QCA8K_PORT_STATUS_SPEED_1000 0x2 #define QCA8K_PORT_STATUS_TXMAC BIT(2) #define QCA8K_PORT_STATUS_RXMAC BIT(3) #define QCA8K_PORT_STATUS_TXFLOW BIT(4) From b429bf7de4944f73a489f97bb33bcf78a6935ac2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Vok=C3=A1=C4=8D?= Date: Wed, 23 May 2018 08:20:20 +0200 Subject: [PATCH 456/970] net: dsa: qca8k: Enable RXMAC when bringing up a port MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit eee1fe64765c562d8bcaf95e5631a8ea2f760f34 upstream. When a port is brought up/down do not enable/disable only the TXMAC but the RXMAC as well. This is essential for the CPU port to work. Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family") Signed-off-by: Michal Vokáč Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/dsa/qca8k.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index b3f6f4f48952..dbd702235190 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -474,7 +474,7 @@ qca8k_set_pad_ctrl(struct qca8k_priv *priv, int port, int mode) static void qca8k_port_set_status(struct qca8k_priv *priv, int port, int enable) { - u32 mask = QCA8K_PORT_STATUS_TXMAC; + u32 mask = QCA8K_PORT_STATUS_TXMAC | QCA8K_PORT_STATUS_RXMAC; /* Port 0 and 6 have no internal PHY */ if ((port > 0) && (port < 6)) From 53a1a29a9236978511fb8c8eaad494ec01ba0732 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Vok=C3=A1=C4=8D?= Date: Wed, 23 May 2018 08:20:18 +0200 Subject: [PATCH 457/970] net: dsa: qca8k: Add QCA8334 binding documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 218bbea11a777c156eb7bcbdc72867b32ae10985 upstream. Add support for the four-port variant of the Qualcomm QCA833x switch. The CPU port default link settings can be reconfigured using a fixed-link sub-node. Signed-off-by: Michal Vokáč Reviewed-by: Rob Herring Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- .../devicetree/bindings/net/dsa/qca8k.txt | 23 ++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt index 9c67ee4890d7..bbcb255c3150 100644 --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt @@ -2,7 +2,10 @@ Required properties: -- compatible: should be "qca,qca8337" +- compatible: should be one of: + "qca,qca8334" + "qca,qca8337" + - #size-cells: must be 0 - #address-cells: must be 1 @@ -14,6 +17,20 @@ port and PHY id, each subnode describing a port needs to have a valid phandle referencing the internal PHY connected to it. The CPU port of this switch is always port 0. +A CPU port node has the following optional node: + +- fixed-link : Fixed-link subnode describing a link to a non-MDIO + managed entity. See + Documentation/devicetree/bindings/net/fixed-link.txt + for details. + +For QCA8K the 'fixed-link' sub-node supports only the following properties: + +- 'speed' (integer, mandatory), to indicate the link speed. Accepted + values are 10, 100 and 1000 +- 'full-duplex' (boolean, optional), to indicate that full duplex is + used. When absent, half duplex is assumed. + Example: @@ -53,6 +70,10 @@ Example: label = "cpu"; ethernet = <&gmac1>; phy-mode = "rgmii"; + fixed-link { + speed = 1000; + full-duplex; + }; }; port@1 { From db890d30b9750f929c0e976972a4d424ac90f8cb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Vok=C3=A1=C4=8D?= Date: Wed, 23 May 2018 08:20:22 +0200 Subject: [PATCH 458/970] net: dsa: qca8k: Allow overwriting CPU port setting MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 9bb2289f90e671bdb78e306974187424ac19ff8e upstream. Implement adjust_link function that allows to overwrite default CPU port setting using fixed-link device tree subnode. Signed-off-by: Michal Vokáč Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/dsa/qca8k.c | 43 +++++++++++++++++++++++++++++++++++++++++ drivers/net/dsa/qca8k.h | 1 + 2 files changed, 44 insertions(+) diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index dbd702235190..7f64a76acd37 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -589,6 +589,47 @@ qca8k_setup(struct dsa_switch *ds) return 0; } +static void +qca8k_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phy) +{ + struct qca8k_priv *priv = ds->priv; + u32 reg; + + /* Force fixed-link setting for CPU port, skip others. */ + if (!phy_is_pseudo_fixed_link(phy)) + return; + + /* Set port speed */ + switch (phy->speed) { + case 10: + reg = QCA8K_PORT_STATUS_SPEED_10; + break; + case 100: + reg = QCA8K_PORT_STATUS_SPEED_100; + break; + case 1000: + reg = QCA8K_PORT_STATUS_SPEED_1000; + break; + default: + dev_dbg(priv->dev, "port%d link speed %dMbps not supported.\n", + port, phy->speed); + return; + } + + /* Set duplex mode */ + if (phy->duplex == DUPLEX_FULL) + reg |= QCA8K_PORT_STATUS_DUPLEX; + + /* Force flow control */ + if (dsa_is_cpu_port(ds, port)) + reg |= QCA8K_PORT_STATUS_RXFLOW | QCA8K_PORT_STATUS_TXFLOW; + + /* Force link down before changing MAC options */ + qca8k_port_set_status(priv, port, 0); + qca8k_write(priv, QCA8K_REG_PORT_STATUS(port), reg); + qca8k_port_set_status(priv, port, 1); +} + static int qca8k_phy_read(struct dsa_switch *ds, int phy, int regnum) { @@ -918,6 +959,7 @@ qca8k_get_tag_protocol(struct dsa_switch *ds) static struct dsa_switch_ops qca8k_switch_ops = { .get_tag_protocol = qca8k_get_tag_protocol, .setup = qca8k_setup, + .adjust_link = qca8k_adjust_link, .get_strings = qca8k_get_strings, .phy_read = qca8k_phy_read, .phy_write = qca8k_phy_write, @@ -950,6 +992,7 @@ qca8k_sw_probe(struct mdio_device *mdiodev) return -ENOMEM; priv->bus = mdiodev->bus; + priv->dev = &mdiodev->dev; /* read the switches ID register */ id = qca8k_read(priv, QCA8K_REG_MASK_CTRL); diff --git a/drivers/net/dsa/qca8k.h b/drivers/net/dsa/qca8k.h index 2f85f7a0fe83..9c22bc3210cd 100644 --- a/drivers/net/dsa/qca8k.h +++ b/drivers/net/dsa/qca8k.h @@ -169,6 +169,7 @@ struct qca8k_priv { struct ar8xxx_port_status port_sts[QCA8K_NUM_PORTS]; struct dsa_switch *ds; struct mutex reg_mutex; + struct device *dev; }; struct qca8k_mib_desc { From ddd28fff50ddc10604e420ca30fe11affbc17567 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 3 Aug 2018 07:55:27 +0200 Subject: [PATCH 459/970] Linux 4.9.117 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index a6b011778960..773c26c95d98 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 116 +SUBLEVEL = 117 EXTRAVERSION = NAME = Roaring Lionus From e364f1a2ccc1c8d5dd0f43063c81507076c854a5 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Fri, 27 Jul 2018 18:15:46 +0200 Subject: [PATCH 460/970] ipv4: remove BUG_ON() from fib_compute_spec_dst [ Upstream commit 9fc12023d6f51551d6ca9ed7e02ecc19d79caf17 ] Remove BUG_ON() from fib_compute_spec_dst routine and check in_dev pointer during flowi4 data structure initialization. fib_compute_spec_dst routine can be run concurrently with device removal where ip_ptr net_device pointer is set to NULL. This can happen if userspace enables pkt info on UDP rx socket and the device is removed while traffic is flowing Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper") Signed-off-by: Lorenzo Bianconi Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_frontend.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 7bdd89354db5..6a2ef162088d 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -282,19 +282,19 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb) return ip_hdr(skb)->daddr; in_dev = __in_dev_get_rcu(dev); - BUG_ON(!in_dev); net = dev_net(dev); scope = RT_SCOPE_UNIVERSE; if (!ipv4_is_zeronet(ip_hdr(skb)->saddr)) { + bool vmark = in_dev && IN_DEV_SRC_VMARK(in_dev); struct flowi4 fl4 = { .flowi4_iif = LOOPBACK_IFINDEX, .flowi4_oif = l3mdev_master_ifindex_rcu(dev), .daddr = ip_hdr(skb)->saddr, .flowi4_tos = RT_TOS(ip_hdr(skb)->tos), .flowi4_scope = scope, - .flowi4_mark = IN_DEV_SRC_VMARK(in_dev) ? skb->mark : 0, + .flowi4_mark = vmark ? skb->mark : 0, }; if (!fib_lookup(net, &fl4, &res, 0)) return FIB_RES_PREFSRC(net, res); From 6fff429df7c51b82d0b143c44a59ac38702a06a8 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Thu, 26 Jul 2018 23:40:33 +0300 Subject: [PATCH 461/970] net: ena: Fix use of uninitialized DMA address bits field [ Upstream commit 101f0cd4f2216d32f1b8a75a2154cf3997484ee2 ] UBSAN triggers the following undefined behaviour warnings: [...] [ 13.236124] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:468:22 [ 13.240043] shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] [ 13.744769] UBSAN: Undefined behaviour in drivers/net/ethernet/amazon/ena/ena_eth_com.c:373:4 [ 13.748694] shift exponent 64 is too large for 64-bit type 'long long unsigned int' [...] When splitting the address to high and low, GENMASK_ULL is used to generate a bitmask with dma_addr_bits field from io_sq (in ena_com_prepare_tx and ena_com_add_single_rx_desc). The problem is that dma_addr_bits is not initialized with a proper value (besides being cleared in ena_com_create_io_queue). Assign dma_addr_bits the correct value that is stored in ena_dev when initializing the SQ. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Gal Pressman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/amazon/ena/ena_com.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index e13c9cd45dc0..bcd993140f84 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -331,6 +331,7 @@ static int ena_com_init_io_sq(struct ena_com_dev *ena_dev, memset(&io_sq->desc_addr, 0x0, sizeof(struct ena_com_io_desc_addr)); + io_sq->dma_addr_bits = ena_dev->dma_addr_bits; io_sq->desc_entry_size = (io_sq->direction == ENA_COM_IO_QUEUE_DIRECTION_TX) ? sizeof(struct ena_eth_io_tx_desc) : From 31a9d4dd852843af8b815b2dc621c0142fa40f56 Mon Sep 17 00:00:00 2001 From: tangpengpeng Date: Thu, 26 Jul 2018 14:45:16 +0800 Subject: [PATCH 462/970] net: fix amd-xgbe flow-control issue [ Upstream commit 7f3fc7ddf719cd6faaf787722c511f6918ac6aab ] If we enable or disable xgbe flow-control by ethtool , it does't work.Because the parameter is not properly assigned,so we need to adjust the assignment order of the parameters. Fixes: c1ce2f77366b ("amd-xgbe: Fix flow control setting logic") Signed-off-by: tangpengpeng Acked-by: Tom Lendacky Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c index 84c5d296d13e..684835833fe3 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c @@ -877,14 +877,14 @@ static void xgbe_phy_adjust_link(struct xgbe_prv_data *pdata) if (pdata->tx_pause != pdata->phy.tx_pause) { new_state = 1; - pdata->hw_if.config_tx_flow_control(pdata); pdata->tx_pause = pdata->phy.tx_pause; + pdata->hw_if.config_tx_flow_control(pdata); } if (pdata->rx_pause != pdata->phy.rx_pause) { new_state = 1; - pdata->hw_if.config_rx_flow_control(pdata); pdata->rx_pause = pdata->phy.rx_pause; + pdata->hw_if.config_rx_flow_control(pdata); } /* Speed support */ From a9deaa19715d5c2b355fff98c4d54049d337311d Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Sat, 28 Jul 2018 09:52:10 +0200 Subject: [PATCH 463/970] net: lan78xx: fix rx handling before first packet is send [ Upstream commit 136f55f660192ce04af091642efc75d85e017364 ] As long the bh tasklet isn't scheduled once, no packet from the rx path will be handled. Since the tx path also schedule the same tasklet this situation only persits until the first packet transmission. So fix this issue by scheduling the tasklet after link reset. Link: https://github.com/raspberrypi/linux/issues/2617 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Suggested-by: Floris Bos Signed-off-by: Stefan Wahren Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/lan78xx.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index 5e0626c80b81..c5e04d1ad73a 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -1170,6 +1170,8 @@ static int lan78xx_link_reset(struct lan78xx_net *dev) mod_timer(&dev->stat_monitor, jiffies + STAT_UPDATE_TIMER); } + + tasklet_schedule(&dev->bh); } return ret; From 32363930dfd2eb8449d062cb590918021f8d7502 Mon Sep 17 00:00:00 2001 From: Anton Vasilyev Date: Fri, 27 Jul 2018 18:57:47 +0300 Subject: [PATCH 464/970] net: mdio-mux: bcm-iproc: fix wrong getter and setter pair [ Upstream commit b0753408aadf32c7ece9e6b765017881e54af833 ] mdio_mux_iproc_probe() uses platform_set_drvdata() to store md pointer in device, whereas mdio_mux_iproc_remove() restores md pointer by dev_get_platdata(&pdev->dev). This leads to wrong resources release. The patch replaces getter to platform_get_drvdata. Fixes: 98bc865a1ec8 ("net: mdio-mux: Add MDIO mux driver for iProc SoCs") Signed-off-by: Anton Vasilyev Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/mdio-mux-bcm-iproc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/mdio-mux-bcm-iproc.c b/drivers/net/phy/mdio-mux-bcm-iproc.c index 0a5f62e0efcc..487bf5b8f545 100644 --- a/drivers/net/phy/mdio-mux-bcm-iproc.c +++ b/drivers/net/phy/mdio-mux-bcm-iproc.c @@ -218,7 +218,7 @@ static int mdio_mux_iproc_probe(struct platform_device *pdev) static int mdio_mux_iproc_remove(struct platform_device *pdev) { - struct iproc_mdiomux_desc *md = dev_get_platdata(&pdev->dev); + struct iproc_mdiomux_desc *md = platform_get_drvdata(pdev); mdio_mux_uninit(md->mux_handle); mdiobus_unregister(md->mii_bus); From f6488f40a8e40fa464a9433d5990c8ebfccdd72b Mon Sep 17 00:00:00 2001 From: Eugeniy Paltsev Date: Thu, 26 Jul 2018 15:05:37 +0300 Subject: [PATCH 465/970] NET: stmmac: align DMA stuff to largest cache line length [ Upstream commit 9939a46d90c6c76f4533d534dbadfa7b39dc6acc ] As for today STMMAC_ALIGN macro (which is used to align DMA stuff) relies on L1 line length (L1_CACHE_BYTES). This isn't correct in case of system with several cache levels which might have L1 cache line length smaller than L2 line. This can lead to sharing one cache line between DMA buffer and other data, so we can lose this data while invalidate DMA buffer before DMA transaction. Fix that by using SMP_CACHE_BYTES instead of L1_CACHE_BYTES for aligning. Signed-off-by: Eugeniy Paltsev Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index b3bc1287b2a7..0df71865fab1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -55,7 +55,7 @@ #include #include "dwmac1000.h" -#define STMMAC_ALIGN(x) L1_CACHE_ALIGN(x) +#define STMMAC_ALIGN(x) __ALIGN_KERNEL(x, SMP_CACHE_BYTES) #define TSO_MAX_BUFF_SIZE (SZ_16K - 1) /* Module parameters */ From b3e349fd557f1fd9652d535e47b96da049cdbf2a Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Fri, 27 Jul 2018 17:19:12 -0400 Subject: [PATCH 466/970] tcp_bbr: fix bw probing to raise in-flight data for very small BDPs [ Upstream commit 383d470936c05554219094a4d364d964cb324827 ] For some very small BDPs (with just a few packets) there was a quantization effect where the target number of packets in flight during the super-unity-gain (1.25x) phase of gain cycling was implicitly truncated to a number of packets no larger than the normal unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow scenarios some flows could get stuck with a lower bandwidth, because they did not push enough packets inflight to discover that there was more bandwidth available. This was really only an issue in multi-flow LAN scenarios, where RTTs and BDPs are low enough for this to be an issue. This fix ensures that gain cycling can raise inflight for small BDPs by ensuring that in PROBE_BW mode target inflight values with a super-unity gain are always greater than inflight values with a gain <= 1. Importantly, this applies whether the inflight value is calculated for use as a cwnd value, or as a target inflight value for the end of the super-unity phase in bbr_is_next_cycle_phase() (both need to be bigger to ensure we can probe with more packets in flight reliably). This is a candidate fix for stable releases. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell Acked-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Acked-by: Priyaranjan Jha Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_bbr.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 9169859506b7..7e44d23b0328 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -324,6 +324,10 @@ static u32 bbr_target_cwnd(struct sock *sk, u32 bw, int gain) /* Reduce delayed ACKs by rounding up cwnd to the next even number. */ cwnd = (cwnd + 1) & ~1U; + /* Ensure gain cycling gets inflight above BDP even for small BDPs. */ + if (bbr->mode == BBR_PROBE_BW && gain > BBR_UNIT) + cwnd += 2; + return cwnd; } From b03ca669d5c152c26781ec821429564d98d5bd79 Mon Sep 17 00:00:00 2001 From: Xiao Liang Date: Fri, 27 Jul 2018 17:56:08 +0800 Subject: [PATCH 467/970] xen-netfront: wait xenbus state change when load module manually [ Upstream commit 822fb18a82abaf4ee7058793d95d340f5dab7bfc ] When loading module manually, after call xenbus_switch_state to initializes the state of the netfront device, the driver state did not change so fast that may lead no dev created in latest kernel. This patch adds wait to make sure xenbus knows the driver is not in closed/unknown state. Current state: [vm]# ethtool eth0 Settings for eth0: Link detected: yes [vm]# modprobe -r xen_netfront [vm]# modprobe xen_netfront [vm]# ethtool eth0 Settings for eth0: Cannot get device settings: No such device Cannot get wake-on-lan settings: No such device Cannot get message level: No such device Cannot get link status: No such device No data available With the patch installed. [vm]# ethtool eth0 Settings for eth0: Link detected: yes [vm]# modprobe -r xen_netfront [vm]# modprobe xen_netfront [vm]# ethtool eth0 Settings for eth0: Link detected: yes Signed-off-by: Xiao Liang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/xen-netfront.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index a5908e4c06cb..681256f97cb3 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -86,6 +86,7 @@ struct netfront_cb { /* IRQ name is queue name with "-tx" or "-rx" appended */ #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3) +static DECLARE_WAIT_QUEUE_HEAD(module_load_q); static DECLARE_WAIT_QUEUE_HEAD(module_unload_q); struct netfront_stats { @@ -1349,6 +1350,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev) netif_carrier_off(netdev); xenbus_switch_state(dev, XenbusStateInitialising); + wait_event(module_load_q, + xenbus_read_driver_state(dev->otherend) != + XenbusStateClosed && + xenbus_read_driver_state(dev->otherend) != + XenbusStateUnknown); return netdev; exit: From 8ca41e4efcfe241cad0fffe22e558399c0eabbd2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 17 May 2018 14:47:25 -0700 Subject: [PATCH 468/970] tcp: do not force quickack when receiving out-of-order packets [ Upstream commit a3893637e1eb0ef5eb1bbc52b3a8d2dfa317a35d ] As explained in commit 9f9843a751d0 ("tcp: properly handle stretch acks in slow start"), TCP stacks have to consider how many packets are acknowledged in one single ACK, because of GRO, but also because of ACK compression or losses. We plan to add SACK compression in the following patch, we must therefore not call tcp_enter_quickack_mode() Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 44d136fd2af5..9ea2167d5bdc 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4745,8 +4745,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt + tcp_receive_window(tp))) goto out_of_window; - tcp_enter_quickack_mode(sk); - if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) { /* Partial packet, seq < rcv_next < end_seq */ SOCK_DEBUG(sk, "partial packet: rcv_next %X seq %X - %X\n", From 90cf17d66500d9d30a144449b78ada0d2ff2b9d1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 21 May 2018 15:08:56 -0700 Subject: [PATCH 469/970] tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode [ Upstream commit 9a9c9b51e54618861420093ae6e9b50a961914c5 ] We want to add finer control of the number of ACK packets sent after ECN events. This patch is not changing current behavior, it only enables following change. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 2 +- net/ipv4/tcp_dctcp.c | 4 ++-- net/ipv4/tcp_input.c | 24 +++++++++++++----------- 3 files changed, 16 insertions(+), 14 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 5d440bb0e409..97d210535cdd 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -363,7 +363,7 @@ ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); -void tcp_enter_quickack_mode(struct sock *sk); +void tcp_enter_quickack_mode(struct sock *sk, unsigned int max_quickacks); static inline void tcp_dec_quickack_mode(struct sock *sk, const unsigned int pkts) { diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index dd52ccb812ea..8905a0aec8ee 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -138,7 +138,7 @@ static void dctcp_ce_state_0_to_1(struct sock *sk) */ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) __tcp_send_ack(sk, ca->prior_rcv_nxt); - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, 1); } ca->prior_rcv_nxt = tp->rcv_nxt; @@ -159,7 +159,7 @@ static void dctcp_ce_state_1_to_0(struct sock *sk) */ if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) __tcp_send_ack(sk, ca->prior_rcv_nxt); - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, 1); } ca->prior_rcv_nxt = tp->rcv_nxt; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9ea2167d5bdc..6cad57eda33b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -198,21 +198,23 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) } } -static void tcp_incr_quickack(struct sock *sk) +static void tcp_incr_quickack(struct sock *sk, unsigned int max_quickacks) { struct inet_connection_sock *icsk = inet_csk(sk); unsigned int quickacks = tcp_sk(sk)->rcv_wnd / (2 * icsk->icsk_ack.rcv_mss); if (quickacks == 0) quickacks = 2; + quickacks = min(quickacks, max_quickacks); if (quickacks > icsk->icsk_ack.quick) - icsk->icsk_ack.quick = min(quickacks, TCP_MAX_QUICKACKS); + icsk->icsk_ack.quick = quickacks; } -void tcp_enter_quickack_mode(struct sock *sk) +void tcp_enter_quickack_mode(struct sock *sk, unsigned int max_quickacks) { struct inet_connection_sock *icsk = inet_csk(sk); - tcp_incr_quickack(sk); + + tcp_incr_quickack(sk, max_quickacks); icsk->icsk_ack.pingpong = 0; icsk->icsk_ack.ato = TCP_ATO_MIN; } @@ -257,7 +259,7 @@ static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) * it is probably a retransmit. */ if (tp->ecn_flags & TCP_ECN_SEEN) - tcp_enter_quickack_mode((struct sock *)tp); + tcp_enter_quickack_mode((struct sock *)tp, TCP_MAX_QUICKACKS); break; case INET_ECN_CE: if (tcp_ca_needs_ecn((struct sock *)tp)) @@ -265,7 +267,7 @@ static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { /* Better not delay acks, sender can have a very low cwnd */ - tcp_enter_quickack_mode((struct sock *)tp); + tcp_enter_quickack_mode((struct sock *)tp, TCP_MAX_QUICKACKS); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } tp->ecn_flags |= TCP_ECN_SEEN; @@ -675,7 +677,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) /* The _first_ data packet received, initialize * delayed ACK engine. */ - tcp_incr_quickack(sk); + tcp_incr_quickack(sk, TCP_MAX_QUICKACKS); icsk->icsk_ack.ato = TCP_ATO_MIN; } else { int m = now - icsk->icsk_ack.lrcvtime; @@ -691,7 +693,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) /* Too long gap. Apparently sender failed to * restart window, so that we send ACKs quickly. */ - tcp_incr_quickack(sk); + tcp_incr_quickack(sk, TCP_MAX_QUICKACKS); sk_mem_reclaim(sk); } } @@ -4210,7 +4212,7 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb) if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) { NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKLOST); - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS); if (tcp_is_sack(tp) && sysctl_tcp_dsack) { u32 end_seq = TCP_SKB_CB(skb)->end_seq; @@ -4734,7 +4736,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq); out_of_window: - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS); inet_csk_schedule_ack(sk); drop: tcp_drop(sk, skb); @@ -5828,7 +5830,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, * to stand against the temptation 8) --ANK */ inet_csk_schedule_ack(sk); - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS); inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, TCP_DELACK_MAX, TCP_RTO_MAX); From 65d986cb5e14b686988258f8aa3784df7a3c969b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 21 May 2018 15:08:57 -0700 Subject: [PATCH 470/970] tcp: do not aggressively quick ack after ECN events [ Upstream commit 522040ea5fdd1c33bbf75e1d7c7c0422b96a94ef ] ECN signals currently forces TCP to enter quickack mode for up to 16 (TCP_MAX_QUICKACKS) following incoming packets. We believe this is not needed, and only sending one immediate ack for the current packet should be enough. This should reduce the extra load noticed in DCTCP environments, after congestion events. This is part 2 of our effort to reduce pure ACK packets. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 6cad57eda33b..4f39acf62e9d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -259,7 +259,7 @@ static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) * it is probably a retransmit. */ if (tp->ecn_flags & TCP_ECN_SEEN) - tcp_enter_quickack_mode((struct sock *)tp, TCP_MAX_QUICKACKS); + tcp_enter_quickack_mode((struct sock *)tp, 1); break; case INET_ECN_CE: if (tcp_ca_needs_ecn((struct sock *)tp)) @@ -267,7 +267,7 @@ static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { /* Better not delay acks, sender can have a very low cwnd */ - tcp_enter_quickack_mode((struct sock *)tp, TCP_MAX_QUICKACKS); + tcp_enter_quickack_mode((struct sock *)tp, 1); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } tp->ecn_flags |= TCP_ECN_SEEN; From 095ab5f46c0e084804c86406e45de3b01ac46058 Mon Sep 17 00:00:00 2001 From: Yousuk Seung Date: Mon, 4 Jun 2018 15:29:51 -0700 Subject: [PATCH 471/970] tcp: refactor tcp_ecn_check_ce to remove sk type cast [ Upstream commit f4c9f85f3b2cb7669830cd04d0be61192a4d2436 ] Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock* instead of tcp_sock* to clean up type casts. This is a pure refactor patch. Signed-off-by: Yousuk Seung Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4f39acf62e9d..ee6602dfaea3 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -250,8 +250,10 @@ static void tcp_ecn_withdraw_cwr(struct tcp_sock *tp) tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; } -static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) +static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) { + struct tcp_sock *tp = tcp_sk(sk); + switch (TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK) { case INET_ECN_NOT_ECT: /* Funny extension: if ECT is not set on a segment, @@ -259,31 +261,31 @@ static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) * it is probably a retransmit. */ if (tp->ecn_flags & TCP_ECN_SEEN) - tcp_enter_quickack_mode((struct sock *)tp, 1); + tcp_enter_quickack_mode(sk, 1); break; case INET_ECN_CE: - if (tcp_ca_needs_ecn((struct sock *)tp)) - tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_IS_CE); + if (tcp_ca_needs_ecn(sk)) + tcp_ca_event(sk, CA_EVENT_ECN_IS_CE); if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { /* Better not delay acks, sender can have a very low cwnd */ - tcp_enter_quickack_mode((struct sock *)tp, 1); + tcp_enter_quickack_mode(sk, 1); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } tp->ecn_flags |= TCP_ECN_SEEN; break; default: - if (tcp_ca_needs_ecn((struct sock *)tp)) - tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_NO_CE); + if (tcp_ca_needs_ecn(sk)) + tcp_ca_event(sk, CA_EVENT_ECN_NO_CE); tp->ecn_flags |= TCP_ECN_SEEN; break; } } -static void tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) +static void tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) { - if (tp->ecn_flags & TCP_ECN_OK) - __tcp_ecn_check_ce(tp, skb); + if (tcp_sk(sk)->ecn_flags & TCP_ECN_OK) + __tcp_ecn_check_ce(sk, skb); } static void tcp_ecn_rcv_synack(struct tcp_sock *tp, const struct tcphdr *th) @@ -699,7 +701,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) } icsk->icsk_ack.lrcvtime = now; - tcp_ecn_check_ce(tp, skb); + tcp_ecn_check_ce(sk, skb); if (skb->len >= 128) tcp_grow_window(sk, skb); @@ -4456,7 +4458,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) u32 seq, end_seq; bool fragstolen; - tcp_ecn_check_ce(tp, skb); + tcp_ecn_check_ce(sk, skb); if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFODROP); From 019ea5193fe4f9668671aa400a42b9305ad748c9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 27 Jun 2018 08:47:21 -0700 Subject: [PATCH 472/970] tcp: add one more quick ack after after ECN events [ Upstream commit 15ecbe94a45ef88491ca459b26efdd02f91edb6d ] Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/ tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink about our recent patch removing ~16 quick acks after ECN events. tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent, but in the case the sender cwnd was lowered to 1, we do not want to have a delayed ack for the next packet we will receive. Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events") Signed-off-by: Eric Dumazet Reported-by: Neal Cardwell Cc: Lawrence Brakmo Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ee6602dfaea3..a9be8df108b4 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -261,7 +261,7 @@ static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) * it is probably a retransmit. */ if (tp->ecn_flags & TCP_ECN_SEEN) - tcp_enter_quickack_mode(sk, 1); + tcp_enter_quickack_mode(sk, 2); break; case INET_ECN_CE: if (tcp_ca_needs_ecn(sk)) @@ -269,7 +269,7 @@ static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { /* Better not delay acks, sender can have a very low cwnd */ - tcp_enter_quickack_mode(sk, 1); + tcp_enter_quickack_mode(sk, 2); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } tp->ecn_flags |= TCP_ECN_SEEN; From d4c9c7c1eef0074bc9d34203da6e5089e7c647e0 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 24 Aug 2017 11:19:33 +0300 Subject: [PATCH 473/970] pinctrl: intel: Read back TX buffer state commit d68b42e30bbacd24354d644f430d088435b15e83 upstream. In the same way as it's done in pinctrl-cherryview.c we would provide a readback TX buffer state. Fixes: 17fab473693 ("pinctrl: intel: Set pin direction properly") Reported-by: "Bourque, Francis" Signed-off-by: Andy Shevchenko Acked-by: Mika Westerberg Tested-by: "Bourque, Francis" Signed-off-by: Linus Walleij Cc: Anthony de Boer Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/intel/pinctrl-intel.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pinctrl/intel/pinctrl-intel.c b/drivers/pinctrl/intel/pinctrl-intel.c index b40a074822cf..15aeeb2159cc 100644 --- a/drivers/pinctrl/intel/pinctrl-intel.c +++ b/drivers/pinctrl/intel/pinctrl-intel.c @@ -604,12 +604,17 @@ static int intel_gpio_get(struct gpio_chip *chip, unsigned offset) { struct intel_pinctrl *pctrl = gpiochip_get_data(chip); void __iomem *reg; + u32 padcfg0; reg = intel_get_padcfg(pctrl, offset, PADCFG0); if (!reg) return -EINVAL; - return !!(readl(reg) & PADCFG0_GPIORXSTATE); + padcfg0 = readl(reg); + if (!(padcfg0 & PADCFG0_GPIOTXDIS)) + return !!(padcfg0 & PADCFG0_GPIOTXSTATE); + + return !!(padcfg0 & PADCFG0_GPIORXSTATE); } static void intel_gpio_set(struct gpio_chip *chip, unsigned offset, int value) From 047f9d6a5672be91225422169f85f54df1d10c2c Mon Sep 17 00:00:00 2001 From: Boqun Feng Date: Thu, 15 Jun 2017 12:18:28 +0800 Subject: [PATCH 474/970] sched/wait: Remove the lockless swait_active() check in swake_up*() commit 35a2897c2a306cca344ca5c0b43416707018f434 upstream. Steven Rostedt reported a potential race in RCU core because of swake_up(): CPU0 CPU1 ---- ---- __call_rcu_core() { spin_lock(rnp_root) need_wake = __rcu_start_gp() { rcu_start_gp_advanced() { gp_flags = FLAG_INIT } } rcu_gp_kthread() { swait_event_interruptible(wq, gp_flags & FLAG_INIT) { spin_lock(q->lock) *fetch wq->task_list here! * list_add(wq->task_list, q->task_list) spin_unlock(q->lock); *fetch old value of gp_flags here * spin_unlock(rnp_root) rcu_gp_kthread_wake() { swake_up(wq) { swait_active(wq) { list_empty(wq->task_list) } * return false * if (condition) * false * schedule(); In this case, a wakeup is missed, which could cause the rcu_gp_kthread waits for a long time. The reason of this is that we do a lockless swait_active() check in swake_up(). To fix this, we can either 1) add a smp_mb() in swake_up() before swait_active() to provide the proper order or 2) simply remove the swait_active() in swake_up(). The solution 2 not only fixes this problem but also keeps the swait and wait API as close as possible, as wake_up() doesn't provide a full barrier and doesn't do a lockless check of the wait queue either. Moreover, there are users already using swait_active() to do their quick checks for the wait queues, so it make less sense that swake_up() and swake_up_all() do this on their own. This patch then removes the lockless swait_active() check in swake_up() and swake_up_all(). Reported-by: Steven Rostedt Signed-off-by: Boqun Feng Signed-off-by: Peter Zijlstra (Intel) Cc: Krister Johansen Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170615041828.zk3a3sfyudm5p6nl@tardis Signed-off-by: Ingo Molnar Cc: David Chen Signed-off-by: Greg Kroah-Hartman --- kernel/sched/swait.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c index 82f0dff90030..9c2da06a8869 100644 --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -33,9 +33,6 @@ void swake_up(struct swait_queue_head *q) { unsigned long flags; - if (!swait_active(q)) - return; - raw_spin_lock_irqsave(&q->lock, flags); swake_up_locked(q); raw_spin_unlock_irqrestore(&q->lock, flags); @@ -51,9 +48,6 @@ void swake_up_all(struct swait_queue_head *q) struct swait_queue *curr; LIST_HEAD(tmp); - if (!swait_active(q)) - return; - raw_spin_lock_irq(&q->lock); list_splice_init(&q->task_list, &tmp); while (!list_empty(&tmp)) { From 7142fdb6a924bc4fb1f1661b585d01ff095ef41b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 31 Jul 2018 06:30:54 -0700 Subject: [PATCH 475/970] bonding: avoid lockdep confusion in bond_get_stats() [ Upstream commit 7e2556e40026a1b0c16f37446ab398d5a5a892e4 ] syzbot found that the following sequence produces a LOCKDEP splat [1] ip link add bond10 type bond ip link add bond11 type bond ip link set bond11 master bond10 To fix this, we can use the already provided nest_level. This patch also provides correct nesting for dev->addr_list_lock [1] WARNING: possible recursive locking detected 4.18.0-rc6+ #167 Not tainted -------------------------------------------- syz-executor751/4439 is trying to acquire lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 but task is already holding lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&bond->stats_lock)->rlock); lock(&(&bond->stats_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor751/4439: #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215 stack backtrace: CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] check_deadlock kernel/locking/lockdep.c:1809 [inline] validate_chain kernel/locking/lockdep.c:2405 [inline] __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144 spin_lock include/linux/spinlock.h:310 [inline] bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169 rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611 rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268 rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300 rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline] rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716 notifier_call_chain+0x180/0x390 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735 call_netdevice_notifiers net/core/dev.c:1753 [inline] netdev_features_change net/core/dev.c:1321 [inline] netdev_change_features+0xb3/0x110 net/core/dev.c:7759 bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120 bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755 bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528 dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327 dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493 sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992 sock_ioctl+0x30d/0x680 net/socket.c:1093 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440859 Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859 RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003 RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390 R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Eric Dumazet Cc: Jay Vosburgh Cc: Veaceslav Falico Cc: Andy Gospodarek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/bonding/bond_main.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index f5fcc0850dac..8a5e0ae4e4c0 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1682,6 +1682,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) goto err_upper_unlink; } + bond->nest_level = dev_get_nest_level(bond_dev) + 1; + /* If the mode uses primary, then the following is handled by * bond_change_active_slave(). */ @@ -1729,7 +1731,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) if (bond_mode_uses_xmit_hash(bond)) bond_update_slave_arr(bond, NULL); - bond->nest_level = dev_get_nest_level(bond_dev); netdev_info(bond_dev, "Enslaving %s as %s interface with %s link\n", slave_dev->name, @@ -3359,6 +3360,13 @@ static void bond_fold_stats(struct rtnl_link_stats64 *_res, } } +static int bond_get_nest_level(struct net_device *bond_dev) +{ + struct bonding *bond = netdev_priv(bond_dev); + + return bond->nest_level; +} + static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev, struct rtnl_link_stats64 *stats) { @@ -3367,7 +3375,7 @@ static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev, struct list_head *iter; struct slave *slave; - spin_lock(&bond->stats_lock); + spin_lock_nested(&bond->stats_lock, bond_get_nest_level(bond_dev)); memcpy(stats, &bond->bond_stats, sizeof(*stats)); rcu_read_lock(); @@ -4163,6 +4171,7 @@ static const struct net_device_ops bond_netdev_ops = { .ndo_neigh_setup = bond_neigh_setup, .ndo_vlan_rx_add_vid = bond_vlan_rx_add_vid, .ndo_vlan_rx_kill_vid = bond_vlan_rx_kill_vid, + .ndo_get_lock_subclass = bond_get_nest_level, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_netpoll_setup = bond_netpoll_setup, .ndo_netpoll_cleanup = bond_netpoll_cleanup, @@ -4655,6 +4664,7 @@ static int bond_init(struct net_device *bond_dev) if (!bond->wq) return -ENOMEM; + bond->nest_level = SINGLE_DEPTH_NESTING; netdev_lockdep_set_classes(bond_dev); list_add_tail(&bond->bond_list, &bn->dev_list); From c5282a032fa2823b588b16f6f4bfa5fa2d350671 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 30 Jul 2018 20:09:11 -0700 Subject: [PATCH 476/970] inet: frag: enforce memory limits earlier [ Upstream commit 56e2c94f055d328f5f6b0a5c1721cca2f2d4e0a1 ] We currently check current frags memory usage only when a new frag queue is created. This allows attackers to first consume the memory budget (default : 4 MB) creating thousands of frag queues, then sending tiny skbs to exceed high_thresh limit by 2 to 3 order of magnitude. Note that before commit 648700f76b03 ("inet: frags: use rhashtables for reassembly units"), work queue could be starved under DOS, getting no cpu cycles. After commit 648700f76b03, only the per frag queue timer can eventually remove an incomplete frag queue and its skbs. Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Signed-off-by: Eric Dumazet Reported-by: Jann Horn Cc: Florian Westphal Cc: Peter Oskolkov Cc: Paolo Abeni Acked-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/inet_fragment.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 8effac0f2219..f8b41aaac76f 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -356,11 +356,6 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf, { struct inet_frag_queue *q; - if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) { - inet_frag_schedule_worker(f); - return NULL; - } - q = kmem_cache_zalloc(f->frags_cachep, GFP_ATOMIC); if (!q) return NULL; @@ -397,6 +392,11 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, struct inet_frag_queue *q; int depth = 0; + if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) { + inet_frag_schedule_worker(f); + return NULL; + } + if (frag_mem_limit(nf) > nf->low_thresh) inet_frag_schedule_worker(f); From d59dcdf13ee5b27a54940a05474230bda3a50edb Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 30 Jul 2018 21:50:29 -0700 Subject: [PATCH 477/970] ipv4: frags: handle possible skb truesize change [ Upstream commit 4672694bd4f1aebdab0ad763ae4716e89cb15221 ] ip_frag_queue() might call pskb_pull() on one skb that is already in the fragment queue. We need to take care of possible truesize change, or we might have an imbalance of the netns frags memory usage. IPv6 is immune to this bug, because RFC5722, Section 4, amended by Errata ID 3089 states : When reassembling an IPv6 datagram, if one or more its constituent fragments is determined to be an overlapping fragment, the entire datagram (and any constituent fragments) MUST be silently discarded. Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ip_fragment.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index 4bf3b8af0257..752711cd4834 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -446,11 +446,16 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) int i = end - FRAG_CB(next)->offset; /* overlap is 'i' bytes */ if (i < next->len) { + int delta = -next->truesize; + /* Eat head of the next overlapped fragment * and leave the loop. The next ones cannot overlap. */ if (!pskb_pull(next, i)) goto err; + delta += next->truesize; + if (delta) + add_frag_mem_limit(qp->q.net, delta); FRAG_CB(next)->offset += i; qp->q.meat -= i; if (next->ip_summed != CHECKSUM_UNNECESSARY) From ab9a0f80bce707fa4f8b4d63ad1f3b2a89079caa Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 31 Jul 2018 17:12:52 -0700 Subject: [PATCH 478/970] net: dsa: Do not suspend/resume closed slave_dev [ Upstream commit a94c689e6c9e72e722f28339e12dff191ee5a265 ] If a DSA slave network device was previously disabled, there is no need to suspend or resume it. Fixes: 2446254915a7 ("net: dsa: allow switch drivers to implement suspend/resume hooks") Signed-off-by: Florian Fainelli Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dsa/slave.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 5000e6f20f4a..339d9c678d3e 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1199,6 +1199,9 @@ int dsa_slave_suspend(struct net_device *slave_dev) { struct dsa_slave_priv *p = netdev_priv(slave_dev); + if (!netif_running(slave_dev)) + return 0; + netif_device_detach(slave_dev); if (p->phy) { @@ -1216,6 +1219,9 @@ int dsa_slave_resume(struct net_device *slave_dev) { struct dsa_slave_priv *p = netdev_priv(slave_dev); + if (!netif_running(slave_dev)) + return 0; + netif_device_attach(slave_dev); if (p->phy) { From 67f0a2887bcb587d4116d5896aaea8432ad04696 Mon Sep 17 00:00:00 2001 From: Jeremy Cline Date: Tue, 31 Jul 2018 21:13:16 +0000 Subject: [PATCH 479/970] netlink: Fix spectre v1 gadget in netlink_create() [ Upstream commit bc5b6c0b62b932626a135f516a41838c510c6eba ] 'protocol' is a user-controlled value, so sanitize it after the bounds check to avoid using it for speculative out-of-bounds access to arrays indexed by it. This addresses the following accesses detected with the help of smatch: * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_keys' [w] * net/netlink/af_netlink.c:654 __netlink_create() warn: potential spectre issue 'nlk_cb_mutex_key_strings' [w] * net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre issue 'nl_table' [w] (local cap) Cc: Josh Poimboeuf Signed-off-by: Jeremy Cline Reviewed-by: Josh Poimboeuf Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/netlink/af_netlink.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 15e6e7b9fd2b..8d0aafbdbbc3 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -62,6 +62,7 @@ #include #include #include +#include #include #include @@ -654,6 +655,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol, if (protocol < 0 || protocol >= MAX_LINKS) return -EPROTONOSUPPORT; + protocol = array_index_nospec(protocol, MAX_LINKS); netlink_lock_table(); #ifdef CONFIG_MODULES From c9bd4fd4b744089f368f3d85655e904a76c93c25 Mon Sep 17 00:00:00 2001 From: Jose Abreu Date: Tue, 31 Jul 2018 15:08:20 +0100 Subject: [PATCH 480/970] net: stmmac: Fix WoL for PCI-based setups [ Upstream commit b7d0f08e9129c45ed41bc0cfa8e77067881e45fd ] WoL won't work in PCI-based setups because we are not saving the PCI EP state before entering suspend state and not allowing D3 wake. Fix this by using a wrapper around stmmac_{suspend/resume} which correctly sets the PCI EP state. Signed-off-by: Jose Abreu Cc: David S. Miller Cc: Joao Pinto Cc: Giuseppe Cavallaro Cc: Alexandre Torgue Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- .../net/ethernet/stmicro/stmmac/stmmac_pci.c | 40 ++++++++++++++++++- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c index 56c8a2342c14..eafc28142cd2 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c @@ -183,7 +183,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev, return -ENOMEM; /* Enable pci device */ - ret = pcim_enable_device(pdev); + ret = pci_enable_device(pdev); if (ret) { dev_err(&pdev->dev, "%s: ERROR: failed to enable device\n", __func__); @@ -232,9 +232,45 @@ static int stmmac_pci_probe(struct pci_dev *pdev, static void stmmac_pci_remove(struct pci_dev *pdev) { stmmac_dvr_remove(&pdev->dev); + pci_disable_device(pdev); } -static SIMPLE_DEV_PM_OPS(stmmac_pm_ops, stmmac_suspend, stmmac_resume); +static int stmmac_pci_suspend(struct device *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + ret = stmmac_suspend(dev); + if (ret) + return ret; + + ret = pci_save_state(pdev); + if (ret) + return ret; + + pci_disable_device(pdev); + pci_wake_from_d3(pdev, true); + return 0; +} + +static int stmmac_pci_resume(struct device *dev) +{ + struct pci_dev *pdev = to_pci_dev(dev); + int ret; + + pci_restore_state(pdev); + pci_set_power_state(pdev, PCI_D0); + + ret = pci_enable_device(pdev); + if (ret) + return ret; + + pci_set_master(pdev); + + return stmmac_resume(dev); +} + +static SIMPLE_DEV_PM_OPS(stmmac_pm_ops, stmmac_pci_suspend, stmmac_pci_resume); #define STMMAC_VENDOR_ID 0x700 #define STMMAC_QUARK_ID 0x0937 From 3abef06039cb43e0fe44f3714969af0b9a744dc5 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 30 Jul 2018 14:27:15 -0700 Subject: [PATCH 481/970] squashfs: more metadata hardening commit d512584780d3e6a7cacb2f482834849453d444a1 upstream. Anatoly reports another squashfs fuzzing issue, where the decompression parameters themselves are in a compressed block. This causes squashfs_read_data() to be called in order to read the decompression options before the decompression stream having been set up, making squashfs go sideways. Reported-by: Anatoly Trosinenko Acked-by: Phillip Lougher Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/squashfs/block.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c index ce62a380314f..cec0fa208078 100644 --- a/fs/squashfs/block.c +++ b/fs/squashfs/block.c @@ -166,6 +166,8 @@ int squashfs_read_data(struct super_block *sb, u64 index, int length, } if (compressed) { + if (!msblk->stream) + goto read_failure; length = squashfs_decompress(msblk, bh, b, offset, length, output); if (length < 0) From 52cd8f3790cf1e71b6b38b63735042a014a3ff8a Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 2 Aug 2018 08:43:35 -0700 Subject: [PATCH 482/970] squashfs: more metadata hardenings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 71755ee5350b63fb1f283de8561cdb61b47f4d1d upstream. The squashfs fragment reading code doesn't actually verify that the fragment is inside the fragment table. The end result _is_ verified to be inside the image when actually reading the fragment data, but before that is done, we may end up taking a page fault because the fragment table itself might not even exist. Another report from Anatoly and his endless squashfs image fuzzing. Reported-by: Анатолий Тросиненко Acked-by:: Phillip Lougher , Cc: Willy Tarreau Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/squashfs/fragment.c | 13 +++++++++---- fs/squashfs/squashfs_fs_sb.h | 1 + fs/squashfs/super.c | 5 +++-- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/squashfs/fragment.c b/fs/squashfs/fragment.c index 86ad9a4b8c36..0681feab4a84 100644 --- a/fs/squashfs/fragment.c +++ b/fs/squashfs/fragment.c @@ -49,11 +49,16 @@ int squashfs_frag_lookup(struct super_block *sb, unsigned int fragment, u64 *fragment_block) { struct squashfs_sb_info *msblk = sb->s_fs_info; - int block = SQUASHFS_FRAGMENT_INDEX(fragment); - int offset = SQUASHFS_FRAGMENT_INDEX_OFFSET(fragment); - u64 start_block = le64_to_cpu(msblk->fragment_index[block]); + int block, offset, size; struct squashfs_fragment_entry fragment_entry; - int size; + u64 start_block; + + if (fragment >= msblk->fragments) + return -EIO; + block = SQUASHFS_FRAGMENT_INDEX(fragment); + offset = SQUASHFS_FRAGMENT_INDEX_OFFSET(fragment); + + start_block = le64_to_cpu(msblk->fragment_index[block]); size = squashfs_read_metadata(sb, &fragment_entry, &start_block, &offset, sizeof(fragment_entry)); diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h index 1da565cb50c3..ef69c31947bf 100644 --- a/fs/squashfs/squashfs_fs_sb.h +++ b/fs/squashfs/squashfs_fs_sb.h @@ -75,6 +75,7 @@ struct squashfs_sb_info { unsigned short block_log; long long bytes_used; unsigned int inodes; + unsigned int fragments; int xattr_ids; }; #endif diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c index cf01e15a7b16..1516bb779b8d 100644 --- a/fs/squashfs/super.c +++ b/fs/squashfs/super.c @@ -175,6 +175,7 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent) msblk->inode_table = le64_to_cpu(sblk->inode_table_start); msblk->directory_table = le64_to_cpu(sblk->directory_table_start); msblk->inodes = le32_to_cpu(sblk->inodes); + msblk->fragments = le32_to_cpu(sblk->fragments); flags = le16_to_cpu(sblk->flags); TRACE("Found valid superblock on %pg\n", sb->s_bdev); @@ -185,7 +186,7 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent) TRACE("Filesystem size %lld bytes\n", msblk->bytes_used); TRACE("Block size %d\n", msblk->block_size); TRACE("Number of inodes %d\n", msblk->inodes); - TRACE("Number of fragments %d\n", le32_to_cpu(sblk->fragments)); + TRACE("Number of fragments %d\n", msblk->fragments); TRACE("Number of ids %d\n", le16_to_cpu(sblk->no_ids)); TRACE("sblk->inode_table_start %llx\n", msblk->inode_table); TRACE("sblk->directory_table_start %llx\n", msblk->directory_table); @@ -272,7 +273,7 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent) sb->s_export_op = &squashfs_export_ops; handle_fragments: - fragments = le32_to_cpu(sblk->fragments); + fragments = msblk->fragments; if (fragments == 0) goto check_directory_table; From 18d971807db59b4f31a4cdde30c2353f9d4048d1 Mon Sep 17 00:00:00 2001 From: Anton Vasilyev Date: Fri, 27 Jul 2018 18:50:42 +0300 Subject: [PATCH 483/970] can: ems_usb: Fix memory leak on ems_usb_disconnect() commit 72c05f32f4a5055c9c8fe889bb6903ec959c0aad upstream. ems_usb_probe() allocates memory for dev->tx_msg_buffer, but there is no its deallocation in ems_usb_disconnect(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Vasilyev Cc: Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/ems_usb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/usb/ems_usb.c b/drivers/net/can/usb/ems_usb.c index b00358297424..d0846ae9e0e4 100644 --- a/drivers/net/can/usb/ems_usb.c +++ b/drivers/net/can/usb/ems_usb.c @@ -1071,6 +1071,7 @@ static void ems_usb_disconnect(struct usb_interface *intf) usb_free_urb(dev->intr_urb); kfree(dev->intr_in_buffer); + kfree(dev->tx_msg_buffer); } } From 9a492f8c716465a8ed40590f7cbcd2ad6ff1b6c3 Mon Sep 17 00:00:00 2001 From: Jeremy Cline Date: Fri, 27 Jul 2018 22:43:01 +0000 Subject: [PATCH 484/970] net: socket: fix potential spectre v1 gadget in socketcall commit c8e8cd579bb4265651df8223730105341e61a2d1 upstream. 'call' is a user-controlled value, so sanitize the array index after the bounds check to avoid speculating past the bounds of the 'nargs' array. Found with the help of Smatch: net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue 'nargs' [r] (local cap) Cc: Josh Poimboeuf Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/socket.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/socket.c b/net/socket.c index bd3b33988ee0..35fa349ba274 100644 --- a/net/socket.c +++ b/net/socket.c @@ -89,6 +89,7 @@ #include #include #include +#include #include #include @@ -2338,6 +2339,7 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args) if (call < 1 || call > SYS_SENDMMSG) return -EINVAL; + call = array_index_nospec(call, SYS_SENDMMSG + 1); len = nargs[call]; if (len > sizeof(a)) From 1d433144592dffefbb9cbc9b0d853473c1b3157c Mon Sep 17 00:00:00 2001 From: Jiang Biao Date: Wed, 18 Jul 2018 10:29:28 +0800 Subject: [PATCH 485/970] virtio_balloon: fix another race between migration and ballooning commit 89da619bc18d79bca5304724c11d4ba3b67ce2c6 upstream. Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc #6 [ffff881ec7ed7838] __node_set at ffffffff81680300 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e22504296d4f64f ("virtio_balloon: introduce migration primitives to balloon pages") Cc: stable@vger.kernel.org Signed-off-by: Jiang Biao Signed-off-by: Huang Chong Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/virtio/virtio_balloon.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index a7c08cc4c1b7..30076956a096 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -493,7 +493,9 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, tell_host(vb, vb->inflate_vq); /* balloon's page migration 2nd step -- deflate "page" */ + spin_lock_irqsave(&vb_dev_info->pages_lock, flags); balloon_page_delete(page); + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; set_page_pfns(vb, vb->pfns, page); tell_host(vb, vb->deflate_vq); From 020a90f653dd02dbbae389da91f510d5f33984dc Mon Sep 17 00:00:00 2001 From: Roman Kagan Date: Thu, 19 Jul 2018 21:59:07 +0300 Subject: [PATCH 486/970] kvm: x86: vmx: fix vpid leak commit 63aff65573d73eb8dda4732ad4ef222dd35e4862 upstream. VPID for the nested vcpu is allocated at vmx_create_vcpu whenever nested vmx is turned on with the module parameter. However, it's only freed if the L1 guest has executed VMXON which is not a given. As a result, on a system with nested==on every creation+deletion of an L1 vcpu without running an L2 guest results in leaking one vpid. Since the total number of vpids is limited to 64k, they can eventually get exhausted, preventing L2 from starting. Delay allocation of the L2 vpid until VMXON emulation, thus matching its freeing. Fixes: 5c614b3583e7b6dab0c86356fa36c2bcbb8322a0 Cc: stable@vger.kernel.org Signed-off-by: Roman Kagan Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4e0292e0aafb..30b74b491909 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7085,6 +7085,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu) HRTIMER_MODE_REL_PINNED); vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; + vmx->nested.vpid02 = allocate_vpid(); + vmx->nested.vmxon = true; skip_emulated_instruction(vcpu); @@ -9264,10 +9266,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) goto free_vmcs; } - if (nested) { + if (nested) nested_vmx_setup_ctls_msrs(vmx); - vmx->nested.vpid02 = allocate_vpid(); - } vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull; @@ -9285,7 +9285,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return &vmx->vcpu; free_vmcs: - free_vpid(vmx->nested.vpid02); free_loaded_vmcs(vmx->loaded_vmcs); free_msrs: kfree(vmx->guest_msrs); From 804f510bf2182da8bdbe0135538c8f3f14670a67 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Fri, 13 Jul 2018 16:12:32 +0800 Subject: [PATCH 487/970] crypto: padlock-aes - Fix Nano workaround data corruption commit 46d8c4b28652d35dc6cfb5adf7f54e102fc04384 upstream. This was detected by the self-test thanks to Ard's chunking patch. I finally got around to testing this out on my ancient Via box. It turns out that the workaround got the assembly wrong and we end up doing count + initial cycles of the loop instead of just count. This obviously causes corruption, either by overwriting the source that is yet to be processed, or writing over the end of the buffer. On CPUs that don't require the workaround only ECB is affected. On Nano CPUs both ECB and CBC are affected. This patch fixes it by doing the subtraction prior to the assembly. Fixes: a76c1c23d0c3 ("crypto: padlock-aes - work around Nano CPU...") Cc: Reported-by: Jamie Heilman Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/padlock-aes.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/padlock-aes.c b/drivers/crypto/padlock-aes.c index 9126627cbf4d..75f2bef79718 100644 --- a/drivers/crypto/padlock-aes.c +++ b/drivers/crypto/padlock-aes.c @@ -266,6 +266,8 @@ static inline void padlock_xcrypt_ecb(const u8 *input, u8 *output, void *key, return; } + count -= initial; + if (initial) asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */ : "+S"(input), "+D"(output) @@ -273,7 +275,7 @@ static inline void padlock_xcrypt_ecb(const u8 *input, u8 *output, void *key, asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */ : "+S"(input), "+D"(output) - : "d"(control_word), "b"(key), "c"(count - initial)); + : "d"(control_word), "b"(key), "c"(count)); } static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key, @@ -284,6 +286,8 @@ static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key, if (count < cbc_fetch_blocks) return cbc_crypt(input, output, key, iv, control_word, count); + count -= initial; + if (initial) asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */ : "+S" (input), "+D" (output), "+a" (iv) @@ -291,7 +295,7 @@ static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key, asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */ : "+S" (input), "+D" (output), "+a" (iv) - : "d" (control_word), "b" (key), "c" (count-initial)); + : "d" (control_word), "b" (key), "c" (count)); return iv; } From e79a2db21eec75851baa05e62139f466e9d2c166 Mon Sep 17 00:00:00 2001 From: Boris Brezillon Date: Tue, 24 Jul 2018 15:36:01 +0200 Subject: [PATCH 488/970] drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formats commit a6a00918d4ad8718c3ccde38c02cec17f116b2fd upstream. This is needed to ensure ->is_unity is correct when the plane was previously configured to output a multi-planar format with scaling enabled, and is then being reconfigured to output a uniplanar format. Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.") Cc: Signed-off-by: Boris Brezillon Reviewed-by: Eric Anholt Link: https://patchwork.freedesktop.org/patch/msgid/20180724133601.32114-1-boris.brezillon@bootlin.com Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vc4/vc4_plane.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c index 75056553b06c..f8c9f6f4f822 100644 --- a/drivers/gpu/drm/vc4/vc4_plane.c +++ b/drivers/gpu/drm/vc4/vc4_plane.c @@ -350,6 +350,9 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state) vc4_state->x_scaling[0] = VC4_SCALING_TPZ; if (vc4_state->y_scaling[0] == VC4_SCALING_NONE) vc4_state->y_scaling[0] = VC4_SCALING_TPZ; + } else { + vc4_state->x_scaling[1] = VC4_SCALING_NONE; + vc4_state->y_scaling[1] = VC4_SCALING_NONE; } vc4_state->is_unity = (vc4_state->x_scaling[0] == VC4_SCALING_NONE && From 0ff94fb99e0ba3f60bb0e63f51db266a6b025d65 Mon Sep 17 00:00:00 2001 From: Tony Battersby Date: Thu, 12 Jul 2018 16:30:45 -0400 Subject: [PATCH 489/970] scsi: sg: fix minor memory leak in error path commit c170e5a8d222537e98aa8d4fddb667ff7a2ee114 upstream. Fix a minor memory leak when there is an error opening a /dev/sg device. Fixes: cc833acbee9d ("sg: O_EXCL and other lock handling") Cc: Reviewed-by: Ewan D. Milne Signed-off-by: Tony Battersby Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sg.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 2065a0f9dca6..8d9b416399f9 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -2185,6 +2185,7 @@ sg_add_sfp(Sg_device * sdp) write_lock_irqsave(&sdp->sfd_lock, iflags); if (atomic_read(&sdp->detaching)) { write_unlock_irqrestore(&sdp->sfd_lock, iflags); + kfree(sfp); return ERR_PTR(-ENODEV); } list_add_tail(&sfp->sfd_siblings, &sdp->sfds); From e01202b36f03f9284ffb51e911f6710d0a62c815 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 6 Aug 2018 16:23:04 +0200 Subject: [PATCH 490/970] Linux 4.9.118 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 773c26c95d98..0940f11fa071 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 117 +SUBLEVEL = 118 EXTRAVERSION = NAME = Roaring Lionus From f71d13c397b2133d5e5183e5db3547af271d04d6 Mon Sep 17 00:00:00 2001 From: Quinn Tran Date: Wed, 18 Jul 2018 14:29:54 -0700 Subject: [PATCH 491/970] scsi: qla2xxx: Fix ISP recovery on unload commit b08abbd9f5996309f021684f9ca74da30dcca36a upstream. During unload process, the chip can encounter problem where a FW dump would be captured. For this case, the full reset sequence will be skip to bring the chip back to full operational state. Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring") Cc: Signed-off-by: Quinn Tran Signed-off-by: Himanshu Madhani Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/qla2xxx/qla_os.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index baccd116f864..c813c9b75a10 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -5218,8 +5218,9 @@ qla2x00_do_dpc(void *data) } } - if (test_and_clear_bit(ISP_ABORT_NEEDED, - &base_vha->dpc_flags)) { + if (test_and_clear_bit + (ISP_ABORT_NEEDED, &base_vha->dpc_flags) && + !test_bit(UNLOADING, &base_vha->dpc_flags)) { ql_dbg(ql_dbg_dpc, base_vha, 0x4007, "ISP abort scheduled.\n"); From 24b79a95b2cb2d8cc5e65e76aecc334bb71e3523 Mon Sep 17 00:00:00 2001 From: Anil Gurumurthy Date: Wed, 18 Jul 2018 14:29:55 -0700 Subject: [PATCH 492/970] scsi: qla2xxx: Return error when TMF returns commit b4146c4929ef61d5afca011474d59d0918a0cd82 upstream. Propagate the task management completion status properly to avoid unnecessary waits for commands to complete. Fixes: faef62d13463 ("[SCSI] qla2xxx: Fix Task Management command asynchronous handling") Cc: Signed-off-by: Anil Gurumurthy Signed-off-by: Himanshu Madhani Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/qla2xxx/qla_init.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 34bbcfcae67c..5f66b6da65f2 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -329,11 +329,10 @@ qla2x00_async_tm_cmd(fc_port_t *fcport, uint32_t flags, uint32_t lun, wait_for_completion(&tm_iocb->u.tmf.comp); - rval = tm_iocb->u.tmf.comp_status == CS_COMPLETE ? - QLA_SUCCESS : QLA_FUNCTION_FAILED; + rval = tm_iocb->u.tmf.data; - if ((rval != QLA_SUCCESS) || tm_iocb->u.tmf.data) { - ql_dbg(ql_dbg_taskm, vha, 0x8030, + if (rval != QLA_SUCCESS) { + ql_log(ql_log_warn, vha, 0x8030, "TM IOCB failed (%x).\n", rval); } From eecd08afb0d49eaf5efb54a5deffdf72ada40feb Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 3 Aug 2018 14:44:59 +0200 Subject: [PATCH 493/970] genirq: Make force irq threading setup more robust commit d1f0301b3333eef5efbfa1fe0f0edbea01863d5d upstream. The support of force threading interrupts which are set up with both a primary and a threaded handler wreckaged the setup of regular requested threaded interrupts (primary handler == NULL). The reason is that it does not check whether the primary handler is set to the default handler which wakes the handler thread. Instead it replaces the thread handler with the primary handler as it would do with force threaded interrupts which have been requested via request_irq(). So both the primary and the thread handler become the same which then triggers the warnon that the thread handler tries to wakeup a not configured secondary thread. Fortunately this only happens when the driver omits the IRQF_ONESHOT flag when requesting the threaded interrupt, which is normaly caught by the sanity checks when force irq threading is disabled. Fix it by skipping the force threading setup when a regular threaded interrupt is requested. As a consequence the interrupt request which lacks the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging it. Fixes: 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary and thread handler") Reported-by: Kurt Kanzenbach Signed-off-by: Thomas Gleixner Tested-by: Kurt Kanzenbach Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- kernel/irq/manage.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 5927da596d42..e121645bb8a1 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1026,6 +1026,13 @@ static int irq_setup_forced_threading(struct irqaction *new) if (new->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)) return 0; + /* + * No further action required for interrupts which are requested as + * threaded interrupts already + */ + if (new->handler == irq_default_primary_handler) + return 0; + new->flags |= IRQF_ONESHOT; /* @@ -1033,7 +1040,7 @@ static int irq_setup_forced_threading(struct irqaction *new) * thread handler. We force thread them as well by creating a * secondary action. */ - if (new->handler != irq_default_primary_handler && new->thread_fn) { + if (new->handler && new->thread_fn) { /* Allocate the secondary action */ new->secondary = kzalloc(sizeof(struct irqaction), GFP_KERNEL); if (!new->secondary) From f4a9db57e7dbf929f5262a01c4f5fa240ff79688 Mon Sep 17 00:00:00 2001 From: Anna-Maria Gleixner Date: Tue, 31 Jul 2018 18:13:58 +0200 Subject: [PATCH 494/970] nohz: Fix local_timer_softirq_pending() commit 80d20d35af1edd632a5e7a3b9c0ab7ceff92769e upstream. local_timer_softirq_pending() checks whether the timer softirq is pending with: local_softirq_pending() & TIMER_SOFTIRQ. This is wrong because TIMER_SOFTIRQ is the softirq number and not a bitmask. So the test checks for the wrong bit. Use BIT(TIMER_SOFTIRQ) instead. Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()") Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Reviewed-by: Daniel Bristot de Oliveira Acked-by: Frederic Weisbecker Cc: bigeasy@linutronix.de Cc: peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de Signed-off-by: Greg Kroah-Hartman --- kernel/time/tick-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index dae1a45be504..b6bebe28a3e0 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -665,7 +665,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) static inline bool local_timer_softirq_pending(void) { - return local_softirq_pending() & TIMER_SOFTIRQ; + return local_softirq_pending() & BIT(TIMER_SOFTIRQ); } static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, From 4f08437d6cc3c243c868207a474496fb253c10ff Mon Sep 17 00:00:00 2001 From: Dmitry Safonov Date: Fri, 27 Jul 2018 16:54:44 +0100 Subject: [PATCH 495/970] netlink: Do not subscribe to non-existent groups [ Upstream commit 7acf9d4237c46894e0fa0492dd96314a41742e84 ] Make ABI more strict about subscribing to group > ngroups. Code doesn't check for that and it looks bogus. (one can subscribe to non-existing group) Still, it's possible to bind() to all possible groups with (-1) Cc: "David S. Miller" Cc: Herbert Xu Cc: Steffen Klassert Cc: netdev@vger.kernel.org Signed-off-by: Dmitry Safonov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/netlink/af_netlink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 8d0aafbdbbc3..cbcc92ebe97a 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -985,6 +985,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, if (err) return err; } + groups &= (1UL << nlk->ngroups) - 1; bound = nlk->bound; if (bound) { From 4d502572ea7da97550891c2b0bcd36fa4eb33401 Mon Sep 17 00:00:00 2001 From: Dmitry Safonov Date: Mon, 30 Jul 2018 18:32:36 +0100 Subject: [PATCH 496/970] netlink: Don't shift with UB on nlk->ngroups [ Upstream commit 61f4b23769f0cc72ae62c9a81cf08f0397d40da8 ] On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in hang during boot. Check for 0 ngroups and use (unsigned long long) as a type to shift. Fixes: 7acf9d4237c4 ("netlink: Do not subscribe to non-existent groups"). Reported-by: kernel test robot Signed-off-by: Dmitry Safonov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/netlink/af_netlink.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index cbcc92ebe97a..26ff1c7bbcfa 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -985,7 +985,11 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, if (err) return err; } - groups &= (1UL << nlk->ngroups) - 1; + + if (nlk->ngroups == 0) + groups = 0; + else + groups &= (1ULL << nlk->ngroups) - 1; bound = nlk->bound; if (bound) { From c68c772262d9b460ddd1b1d68e06ebe3254d0ead Mon Sep 17 00:00:00 2001 From: Dmitry Safonov Date: Sun, 5 Aug 2018 01:35:53 +0100 Subject: [PATCH 497/970] netlink: Don't shift on 64 for ngroups commit 91874ecf32e41b5d86a4cb9d60e0bee50d828058 upstream. It's legal to have 64 groups for netlink_sock. As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe only to first 32 groups. The check for correctness of .bind() userspace supplied parameter is done by applying mask made from ngroups shift. Which broke Android as they have 64 groups and the shift for mask resulted in an overflow. Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups") Cc: "David S. Miller" Cc: Herbert Xu Cc: Steffen Klassert Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Reported-and-Tested-by: Nathan Chancellor Signed-off-by: Dmitry Safonov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/netlink/af_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 26ff1c7bbcfa..025487436438 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -988,8 +988,8 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, if (nlk->ngroups == 0) groups = 0; - else - groups &= (1ULL << nlk->ngroups) - 1; + else if (nlk->ngroups < 8*sizeof(groups)) + groups &= (1UL << nlk->ngroups) - 1; bound = nlk->bound; if (bound) { From 9bf8d5bf50510a6898943dbb397b7cb529301614 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 8 Jul 2018 19:35:02 -0400 Subject: [PATCH 498/970] ext4: fix false negatives *and* false positives in ext4_check_descriptors() commit 44de022c4382541cebdd6de4465d1f4f465ff1dd upstream. Ext4_check_descriptors() was getting called before s_gdb_count was initialized. So for file systems w/o the meta_bg feature, allocation bitmaps could overlap the block group descriptors and ext4 wouldn't notice. For file systems with the meta_bg feature enabled, there was a fencepost error which would cause the ext4_check_descriptors() to incorrectly believe that the block allocation bitmap overlaps with the block group descriptor blocks, and it would reject the mount. Fix both of these problems. Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org Signed-off-by: Benjamin Gilbert Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 41ef83471ea5..6cbb0f7ead2f 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2231,7 +2231,7 @@ static int ext4_check_descriptors(struct super_block *sb, struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t first_block = le32_to_cpu(sbi->s_es->s_first_data_block); ext4_fsblk_t last_block; - ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0) + 1; + ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0); ext4_fsblk_t block_bitmap; ext4_fsblk_t inode_bitmap; ext4_fsblk_t inode_table; @@ -3941,13 +3941,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) goto failed_mount2; } } + sbi->s_gdb_count = db_count; if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) { ext4_msg(sb, KERN_ERR, "group descriptors corrupted!"); ret = -EFSCORRUPTED; goto failed_mount2; } - sbi->s_gdb_count = db_count; get_random_bytes(&sbi->s_next_generation, sizeof(u32)); spin_lock_init(&sbi->s_next_gen_lock); From b209a097ca39ea586588d77ce680826103020af7 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Thu, 14 Sep 2017 16:50:14 +0200 Subject: [PATCH 499/970] ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle commit a0040c0145945d3bd203df8fa97f6dfa819f3f7d upstream. Hyper-V instances support PCI pass-through which is implemented through PV pci-hyperv driver. When a device is passed through, a new root PCI bus is created in the guest. The bus sits on top of VMBus and has no associated information in ACPI. acpi_pci_add_bus() in this case proceeds all the way to acpi_evaluate_dsm(), which reports ACPI: \: failed to evaluate _DSM (0x1001) While acpi_pci_slot_enumerate() and acpiphp_enumerate_slots() are protected against ACPI_HANDLE() being NULL and do nothing, acpi_evaluate_dsm() is not and gives us the error. It seems the correct fix is to not do anything in acpi_pci_add_bus() in such cases. Signed-off-by: Vitaly Kuznetsov Signed-off-by: Bjorn Helgaas Cc: Sinan Kaya Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pci-acpi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index d966d47c9e80..d38d379bb5c8 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -567,7 +567,7 @@ void acpi_pci_add_bus(struct pci_bus *bus) union acpi_object *obj; struct pci_host_bridge *bridge; - if (acpi_pci_disabled || !bus->bridge) + if (acpi_pci_disabled || !bus->bridge || !ACPI_HANDLE(bus->bridge)) return; acpi_pci_slot_enumerate(bus); From a26030a63e041c5d81df356d0f8a5d8e7bad25e3 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sat, 14 Jul 2018 01:28:15 +0900 Subject: [PATCH 500/970] ring_buffer: tracing: Inherit the tracing setting to next ring buffer commit 73c8d8945505acdcbae137c2e00a1232e0be709f upstream. Maintain the tracing on/off setting of the ring_buffer when switching to the trace buffer snapshot. Taking a snapshot is done by swapping the backup ring buffer (max_tr_buffer). But since the tracing on/off setting is defined by the ring buffer, when swapping it, the tracing on/off setting can also be changed. This causes a strange result like below: /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 0 > tracing_on /sys/kernel/debug/tracing # cat tracing_on 0 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 0 We don't touch tracing_on, but snapshot changes tracing_on setting each time. This is an anomaly, because user doesn't know that each "ring_buffer" stores its own tracing-enable state and the snapshot is done by swapping ring buffers. Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox Cc: Ingo Molnar Cc: Shuah Khan Cc: Tom Zanussi Cc: Hiraku Toyooka Cc: stable@vger.kernel.org Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Masami Hiramatsu [ Updated commit log and comment in the code ] Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- include/linux/ring_buffer.h | 1 + kernel/trace/ring_buffer.c | 16 ++++++++++++++++ kernel/trace/trace.c | 6 ++++++ 3 files changed, 23 insertions(+) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 4acc552e9279..19d0778ec382 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -162,6 +162,7 @@ void ring_buffer_record_enable(struct ring_buffer *buffer); void ring_buffer_record_off(struct ring_buffer *buffer); void ring_buffer_record_on(struct ring_buffer *buffer); int ring_buffer_record_is_on(struct ring_buffer *buffer); +int ring_buffer_record_is_set_on(struct ring_buffer *buffer); void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu); void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 3e1d11f4fe44..dc29b600d2cb 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3136,6 +3136,22 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer) return !atomic_read(&buffer->record_disabled); } +/** + * ring_buffer_record_is_set_on - return true if the ring buffer is set writable + * @buffer: The ring buffer to see if write is set enabled + * + * Returns true if the ring buffer is set writable by ring_buffer_record_on(). + * Note that this does NOT mean it is in a writable state. + * + * It may return true when the ring buffer has been disabled by + * ring_buffer_record_disable(), as that is a temporary disabling of + * the ring buffer. + */ +int ring_buffer_record_is_set_on(struct ring_buffer *buffer) +{ + return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF); +} + /** * ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer * @buffer: The ring buffer to stop writes to. diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 15b02645ce8b..901c7f15f6e2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1323,6 +1323,12 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu) arch_spin_lock(&tr->max_lock); + /* Inherit the recordable setting from trace_buffer */ + if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer)) + ring_buffer_record_on(tr->max_buffer.buffer); + else + ring_buffer_record_off(tr->max_buffer.buffer); + buf = tr->trace_buffer.buffer; tr->trace_buffer.buffer = tr->max_buffer.buffer; tr->max_buffer.buffer = buf; From 7f8d5ff5eadc9cb90c8af1a3b4177ee6abe5eed3 Mon Sep 17 00:00:00 2001 From: Esben Haabendal Date: Mon, 9 Jul 2018 11:43:01 +0200 Subject: [PATCH 501/970] i2c: imx: Fix reinit_completion() use MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 9f9e3e0d4dd3338b3f3dde080789f71901e1e4ff upstream. Make sure to call reinit_completion() before dma is started to avoid race condition where reinit_completion() is called after complete() and before wait_for_completion_timeout(). Signed-off-by: Esben Haabendal Fixes: ce1a78840ff7 ("i2c: imx: add DMA support for freescale i2c driver") Reviewed-by: Uwe Kleine-König Signed-off-by: Wolfram Sang Cc: stable@kernel.org Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-imx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c index 47fc1f1acff7..4d297d554e52 100644 --- a/drivers/i2c/busses/i2c-imx.c +++ b/drivers/i2c/busses/i2c-imx.c @@ -376,6 +376,7 @@ static int i2c_imx_dma_xfer(struct imx_i2c_struct *i2c_imx, goto err_desc; } + reinit_completion(&dma->cmd_complete); txdesc->callback = i2c_imx_dma_callback; txdesc->callback_param = i2c_imx; if (dma_submit_error(dmaengine_submit(txdesc))) { @@ -619,7 +620,6 @@ static int i2c_imx_dma_write(struct imx_i2c_struct *i2c_imx, * The first byte must be transmitted by the CPU. */ imx_i2c_write_reg(msgs->addr << 1, i2c_imx, IMX_I2C_I2DR); - reinit_completion(&i2c_imx->dma->cmd_complete); time_left = wait_for_completion_timeout( &i2c_imx->dma->cmd_complete, msecs_to_jiffies(DMA_TIMEOUT)); @@ -678,7 +678,6 @@ static int i2c_imx_dma_read(struct imx_i2c_struct *i2c_imx, if (result) return result; - reinit_completion(&i2c_imx->dma->cmd_complete); time_left = wait_for_completion_timeout( &i2c_imx->dma->cmd_complete, msecs_to_jiffies(DMA_TIMEOUT)); From b2486a81f6c1d8c79960574c06c1da31a034a95b Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Thu, 12 Jul 2018 01:36:43 +0100 Subject: [PATCH 502/970] Btrfs: fix file data corruption after cloning a range and fsync commit bd3599a0e142cd73edd3b6801068ac3f48ac771a upstream. When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified extents in the inode's extent map tree, so that a "fast" fsync (the flag BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps and log corresponding extent items. However, at the end of range cloning operation we do truncate all the pages in the affected range (in order to ensure future reads will not get stale data). Sometimes this truncation will release the corresponding extent maps besides the pages from the page cache. If this happens, then a "fast" fsync operation will miss logging some extent items, because it relies exclusively on the extent maps being present in the inode's extent tree, leading to data loss/corruption if the fsync ends up using the same transaction used by the clone operation (that transaction was not committed in the meanwhile). An extent map is released through the callback btrfs_invalidatepage(), which gets called by truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The later ends up calling try_release_extent_mapping() which will release the extent map if some conditions are met, like the file size being greater than 16Mb, gfp flags allow blocking and the range not being locked (which is the case during the clone operation) nor being the extent map flagged as pinned (also the case for cloning). The following example, turned into a test for fstests, reproduces the issue: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar # reflink destination offset corresponds to the size of file bar, # 2728Kb minus 4Kb. $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar $ md5sum /mnt/bar 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar $ mount /dev/sdb /mnt $ md5sum /mnt/bar 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the # power failure In the above example, the destination offset of the clone operation corresponds to the size of the "bar" file minus 4Kb. So during the clone operation, the extent map covering the range from 2572Kb to 2728Kb gets trimmed so that it ends at offset 2724Kb, and a new extent map covering the range from 2724Kb to 11724Kb is created. So at the end of the clone operation when we ask to truncate the pages in the range from 2724Kb to 2724Kb + 15908Kb, the page invalidation callback ends up removing the new extent map (through try_release_extent_mapping()) when the page at offset 2724Kb is passed to that callback. Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent map is removed at try_release_extent_mapping(), forcing the next fsync to search for modified extents in the fs/subvolume tree instead of relying on the presence of extent maps in memory. This way we can continue doing a "fast" fsync if the destination range of a clone operation does not overlap with an existing range or if any of the criteria necessary to remove an extent map at try_release_extent_mapping() is not met (file size not bigger then 16Mb or gfp flags do not allow blocking). CC: stable@vger.kernel.org # 3.16+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/extent_io.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 03ac3ab4b3b4..2b96ca68dc10 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4298,6 +4298,7 @@ int try_release_extent_mapping(struct extent_map_tree *map, struct extent_map *em; u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; + struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host); if (gfpflags_allow_blocking(mask) && page->mapping->host->i_size > SZ_16M) { @@ -4320,6 +4321,8 @@ int try_release_extent_mapping(struct extent_map_tree *map, extent_map_end(em) - 1, EXTENT_LOCKED | EXTENT_WRITEBACK, 0, NULL)) { + set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, + &btrfs_inode->runtime_flags); remove_extent_mapping(map, em); /* once for the rb tree */ free_extent_map(em); From 36ee106e844187e3fc612c9b87f12e5e23e9d8a5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 23 Jul 2018 09:28:21 -0700 Subject: [PATCH 503/970] tcp: add tcp_ooo_try_coalesce() helper commit 58152ecbbcc6a0ce7fddd5bf5f6ee535834ece0c upstream. In case skb in out_or_order_queue is the result of multiple skbs coalescing, we would like to get a proper gso_segs counter tracking, so that future tcp_drop() can report an accurate number. I chose to not implement this tracking for skbs in receive queue, since they are not dropped, unless socket is disconnected. Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Acked-by: Yuchung Cheng Signed-off-by: David S. Miller Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a9be8df108b4..9d0b73aa649f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4370,6 +4370,23 @@ static bool tcp_try_coalesce(struct sock *sk, return true; } +static bool tcp_ooo_try_coalesce(struct sock *sk, + struct sk_buff *to, + struct sk_buff *from, + bool *fragstolen) +{ + bool res = tcp_try_coalesce(sk, to, from, fragstolen); + + /* In case tcp_drop() is called later, update to->gso_segs */ + if (res) { + u32 gso_segs = max_t(u16, 1, skb_shinfo(to)->gso_segs) + + max_t(u16, 1, skb_shinfo(from)->gso_segs); + + skb_shinfo(to)->gso_segs = min_t(u32, gso_segs, 0xFFFF); + } + return res; +} + static void tcp_drop(struct sock *sk, struct sk_buff *skb) { sk_drops_add(sk, skb); @@ -4493,7 +4510,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) /* In the typical case, we are adding an skb to the end of the list. * Use of ooo_last_skb avoids the O(Log(N)) rbtree lookup. */ - if (tcp_try_coalesce(sk, tp->ooo_last_skb, skb, &fragstolen)) { + if (tcp_ooo_try_coalesce(sk, tp->ooo_last_skb, + skb, &fragstolen)) { coalesce_done: tcp_grow_window(sk, skb); kfree_skb_partial(skb, fragstolen); @@ -4543,7 +4561,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) tcp_drop(sk, skb1); goto merge_right; } - } else if (tcp_try_coalesce(sk, skb1, skb, &fragstolen)) { + } else if (tcp_ooo_try_coalesce(sk, skb1, + skb, &fragstolen)) { goto coalesce_done; } p = &parent->rb_right; From 885b49b4f31fdec212e6c5e9ad0845fab266d3cf Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Fri, 13 Oct 2017 15:58:22 -0700 Subject: [PATCH 504/970] kmemleak: clear stale pointers from task stacks commit ca182551857cc2c1e6a2b7f1e72090a137a15008 upstream. Kmemleak considers any pointers on task stacks as references. This patch clears newly allocated and reused vmap stacks. Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz Signed-off-by: Konstantin Khlebnikov Acked-by: Catalin Marinas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [ Srivatsa: Backported to 4.9.y ] Signed-off-by: Srivatsa S. Bhat Signed-off-by: Greg Kroah-Hartman --- include/linux/thread_info.h | 2 +- kernel/fork.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 2873baf5372a..cf87c162f4eb 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -59,7 +59,7 @@ extern long do_no_restart_syscall(struct restart_block *parm); #ifdef __KERNEL__ -#ifdef CONFIG_DEBUG_STACK_USAGE +#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) # define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ __GFP_ZERO) #else diff --git a/kernel/fork.c b/kernel/fork.c index 70e10cb49be0..c19e6d48d57d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -184,6 +184,10 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) continue; this_cpu_write(cached_stacks[i], NULL); +#ifdef CONFIG_DEBUG_KMEMLEAK + /* Clear stale pointers from reused stack. */ + memset(s->addr, 0, THREAD_SIZE); +#endif tsk->stack_vm_area = s; local_irq_enable(); return s->addr; From 6a19e26f11fb4192ec92d1f22ae8a1c252ebc272 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 20 Apr 2018 14:55:31 -0700 Subject: [PATCH 505/970] fork: unconditionally clear stack on fork commit e01e80634ecdde1dd113ac43b3adad21b47f3957 upstream. One of the classes of kernel stack content leaks[1] is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. In the face of stack content exposure flaws, those contents can leak to userspace. Fixing this will make the kernel no longer vulnerable to these flaws, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit. Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54 and after: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46 Instead of making this a build or runtime config, Andy Lutomirski recommended this just be enabled by default. [1] A noisy search for many kinds of stack content leaks can be seen here: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak I did some more with perf and cycle counts on running 100,000 execs of /bin/true. before: Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841 Mean: 221015379122.60 Std Dev: 4662486552.47 after: Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348 Mean: 217745009865.40 Std Dev: 5935559279.99 It continues to look like it's faster, though the deviation is rather wide, but I'm not sure what I could do that would be less noisy. I'm open to ideas! Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast Signed-off-by: Kees Cook Acked-by: Michal Hocko Reviewed-by: Andrew Morton Cc: Andy Lutomirski Cc: Laura Abbott Cc: Rasmus Villemoes Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [ Srivatsa: Backported to 4.9.y ] Signed-off-by: Srivatsa S. Bhat Reviewed-by: Srinidhi Rao Signed-off-by: Greg Kroah-Hartman --- include/linux/thread_info.h | 7 +------ kernel/fork.c | 3 +-- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index cf87c162f4eb..5e6436741f96 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -59,12 +59,7 @@ extern long do_no_restart_syscall(struct restart_block *parm); #ifdef __KERNEL__ -#if IS_ENABLED(CONFIG_DEBUG_STACK_USAGE) || IS_ENABLED(CONFIG_DEBUG_KMEMLEAK) -# define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | \ - __GFP_ZERO) -#else -# define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK) -#endif +#define THREADINFO_GFP (GFP_KERNEL_ACCOUNT | __GFP_NOTRACK | __GFP_ZERO) /* * flag set/clear/test wrappers diff --git a/kernel/fork.c b/kernel/fork.c index c19e6d48d57d..2c98b987808d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -184,10 +184,9 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) continue; this_cpu_write(cached_stacks[i], NULL); -#ifdef CONFIG_DEBUG_KMEMLEAK /* Clear stale pointers from reused stack. */ memset(s->addr, 0, THREAD_SIZE); -#endif + tsk->stack_vm_area = s; local_irq_enable(); return s->addr; From 34a5bbbb6ded0f1195cefc8d041bb51c99d3c242 Mon Sep 17 00:00:00 2001 From: "Michael J. Ruhl" Date: Wed, 20 Jun 2018 09:29:08 -0700 Subject: [PATCH 506/970] IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values commit b697d7d8c741f27b728a878fc55852b06d0f6f5e upstream. The __get_txreq() function can return a pointer, ERR_PTR(-EBUSY), or NULL. All of the relevant call sites look for IS_ERR, so the NULL return would lead to a NULL pointer exception. Do not use the ERR_PTR mechanism for this function. Update all call sites to handle the return value correctly. Clean up error paths to reflect return value. Fixes: 45842abbb292 ("staging/rdma/hfi1: move txreq header code") Cc: # 4.9.x+ Reported-by: Dan Carpenter Reviewed-by: Mike Marciniszyn Reviewed-by: Kamenee Arumugam Signed-off-by: Michael J. Ruhl Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/rc.c | 2 +- drivers/infiniband/hw/hfi1/uc.c | 4 ++-- drivers/infiniband/hw/hfi1/ud.c | 4 ++-- drivers/infiniband/hw/hfi1/verbs_txreq.c | 4 ++-- drivers/infiniband/hw/hfi1/verbs_txreq.h | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/rc.c b/drivers/infiniband/hw/hfi1/rc.c index 613074e963bb..e8e0fa58cb71 100644 --- a/drivers/infiniband/hw/hfi1/rc.c +++ b/drivers/infiniband/hw/hfi1/rc.c @@ -397,7 +397,7 @@ int hfi1_make_rc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) lockdep_assert_held(&qp->s_lock); ps->s_txreq = get_txreq(ps->dev, qp); - if (IS_ERR(ps->s_txreq)) + if (!ps->s_txreq) goto bail_no_tx; ohdr = &ps->s_txreq->phdr.hdr.u.oth; diff --git a/drivers/infiniband/hw/hfi1/uc.c b/drivers/infiniband/hw/hfi1/uc.c index 5e6d1bac4914..de21128a0181 100644 --- a/drivers/infiniband/hw/hfi1/uc.c +++ b/drivers/infiniband/hw/hfi1/uc.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2015, 2016 Intel Corporation. + * Copyright(c) 2015 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -72,7 +72,7 @@ int hfi1_make_uc_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) int middle = 0; ps->s_txreq = get_txreq(ps->dev, qp); - if (IS_ERR(ps->s_txreq)) + if (!ps->s_txreq) goto bail_no_tx; if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_SEND_OK)) { diff --git a/drivers/infiniband/hw/hfi1/ud.c b/drivers/infiniband/hw/hfi1/ud.c index 97ae24b6314c..1a7ce1d740ce 100644 --- a/drivers/infiniband/hw/hfi1/ud.c +++ b/drivers/infiniband/hw/hfi1/ud.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2015, 2016 Intel Corporation. + * Copyright(c) 2015 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -285,7 +285,7 @@ int hfi1_make_ud_req(struct rvt_qp *qp, struct hfi1_pkt_state *ps) u8 sc5; ps->s_txreq = get_txreq(ps->dev, qp); - if (IS_ERR(ps->s_txreq)) + if (!ps->s_txreq) goto bail_no_tx; if (!(ib_rvt_state_ops[qp->state] & RVT_PROCESS_NEXT_SEND_OK)) { diff --git a/drivers/infiniband/hw/hfi1/verbs_txreq.c b/drivers/infiniband/hw/hfi1/verbs_txreq.c index 094ab829ec42..d8a5bad49680 100644 --- a/drivers/infiniband/hw/hfi1/verbs_txreq.c +++ b/drivers/infiniband/hw/hfi1/verbs_txreq.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2016 Intel Corporation. + * Copyright(c) 2016 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -94,7 +94,7 @@ struct verbs_txreq *__get_txreq(struct hfi1_ibdev *dev, struct rvt_qp *qp) __must_hold(&qp->s_lock) { - struct verbs_txreq *tx = ERR_PTR(-EBUSY); + struct verbs_txreq *tx = NULL; write_seqlock(&dev->iowait_lock); if (ib_rvt_state_ops[qp->state] & RVT_PROCESS_RECV_OK) { diff --git a/drivers/infiniband/hw/hfi1/verbs_txreq.h b/drivers/infiniband/hw/hfi1/verbs_txreq.h index 5660897593ba..31ded57592ee 100644 --- a/drivers/infiniband/hw/hfi1/verbs_txreq.h +++ b/drivers/infiniband/hw/hfi1/verbs_txreq.h @@ -1,5 +1,5 @@ /* - * Copyright(c) 2016 Intel Corporation. + * Copyright(c) 2016 - 2018 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -82,7 +82,7 @@ static inline struct verbs_txreq *get_txreq(struct hfi1_ibdev *dev, if (unlikely(!tx)) { /* call slow path to get the lock */ tx = __get_txreq(dev, qp); - if (IS_ERR(tx)) + if (!tx) return tx; } tx->qp = qp; From 240d46556d5961c7100febbee0e058185b3c8d4f Mon Sep 17 00:00:00 2001 From: Shankara Pailoor Date: Tue, 5 Jun 2018 08:33:27 -0500 Subject: [PATCH 507/970] jfs: Fix inconsistency between memory allocation and ea_buf->max_size commit 92d34134193e5b129dc24f8d79cb9196626e8d7a upstream. The code is assuming the buffer is max_size length, but we weren't allocating enough space for it. Signed-off-by: Shankara Pailoor Signed-off-by: Dave Kleikamp Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- fs/jfs/xattr.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/jfs/xattr.c b/fs/jfs/xattr.c index c60f3d32ee91..a6797986b625 100644 --- a/fs/jfs/xattr.c +++ b/fs/jfs/xattr.c @@ -491,15 +491,17 @@ static int ea_get(struct inode *inode, struct ea_buffer *ea_buf, int min_size) if (size > PSIZE) { /* * To keep the rest of the code simple. Allocate a - * contiguous buffer to work with + * contiguous buffer to work with. Make the buffer large + * enough to make use of the whole extent. */ - ea_buf->xattr = kmalloc(size, GFP_KERNEL); + ea_buf->max_size = (size + sb->s_blocksize - 1) & + ~(sb->s_blocksize - 1); + + ea_buf->xattr = kmalloc(ea_buf->max_size, GFP_KERNEL); if (ea_buf->xattr == NULL) return -ENOMEM; ea_buf->flag = EA_MALLOC; - ea_buf->max_size = (size + sb->s_blocksize - 1) & - ~(sb->s_blocksize - 1); if (ea_size == 0) return 0; From 8f21ecb4249a0914aea08bef1befca9019a3b44b Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 9 Aug 2018 12:18:00 +0200 Subject: [PATCH 508/970] Linux 4.9.119 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 0940f11fa071..0723bbe1d4a7 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 118 +SUBLEVEL = 119 EXTRAVERSION = NAME = Roaring Lionus From 954e572ae2f26ec98a4d8c1c04ab91798ccace75 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sat, 28 Jul 2018 08:12:04 -0400 Subject: [PATCH 509/970] ext4: fix check to prevent initializing reserved inodes commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream. Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is valid" will complain if block group zero does not have the EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct, since a freshly created file system has this flag cleared. It gets almost immediately after the file system is mounted read-write --- but the following somewhat unlikely sequence will end up triggering a false positive report of a corrupted file system: mkfs.ext4 /dev/vdc mount -o ro /dev/vdc /vdc mount -o remount,rw /dev/vdc Instead, when initializing the inode table for block group zero, test to make sure that itable_unused count is not too large, since that is the case that will result in some or all of the reserved inodes getting cleared. This fixes the failures reported by Eric Whiteney when running generic/230 and generic/231 in the the nojournal test case. Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid") Reported-by: Eric Whitney Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ialloc.c | 5 ++++- fs/ext4/super.c | 8 +------- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index ffaf66a51de3..4f78e099de1d 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1316,7 +1316,10 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group, ext4_itable_unused_count(sb, gdp)), sbi->s_inodes_per_block); - if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group)) { + if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group) || + ((group == 0) && ((EXT4_INODES_PER_GROUP(sb) - + ext4_itable_unused_count(sb, gdp)) < + EXT4_FIRST_INO(sb)))) { ext4_error(sb, "Something is wrong with group %u: " "used itable blocks: %d; " "itable unused count: %u", diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 6cbb0f7ead2f..9d44b3683b46 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3031,14 +3031,8 @@ static ext4_group_t ext4_has_uninit_itable(struct super_block *sb) if (!gdp) continue; - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) - continue; - if (group != 0) + if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) break; - ext4_error(sb, "Inode table for bg 0 marked as " - "needing zeroing"); - if (sb->s_flags & MS_RDONLY) - return ngroups; } return group; From 1d4167a818e6f4a749b5f6ce54f98176276ed85b Mon Sep 17 00:00:00 2001 From: Tadeusz Struk Date: Tue, 22 May 2018 14:37:18 -0700 Subject: [PATCH 510/970] tpm: fix race condition in tpm_common_write() commit 3ab2011ea368ec3433ad49e1b9e1c7b70d2e65df upstream. There is a race condition in tpm_common_write function allowing two threads on the same /dev/tpm, or two different applications on the same /dev/tpmrm to overwrite each other commands/responses. Fixed this by taking the priv->buffer_mutex early in the function. Also converted the priv->data_pending from atomic to a regular size_t type. There is no need for it to be atomic since it is only touched under the protection of the priv->buffer_mutex. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Tadeusz Struk Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman --- drivers/char/tpm/tpm-dev.c | 43 ++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/drivers/char/tpm/tpm-dev.c b/drivers/char/tpm/tpm-dev.c index 65b824954bdc..1662e4688ee2 100644 --- a/drivers/char/tpm/tpm-dev.c +++ b/drivers/char/tpm/tpm-dev.c @@ -25,7 +25,7 @@ struct file_priv { struct tpm_chip *chip; /* Data passed to and from the tpm via the read/write calls */ - atomic_t data_pending; + size_t data_pending; struct mutex buffer_mutex; struct timer_list user_read_timer; /* user needs to claim result */ @@ -46,7 +46,7 @@ static void timeout_work(struct work_struct *work) struct file_priv *priv = container_of(work, struct file_priv, work); mutex_lock(&priv->buffer_mutex); - atomic_set(&priv->data_pending, 0); + priv->data_pending = 0; memset(priv->data_buffer, 0, sizeof(priv->data_buffer)); mutex_unlock(&priv->buffer_mutex); } @@ -72,7 +72,6 @@ static int tpm_open(struct inode *inode, struct file *file) } priv->chip = chip; - atomic_set(&priv->data_pending, 0); mutex_init(&priv->buffer_mutex); setup_timer(&priv->user_read_timer, user_reader_timeout, (unsigned long)priv); @@ -86,28 +85,24 @@ static ssize_t tpm_read(struct file *file, char __user *buf, size_t size, loff_t *off) { struct file_priv *priv = file->private_data; - ssize_t ret_size; + ssize_t ret_size = 0; int rc; del_singleshot_timer_sync(&priv->user_read_timer); flush_work(&priv->work); - ret_size = atomic_read(&priv->data_pending); - if (ret_size > 0) { /* relay data */ - ssize_t orig_ret_size = ret_size; - if (size < ret_size) - ret_size = size; + mutex_lock(&priv->buffer_mutex); - mutex_lock(&priv->buffer_mutex); + if (priv->data_pending) { + ret_size = min_t(ssize_t, size, priv->data_pending); rc = copy_to_user(buf, priv->data_buffer, ret_size); - memset(priv->data_buffer, 0, orig_ret_size); + memset(priv->data_buffer, 0, priv->data_pending); if (rc) ret_size = -EFAULT; - mutex_unlock(&priv->buffer_mutex); + priv->data_pending = 0; } - atomic_set(&priv->data_pending, 0); - + mutex_unlock(&priv->buffer_mutex); return ret_size; } @@ -118,18 +113,20 @@ static ssize_t tpm_write(struct file *file, const char __user *buf, size_t in_size = size; ssize_t out_size; - /* cannot perform a write until the read has cleared - either via tpm_read or a user_read_timer timeout. - This also prevents splitted buffered writes from blocking here. - */ - if (atomic_read(&priv->data_pending) != 0) - return -EBUSY; - if (in_size > TPM_BUFSIZE) return -E2BIG; mutex_lock(&priv->buffer_mutex); + /* Cannot perform a write until the read has cleared either via + * tpm_read or a user_read_timer timeout. This also prevents split + * buffered writes from blocking here. + */ + if (priv->data_pending != 0) { + mutex_unlock(&priv->buffer_mutex); + return -EBUSY; + } + if (copy_from_user (priv->data_buffer, (void __user *) buf, in_size)) { mutex_unlock(&priv->buffer_mutex); @@ -159,7 +156,7 @@ static ssize_t tpm_write(struct file *file, const char __user *buf, return out_size; } - atomic_set(&priv->data_pending, out_size); + priv->data_pending = out_size; mutex_unlock(&priv->buffer_mutex); /* Set a timeout by which the reader must come claim the result */ @@ -178,7 +175,7 @@ static int tpm_release(struct inode *inode, struct file *file) del_singleshot_timer_sync(&priv->user_read_timer); flush_work(&priv->work); file->private_data = NULL; - atomic_set(&priv->data_pending, 0); + priv->data_pending = 0; clear_bit(0, &priv->chip->is_open); kfree(priv); return 0; From 5f394c9ef67234750fe277bb9bfdcf99ebee41d8 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Sat, 28 Jul 2018 11:47:17 +0200 Subject: [PATCH 511/970] parisc: Enable CONFIG_MLONGCALLS by default commit 66509a276c8c1d19ee3f661a41b418d101c57d29 upstream. Enable the -mlong-calls compiler option by default, because otherwise in most cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F relocations. This fixes building the 64-bit defconfig. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index a14b86587013..3c37af11dab6 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -184,7 +184,7 @@ config PREFETCH config MLONGCALLS bool "Enable the -mlong-calls compiler option for big kernels" - def_bool y if (!MODULES) + default y depends on PA8X00 help If you configure the kernel to include many drivers built-in instead From 2106b21a8a59acd74cf5e473f71e75fdf03e3b99 Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Sun, 5 Aug 2018 13:30:31 -0400 Subject: [PATCH 512/970] parisc: Define mb() and add memory barriers to assembler unlock sequences MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit fedb8da96355f5f64353625bf96dc69423ad1826 upstream. For years I thought all parisc machines executed loads and stores in order. However, Jeff Law recently indicated on gcc-patches that this is not correct. There are various degrees of out-of-order execution all the way back to the PA7xxx processor series (hit-under-miss). The PA8xxx series has full out-of-order execution for both integer operations, and loads and stores. This is described in the following article: http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml For this reason, we need to define mb() and to insert a memory barrier before the store unlocking spinlocks. This ensures that all memory accesses are complete prior to unlocking. The ldcw instruction performs the same function on entry. Signed-off-by: John David Anglin Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/include/asm/barrier.h | 32 +++++++++++++++++++++++++++++++ arch/parisc/kernel/entry.S | 2 ++ arch/parisc/kernel/pacache.S | 1 + arch/parisc/kernel/syscall.S | 4 ++++ 4 files changed, 39 insertions(+) create mode 100644 arch/parisc/include/asm/barrier.h diff --git a/arch/parisc/include/asm/barrier.h b/arch/parisc/include/asm/barrier.h new file mode 100644 index 000000000000..dbaaca84f27f --- /dev/null +++ b/arch/parisc/include/asm/barrier.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_BARRIER_H +#define __ASM_BARRIER_H + +#ifndef __ASSEMBLY__ + +/* The synchronize caches instruction executes as a nop on systems in + which all memory references are performed in order. */ +#define synchronize_caches() __asm__ __volatile__ ("sync" : : : "memory") + +#if defined(CONFIG_SMP) +#define mb() do { synchronize_caches(); } while (0) +#define rmb() mb() +#define wmb() mb() +#define dma_rmb() mb() +#define dma_wmb() mb() +#else +#define mb() barrier() +#define rmb() barrier() +#define wmb() barrier() +#define dma_rmb() barrier() +#define dma_wmb() barrier() +#endif + +#define __smp_mb() mb() +#define __smp_rmb() mb() +#define __smp_wmb() mb() + +#include + +#endif /* !__ASSEMBLY__ */ +#endif /* __ASM_BARRIER_H */ diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index e3d3e8e1d708..015614405755 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -481,6 +481,8 @@ /* Release pa_tlb_lock lock without reloading lock address. */ .macro tlb_unlock0 spc,tmp #ifdef CONFIG_SMP + or,COND(=) %r0,\spc,%r0 + sync or,COND(=) %r0,\spc,%r0 stw \spc,0(\tmp) #endif diff --git a/arch/parisc/kernel/pacache.S b/arch/parisc/kernel/pacache.S index 67b0f7532e83..3e163df49cf3 100644 --- a/arch/parisc/kernel/pacache.S +++ b/arch/parisc/kernel/pacache.S @@ -354,6 +354,7 @@ ENDPROC_CFI(flush_data_cache_local) .macro tlb_unlock la,flags,tmp #ifdef CONFIG_SMP ldi 1,\tmp + sync stw \tmp,0(\la) mtsm \flags #endif diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S index e775f80ae28c..4886a6db42e9 100644 --- a/arch/parisc/kernel/syscall.S +++ b/arch/parisc/kernel/syscall.S @@ -633,6 +633,7 @@ cas_action: sub,<> %r28, %r25, %r0 2: stw,ma %r24, 0(%r26) /* Free lock */ + sync stw,ma %r20, 0(%sr2,%r20) #if ENABLE_LWS_DEBUG /* Clear thread register indicator */ @@ -647,6 +648,7 @@ cas_action: 3: /* Error occurred on load or store */ /* Free lock */ + sync stw %r20, 0(%sr2,%r20) #if ENABLE_LWS_DEBUG stw %r0, 4(%sr2,%r20) @@ -848,6 +850,7 @@ cas2_action: cas2_end: /* Free lock */ + sync stw,ma %r20, 0(%sr2,%r20) /* Enable interrupts */ ssm PSW_SM_I, %r0 @@ -858,6 +861,7 @@ cas2_end: 22: /* Error occurred on load or store */ /* Free lock */ + sync stw %r20, 0(%sr2,%r20) ssm PSW_SM_I, %r0 ldo 1(%r0),%r28 From 50bed434ad9c0d1d41a4b617935ac28c3fc778b8 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Fri, 20 Apr 2018 14:55:52 -0700 Subject: [PATCH 513/970] kasan: add no_sanitize attribute for clang builds commit 12c8f25a016dff69ee284aa3338bebfd2cfcba33 upstream. KASAN uses the __no_sanitize_address macro to disable instrumentation of particular functions. Right now it's defined only for GCC build, which causes false positives when clang is used. This patch adds a definition for clang. Note, that clang's revision 329612 or higher is required. [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check] Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: David Rientjes Cc: Thomas Gleixner Cc: Ingo Molnar Cc: David Woodhouse Cc: Andrey Konovalov Cc: Will Deacon Cc: Greg Kroah-Hartman Cc: Paul Lawrence Cc: Sandipan Das Cc: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Cc: Sodagudi Prasad Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler-clang.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 01225b0059b1..21c88a7ac23b 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -16,6 +16,9 @@ */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) +#undef __no_sanitize_address +#define __no_sanitize_address __attribute__((no_sanitize("address"))) + /* Clang doesn't have a way to turn it off per-function, yet. */ #ifdef __noretpoline #undef __noretpoline From fbf12e19c9f13374ded72893f34777c2fa41f75c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 8 Jan 2018 11:51:04 -0800 Subject: [PATCH 514/970] Mark HI and TASKLET softirq synchronous commit 3c53776e29f81719efcf8f7a6e30cdf753bee94d upstream. Way back in 4.9, we committed 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), and ever since we've had small nagging issues with it. For example, we've had: 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") all of which worked around some of the effects of that commit. The DVB people have also complained that the commit causes excessive USB URB latencies, which seems to be due to the USB code using tasklets to schedule USB traffic. This seems to be an issue mainly when already living on the edge, but waiting for ksoftirqd to handle it really does seem to cause excessive latencies. Now Hanna Hawa reports that this issue isn't just limited to USB URB and DVB, but also causes timeout problems for the Marvell SoC team: "I'm facing kernel panic issue while running raid 5 on sata disks connected to Macchiatobin (Marvell community board with Armada-8040 SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2) uses a tasklet to clean the done descriptors from the queue" The latency problem causes a panic: mv_xor_v2 f0400000.xor: dma_sync_wait: timeout! Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction We've discussed simply just reverting the original commit entirely, and also much more involved solutions (with per-softirq threads etc). This patch is intentionally stupid and fairly limited, because the issue still remains, and the other solutions either got sidetracked or had other issues. We should probably also consider the timer softirqs to be synchronous and not be delayed to ksoftirqd (since they were the issue with the earlier watchdog problems), but that should be done as a separate patch. This does only the tasklet cases. Reported-and-tested-by: Hanna Hawa Reported-and-tested-by: Josef Griebichler Reported-by: Mauro Carvalho Chehab Cc: Alan Stern Cc: Greg Kroah-Hartman Cc: Eric Dumazet Cc: Ingo Molnar Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/softirq.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index 744fa611cae0..d257e624be25 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -79,12 +79,16 @@ static void wakeup_softirqd(void) /* * If ksoftirqd is scheduled, we do not want to process pending softirqs - * right now. Let ksoftirqd handle this at its own rate, to get fairness. + * right now. Let ksoftirqd handle this at its own rate, to get fairness, + * unless we're doing some of the synchronous softirqs. */ -static bool ksoftirqd_running(void) +#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ)) +static bool ksoftirqd_running(unsigned long pending) { struct task_struct *tsk = __this_cpu_read(ksoftirqd); + if (pending & SOFTIRQ_NOW_MASK) + return false; return tsk && (tsk->state == TASK_RUNNING); } @@ -324,7 +328,7 @@ asmlinkage __visible void do_softirq(void) pending = local_softirq_pending(); - if (pending && !ksoftirqd_running()) + if (pending && !ksoftirqd_running(pending)) do_softirq_own_stack(); local_irq_restore(flags); @@ -351,7 +355,7 @@ void irq_enter(void) static inline void invoke_softirq(void) { - if (ksoftirqd_running()) + if (ksoftirqd_running(local_softirq_pending())) return; if (!force_irqthreads) { From af3bd8d6a9efcb782d44e537dc391970e0d70fc7 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 9 Aug 2018 16:42:16 +0200 Subject: [PATCH 515/970] xen/netfront: don't cache skb_shinfo() commit d472b3a6cf63cd31cae1ed61930f07e6cd6671b5 upstream. skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache its return value. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross Reviewed-by: Wei Liu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/xen-netfront.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 681256f97cb3..cd2c6ffdbdde 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -893,7 +893,6 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue, struct sk_buff *skb, struct sk_buff_head *list) { - struct skb_shared_info *shinfo = skb_shinfo(skb); RING_IDX cons = queue->rx.rsp_cons; struct sk_buff *nskb; @@ -902,15 +901,16 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue, RING_GET_RESPONSE(&queue->rx, ++cons); skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; - if (shinfo->nr_frags == MAX_SKB_FRAGS) { + if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) { unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; BUG_ON(pull_to <= skb_headlen(skb)); __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); } - BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS); + BUG_ON(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS); - skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag), + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, + skb_frag_page(nfrag), rx->offset, rx->status, PAGE_SIZE); skb_shinfo(nskb)->nr_frags = 0; From 51b3938e399bdf0cef090cea7b146c1ba9604ca2 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Thu, 26 Apr 2018 14:10:24 +0200 Subject: [PATCH 516/970] ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices commit fdcb613d49321b5bf5d5a1bd0fba8e7c241dcc70 upstream. The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set of private registers at offset 0x800, the current lpss_device_desc for them already sets the LPSS_SAVE_CTX flag to have these saved/restored over device-suspend, but the current lpss_device_desc was not setting the prv_offset field, leading to the regular device registers getting saved/restored instead. This is causing the PWM controller to no longer work, resulting in a black screen, after a suspend/resume on systems where the firmware clears the APB clock and reset bits at offset 0x804. This commit fixes this by properly setting prv_offset to 0x800 for the PWM devices. Cc: stable@vger.kernel.org Fixes: e1c748179754 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM") Fixes: 1bfbd8eb8a7f ("ACPI / LPSS: Add ACPI IDs for Intel Braswell") Signed-off-by: Hans de Goede Acked-by: Rafael J . Wysocki Signed-off-by: Thierry Reding Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/acpi_lpss.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 373657f7e35a..3cdd2c3a5bfc 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -187,10 +187,12 @@ static const struct lpss_device_desc lpt_sdio_dev_desc = { static const struct lpss_device_desc byt_pwm_dev_desc = { .flags = LPSS_SAVE_CTX, + .prv_offset = 0x800, }; static const struct lpss_device_desc bsw_pwm_dev_desc = { .flags = LPSS_SAVE_CTX | LPSS_NO_D3_DELAY, + .prv_offset = 0x800, }; static const struct lpss_device_desc byt_uart_dev_desc = { From bcf447f808b5e054d529eea01dcbabe6a576666a Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 2 Aug 2018 10:44:42 -0700 Subject: [PATCH 517/970] scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled commit 1214fd7b497400d200e3f4e64e2338b303a20949 upstream. Surround scsi_execute() calls with scsi_autopm_get_device() and scsi_autopm_put_device(). Note: removing sr_mutex protection from the scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of sr_mutex is to serialize cdrom_*() calls. This patch avoids that complaints similar to the following appear in the kernel log if runtime power management is enabled: INFO: task systemd-udevd:650 blocked for more than 120 seconds. Not tainted 4.18.0-rc7-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. systemd-udevd D28176 650 513 0x00000104 Call Trace: __schedule+0x444/0xfe0 schedule+0x4e/0xe0 schedule_preempt_disabled+0x18/0x30 __mutex_lock+0x41c/0xc70 mutex_lock_nested+0x1b/0x20 __blkdev_get+0x106/0x970 blkdev_get+0x22c/0x5a0 blkdev_open+0xe9/0x100 do_dentry_open.isra.19+0x33e/0x570 vfs_open+0x7c/0xd0 path_openat+0x6e3/0x1120 do_filp_open+0x11c/0x1c0 do_sys_open+0x208/0x2d0 __x64_sys_openat+0x59/0x70 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche Cc: Maurizio Lombardi Cc: Johannes Thumshirn Cc: Alan Stern Cc: Tested-by: Johannes Thumshirn Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sr.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 01699845c42c..cc484cb287d2 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -520,18 +520,26 @@ static int sr_init_command(struct scsi_cmnd *SCpnt) static int sr_block_open(struct block_device *bdev, fmode_t mode) { struct scsi_cd *cd; + struct scsi_device *sdev; int ret = -ENXIO; + cd = scsi_cd_get(bdev->bd_disk); + if (!cd) + goto out; + + sdev = cd->device; + scsi_autopm_get_device(sdev); check_disk_change(bdev); mutex_lock(&sr_mutex); - cd = scsi_cd_get(bdev->bd_disk); - if (cd) { - ret = cdrom_open(&cd->cdi, bdev, mode); - if (ret) - scsi_cd_put(cd); - } + ret = cdrom_open(&cd->cdi, bdev, mode); mutex_unlock(&sr_mutex); + + scsi_autopm_put_device(sdev); + if (ret) + scsi_cd_put(cd); + +out: return ret; } @@ -559,6 +567,8 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, if (ret) goto out; + scsi_autopm_get_device(sdev); + /* * Send SCSI addressing ioctls directly to mid level, send other * ioctls to cdrom/block level. @@ -567,15 +577,18 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, case SCSI_IOCTL_GET_IDLUN: case SCSI_IOCTL_GET_BUS_NUMBER: ret = scsi_ioctl(sdev, cmd, argp); - goto out; + goto put; } ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, arg); if (ret != -ENOSYS) - goto out; + goto put; ret = scsi_ioctl(sdev, cmd, argp); +put: + scsi_autopm_put_device(sdev); + out: mutex_unlock(&sr_mutex); return ret; From 6bb53ee170c45f44ac80ad8318f72feff9cdee1b Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 12 Aug 2018 12:19:42 -0700 Subject: [PATCH 518/970] init: rename and re-order boot_cpu_state_init() commit b5b1404d0815894de0690de8a1ab58269e56eae6 upstream. This is purely a preparatory patch for upcoming changes during the 4.19 merge window. We have a function called "boot_cpu_state_init()" that isn't really about the bootup cpu state: that is done much earlier by the similarly named "boot_cpu_init()" (note lack of "state" in name). This function initializes some hotplug CPU state, and needs to run after the percpu data has been properly initialized. It even has a comment to that effect. Except it _doesn't_ actually run after the percpu data has been properly initialized. On x86 it happens to do that, but on at least arm and arm64, the percpu base pointers are initialized by the arch-specific 'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init(). This had some unexpected results, and in particular we have a patch pending for the merge window that did the obvious cleanup of using 'this_cpu_write()' in the cpu hotplug init code: - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); which is obviously the right thing to do. Except because of the ordering issue, it actually failed miserably and unexpectedly on arm64. So this just fixes the ordering, and changes the name of the function to be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu hotplug state, because the core CPU state was supposed to have already been done earlier. Marked for stable, since the (not yet merged) patch that will show this problem is marked for stable. Reported-by: Vlastimil Babka Reported-by: Mian Yousaf Kaukab Suggested-by: Catalin Marinas Acked-by: Thomas Gleixner Cc: Will Deacon Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 2 +- init/main.c | 2 +- kernel/cpu.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 917829b27350..f1da93d0c34f 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -29,7 +29,7 @@ struct cpu { }; extern void boot_cpu_init(void); -extern void boot_cpu_state_init(void); +extern void boot_cpu_hotplug_init(void); extern int register_cpu(struct cpu *cpu, int num); extern struct device *get_cpu_device(unsigned cpu); diff --git a/init/main.c b/init/main.c index f22957afb37e..4313772d634a 100644 --- a/init/main.c +++ b/init/main.c @@ -509,8 +509,8 @@ asmlinkage __visible void __init start_kernel(void) setup_command_line(command_line); setup_nr_cpu_ids(); setup_per_cpu_areas(); - boot_cpu_state_init(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ + boot_cpu_hotplug_init(); build_all_zonelists(NULL, NULL); page_alloc_init(); diff --git a/kernel/cpu.c b/kernel/cpu.c index 967163fb90a8..e93635e2376f 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1944,7 +1944,7 @@ void __init boot_cpu_init(void) /* * Must be called _AFTER_ setting up the per_cpu areas */ -void __init boot_cpu_state_init(void) +void __init boot_cpu_hotplug_init(void) { per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; } From cfac7df7dc10a1187176c19c4ba950b365d388b7 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 6 Aug 2018 09:03:58 -0400 Subject: [PATCH 519/970] root dentries need RCU-delayed freeing commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream. Since mountpoint crossing can happen without leaving lazy mode, root dentries do need the same protection against having their memory freed without RCU delay as everything else in the tree. It's partially hidden by RCU delay between detaching from the mount tree and dropping the vfsmount reference, but the starting point of pathwalk can be on an already detached mount, in which case umount-caused RCU delay has already passed by the time the lazy pathwalk grabs rcu_read_lock(). If the starting point happens to be at the root of that vfsmount *and* that vfsmount covers the entire filesystem, we get trouble. Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/dcache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 7a5e6f9717f5..1abe88cd0d05 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1914,10 +1914,12 @@ struct dentry *d_make_root(struct inode *root_inode) if (root_inode) { res = __d_alloc(root_inode->i_sb, NULL); - if (res) + if (res) { + res->d_flags |= DCACHE_RCUACCESS; d_instantiate(res, root_inode); - else + } else { iput(root_inode); + } } return res; } From 59199c04b746b87db92843f28364547cb7ca1764 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 9 Aug 2018 10:15:54 -0400 Subject: [PATCH 520/970] make sure that __dentry_kill() always invalidates d_seq, unhashed or not commit 4c0d7cd5c8416b1ef41534d19163cb07ffaa03ab upstream. RCU pathwalk relies upon the assumption that anything that changes ->d_inode of a dentry will invalidate its ->d_seq. That's almost true - the one exception is that the final dput() of already unhashed dentry does *not* touch ->d_seq at all. Unhashing does, though, so for anything we'd found by RCU dcache lookup we are fine. Unfortunately, we can *start* with an unhashed dentry or jump into it. We could try and be careful in the (few) places where that could happen. Or we could just make the final dput() invalidate the damn thing, unhashed or not. The latter is much simpler and easier to backport, so let's do it that way. Reported-by: "Dae R. Jeong" Cc: stable@vger.kernel.org Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/dcache.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 1abe88cd0d05..461ff8f234e3 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -352,14 +352,11 @@ static void dentry_unlink_inode(struct dentry * dentry) __releases(dentry->d_inode->i_lock) { struct inode *inode = dentry->d_inode; - bool hashed = !d_unhashed(dentry); - if (hashed) - raw_write_seqcount_begin(&dentry->d_seq); + raw_write_seqcount_begin(&dentry->d_seq); __d_clear_type_and_inode(dentry); hlist_del_init(&dentry->d_u.d_alias); - if (hashed) - raw_write_seqcount_end(&dentry->d_seq); + raw_write_seqcount_end(&dentry->d_seq); spin_unlock(&dentry->d_lock); spin_unlock(&inode->i_lock); if (!inode->i_nlink) From 87a2d84d2ff4aea2f9bc8c5801f5044024fac1c4 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 9 Aug 2018 17:21:17 -0400 Subject: [PATCH 521/970] fix mntput/mntput race commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream. mntput_no_expire() does the calculation of total refcount under mount_lock; unfortunately, the decrement (as well as all increments) are done outside of it, leading to false positives in the "are we dropping the last reference" test. Consider the following situation: * mnt is a lazy-umounted mount, kept alive by two opened files. One of those files gets closed. Total refcount of mnt is 2. On CPU 42 mntput(mnt) (called from __fput()) drops one reference, decrementing component * After it has looked at component #0, the process on CPU 0 does mntget(), incrementing component #0, gets preempted and gets to run again - on CPU 69. There it does mntput(), which drops the reference (component #69) and proceeds to spin on mount_lock. * On CPU 42 our first mntput() finishes counting. It observes the decrement of component #69, but not the increment of component #0. As the result, the total it gets is not 1 as it should've been - it's 0. At which point we decide that vfsmount needs to be killed and proceed to free it and shut the filesystem down. However, there's still another opened file on that filesystem, with reference to (now freed) vfsmount, etc. and we are screwed. It's not a wide race, but it can be reproduced with artificial slowdown of the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups. Fix consists of moving the refcount decrement under mount_lock; the tricky part is that we want (and can) keep the fast case (i.e. mount that still has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu() before that mntput(). IOW, if mntput() observes (under rcu_read_lock()) a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to be dropped. Reported-by: Jann Horn Tested-by: Jann Horn Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/namespace.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 6c873b330a93..e4e1c7f6b789 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1139,12 +1139,22 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, delayed_mntput); static void mntput_no_expire(struct mount *mnt) { rcu_read_lock(); - mnt_add_count(mnt, -1); - if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */ + if (likely(READ_ONCE(mnt->mnt_ns))) { + /* + * Since we don't do lock_mount_hash() here, + * ->mnt_ns can change under us. However, if it's + * non-NULL, then there's a reference that won't + * be dropped until after an RCU delay done after + * turning ->mnt_ns NULL. So if we observe it + * non-NULL under rcu_read_lock(), the reference + * we are dropping is not the final one. + */ + mnt_add_count(mnt, -1); rcu_read_unlock(); return; } lock_mount_hash(); + mnt_add_count(mnt, -1); if (mnt_get_count(mnt)) { rcu_read_unlock(); unlock_mount_hash(); From e31578c6fb0b89ceb8ef943528279571dfc0f8dc Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 9 Aug 2018 17:51:32 -0400 Subject: [PATCH 522/970] fix __legitimize_mnt()/mntput() race commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream. __legitimize_mnt() has two problems - one is that in case of success the check of mount_lock is not ordered wrt preceding increment of refcount, making it possible to have successful __legitimize_mnt() on one CPU just before the otherwise final mntpu() on another, with __legitimize_mnt() not seeing mntput() taking the lock and mntput() not seeing the increment done by __legitimize_mnt(). Solved by a pair of barriers. Another is that failure of __legitimize_mnt() on the second read_seqretry() leaves us with reference that'll need to be dropped by caller; however, if that races with final mntput() we can end up with caller dropping rcu_read_lock() and doing mntput() to release that reference - with the first mntput() having freed the damn thing just as rcu_read_lock() had been dropped. Solution: in "do mntput() yourself" failure case grab mount_lock, check if MNT_DOOMED has been set by racing final mntput() that has missed our increment and if it has - undo the increment and treat that as "failure, caller doesn't need to drop anything" case. It's not easy to hit - the final mntput() has to come right after the first read_seqretry() in __legitimize_mnt() *and* manage to miss the increment done by __legitimize_mnt() before the second read_seqretry() in there. The things that are almost impossible to hit on bare hardware are not impossible on SMP KVM, though... Reported-by: Oleg Nesterov Fixes: 48a066e72d97 ("RCU'd vsfmounts") Cc: stable@vger.kernel.org Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- fs/namespace.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index e4e1c7f6b789..0a9e766b4087 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -603,12 +603,21 @@ int __legitimize_mnt(struct vfsmount *bastard, unsigned seq) return 0; mnt = real_mount(bastard); mnt_add_count(mnt, 1); + smp_mb(); // see mntput_no_expire() if (likely(!read_seqretry(&mount_lock, seq))) return 0; if (bastard->mnt_flags & MNT_SYNC_UMOUNT) { mnt_add_count(mnt, -1); return 1; } + lock_mount_hash(); + if (unlikely(bastard->mnt_flags & MNT_DOOMED)) { + mnt_add_count(mnt, -1); + unlock_mount_hash(); + return 1; + } + unlock_mount_hash(); + /* caller will mntput() */ return -1; } @@ -1154,6 +1163,11 @@ static void mntput_no_expire(struct mount *mnt) return; } lock_mount_hash(); + /* + * make sure that if __legitimize_mnt() has not seen us grab + * mount_lock, we'll see their refcount increment here. + */ + smp_mb(); mnt_add_count(mnt, -1); if (mnt_get_count(mnt)) { rcu_read_unlock(); From b96e215e539509cae8bfe468689b70661cf511b4 Mon Sep 17 00:00:00 2001 From: Konstantin Khlebnikov Date: Fri, 10 Feb 2017 10:35:02 +0300 Subject: [PATCH 523/970] proc/sysctl: prune stale dentries during unregistering commit d6cffbbe9a7e51eb705182965a189457c17ba8a3 upstream. Currently unregistering sysctl table does not prune its dentries. Stale dentries could slowdown sysctl operations significantly. For example, command: # for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done creates a millions of stale denties around sysctls of loopback interface: # sysctl fs.dentry-state fs.dentry-state = 25812579 24724135 45 0 0 0 All of them have matching names thus lookup have to scan though whole hash chain and call d_compare (proc_sys_compare) which checks them under system-wide spinlock (sysctl_lock). # time sysctl -a > /dev/null real 1m12.806s user 0m0.016s sys 1m12.400s Currently only memory reclaimer could remove this garbage. But without significant memory pressure this never happens. This patch collects sysctl inodes into list on sysctl table header and prunes all their dentries once that table unregisters. Konstantin Khlebnikov writes: > On 10.02.2017 10:47, Al Viro wrote: >> how about >> the matching stats *after* that patch? > > dcache size doesn't grow endlessly, so stats are fine > > # sysctl fs.dentry-state > fs.dentry-state = 92712 58376 45 0 0 0 > > # time sysctl -a &>/dev/null > > real 0m0.013s > user 0m0.004s > sys 0m0.008s Signed-off-by: Konstantin Khlebnikov Suggested-by: Al Viro Signed-off-by: Eric W. Biederman Signed-off-by: Greg Kroah-Hartman --- fs/proc/inode.c | 3 ++- fs/proc/internal.h | 7 +++-- fs/proc/proc_sysctl.c | 59 ++++++++++++++++++++++++++++++------------ include/linux/sysctl.h | 1 + 4 files changed, 51 insertions(+), 19 deletions(-) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index e69ebe648a34..c2afe39f0b9e 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -43,10 +43,11 @@ static void proc_evict_inode(struct inode *inode) de = PDE(inode); if (de) pde_put(de); + head = PROC_I(inode)->sysctl; if (head) { RCU_INIT_POINTER(PROC_I(inode)->sysctl, NULL); - sysctl_head_put(head); + proc_sys_evict_inode(inode, head); } } diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 5378441ec1b7..7aad3f61b5b0 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -65,6 +65,7 @@ struct proc_inode { struct proc_dir_entry *pde; struct ctl_table_header *sysctl; struct ctl_table *sysctl_entry; + struct list_head sysctl_inodes; const struct proc_ns_operations *ns_ops; struct inode vfs_inode; }; @@ -249,10 +250,12 @@ extern void proc_thread_self_init(void); */ #ifdef CONFIG_PROC_SYSCTL extern int proc_sys_init(void); -extern void sysctl_head_put(struct ctl_table_header *); +extern void proc_sys_evict_inode(struct inode *inode, + struct ctl_table_header *head); #else static inline void proc_sys_init(void) { } -static inline void sysctl_head_put(struct ctl_table_header *head) { } +static inline void proc_sys_evict_inode(struct inode *inode, + struct ctl_table_header *head) { } #endif /* diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 847f23420b40..1f91765d0a28 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -190,6 +190,7 @@ static void init_header(struct ctl_table_header *head, head->set = set; head->parent = NULL; head->node = node; + INIT_LIST_HEAD(&head->inodes); if (node) { struct ctl_table *entry; for (entry = table; entry->procname; entry++, node++) @@ -259,6 +260,29 @@ static void unuse_table(struct ctl_table_header *p) complete(p->unregistering); } +/* called under sysctl_lock */ +static void proc_sys_prune_dcache(struct ctl_table_header *head) +{ + struct inode *inode, *prev = NULL; + struct proc_inode *ei; + + list_for_each_entry(ei, &head->inodes, sysctl_inodes) { + inode = igrab(&ei->vfs_inode); + if (inode) { + spin_unlock(&sysctl_lock); + iput(prev); + prev = inode; + d_prune_aliases(inode); + spin_lock(&sysctl_lock); + } + } + if (prev) { + spin_unlock(&sysctl_lock); + iput(prev); + spin_lock(&sysctl_lock); + } +} + /* called under sysctl_lock, will reacquire if has to wait */ static void start_unregistering(struct ctl_table_header *p) { @@ -277,6 +301,11 @@ static void start_unregistering(struct ctl_table_header *p) /* anything non-NULL; we'll never dereference it */ p->unregistering = ERR_PTR(-EINVAL); } + /* + * Prune dentries for unregistered sysctls: namespaced sysctls + * can have duplicate names and contaminate dcache very badly. + */ + proc_sys_prune_dcache(p); /* * do not remove from the list until nobody holds it; walking the * list in do_sysctl() relies on that. @@ -284,21 +313,6 @@ static void start_unregistering(struct ctl_table_header *p) erase_header(p); } -static void sysctl_head_get(struct ctl_table_header *head) -{ - spin_lock(&sysctl_lock); - head->count++; - spin_unlock(&sysctl_lock); -} - -void sysctl_head_put(struct ctl_table_header *head) -{ - spin_lock(&sysctl_lock); - if (!--head->count) - kfree_rcu(head, rcu); - spin_unlock(&sysctl_lock); -} - static struct ctl_table_header *sysctl_head_grab(struct ctl_table_header *head) { BUG_ON(!head); @@ -440,11 +454,15 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, inode->i_ino = get_next_ino(); - sysctl_head_get(head); ei = PROC_I(inode); ei->sysctl = head; ei->sysctl_entry = table; + spin_lock(&sysctl_lock); + list_add(&ei->sysctl_inodes, &head->inodes); + head->count++; + spin_unlock(&sysctl_lock); + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); inode->i_mode = table->mode; if (!S_ISDIR(table->mode)) { @@ -466,6 +484,15 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, return inode; } +void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head) +{ + spin_lock(&sysctl_lock); + list_del(&PROC_I(inode)->sysctl_inodes); + if (!--head->count) + kfree_rcu(head, rcu); + spin_unlock(&sysctl_lock); +} + static struct ctl_table_header *grab_header(struct inode *inode) { struct ctl_table_header *head = PROC_I(inode)->sysctl; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index adf4e51cf597..b7e82049fec7 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -143,6 +143,7 @@ struct ctl_table_header struct ctl_table_set *set; struct ctl_dir *parent; struct ctl_node *node; + struct list_head inodes; /* head for proc_inode->sysctl_inodes */ }; struct ctl_dir { From 631f93a6fe847d2d317010d5bbd7cb3bcc284336 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Mon, 20 Feb 2017 18:17:03 +1300 Subject: [PATCH 524/970] proc/sysctl: Don't grab i_lock under sysctl_lock. commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb upstream. Konstantin Khlebnikov writes: > This patch has locking problem. I've got lockdep splat under LTP. > > [ 6633.115456] ====================================================== > [ 6633.115502] [ INFO: possible circular locking dependency detected ] > [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L > [ 6633.115584] ------------------------------------------------------- > [ 6633.115627] ksm02/284980 is trying to acquire lock: > [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [] igrab+0x1e/0x80 > [ 6633.115834] but task is already holding lock: > [ 6633.115882] (sysctl_lock){+.+...}, at: [] unregister_sysctl_table+0x6b/0x110 > [ 6633.116026] which lock already depends on the new lock. > [ 6633.116026] > [ 6633.116080] > [ 6633.116080] the existing dependency chain (in reverse order) is: > [ 6633.116117] > -> #2 (sysctl_lock){+.+...}: > -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}: > -> #0 (&sb->s_type->i_lock_key#4){+.+...}: > > d_lock nests inside i_lock > sysctl_lock nests inside d_lock in d_compare > > This patch adds i_lock nesting inside sysctl_lock. Al Viro replied: > Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd > try something like this - use rcu_read_lock() in proc_sys_prune_dcache(), > drop sysctl_lock() before it and regain after. Make sure that no inodes > are added to the list ones ->unregistering has been set and use RCU list > primitives for modifying the inode list, with sysctl_lock still used to > serialize its modifications. > > Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing > igrab() is safe there. Since we don't drop inode reference until after we'd > passed beyond it in the list, list_for_each_entry_rcu() should be fine. I agree with Al Viro's analsysis of the situtation. Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering") Reported-by: Konstantin Khlebnikov Tested-by: Konstantin Khlebnikov Suggested-by: Al Viro Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman --- fs/proc/proc_sysctl.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 1f91765d0a28..3606e35c98b7 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -266,21 +266,19 @@ static void proc_sys_prune_dcache(struct ctl_table_header *head) struct inode *inode, *prev = NULL; struct proc_inode *ei; - list_for_each_entry(ei, &head->inodes, sysctl_inodes) { + rcu_read_lock(); + list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) { inode = igrab(&ei->vfs_inode); if (inode) { - spin_unlock(&sysctl_lock); + rcu_read_unlock(); iput(prev); prev = inode; d_prune_aliases(inode); - spin_lock(&sysctl_lock); + rcu_read_lock(); } } - if (prev) { - spin_unlock(&sysctl_lock); - iput(prev); - spin_lock(&sysctl_lock); - } + rcu_read_unlock(); + iput(prev); } /* called under sysctl_lock, will reacquire if has to wait */ @@ -296,10 +294,10 @@ static void start_unregistering(struct ctl_table_header *p) p->unregistering = &wait; spin_unlock(&sysctl_lock); wait_for_completion(&wait); - spin_lock(&sysctl_lock); } else { /* anything non-NULL; we'll never dereference it */ p->unregistering = ERR_PTR(-EINVAL); + spin_unlock(&sysctl_lock); } /* * Prune dentries for unregistered sysctls: namespaced sysctls @@ -310,6 +308,7 @@ static void start_unregistering(struct ctl_table_header *p) * do not remove from the list until nobody holds it; walking the * list in do_sysctl() relies on that. */ + spin_lock(&sysctl_lock); erase_header(p); } @@ -455,11 +454,17 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, inode->i_ino = get_next_ino(); ei = PROC_I(inode); - ei->sysctl = head; - ei->sysctl_entry = table; spin_lock(&sysctl_lock); - list_add(&ei->sysctl_inodes, &head->inodes); + if (unlikely(head->unregistering)) { + spin_unlock(&sysctl_lock); + iput(inode); + inode = NULL; + goto out; + } + ei->sysctl = head; + ei->sysctl_entry = table; + list_add_rcu(&ei->sysctl_inodes, &head->inodes); head->count++; spin_unlock(&sysctl_lock); @@ -487,7 +492,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head) { spin_lock(&sysctl_lock); - list_del(&PROC_I(inode)->sysctl_inodes); + list_del_rcu(&PROC_I(inode)->sysctl_inodes); if (!--head->count) kfree_rcu(head, rcu); spin_unlock(&sysctl_lock); From a3a7b992b240ba621a47ff2d3465fa4f0534e297 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 6 Jul 2017 08:41:06 -0500 Subject: [PATCH 525/970] proc: Fix proc_sys_prune_dcache to hold a sb reference commit 2fd1d2c4ceb2248a727696962cf3370dc9f5a0a4 upstream. Andrei Vagin writes: FYI: This bug has been reproduced on 4.11.7 > BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo} still in use (1) [unmount of proc proc] > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80 > CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1 > Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015 > Workqueue: events proc_cleanup_work > Call Trace: > dump_stack+0x63/0x86 > __warn+0xcb/0xf0 > warn_slowpath_null+0x1d/0x20 > umount_check+0x6e/0x80 > d_walk+0xc6/0x270 > ? dentry_free+0x80/0x80 > do_one_tree+0x26/0x40 > shrink_dcache_for_umount+0x2d/0x90 > generic_shutdown_super+0x1f/0xf0 > kill_anon_super+0x12/0x20 > proc_kill_sb+0x40/0x50 > deactivate_locked_super+0x43/0x70 > deactivate_super+0x5a/0x60 > cleanup_mnt+0x3f/0x90 > mntput_no_expire+0x13b/0x190 > kern_unmount+0x3e/0x50 > pid_ns_release_proc+0x15/0x20 > proc_cleanup_work+0x15/0x20 > process_one_work+0x197/0x450 > worker_thread+0x4e/0x4a0 > kthread+0x109/0x140 > ? process_one_work+0x450/0x450 > ? kthread_park+0x90/0x90 > ret_from_fork+0x2c/0x40 > ---[ end trace e1c109611e5d0b41 ]--- > VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds. Have a nice day... > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: _raw_spin_lock+0xc/0x30 > PGD 0 Fix this by taking a reference to the super block in proc_sys_prune_dcache. The superblock reference is the core of the fix however the sysctl_inodes list is converted to a hlist so that hlist_del_init_rcu may be used. This allows proc_sys_prune_dache to remove inodes the sysctl_inodes list, while not causing problems for proc_sys_evict_inode when if it later choses to remove the inode from the sysctl_inodes list. Removing inodes from the sysctl_inodes list allows proc_sys_prune_dcache to have a progress guarantee, while still being able to drop all locks. The fact that head->unregistering is set in start_unregistering ensures that no more inodes will be added to the the sysctl_inodes list. Previously the code did a dance where it delayed calling iput until the next entry in the list was being considered to ensure the inode remained on the sysctl_inodes list until the next entry was walked to. The structure of the loop in this patch does not need that so is much easier to understand and maintain. Cc: stable@vger.kernel.org Reported-by: Andrei Vagin Tested-by: Andrei Vagin Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.") Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering") Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman --- fs/proc/internal.h | 2 +- fs/proc/proc_sysctl.c | 43 +++++++++++++++++++++++++++++------------- include/linux/sysctl.h | 2 +- 3 files changed, 32 insertions(+), 15 deletions(-) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 7aad3f61b5b0..c0bdeceaaeb6 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -65,7 +65,7 @@ struct proc_inode { struct proc_dir_entry *pde; struct ctl_table_header *sysctl; struct ctl_table *sysctl_entry; - struct list_head sysctl_inodes; + struct hlist_node sysctl_inodes; const struct proc_ns_operations *ns_ops; struct inode vfs_inode; }; diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 3606e35c98b7..46cd2e1b055b 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -190,7 +190,7 @@ static void init_header(struct ctl_table_header *head, head->set = set; head->parent = NULL; head->node = node; - INIT_LIST_HEAD(&head->inodes); + INIT_HLIST_HEAD(&head->inodes); if (node) { struct ctl_table *entry; for (entry = table; entry->procname; entry++, node++) @@ -260,25 +260,42 @@ static void unuse_table(struct ctl_table_header *p) complete(p->unregistering); } -/* called under sysctl_lock */ static void proc_sys_prune_dcache(struct ctl_table_header *head) { - struct inode *inode, *prev = NULL; + struct inode *inode; struct proc_inode *ei; + struct hlist_node *node; + struct super_block *sb; rcu_read_lock(); - list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) { - inode = igrab(&ei->vfs_inode); - if (inode) { - rcu_read_unlock(); - iput(prev); - prev = inode; - d_prune_aliases(inode); + for (;;) { + node = hlist_first_rcu(&head->inodes); + if (!node) + break; + ei = hlist_entry(node, struct proc_inode, sysctl_inodes); + spin_lock(&sysctl_lock); + hlist_del_init_rcu(&ei->sysctl_inodes); + spin_unlock(&sysctl_lock); + + inode = &ei->vfs_inode; + sb = inode->i_sb; + if (!atomic_inc_not_zero(&sb->s_active)) + continue; + inode = igrab(inode); + rcu_read_unlock(); + if (unlikely(!inode)) { + deactivate_super(sb); rcu_read_lock(); + continue; } + + d_prune_aliases(inode); + iput(inode); + deactivate_super(sb); + + rcu_read_lock(); } rcu_read_unlock(); - iput(prev); } /* called under sysctl_lock, will reacquire if has to wait */ @@ -464,7 +481,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, } ei->sysctl = head; ei->sysctl_entry = table; - list_add_rcu(&ei->sysctl_inodes, &head->inodes); + hlist_add_head_rcu(&ei->sysctl_inodes, &head->inodes); head->count++; spin_unlock(&sysctl_lock); @@ -492,7 +509,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head) { spin_lock(&sysctl_lock); - list_del_rcu(&PROC_I(inode)->sysctl_inodes); + hlist_del_init_rcu(&PROC_I(inode)->sysctl_inodes); if (!--head->count) kfree_rcu(head, rcu); spin_unlock(&sysctl_lock); diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index b7e82049fec7..0e5cc33b9b25 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -143,7 +143,7 @@ struct ctl_table_header struct ctl_table_set *set; struct ctl_dir *parent; struct ctl_node *node; - struct list_head inodes; /* head for proc_inode->sysctl_inodes */ + struct hlist_head inodes; /* head for proc_inode->sysctl_inodes */ }; struct ctl_dir { From 11410f99982cbc7ee71c58437d1f0cccd7bb5e96 Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Wed, 23 May 2018 15:30:30 +0300 Subject: [PATCH 526/970] IB/core: Make testing MR flags for writability a static inline function commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream. Make the MR writability flags check, which is performed in umem.c, a static inline function in file ib_verbs.h This allows the function to be used by low-level infiniband drivers. Cc: Signed-off-by: Jason Gunthorpe Signed-off-by: Jack Morgenstein Signed-off-by: Leon Romanovsky Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/umem.c | 11 +---------- include/rdma/ib_verbs.h | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index e74aa1d60fdb..99cebf3a9163 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -122,16 +122,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->address = addr; umem->page_size = PAGE_SIZE; umem->pid = get_task_pid(current, PIDTYPE_PID); - /* - * We ask for writable memory if any of the following - * access flags are set. "Local write" and "remote write" - * obviously require write access. "Remote atomic" can do - * things like fetch and add, which will modify memory, and - * "MW bind" can change permissions by binding a window. - */ - umem->writable = !!(access & - (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE | - IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND)); + umem->writable = ib_access_writable(access); if (access & IB_ACCESS_ON_DEMAND) { put_pid(umem->pid); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5ad43a487745..a42535f252b5 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -3308,6 +3308,20 @@ static inline int ib_check_mr_access(int flags) return 0; } +static inline bool ib_access_writable(int access_flags) +{ + /* + * We have writable memory backing the MR if any of the following + * access flags are set. "Local write" and "remote write" obviously + * require write access. "Remote atomic" can do things like fetch and + * add, which will modify memory, and "MW bind" can change permissions + * by binding a window. + */ + return access_flags & + (IB_ACCESS_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_ATOMIC | IB_ACCESS_MW_BIND); +} + /** * ib_check_mr_status: lightweight check of MR status. * This routine may provide status checks on a selected From e2ba7bf19727b67f0f3850c84829a19898401928 Mon Sep 17 00:00:00 2001 From: Jack Morgenstein Date: Wed, 23 May 2018 15:30:31 +0300 Subject: [PATCH 527/970] IB/mlx4: Mark user MR as writable if actual virtual memory is writable commit d8f9cc328c8888369880e2527e9186d745f2bbf6 upstream. To allow rereg_user_mr to modify the MR from read-only to writable without using get_user_pages again, we needed to define the initial MR as writable. However, this was originally done unconditionally, without taking into account the writability of the underlying virtual memory. As a result, any attempt to register a read-only MR over read-only virtual memory failed. To fix this, do not add the writable flag bit when the user virtual memory is not writable (e.g. const memory). However, when the underlying memory is NOT writable (and we therefore do not define the initial MR as writable), the IB core adds a "force writable" flag to its user-pages request. If this succeeds, the reg_user_mr caller gets a writable copy of the original pages. If the user-space caller then does a rereg_user_mr operation to enable writability, this will succeed. This should not be allowed, since the original virtual memory was not writable. Cc: Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration") Signed-off-by: Jason Gunthorpe Signed-off-by: Jack Morgenstein Signed-off-by: Leon Romanovsky Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx4/mr.c | 50 +++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index ae41623e0f13..0d4878efd643 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -131,6 +131,40 @@ int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt, return err; } +static struct ib_umem *mlx4_get_umem_mr(struct ib_ucontext *context, u64 start, + u64 length, u64 virt_addr, + int access_flags) +{ + /* + * Force registering the memory as writable if the underlying pages + * are writable. This is so rereg can change the access permissions + * from readable to writable without having to run through ib_umem_get + * again + */ + if (!ib_access_writable(access_flags)) { + struct vm_area_struct *vma; + + down_read(¤t->mm->mmap_sem); + /* + * FIXME: Ideally this would iterate over all the vmas that + * cover the memory, but for now it requires a single vma to + * entirely cover the MR to support RO mappings. + */ + vma = find_vma(current->mm, start); + if (vma && vma->vm_end >= start + length && + vma->vm_start <= start) { + if (vma->vm_flags & VM_WRITE) + access_flags |= IB_ACCESS_LOCAL_WRITE; + } else { + access_flags |= IB_ACCESS_LOCAL_WRITE; + } + + up_read(¤t->mm->mmap_sem); + } + + return ib_umem_get(context, start, length, access_flags, 0); +} + struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata) @@ -145,10 +179,8 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, if (!mr) return ERR_PTR(-ENOMEM); - /* Force registering the memory as writable. */ - /* Used for memory re-registeration. HCA protects the access */ - mr->umem = ib_umem_get(pd->uobject->context, start, length, - access_flags | IB_ACCESS_LOCAL_WRITE, 0); + mr->umem = mlx4_get_umem_mr(pd->uobject->context, start, length, + virt_addr, access_flags); if (IS_ERR(mr->umem)) { err = PTR_ERR(mr->umem); goto err_free; @@ -215,6 +247,9 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags, } if (flags & IB_MR_REREG_ACCESS) { + if (ib_access_writable(mr_access_flags) && !mmr->umem->writable) + return -EPERM; + err = mlx4_mr_hw_change_access(dev->dev, *pmpt_entry, convert_access(mr_access_flags)); @@ -228,10 +263,9 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags, mlx4_mr_rereg_mem_cleanup(dev->dev, &mmr->mmr); ib_umem_release(mmr->umem); - mmr->umem = ib_umem_get(mr->uobject->context, start, length, - mr_access_flags | - IB_ACCESS_LOCAL_WRITE, - 0); + mmr->umem = + mlx4_get_umem_mr(mr->uobject->context, start, length, + virt_addr, mr_access_flags); if (IS_ERR(mmr->umem)) { err = PTR_ERR(mmr->umem); /* Prevent mlx4_ib_dereg_mr from free'ing invalid pointer */ From 5ee45fc998a3e45af45f8886fb13cc417c6f18d0 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Fri, 5 Jan 2018 18:02:55 -0200 Subject: [PATCH 528/970] mtd: nand: qcom: Add a NULL check for devm_kasprintf() commit 069f05346d01e7298939f16533953cdf52370be3 upstream. devm_kasprintf() may fail, so we should better add a NULL check and propagate an error on failure. Signed-off-by: Fabio Estevam Signed-off-by: Boris Brezillon Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/nand/qcom_nandc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c index 6f0fd1512ad2..dc4943134649 100644 --- a/drivers/mtd/nand/qcom_nandc.c +++ b/drivers/mtd/nand/qcom_nandc.c @@ -2008,6 +2008,9 @@ static int qcom_nand_host_init(struct qcom_nand_controller *nandc, nand_set_flash_node(chip, dn); mtd->name = devm_kasprintf(dev, GFP_KERNEL, "qcom_nand.%d", host->cs); + if (!mtd->name) + return -ENOMEM; + mtd->owner = THIS_MODULE; mtd->dev.parent = dev; From 27250cf83def3eeff4e438571a0aa9c18deb898f Mon Sep 17 00:00:00 2001 From: Michael Mera Date: Mon, 1 May 2017 15:41:16 +0900 Subject: [PATCH 529/970] IB/ocrdma: fix out of bounds access to local buffer commit 062d0f22a30c39840ea49b72cfcfc1aa4cc538fa upstream. In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is written at index 'count-1' where 'count' is the size of the write, so potentially 0. This patch filters odd values for the write size/position to avoid this type of problem. Signed-off-by: Michael Mera Reviewed-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/ocrdma/ocrdma_stats.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_stats.c b/drivers/infiniband/hw/ocrdma/ocrdma_stats.c index 265943069b35..84349d976162 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_stats.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_stats.c @@ -645,7 +645,7 @@ static ssize_t ocrdma_dbgfs_ops_write(struct file *filp, struct ocrdma_stats *pstats = filp->private_data; struct ocrdma_dev *dev = pstats->dev; - if (count > 32) + if (*ppos != 0 || count == 0 || count > sizeof(tmp_str)) goto err; if (copy_from_user(tmp_str, buffer, count)) From 16aeb3f175a1b4d68dd68418230a1644de95fb6b Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Fri, 15 Jun 2018 09:41:29 +0200 Subject: [PATCH 530/970] ARM: dts: imx6sx: fix irq for pcie bridge commit 1bcfe0564044be578841744faea1c2f46adc8178 upstream. Use the correct IRQ line for the MSI controller in the PCIe host controller. Apparently a different IRQ line is used compared to other i.MX6 variants. Without this change MSI IRQs aren't properly propagated to the upstream interrupt controller. Signed-off-by: Oleksij Rempel Reviewed-by: Lucas Stach Fixes: b1d17f68e5c5 ("ARM: dts: imx: add initial imx6sx device tree source") Signed-off-by: Shawn Guo Signed-off-by: Amit Pundir Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/imx6sx.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/imx6sx.dtsi b/arch/arm/boot/dts/imx6sx.dtsi index 1a473e83efbf..a885052157f0 100644 --- a/arch/arm/boot/dts/imx6sx.dtsi +++ b/arch/arm/boot/dts/imx6sx.dtsi @@ -1280,7 +1280,7 @@ /* non-prefetchable memory */ 0x82000000 0 0x08000000 0x08000000 0 0x00f00000>; num-lanes = <1>; - interrupts = ; + interrupts = ; clocks = <&clks IMX6SX_CLK_PCIE_REF_125M>, <&clks IMX6SX_CLK_PCIE_AXI>, <&clks IMX6SX_CLK_LVDS1_OUT>, From 640fe070d801b91081b7c9e3575b1ed2f0018eeb Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 3 Aug 2018 16:41:39 +0200 Subject: [PATCH 531/970] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests commit 5800dc5c19f34e6e03b5adab1282535cb102fafd upstream. Nadav reported that on guests we're failing to rewrite the indirect calls to CALLEE_SAVE paravirt functions. In particular the pv_queued_spin_unlock() call is left unpatched and that is all over the place. This obviously wrecks Spectre-v2 mitigation (for paravirt guests) which relies on not actually having indirect calls around. The reason is an incorrect clobber test in paravirt_patch_call(); this function rewrites an indirect call with a direct call to the _SAME_ function, there is no possible way the clobbers can be different because of this. Therefore remove this clobber check. Also put WARNs on the other patch failure case (not enough room for the instruction) which I've not seen trigger in my (limited) testing. Three live kernel image disassemblies for lock_sock_nested (as a small function that illustrates the problem nicely). PRE is the current situation for guests, POST is with this patch applied and NATIVE is with or without the patch for !guests. PRE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 End of assembler dump. POST: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock> 0xffffffff817be9a5 <+53>: xchg %ax,%ax 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 End of assembler dump. NATIVE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0xffffffff817be970 <+0>: push %rbp 0xffffffff817be971 <+1>: mov %rdi,%rbp 0xffffffff817be974 <+4>: push %rbx 0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx 0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched> 0xffffffff817be981 <+17>: mov %rbx,%rdi 0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh> 0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax 0xffffffff817be98f <+31>: test %eax,%eax 0xffffffff817be991 <+33>: jne 0xffffffff817be9ba 0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp) 0xffffffff817be99d <+45>: mov %rbx,%rdi 0xffffffff817be9a0 <+48>: movb $0x0,(%rdi) 0xffffffff817be9a3 <+51>: nopl 0x0(%rax) 0xffffffff817be9a7 <+55>: pop %rbx 0xffffffff817be9a8 <+56>: pop %rbp 0xffffffff817be9a9 <+57>: mov $0x200,%esi 0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi 0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip> 0xffffffff817be9ba <+74>: mov %rbp,%rdi 0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock> 0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 End of assembler dump. Fixes: 63f70270ccd9 ("[PATCH] i386: PARAVIRT: add common patching machinery") Fixes: 3010a0663fd9 ("x86/paravirt, objtool: Annotate indirect calls") Reported-by: Nadav Amit Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Reviewed-by: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky Cc: David Woodhouse Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/paravirt.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index bbf3d5933eaa..29d465627919 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -88,10 +88,12 @@ unsigned paravirt_patch_call(void *insnbuf, struct branch *b = insnbuf; unsigned long delta = (unsigned long)target - (addr+5); - if (tgt_clobbers & ~site_clobbers) - return len; /* target would clobber too much for this site */ - if (len < 5) + if (len < 5) { +#ifdef CONFIG_RETPOLINE + WARN_ONCE("Failing to patch indirect CALL in %ps\n", (void *)addr); +#endif return len; /* call too long for patch site */ + } b->opcode = 0xe8; /* call */ b->delta = delta; @@ -106,8 +108,12 @@ unsigned paravirt_patch_jmp(void *insnbuf, const void *target, struct branch *b = insnbuf; unsigned long delta = (unsigned long)target - (addr+5); - if (len < 5) + if (len < 5) { +#ifdef CONFIG_RETPOLINE + WARN_ONCE("Failing to patch indirect JMP in %ps\n", (void *)addr); +#endif return len; /* call too long for patch site */ + } b->opcode = 0xe9; /* jmp */ b->delta = delta; From 6455f41db5206cf46b623be071a0aa308c183642 Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Thu, 26 Jul 2018 13:14:55 +0200 Subject: [PATCH 532/970] x86/speculation: Protect against userspace-userspace spectreRSB commit fdf82a7856b32d905c39afc85e34364491e46346 upstream. The article "Spectre Returns! Speculation Attacks using the Return Stack Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks, making use solely of the RSB contents even on CPUs that don't fallback to BTB on RSB underflow (Skylake+). Mitigate userspace-userspace attacks by always unconditionally filling RSB on context switch when the generic spectrev2 mitigation has been enabled. [1] https://arxiv.org/pdf/1807.07940.pdf Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Borislav Petkov Cc: David Woodhouse Cc: Peter Zijlstra Cc: Linus Torvalds Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 38 +++++++------------------------------- 1 file changed, 7 insertions(+), 31 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 86af9b1b049d..53ddf121118c 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -310,23 +310,6 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) return cmd; } -/* Check for Skylake-like CPUs (for RSB handling) */ -static bool __init is_skylake_era(void) -{ - if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && - boot_cpu_data.x86 == 6) { - switch (boot_cpu_data.x86_model) { - case INTEL_FAM6_SKYLAKE_MOBILE: - case INTEL_FAM6_SKYLAKE_DESKTOP: - case INTEL_FAM6_SKYLAKE_X: - case INTEL_FAM6_KABYLAKE_MOBILE: - case INTEL_FAM6_KABYLAKE_DESKTOP: - return true; - } - } - return false; -} - static void __init spectre_v2_select_mitigation(void) { enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); @@ -387,22 +370,15 @@ static void __init spectre_v2_select_mitigation(void) pr_info("%s\n", spectre_v2_strings[mode]); /* - * If neither SMEP nor PTI are available, there is a risk of - * hitting userspace addresses in the RSB after a context switch - * from a shallow call stack to a deeper one. To prevent this fill - * the entire RSB, even when using IBRS. + * If spectre v2 protection has been enabled, unconditionally fill + * RSB during a context switch; this protects against two independent + * issues: * - * Skylake era CPUs have a separate issue with *underflow* of the - * RSB, when they will predict 'ret' targets from the generic BTB. - * The proper mitigation for this is IBRS. If IBRS is not supported - * or deactivated in favour of retpolines the RSB fill on context - * switch is required. + * - RSB underflow (and switch to BTB) on Skylake+ + * - SpectreRSB variant of spectre v2 on X86_BUG_SPECTRE_V2 CPUs */ - if ((!boot_cpu_has(X86_FEATURE_KAISER) && - !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) { - setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Spectre v2 mitigation: Filling RSB on context switch\n"); - } + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); + pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n"); /* Initialize Indirect Branch Prediction Barrier if supported */ if (boot_cpu_has(X86_FEATURE_IBPB)) { From a92daabdfc87c320a626b2ad0318c2a0dee17a30 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sat, 28 Apr 2018 21:37:03 +0900 Subject: [PATCH 533/970] kprobes/x86: Fix %p uses in error messages commit 0ea063306eecf300fcf06d2f5917474b580f666f upstream. Remove all %p uses in error messages in kprobes/x86. Signed-off-by: Masami Hiramatsu Cc: Ananth N Mavinakayanahalli Cc: Anil S Keshavamurthy Cc: Arnd Bergmann Cc: David Howells Cc: David S . Miller Cc: Heiko Carstens Cc: Jon Medhurst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Thomas Richter Cc: Tobin C . Harding Cc: Will Deacon Cc: acme@kernel.org Cc: akpm@linux-foundation.org Cc: brueckner@linux.vnet.ibm.com Cc: linux-arch@vger.kernel.org Cc: rostedt@goodmis.org Cc: schwidefsky@de.ibm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/152491902310.9916.13355297638917767319.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/kprobes/core.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 516be613bd41..1fa7783fa51d 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -396,7 +396,6 @@ int __copy_instruction(u8 *dest, u8 *src) newdisp = (u8 *) src + (s64) insn.displacement.value - (u8 *) dest; if ((s64) (s32) newdisp != newdisp) { pr_err("Kprobes error: new displacement does not fit into s32 (%llx)\n", newdisp); - pr_err("\tSrc: %p, Dest: %p, old disp: %x\n", src, dest, insn.displacement.value); return 0; } disp = (u8 *) dest + insn_offset_displacement(&insn); @@ -612,8 +611,7 @@ static int reenter_kprobe(struct kprobe *p, struct pt_regs *regs, * Raise a BUG or we'll continue in an endless reentering loop * and eventually a stack overflow. */ - printk(KERN_WARNING "Unrecoverable kprobe detected at %p.\n", - p->addr); + pr_err("Unrecoverable kprobe detected.\n"); dump_kprobe(p); BUG(); default: From 329d815667373e858497b5947ad0484194d8c3e2 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Fri, 3 Aug 2018 10:05:50 -0700 Subject: [PATCH 534/970] x86/irqflags: Provide a declaration for native_save_fl commit 208cbb32558907f68b3b2a081ca2337ac3744794 upstream. It was reported that the commit d0a8d9378d16 is causing users of gcc < 4.9 to observe -Werror=missing-prototypes errors. Indeed, it seems that: extern inline unsigned long native_save_fl(void) { return 0; } compiled with -Werror=missing-prototypes produces this warning in gcc < 4.9, but not gcc >= 4.9. Fixes: d0a8d9378d16 ("x86/paravirt: Make native_save_fl() extern inline"). Reported-by: David Laight Reported-by: Jean Delvare Signed-off-by: Nick Desaulniers Signed-off-by: Thomas Gleixner Cc: hpa@zytor.com Cc: jgross@suse.com Cc: kstewart@linuxfoundation.org Cc: gregkh@linuxfoundation.org Cc: boris.ostrovsky@oracle.com Cc: astrachan@google.com Cc: mka@chromium.org Cc: arnd@arndb.de Cc: tstellar@redhat.com Cc: sedat.dilek@gmail.com Cc: David.Laight@aculab.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180803170550.164688-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/irqflags.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 8a8a6c66be9a..5b1177f5a963 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -12,6 +12,8 @@ * Interrupt control: */ +/* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */ +extern inline unsigned long native_save_fl(void); extern inline unsigned long native_save_fl(void) { unsigned long flags; From bbd07cbb1076de03d896c9c3787081b1080e8c99 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 13 Jun 2018 15:48:21 -0700 Subject: [PATCH 535/970] x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT commit 50896e180c6aa3a9c61a26ced99e15d602666a4c upstream L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU speculates on PTE entries which do not have the PRESENT bit set, if the content of the resulting physical address is available in the L1D cache. The OS side mitigation makes sure that a !PRESENT PTE entry points to a physical address outside the actually existing and cachable memory space. This is achieved by inverting the upper bits of the PTE. Due to the address space limitations this only works for 64bit and 32bit PAE kernels, but not for 32bit non PAE. This mitigation applies to both host and guest kernels, but in case of a 64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of the PAE address space (44bit) is not enough if the host has more than 43 bits of populated memory address space, because the speculation treats the PTE content as a physical host address bypassing EPT. The host (hypervisor) protects itself against the guest by flushing L1D as needed, but pages inside the guest are not protected against attacks from other processes inside the same guest. For the guest the inverted PTE mask has to match the host to provide the full protection for all pages the host could possibly map into the guest. The hosts populated address space is not known to the guest, so the mask must cover the possible maximal host address space, i.e. 52 bit. On 32bit PAE the maximum PTE mask is currently set to 44 bit because that is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits the mask to be below what the host could possible use for physical pages. The L1TF PROT_NONE protection code uses the PTE masks to determine which bits to invert to make sure the higher bits are set for unmapped entries to prevent L1TF speculation attacks against EPT inside guests. In order to invert all bits that could be used by the host, increase __PHYSICAL_PAGE_SHIFT to 52 to match 64bit. The real limit for a 32bit PAE kernel is still 44 bits because all Linux PTEs are created from unsigned long PFNs, so they cannot be higher than 44 bits on a 32bit kernel. So these extra PFN bits should be never set. The only users of this macro are using it to look at PTEs, so it's safe. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/page_32_types.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h index 3bae4969ac65..2622984b8f1c 100644 --- a/arch/x86/include/asm/page_32_types.h +++ b/arch/x86/include/asm/page_32_types.h @@ -28,8 +28,13 @@ #define N_EXCEPTION_STACKS 1 #ifdef CONFIG_X86_PAE -/* 44=32+12, the limit we can fit into an unsigned long pfn */ -#define __PHYSICAL_MASK_SHIFT 44 +/* + * This is beyond the 44 bit limit imposed by the 32bit long pfns, + * but we need the full mask to make sure inverted PROT_NONE + * entries have all the host bits set in a guest. + * The real limit is still 44 bits. + */ +#define __PHYSICAL_MASK_SHIFT 52 #define __VIRTUAL_MASK_SHIFT 32 #else /* !CONFIG_X86_PAE */ From 1a4922e0f01d08a4789b1e17b195bc30bf234a3b Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Fri, 8 Sep 2017 16:10:46 -0700 Subject: [PATCH 536/970] mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 commit eee4818baac0f2b37848fdf90e4b16430dc536ac upstream _PAGE_PSE is used to distinguish between a truly non-present (_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and should be treated as present. But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would cause confusion between one of those PMDs undergoing a THP split, and a soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work well, because it can hurt optimization of tlb handling in thp split. Thus, we need to move the bit. In the current kernel, bits 1-4 are not used in non-present format since commit 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7 is used as reserved (always clear), so please don't use it for other purpose. [dwmw2: Pulled in to 4.9 backport to support L1TF changes] Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com Signed-off-by: Naoya Horiguchi Signed-off-by: Zi Yan Acked-by: Dave Hansen Cc: "H. Peter Anvin" Cc: Anshuman Khandual Cc: David Nellans Cc: Ingo Molnar Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Minchan Kim Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Andrea Arcangeli Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable_64.h | 12 +++++++++--- arch/x86/include/asm/pgtable_types.h | 10 +++++----- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index ce97c8c6a310..008e1a58f96c 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -166,15 +166,21 @@ static inline int pgd_large(pgd_t pgd) { return 0; } /* * Encode and de-code a swap entry * - * | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| <- bit number - * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names - * | OFFSET (14->63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| <- swp entry + * | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number + * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names + * | OFFSET (14->63) | TYPE (9-13) |0|0|X|X| X| X|X|SD|0| <- swp entry * * G (8) is aliased and used as a PROT_NONE indicator for * !present ptes. We need to start storing swap entries above * there. We also need to avoid using A and D because of an * erratum where they can be incorrectly set by hardware on * non-present PTEs. + * + * SD (1) in swp entry is used to store soft dirty bit, which helps us + * remember soft dirty over page migration + * + * Bit 7 in swp entry should be 0 because pmd_present checks not only P, + * but also L and G. */ #define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1) #define SWP_TYPE_BITS 5 diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index f1c8ac468292..dfdb7e21ba56 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -97,15 +97,15 @@ /* * Tracking soft dirty bit when a page goes to a swap is tricky. * We need a bit which can be stored in pte _and_ not conflict - * with swap entry format. On x86 bits 6 and 7 are *not* involved - * into swap entry computation, but bit 6 is used for nonlinear - * file mapping, so we borrow bit 7 for soft dirty tracking. + * with swap entry format. On x86 bits 1-4 are *not* involved + * into swap entry computation, but bit 7 is used for thp migration, + * so we borrow bit 1 for soft dirty tracking. * * Please note that this bit must be treated as swap dirty page - * mark if and only if the PTE has present bit clear! + * mark if and only if the PTE/PMD has present bit clear! */ #ifdef CONFIG_MEM_SOFT_DIRTY -#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE +#define _PAGE_SWP_SOFT_DIRTY _PAGE_RW #else #define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) #endif From 2c9b57e4474d93222bcb6e7f901fd1e71ded699c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 13 Jun 2018 15:48:22 -0700 Subject: [PATCH 537/970] x86/speculation/l1tf: Change order of offset/type in swap entry commit bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f upstream If pages are swapped out, the swap entry is stored in the corresponding PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate on PTE entries which have the present bit set and would treat the swap entry as phsyical address (PFN). To mitigate that the upper bits of the PTE must be set so the PTE points to non existent memory. The swap entry stores the type and the offset of a swapped out page in the PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware ignores the bits beyond the phsyical address space limit, so to make the mitigation effective its required to start 'offset' at the lowest possible bit so that even large swap offsets do not reach into the physical address space limit bits. Move offset to bit 9-58 and type to bit 59-63 which are the bits that hardware generally doesn't care about. That, in turn, means that if you on desktop chip with only 40 bits of physical addressing, now that the offset starts at bit 9, there needs to be 30 bits of offset actually *in use* until bit 39 ends up being set, which means when inverted it will again point into existing memory. So that's 4 terabyte of swap space (because the offset is counted in pages, so 30 bits of offset is 42 bits of actual coverage). With bigger physical addressing, that obviously grows further, until the limit of the offset is hit (at 50 bits of offset - 62 bits of actual swap file coverage). This is a preparatory change for the actual swap entry inversion to protect against L1TF. [ AK: Updated description and minor tweaks. Split into two parts ] [ tglx: Massaged changelog ] Signed-off-by: Linus Torvalds Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Tested-by: Andi Kleen Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable_64.h | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 008e1a58f96c..a72c2ab24006 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -168,7 +168,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; } * * | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names - * | OFFSET (14->63) | TYPE (9-13) |0|0|X|X| X| X|X|SD|0| <- swp entry + * | TYPE (59-63) | OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry * * G (8) is aliased and used as a PROT_NONE indicator for * !present ptes. We need to start storing swap entries above @@ -182,19 +182,28 @@ static inline int pgd_large(pgd_t pgd) { return 0; } * Bit 7 in swp entry should be 0 because pmd_present checks not only P, * but also L and G. */ -#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1) -#define SWP_TYPE_BITS 5 -/* Place the offset above the type: */ -#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS) +#define SWP_TYPE_BITS 5 + +#define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1) + +/* We always extract/encode the offset by shifting it all the way up, and then down again */ +#define SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS) #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) -#define __swp_type(x) (((x).val >> (SWP_TYPE_FIRST_BIT)) \ - & ((1U << SWP_TYPE_BITS) - 1)) -#define __swp_offset(x) ((x).val >> SWP_OFFSET_FIRST_BIT) -#define __swp_entry(type, offset) ((swp_entry_t) { \ - ((type) << (SWP_TYPE_FIRST_BIT)) \ - | ((offset) << SWP_OFFSET_FIRST_BIT) }) +/* Extract the high bits for type */ +#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS)) + +/* Shift up (to get rid of type), then down to get value */ +#define __swp_offset(x) ((x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT) + +/* + * Shift the offset up "too far" by TYPE bits, then down again + */ +#define __swp_entry(type, offset) ((swp_entry_t) { \ + ((unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \ + | ((unsigned long)(type) << (64-SWP_TYPE_BITS)) }) + #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) #define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val }) From 60712274887fcd4ad5eb8e01796022b6b202143c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 13 Jun 2018 15:48:23 -0700 Subject: [PATCH 538/970] x86/speculation/l1tf: Protect swap entries against L1TF commit 2f22b4cd45b67b3496f4aa4c7180a1271c6452f6 upstream With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting side effects allow to read the memory the PTE is pointing too, if its values are still in the L1 cache. For swapped out pages Linux uses unmapped PTEs and stores a swap entry into them. To protect against L1TF it must be ensured that the swap entry is not pointing to valid memory, which requires setting higher bits (between bit 36 and bit 45) that are inside the CPUs physical address space, but outside any real memory. To do this invert the offset to make sure the higher bits are always set, as long as the swap file is not too big. Note there is no workaround for 32bit !PAE, or on systems which have more than MAX_PA/2 worth of memory. The later case is very unlikely to happen on real systems. [AK: updated description and minor tweaks by. Split out from the original patch ] Signed-off-by: Linus Torvalds Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Tested-by: Andi Kleen Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable_64.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index a72c2ab24006..67f2fe43a593 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -168,7 +168,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; } * * | ... | 11| 10| 9|8|7|6|5| 4| 3|2| 1|0| <- bit number * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names - * | TYPE (59-63) | OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry + * | TYPE (59-63) | ~OFFSET (9-58) |0|0|X|X| X| X|X|SD|0| <- swp entry * * G (8) is aliased and used as a PROT_NONE indicator for * !present ptes. We need to start storing swap entries above @@ -181,6 +181,9 @@ static inline int pgd_large(pgd_t pgd) { return 0; } * * Bit 7 in swp entry should be 0 because pmd_present checks not only P, * but also L and G. + * + * The offset is inverted by a binary not operation to make the high + * physical bits set. */ #define SWP_TYPE_BITS 5 @@ -195,13 +198,15 @@ static inline int pgd_large(pgd_t pgd) { return 0; } #define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS)) /* Shift up (to get rid of type), then down to get value */ -#define __swp_offset(x) ((x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT) +#define __swp_offset(x) (~(x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT) /* * Shift the offset up "too far" by TYPE bits, then down again + * The offset is inverted by a binary not operation to make the high + * physical bits set. */ #define __swp_entry(type, offset) ((swp_entry_t) { \ - ((unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \ + (~(unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \ | ((unsigned long)(type) << (64-SWP_TYPE_BITS)) }) #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) From 33182fe97add6e83c195e9d0f7297a6499563b52 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 13 Jun 2018 15:48:24 -0700 Subject: [PATCH 539/970] x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream When PTEs are set to PROT_NONE the kernel just clears the Present bit and preserves the PFN, which creates attack surface for L1TF speculation speculation attacks. This is important inside guests, because L1TF speculation bypasses physical page remapping. While the host has its own migitations preventing leaking data from other VMs into the guest, this would still risk leaking the wrong page inside the current guest. This uses the same technique as Linus' swap entry patch: while an entry is is in PROTNONE state invert the complete PFN part part of it. This ensures that the the highest bit will point to non existing memory. The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and pte/pmd/pud_pfn undo it. This assume that no code path touches the PFN part of a PTE directly without using these primitives. This doesn't handle the case that MMIO is on the top of the CPU physical memory. If such an MMIO region was exposed by an unpriviledged driver for mmap it would be possible to attack some real memory. However this situation is all rather unlikely. For 32bit non PAE the inversion is not done because there are really not enough bits to protect anything. Q: Why does the guest need to be protected when the HyperVisor already has L1TF mitigations? A: Here's an example: Physical pages 1 2 get mapped into a guest as GPA 1 -> PA 2 GPA 2 -> PA 1 through EPT. The L1TF speculation ignores the EPT remapping. Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and they belong to different users and should be isolated. A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and gets read access to the underlying physical page. Which in this case points to PA 2, so it can read process B's data, if it happened to be in L1, so isolation inside the guest is broken. There's nothing the hypervisor can do about this. This mitigation has to be done in the guest itself. [ tglx: Massaged changelog ] [ dwmw2: backported to 4.9 ] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Vlastimil Babka Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable-2level.h | 17 ++++++++++++ arch/x86/include/asm/pgtable-3level.h | 2 ++ arch/x86/include/asm/pgtable-invert.h | 32 ++++++++++++++++++++++ arch/x86/include/asm/pgtable.h | 38 +++++++++++++++++++-------- arch/x86/include/asm/pgtable_64.h | 2 ++ 5 files changed, 80 insertions(+), 11 deletions(-) create mode 100644 arch/x86/include/asm/pgtable-invert.h diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h index fd74a11959de..89c50332a71e 100644 --- a/arch/x86/include/asm/pgtable-2level.h +++ b/arch/x86/include/asm/pgtable-2level.h @@ -77,4 +77,21 @@ static inline unsigned long pte_bitop(unsigned long value, unsigned int rightshi #define __pte_to_swp_entry(pte) ((swp_entry_t) { (pte).pte_low }) #define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val }) +/* No inverted PFNs on 2 level page tables */ + +static inline u64 protnone_mask(u64 val) +{ + return 0; +} + +static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask) +{ + return val; +} + +static inline bool __pte_needs_invert(u64 val) +{ + return false; +} + #endif /* _ASM_X86_PGTABLE_2LEVEL_H */ diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index cdaa58c9b39e..0c89891c7b44 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -184,4 +184,6 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp) #define __pte_to_swp_entry(pte) ((swp_entry_t){ (pte).pte_high }) #define __swp_entry_to_pte(x) ((pte_t){ { .pte_high = (x).val } }) +#include + #endif /* _ASM_X86_PGTABLE_3LEVEL_H */ diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h new file mode 100644 index 000000000000..177564187fc0 --- /dev/null +++ b/arch/x86/include/asm/pgtable-invert.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_PGTABLE_INVERT_H +#define _ASM_PGTABLE_INVERT_H 1 + +#ifndef __ASSEMBLY__ + +static inline bool __pte_needs_invert(u64 val) +{ + return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) == _PAGE_PROTNONE; +} + +/* Get a mask to xor with the page table entry to get the correct pfn. */ +static inline u64 protnone_mask(u64 val) +{ + return __pte_needs_invert(val) ? ~0ull : 0; +} + +static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask) +{ + /* + * When a PTE transitions from NONE to !NONE or vice-versa + * invert the PFN part to stop speculation. + * pte_pfn undoes this when needed. + */ + if (__pte_needs_invert(oldval) != __pte_needs_invert(val)) + val = (val & ~mask) | (~val & mask); + return val; +} + +#endif /* __ASSEMBLY__ */ + +#endif diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5af0401ccff2..441e19dd76a4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -165,19 +165,29 @@ static inline int pte_special(pte_t pte) return pte_flags(pte) & _PAGE_SPECIAL; } +/* Entries that were set to PROT_NONE are inverted */ + +static inline u64 protnone_mask(u64 val); + static inline unsigned long pte_pfn(pte_t pte) { - return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT; + unsigned long pfn = pte_val(pte); + pfn ^= protnone_mask(pfn); + return (pfn & PTE_PFN_MASK) >> PAGE_SHIFT; } static inline unsigned long pmd_pfn(pmd_t pmd) { - return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT; + unsigned long pfn = pmd_val(pmd); + pfn ^= protnone_mask(pfn); + return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT; } static inline unsigned long pud_pfn(pud_t pud) { - return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT; + unsigned long pfn = pud_val(pud); + pfn ^= protnone_mask(pfn); + return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT; } #define pte_page(pte) pfn_to_page(pte_pfn(pte)) @@ -394,19 +404,25 @@ static inline pgprotval_t massage_pgprot(pgprot_t pgprot) static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot) { - return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) | - massage_pgprot(pgprot)); + phys_addr_t pfn = page_nr << PAGE_SHIFT; + pfn ^= protnone_mask(pgprot_val(pgprot)); + pfn &= PTE_PFN_MASK; + return __pte(pfn | massage_pgprot(pgprot)); } static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot) { - return __pmd(((phys_addr_t)page_nr << PAGE_SHIFT) | - massage_pgprot(pgprot)); + phys_addr_t pfn = page_nr << PAGE_SHIFT; + pfn ^= protnone_mask(pgprot_val(pgprot)); + pfn &= PHYSICAL_PMD_PAGE_MASK; + return __pmd(pfn | massage_pgprot(pgprot)); } +static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { - pteval_t val = pte_val(pte); + pteval_t val = pte_val(pte), oldval = val; /* * Chop off the NX bit (if present), and add the NX portion of @@ -414,17 +430,17 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ val &= _PAGE_CHG_MASK; val |= massage_pgprot(newprot) & ~_PAGE_CHG_MASK; - + val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); return __pte(val); } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { - pmdval_t val = pmd_val(pmd); + pmdval_t val = pmd_val(pmd), oldval = val; val &= _HPAGE_CHG_MASK; val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK; - + val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); return __pmd(val); } diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 67f2fe43a593..221a32ed1372 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -235,6 +235,8 @@ extern void cleanup_highmap(void); extern void init_extra_mapping_uc(unsigned long phys, unsigned long size); extern void init_extra_mapping_wb(unsigned long phys, unsigned long size); +#include + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_PGTABLE_64_H */ From 5b2ec92f70f6d4084d23bf42391fd27fa03e8c4c Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 13 Jun 2018 15:48:25 -0700 Subject: [PATCH 540/970] x86/speculation/l1tf: Make sure the first page is always reserved commit 10a70416e1f067f6c4efda6ffd8ea96002ac4223 upstream The L1TF workaround doesn't make any attempt to mitigate speculate accesses to the first physical page for zeroed PTEs. Normally it only contains some data from the early real mode BIOS. It's not entirely clear that the first page is reserved in all configurations, so add an extra reservation call to make sure it is really reserved. In most configurations (e.g. with the standard reservations) it's likely a nop. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/setup.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6b55012d02a3..49960ecfc322 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -854,6 +854,12 @@ void __init setup_arch(char **cmdline_p) memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text); + /* + * Make sure page 0 is always reserved because on systems with + * L1TF its contents can be leaked to user processes. + */ + memblock_reserve(0, PAGE_SIZE); + early_reserve_initrd(); /* From 432e99b34066099db62f87b2704654b1b23fd6be Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 13 Jun 2018 15:48:26 -0700 Subject: [PATCH 541/970] x86/speculation/l1tf: Add sysfs reporting for l1tf commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream L1TF core kernel workarounds are cheap and normally always enabled, However they still should be reported in sysfs if the system is vulnerable or mitigated. Add the necessary CPU feature/bug bits. - Extend the existing checks for Meltdowns to determine if the system is vulnerable. All CPUs which are not vulnerable to Meltdown are also not vulnerable to L1TF - Check for 32bit non PAE and emit a warning as there is no practical way for mitigation due to the limited physical address bits - If the system has more than MAX_PA/2 physical memory the invert page workarounds don't protect the system against the L1TF attack anymore, because an inverted physical address will also point to valid memory. Print a warning in this case and report that the system is vulnerable. Add a function which returns the PFN limit for the L1TF mitigation, which will be used in follow up patches for sanity and range checks. [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ] [ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 3 ++- arch/x86/include/asm/processor.h | 5 ++++ arch/x86/kernel/cpu/bugs.c | 40 ++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 20 +++++++++++++++ drivers/base/cpu.c | 8 ++++++ include/linux/cpu.h | 2 ++ 6 files changed, 77 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index aea30afeddb8..a78ddc9c694e 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -213,7 +213,7 @@ #define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */ #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ - +#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ @@ -349,5 +349,6 @@ #define X86_BUG_SPECTRE_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */ #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */ #define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */ +#define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index ec15ca2b32d0..1b05055bbdc6 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -173,6 +173,11 @@ extern const struct seq_operations cpuinfo_op; extern void cpu_detect(struct cpuinfo_x86 *c); +static inline unsigned long l1tf_pfn_limit(void) +{ + return BIT(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT) - 1; +} + extern void early_cpu_init(void); extern void identify_boot_cpu(void); extern void identify_secondary_cpu(struct cpuinfo_x86 *); diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 53ddf121118c..0cad5df48ae2 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -26,9 +26,11 @@ #include #include #include +#include static void __init spectre_v2_select_mitigation(void); static void __init ssb_select_mitigation(void); +static void __init l1tf_select_mitigation(void); /* * Our boot-time value of the SPEC_CTRL MSR. We read it once so that any @@ -80,6 +82,8 @@ void __init check_bugs(void) */ ssb_select_mitigation(); + l1tf_select_mitigation(); + #ifdef CONFIG_X86_32 /* * Check whether we are able to run this kernel safely on SMP. @@ -204,6 +208,32 @@ static void x86_amd_ssb_disable(void) wrmsrl(MSR_AMD64_LS_CFG, msrval); } +static void __init l1tf_select_mitigation(void) +{ + u64 half_pa; + + if (!boot_cpu_has_bug(X86_BUG_L1TF)) + return; + +#if CONFIG_PGTABLE_LEVELS == 2 + pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n"); + return; +#endif + + /* + * This is extremely unlikely to happen because almost all + * systems have far more MAX_PA/2 than RAM can be fit into + * DIMM slots. + */ + half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT; + if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) { + pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n"); + return; + } + + setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV); +} + #ifdef RETPOLINE static bool spectre_v2_bad_module; @@ -656,6 +686,11 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr case X86_BUG_SPEC_STORE_BYPASS: return sprintf(buf, "%s\n", ssb_strings[ssb_mode]); + case X86_BUG_L1TF: + if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV)) + return sprintf(buf, "Mitigation: Page Table Inversion\n"); + break; + default: break; } @@ -682,4 +717,9 @@ ssize_t cpu_show_spec_store_bypass(struct device *dev, struct device_attribute * { return cpu_show_common(dev, attr, buf, X86_BUG_SPEC_STORE_BYPASS); } + +ssize_t cpu_show_l1tf(struct device *dev, struct device_attribute *attr, char *buf) +{ + return cpu_show_common(dev, attr, buf, X86_BUG_L1TF); +} #endif diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 7a4279d8a902..ef58507fd28e 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -925,6 +925,21 @@ static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = { {} }; +static const __initconst struct x86_cpu_id cpu_no_l1tf[] = { + /* in addition to cpu_no_speculation */ + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT1 }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT2 }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MERRIFIELD }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MOOREFIELD }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GOLDMONT }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_DENVERTON }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_GEMINI_LAKE }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, + {} +}; + static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) { u64 ia32_cap = 0; @@ -950,6 +965,11 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) return; setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN); + + if (x86_match_cpu(cpu_no_l1tf)) + return; + + setup_force_cpu_bug(X86_BUG_L1TF); } /* diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index cbb1cc6bbdb4..f1f4ce7ddb47 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -525,16 +525,24 @@ ssize_t __weak cpu_show_spec_store_bypass(struct device *dev, return sprintf(buf, "Not affected\n"); } +ssize_t __weak cpu_show_l1tf(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Not affected\n"); +} + static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL); +static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { &dev_attr_meltdown.attr, &dev_attr_spectre_v1.attr, &dev_attr_spectre_v2.attr, &dev_attr_spec_store_bypass.attr, + &dev_attr_l1tf.attr, NULL }; diff --git a/include/linux/cpu.h b/include/linux/cpu.h index f1da93d0c34f..0c0c614245e5 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -52,6 +52,8 @@ extern ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf); extern ssize_t cpu_show_spec_store_bypass(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_l1tf(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, From 7c5b42f82c13365b8284b5945f5ffa9f88380dd7 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 13 Jun 2018 15:48:27 -0700 Subject: [PATCH 542/970] x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings commit 42e4089c7890725fcd329999252dc489b72f2921 upstream For L1TF PROT_NONE mappings are protected by inverting the PFN in the page table entry. This sets the high bits in the CPU's address space, thus making sure to point to not point an unmapped entry to valid cached memory. Some server system BIOSes put the MMIO mappings high up in the physical address space. If such an high mapping was mapped to unprivileged users they could attack low memory by setting such a mapping to PROT_NONE. This could happen through a special device driver which is not access protected. Normal /dev/mem is of course access protected. To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings. Valid page mappings are allowed because the system is then unsafe anyways. It's not expected that users commonly use PROT_NONE on MMIO. But to minimize any impact this is only enforced if the mapping actually refers to a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip the check for root. For mmaps this is straight forward and can be handled in vm_insert_pfn and in remap_pfn_range(). For mprotect it's a bit trickier. At the point where the actual PTEs are accessed a lot of state has been changed and it would be difficult to undo on an error. Since this is a uncommon case use a separate early page talk walk pass for MMIO PROT_NONE mappings that checks for this condition early. For non MMIO and non PROT_NONE there are no changes. [dwmw2: Backport to 4.9] Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 9 +++++++ arch/x86/mm/mmap.c | 21 +++++++++++++++ include/asm-generic/pgtable.h | 12 +++++++++ mm/memory.c | 29 +++++++++++++++----- mm/mprotect.c | 49 ++++++++++++++++++++++++++++++++++ 5 files changed, 113 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 441e19dd76a4..38b679df119d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1026,6 +1026,15 @@ static inline u16 pte_flags_pkey(unsigned long pte_flags) #endif } + +#define __HAVE_ARCH_PFN_MODIFY_ALLOWED 1 +extern bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot); + +static inline bool arch_has_pfn_modify_check(void) +{ + return boot_cpu_has_bug(X86_BUG_L1TF); +} + #include #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index d2dc0438d654..5aad869fa205 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -121,3 +121,24 @@ const char *arch_vma_name(struct vm_area_struct *vma) return "[mpx]"; return NULL; } + +/* + * Only allow root to set high MMIO mappings to PROT_NONE. + * This prevents an unpriv. user to set them to PROT_NONE and invert + * them, then pointing to valid memory for L1TF speculation. + * + * Note: for locked down kernels may want to disable the root override. + */ +bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) +{ + if (!boot_cpu_has_bug(X86_BUG_L1TF)) + return true; + if (!__pte_needs_invert(pgprot_val(prot))) + return true; + /* If it's real memory always allow */ + if (pfn_valid(pfn)) + return true; + if (pfn > l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN)) + return false; + return true; +} diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 4e8551c8ef18..0ffe4051ae2e 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -842,4 +842,16 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, #endif #endif +#ifndef __HAVE_ARCH_PFN_MODIFY_ALLOWED +static inline bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) +{ + return true; +} + +static inline bool arch_has_pfn_modify_check(void) +{ + return false; +} +#endif + #endif /* _ASM_GENERIC_PGTABLE_H */ diff --git a/mm/memory.c b/mm/memory.c index d2db2c4eb0a4..88f8d6a2af05 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1641,6 +1641,9 @@ int vm_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr, if (track_pfn_insert(vma, &pgprot, __pfn_to_pfn_t(pfn, PFN_DEV))) return -EINVAL; + if (!pfn_modify_allowed(pfn, pgprot)) + return -EACCES; + ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot); return ret; @@ -1659,6 +1662,9 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, if (track_pfn_insert(vma, &pgprot, pfn)) return -EINVAL; + if (!pfn_modify_allowed(pfn_t_to_pfn(pfn), pgprot)) + return -EACCES; + /* * If we don't have pte special, then we have to use the pfn_valid() * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must* @@ -1692,6 +1698,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, { pte_t *pte; spinlock_t *ptl; + int err = 0; pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) @@ -1699,12 +1706,16 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, arch_enter_lazy_mmu_mode(); do { BUG_ON(!pte_none(*pte)); + if (!pfn_modify_allowed(pfn, prot)) { + err = -EACCES; + break; + } set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); - return 0; + return err; } static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, @@ -1713,6 +1724,7 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, { pmd_t *pmd; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pmd = pmd_alloc(mm, pud, addr); @@ -1721,9 +1733,10 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, VM_BUG_ON(pmd_trans_huge(*pmd)); do { next = pmd_addr_end(addr, end); - if (remap_pte_range(mm, pmd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pte_range(mm, pmd, addr, next, + pfn + (addr >> PAGE_SHIFT), prot); + if (err) + return err; } while (pmd++, addr = next, addr != end); return 0; } @@ -1734,6 +1747,7 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, { pud_t *pud; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pud = pud_alloc(mm, pgd, addr); @@ -1741,9 +1755,10 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, return -ENOMEM; do { next = pud_addr_end(addr, end); - if (remap_pmd_range(mm, pud, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pmd_range(mm, pud, addr, next, + pfn + (addr >> PAGE_SHIFT), prot); + if (err) + return err; } while (pud++, addr = next, addr != end); return 0; } diff --git a/mm/mprotect.c b/mm/mprotect.c index ae740c9b1f9b..6896f77be166 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -260,6 +260,42 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, return pages; } +static int prot_none_pte_entry(pte_t *pte, unsigned long addr, + unsigned long next, struct mm_walk *walk) +{ + return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? + 0 : -EACCES; +} + +static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, + unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? + 0 : -EACCES; +} + +static int prot_none_test(unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + return 0; +} + +static int prot_none_walk(struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned long newflags) +{ + pgprot_t new_pgprot = vm_get_page_prot(newflags); + struct mm_walk prot_none_walk = { + .pte_entry = prot_none_pte_entry, + .hugetlb_entry = prot_none_hugetlb_entry, + .test_walk = prot_none_test, + .mm = current->mm, + .private = &new_pgprot, + }; + + return walk_page_range(start, end, &prot_none_walk); +} + int mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags) @@ -277,6 +313,19 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, return 0; } + /* + * Do PROT_NONE PFN permission checks here when we can still + * bail out without undoing a lot of state. This is a rather + * uncommon case, so doesn't need to be very optimized. + */ + if (arch_has_pfn_modify_check() && + (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && + (newflags & (VM_READ|VM_WRITE|VM_EXEC)) == 0) { + error = prot_none_walk(vma, start, end, newflags); + if (error) + return error; + } + /* * If we make a private mapping writable we increase our commit; * but (without finer accounting) cannot reduce our commit if we From e3923475ebb1b503668dfdb3ba90e2ebd46931e6 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 13 Jun 2018 15:48:28 -0700 Subject: [PATCH 543/970] x86/speculation/l1tf: Limit swap file size to MAX_PA/2 commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream For the L1TF workaround its necessary to limit the swap file size to below MAX_PA/2, so that the higher bits of the swap offset inverted never point to valid memory. Add a mechanism for the architecture to override the swap file size check in swapfile.c and add a x86 specific max swapfile check function that enforces that limit. The check is only enabled if the CPU is vulnerable to L1TF. In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system with 46bit PA it is 32TB. The limit is only per individual swap file, so it's always possible to exceed these limits with multiple swap files or partitions. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Michal Hocko Acked-by: Dave Hansen Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 15 +++++++++++++ include/linux/swapfile.h | 2 ++ mm/swapfile.c | 46 ++++++++++++++++++++++++++-------------- 3 files changed, 47 insertions(+), 16 deletions(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index ae9b84cae57c..d3d732912304 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -4,6 +4,8 @@ #include #include #include /* for max_low_pfn */ +#include +#include #include #include @@ -780,3 +782,16 @@ void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache) __cachemode2pte_tbl[cache] = __cm_idx2pte(entry); __pte2cachemode_tbl[entry] = cache; } + +unsigned long max_swapfile_size(void) +{ + unsigned long pages; + + pages = generic_max_swapfile_size(); + + if (boot_cpu_has_bug(X86_BUG_L1TF)) { + /* Limit the swap file size to MAX_PA/2 for L1TF workaround */ + pages = min_t(unsigned long, l1tf_pfn_limit() + 1, pages); + } + return pages; +} diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h index 388293a91e8c..e4594de79bc4 100644 --- a/include/linux/swapfile.h +++ b/include/linux/swapfile.h @@ -9,5 +9,7 @@ extern spinlock_t swap_lock; extern struct plist_head swap_active_head; extern struct swap_info_struct *swap_info[]; extern int try_to_unuse(unsigned int, bool, unsigned long); +extern unsigned long generic_max_swapfile_size(void); +extern unsigned long max_swapfile_size(void); #endif /* _LINUX_SWAPFILE_H */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 79c03ecd31c8..855f62ab8c1b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2219,6 +2219,35 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode) return 0; } + +/* + * Find out how many pages are allowed for a single swap device. There + * are two limiting factors: + * 1) the number of bits for the swap offset in the swp_entry_t type, and + * 2) the number of bits in the swap pte, as defined by the different + * architectures. + * + * In order to find the largest possible bit mask, a swap entry with + * swap type 0 and swap offset ~0UL is created, encoded to a swap pte, + * decoded to a swp_entry_t again, and finally the swap offset is + * extracted. + * + * This will mask all the bits from the initial ~0UL mask that can't + * be encoded in either the swp_entry_t or the architecture definition + * of a swap pte. + */ +unsigned long generic_max_swapfile_size(void) +{ + return swp_offset(pte_to_swp_entry( + swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; +} + +/* Can be overridden by an architecture for additional checks. */ +__weak unsigned long max_swapfile_size(void) +{ + return generic_max_swapfile_size(); +} + static unsigned long read_swap_header(struct swap_info_struct *p, union swap_header *swap_header, struct inode *inode) @@ -2254,22 +2283,7 @@ static unsigned long read_swap_header(struct swap_info_struct *p, p->cluster_next = 1; p->cluster_nr = 0; - /* - * Find out how many pages are allowed for a single swap - * device. There are two limiting factors: 1) the number - * of bits for the swap offset in the swp_entry_t type, and - * 2) the number of bits in the swap pte as defined by the - * different architectures. In order to find the - * largest possible bit mask, a swap entry with swap type 0 - * and swap offset ~0UL is created, encoded to a swap pte, - * decoded to a swp_entry_t again, and finally the swap - * offset is extracted. This will mask all the bits from - * the initial ~0UL mask that can't be encoded in either - * the swp_entry_t or the architecture definition of a - * swap pte. - */ - maxpages = swp_offset(pte_to_swp_entry( - swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; + maxpages = max_swapfile_size(); last_page = swap_header->info.last_page; if (!last_page) { pr_warn("Empty swap-file\n"); From 1ac1dc14671f531134f29755f98386f8e168b810 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 20 Jun 2018 16:42:57 -0400 Subject: [PATCH 544/970] x86/bugs: Move the l1tf function and define pr_fmt properly commit 56563f53d3066afa9e63d6c997bf67e76a8b05c0 upstream The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt which was defined as "Spectre V2 : ". Move the function to be past SSBD and also define the pr_fmt. Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 55 ++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 0cad5df48ae2..51257f927240 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -208,32 +208,6 @@ static void x86_amd_ssb_disable(void) wrmsrl(MSR_AMD64_LS_CFG, msrval); } -static void __init l1tf_select_mitigation(void) -{ - u64 half_pa; - - if (!boot_cpu_has_bug(X86_BUG_L1TF)) - return; - -#if CONFIG_PGTABLE_LEVELS == 2 - pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n"); - return; -#endif - - /* - * This is extremely unlikely to happen because almost all - * systems have far more MAX_PA/2 than RAM can be fit into - * DIMM slots. - */ - half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT; - if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) { - pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n"); - return; - } - - setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV); -} - #ifdef RETPOLINE static bool spectre_v2_bad_module; @@ -659,6 +633,35 @@ void x86_spec_ctrl_setup_ap(void) x86_amd_ssb_disable(); } +#undef pr_fmt +#define pr_fmt(fmt) "L1TF: " fmt +static void __init l1tf_select_mitigation(void) +{ + u64 half_pa; + + if (!boot_cpu_has_bug(X86_BUG_L1TF)) + return; + +#if CONFIG_PGTABLE_LEVELS == 2 + pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n"); + return; +#endif + + /* + * This is extremely unlikely to happen because almost all + * systems have far more MAX_PA/2 than RAM can be fit into + * DIMM slots. + */ + half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT; + if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) { + pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n"); + return; + } + + setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV); +} +#undef pr_fmt + #ifdef CONFIG_SYSFS static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, From 7b69a96e5a328f17fe33f3826d7e8349ab59015d Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 29 May 2018 17:50:22 +0200 Subject: [PATCH 545/970] x86/smp: Provide topology_is_primary_thread() commit 6a4d2657e048f096c7ffcad254010bd94891c8c0 upstream If the CPU is supporting SMT then the primary thread can be found by checking the lower APIC ID bits for zero. smp_num_siblings is used to build the mask for the APIC ID bits which need to be taken into account. This uses the MPTABLE or ACPI/MADT supplied APIC ID, which can be different than the initial APIC ID in CPUID. But according to AMD the lower bits have to be consistent. Intel gave a tentative confirmation as well. Preparatory patch to support disabling SMT at boot/runtime. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/apic.h | 7 +++++++ arch/x86/include/asm/topology.h | 4 +++- arch/x86/kernel/apic/apic.c | 15 +++++++++++++++ arch/x86/kernel/smpboot.c | 9 +++++++++ 4 files changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index f5aaf6c83222..20eed46c5827 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -633,6 +633,13 @@ extern int default_check_phys_apicid_present(int phys_apicid); #endif #endif /* CONFIG_X86_LOCAL_APIC */ + +#ifdef CONFIG_SMP +bool apic_id_is_primary_thread(unsigned int id); +#else +static inline bool apic_id_is_primary_thread(unsigned int id) { return false; } +#endif + extern void irq_enter(void); extern void irq_exit(void); diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index cf75871d2f81..e5e6c02c203f 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -129,13 +129,15 @@ static inline int topology_max_smt_threads(void) } int topology_update_package_map(unsigned int apicid, unsigned int cpu); -extern int topology_phys_to_logical_pkg(unsigned int pkg); +int topology_phys_to_logical_pkg(unsigned int pkg); +bool topology_is_primary_thread(unsigned int cpu); #else #define topology_max_packages() (1) static inline int topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; } static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; } static inline int topology_max_smt_threads(void) { return 1; } +static inline bool topology_is_primary_thread(unsigned int cpu) { return true; } #endif static inline void arch_fix_phys_package_id(int num, u32 slot) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 76cf21f887bd..28a1531a3d74 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2041,6 +2041,21 @@ static int cpuid_to_apicid[] = { [0 ... NR_CPUS - 1] = -1, }; +/** + * apic_id_is_primary_thread - Check whether APIC ID belongs to a primary thread + * @id: APIC ID to check + */ +bool apic_id_is_primary_thread(unsigned int apicid) +{ + u32 mask; + + if (smp_num_siblings == 1) + return true; + /* Isolate the SMT bit(s) in the APICID and check for 0 */ + mask = (1U << (fls(smp_num_siblings) - 1)) - 1; + return !(apicid & mask); +} + /* * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids * and cpuid_to_apicid[] synchronized. diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 10b22fc6ef5a..e230c47c88e3 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -295,6 +295,15 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu) return 0; } +/** + * topology_is_primary_thread - Check whether CPU is the primary SMT thread + * @cpu: CPU to check + */ +bool topology_is_primary_thread(unsigned int cpu) +{ + return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu)); +} + /** * topology_phys_to_logical_pkg - Map a physical package id to a logical * From 16fd33cd353be2cb71f2431788e5b2ae02891a77 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 21 Jun 2018 10:37:20 +0200 Subject: [PATCH 546/970] x86/topology: Provide topology_smt_supported() commit f048c399e0f7490ab7296bc2c255d37eb14a9675 upstream Provide information whether SMT is supoorted by the CPUs. Preparatory patch for SMT control mechanism. Suggested-by: Dave Hansen Signed-off-by: Thomas Gleixner Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/topology.h | 2 ++ arch/x86/kernel/smpboot.c | 8 ++++++++ 2 files changed, 10 insertions(+) diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index e5e6c02c203f..1fbb174c846b 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -131,6 +131,7 @@ static inline int topology_max_smt_threads(void) int topology_update_package_map(unsigned int apicid, unsigned int cpu); int topology_phys_to_logical_pkg(unsigned int pkg); bool topology_is_primary_thread(unsigned int cpu); +bool topology_smt_supported(void); #else #define topology_max_packages() (1) static inline int @@ -138,6 +139,7 @@ topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; } static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; } static inline int topology_max_smt_threads(void) { return 1; } static inline bool topology_is_primary_thread(unsigned int cpu) { return true; } +static inline bool topology_smt_supported(void) { return false; } #endif static inline void arch_fix_phys_package_id(int num, u32 slot) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index e230c47c88e3..168cc3baadf6 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -304,6 +304,14 @@ bool topology_is_primary_thread(unsigned int cpu) return apic_id_is_primary_thread(per_cpu(x86_cpu_to_apicid, cpu)); } +/** + * topology_smt_supported - Check whether SMT is supported by the CPUs + */ +bool topology_smt_supported(void) +{ + return smp_num_siblings > 1; +} + /** * topology_phys_to_logical_pkg - Map a physical package id to a logical * From 9333575fc4a35dbeda5f75fb9cf72e899e569f00 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 29 May 2018 19:05:25 +0200 Subject: [PATCH 547/970] cpu/hotplug: Make bringup/teardown of smp threads symmetric commit c4de65696d865c225fda3b9913b31284ea65ea96 upstream The asymmetry caused a warning to trigger if the bootup was stopped in state CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can now be invoked on already or still parked threads. But there is still no reason to have this be asymmetric. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- kernel/cpu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index e93635e2376f..89d2ef23a0fa 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -817,7 +817,6 @@ static int takedown_cpu(unsigned int cpu) /* Park the smpboot threads */ kthread_park(per_cpu_ptr(&cpuhp_state, cpu)->thread); - smpboot_park_threads(cpu); /* * Prevent irq alloc/free while the dying cpu reorganizes the @@ -1389,7 +1388,7 @@ static struct cpuhp_step cpuhp_ap_states[] = { [CPUHP_AP_SMPBOOT_THREADS] = { .name = "smpboot/threads:online", .startup.single = smpboot_unpark_threads, - .teardown.single = NULL, + .teardown.single = smpboot_park_threads, }, [CPUHP_AP_PERF_ONLINE] = { .name = "perf:online", From 373b8def455ec80db7d951b20562a75ed2df2703 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 29 May 2018 17:49:05 +0200 Subject: [PATCH 548/970] cpu/hotplug: Split do_cpu_down() commit cc1fe215e1efa406b03aa4389e6269b61342dec5 upstream Split out the inner workings of do_cpu_down() to allow reuse of that function for the upcoming SMT disabling mechanism. No functional change. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- kernel/cpu.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 89d2ef23a0fa..91b9d3272de0 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -955,20 +955,19 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen, return ret; } +static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target) +{ + if (cpu_hotplug_disabled) + return -EBUSY; + return _cpu_down(cpu, 0, target); +} + static int do_cpu_down(unsigned int cpu, enum cpuhp_state target) { int err; cpu_maps_update_begin(); - - if (cpu_hotplug_disabled) { - err = -EBUSY; - goto out; - } - - err = _cpu_down(cpu, 0, target); - -out: + err = cpu_down_maps_locked(cpu, target); cpu_maps_update_done(); return err; } From f37486c0a1d05f41e1d159a0798a19d5461c764a Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 29 May 2018 17:48:27 +0200 Subject: [PATCH 549/970] cpu/hotplug: Provide knobs to control SMT commit 05736e4ac13c08a4a9b1ef2de26dd31a32cbee57 upstream Provide a command line and a sysfs knob to control SMT. The command line options are: 'nosmt': Enumerate secondary threads, but do not online them 'nosmt=force': Ignore secondary threads completely during enumeration via MP table and ACPI/MADT. The sysfs control file has the following states (read/write): 'on': SMT is enabled. Secondary threads can be freely onlined 'off': SMT is disabled. Secondary threads, even if enumerated cannot be onlined 'forceoff': SMT is permanentely disabled. Writes to the control file are rejected. 'notsupported': SMT is not supported by the CPU The command line option 'nosmt' sets the sysfs control to 'off'. This can be changed to 'on' to reenable SMT during runtime. The command line option 'nosmt=force' sets the sysfs control to 'forceoff'. This cannot be changed during runtime. When SMT is 'on' and the control file is changed to 'off' then all online secondary threads are offlined and attempts to online a secondary thread later on are rejected. When SMT is 'off' and the control file is changed to 'on' then secondary threads can be onlined again. The 'off' -> 'on' transition does not automatically online the secondary threads. When the control file is set to 'forceoff', the behaviour is the same as setting it to 'off', but the operation is irreversible and later writes to the control file are rejected. When the control status is 'notsupported' then writes to the control file are rejected. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- .../ABI/testing/sysfs-devices-system-cpu | 20 +++ Documentation/kernel-parameters.txt | 8 + arch/Kconfig | 3 + arch/x86/Kconfig | 1 + include/linux/cpu.h | 13 ++ kernel/cpu.c | 170 ++++++++++++++++++ 6 files changed, 215 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 6d75a9c00e8a..7d8ba8da3c04 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -367,3 +367,23 @@ Description: Information about CPU vulnerabilities "Not affected" CPU is not affected by the vulnerability "Vulnerable" CPU is affected and no mitigation in effect "Mitigation: $M" CPU is affected and mitigation $M is in effect + +What: /sys/devices/system/cpu/smt + /sys/devices/system/cpu/smt/active + /sys/devices/system/cpu/smt/control +Date: June 2018 +Contact: Linux kernel mailing list +Description: Control Symetric Multi Threading (SMT) + + active: Tells whether SMT is active (enabled and siblings online) + + control: Read/write interface to control SMT. Possible + values: + + "on" SMT is enabled + "off" SMT is disabled + "forceoff" SMT is force disabled. Cannot be changed. + "notsupported" SMT is not supported by the CPU + + If control status is "forceoff" or "notsupported" writes + are rejected. diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a16f87e4dd10..8fe810bc236c 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2694,6 +2694,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted. nosmt [KNL,S390] Disable symmetric multithreading (SMT). Equivalent to smt=1. + [KNL,x86] Disable symmetric multithreading (SMT). + nosmt=force: Force disable SMT, similar to disabling + it in the BIOS except that some of the + resource partitioning effects which are + caused by having SMT enabled in the BIOS + cannot be undone. Depending on the CPU + type this might have a performance impact. + nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may allow data leaks with this option, which is equivalent diff --git a/arch/Kconfig b/arch/Kconfig index 659bdd079277..b39d0f93c67b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -5,6 +5,9 @@ config KEXEC_CORE bool +config HOTPLUG_SMT + bool + config OPROFILE tristate "OProfile system profiling" depends on PROFILING diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a4ac7bab15f7..e31001ec4c07 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -147,6 +147,7 @@ config X86 select HAVE_UID16 if X86_32 || IA32_EMULATION select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER + select HOTPLUG_SMT if SMP select IRQ_FORCED_THREADING select MODULES_USE_ELF_RELA if X86_64 select MODULES_USE_ELF_REL if X86_32 diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 0c0c614245e5..bb8fa64fbea3 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -257,4 +257,17 @@ void cpuhp_report_idle_dead(void); static inline void cpuhp_report_idle_dead(void) { } #endif /* #ifdef CONFIG_HOTPLUG_CPU */ +enum cpuhp_smt_control { + CPU_SMT_ENABLED, + CPU_SMT_DISABLED, + CPU_SMT_FORCE_DISABLED, + CPU_SMT_NOT_SUPPORTED, +}; + +#if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) +extern enum cpuhp_smt_control cpu_smt_control; +#else +# define cpu_smt_control (CPU_SMT_ENABLED) +#endif + #endif /* _LINUX_CPU_H_ */ diff --git a/kernel/cpu.c b/kernel/cpu.c index 91b9d3272de0..0fbf3e59cf14 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -978,6 +978,29 @@ int cpu_down(unsigned int cpu) EXPORT_SYMBOL(cpu_down); #endif /*CONFIG_HOTPLUG_CPU*/ +#ifdef CONFIG_HOTPLUG_SMT +enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED; + +static int __init smt_cmdline_disable(char *str) +{ + cpu_smt_control = CPU_SMT_DISABLED; + if (str && !strcmp(str, "force")) { + pr_info("SMT: Force disabled\n"); + cpu_smt_control = CPU_SMT_FORCE_DISABLED; + } + return 0; +} +early_param("nosmt", smt_cmdline_disable); + +static inline bool cpu_smt_allowed(unsigned int cpu) +{ + return cpu_smt_control == CPU_SMT_ENABLED || + topology_is_primary_thread(cpu); +} +#else +static inline bool cpu_smt_allowed(unsigned int cpu) { return true; } +#endif + /** * notify_cpu_starting(cpu) - Invoke the callbacks on the starting CPU * @cpu: cpu that just started @@ -1096,6 +1119,10 @@ static int do_cpu_up(unsigned int cpu, enum cpuhp_state target) err = -EBUSY; goto out; } + if (!cpu_smt_allowed(cpu)) { + err = -EPERM; + goto out; + } err = _cpu_up(cpu, 0, target); out: @@ -1842,10 +1869,153 @@ static struct attribute_group cpuhp_cpu_root_attr_group = { NULL }; +#ifdef CONFIG_HOTPLUG_SMT + +static const char *smt_states[] = { + [CPU_SMT_ENABLED] = "on", + [CPU_SMT_DISABLED] = "off", + [CPU_SMT_FORCE_DISABLED] = "forceoff", + [CPU_SMT_NOT_SUPPORTED] = "notsupported", +}; + +static ssize_t +show_smt_control(struct device *dev, struct device_attribute *attr, char *buf) +{ + return snprintf(buf, PAGE_SIZE - 2, "%s\n", smt_states[cpu_smt_control]); +} + +static void cpuhp_offline_cpu_device(unsigned int cpu) +{ + struct device *dev = get_cpu_device(cpu); + + dev->offline = true; + /* Tell user space about the state change */ + kobject_uevent(&dev->kobj, KOBJ_OFFLINE); +} + +static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) +{ + int cpu, ret = 0; + + cpu_maps_update_begin(); + for_each_online_cpu(cpu) { + if (topology_is_primary_thread(cpu)) + continue; + ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE); + if (ret) + break; + /* + * As this needs to hold the cpu maps lock it's impossible + * to call device_offline() because that ends up calling + * cpu_down() which takes cpu maps lock. cpu maps lock + * needs to be held as this might race against in kernel + * abusers of the hotplug machinery (thermal management). + * + * So nothing would update device:offline state. That would + * leave the sysfs entry stale and prevent onlining after + * smt control has been changed to 'off' again. This is + * called under the sysfs hotplug lock, so it is properly + * serialized against the regular offline usage. + */ + cpuhp_offline_cpu_device(cpu); + } + if (!ret) + cpu_smt_control = ctrlval; + cpu_maps_update_done(); + return ret; +} + +static void cpuhp_smt_enable(void) +{ + cpu_maps_update_begin(); + cpu_smt_control = CPU_SMT_ENABLED; + cpu_maps_update_done(); +} + +static ssize_t +store_smt_control(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + int ctrlval, ret; + + if (sysfs_streq(buf, "on")) + ctrlval = CPU_SMT_ENABLED; + else if (sysfs_streq(buf, "off")) + ctrlval = CPU_SMT_DISABLED; + else if (sysfs_streq(buf, "forceoff")) + ctrlval = CPU_SMT_FORCE_DISABLED; + else + return -EINVAL; + + if (cpu_smt_control == CPU_SMT_FORCE_DISABLED) + return -EPERM; + + if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED) + return -ENODEV; + + ret = lock_device_hotplug_sysfs(); + if (ret) + return ret; + + if (ctrlval != cpu_smt_control) { + switch (ctrlval) { + case CPU_SMT_ENABLED: + cpuhp_smt_enable(); + break; + case CPU_SMT_DISABLED: + case CPU_SMT_FORCE_DISABLED: + ret = cpuhp_smt_disable(ctrlval); + break; + } + } + + unlock_device_hotplug(); + return ret ? ret : count; +} +static DEVICE_ATTR(control, 0644, show_smt_control, store_smt_control); + +static ssize_t +show_smt_active(struct device *dev, struct device_attribute *attr, char *buf) +{ + bool active = topology_max_smt_threads() > 1; + + return snprintf(buf, PAGE_SIZE - 2, "%d\n", active); +} +static DEVICE_ATTR(active, 0444, show_smt_active, NULL); + +static struct attribute *cpuhp_smt_attrs[] = { + &dev_attr_control.attr, + &dev_attr_active.attr, + NULL +}; + +static const struct attribute_group cpuhp_smt_attr_group = { + .attrs = cpuhp_smt_attrs, + .name = "smt", + NULL +}; + +static int __init cpu_smt_state_init(void) +{ + if (!topology_smt_supported()) + cpu_smt_control = CPU_SMT_NOT_SUPPORTED; + + return sysfs_create_group(&cpu_subsys.dev_root->kobj, + &cpuhp_smt_attr_group); +} + +#else +static inline int cpu_smt_state_init(void) { return 0; } +#endif + static int __init cpuhp_sysfs_init(void) { int cpu, ret; + ret = cpu_smt_state_init(); + if (ret) + return ret; + ret = sysfs_create_group(&cpu_subsys.dev_root->kobj, &cpuhp_cpu_root_attr_group); if (ret) From e0439285c628dea71517a1e77cab805d9134f898 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 6 Jun 2018 00:36:15 +0200 Subject: [PATCH 550/970] x86/cpu: Remove the pointless CPU printout commit 55e6d279abd92cfd7576bba031e7589be8475edb upstream The value of this printout is dubious at best and there is no point in having it in two different places along with convoluted ways to reach it. Remove it completely. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/common.c | 20 +++++--------------- arch/x86/kernel/cpu/topology.c | 11 ----------- 2 files changed, 5 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index ef58507fd28e..258650801250 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -611,13 +611,12 @@ void detect_ht(struct cpuinfo_x86 *c) #ifdef CONFIG_SMP u32 eax, ebx, ecx, edx; int index_msb, core_bits; - static bool printed; if (!cpu_has(c, X86_FEATURE_HT)) return; if (cpu_has(c, X86_FEATURE_CMP_LEGACY)) - goto out; + return; if (cpu_has(c, X86_FEATURE_XTOPOLOGY)) return; @@ -626,14 +625,14 @@ void detect_ht(struct cpuinfo_x86 *c) smp_num_siblings = (ebx & 0xff0000) >> 16; + if (!smp_num_siblings) + smp_num_siblings = 1; + if (smp_num_siblings == 1) { pr_info_once("CPU0: Hyper-Threading is disabled\n"); - goto out; + return; } - if (smp_num_siblings <= 1) - goto out; - index_msb = get_count_order(smp_num_siblings); c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, index_msb); @@ -645,15 +644,6 @@ void detect_ht(struct cpuinfo_x86 *c) c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, index_msb) & ((1 << core_bits) - 1); - -out: - if (!printed && (c->x86_max_cores * smp_num_siblings) > 1) { - pr_info("CPU: Physical Processor ID: %d\n", - c->phys_proc_id); - pr_info("CPU: Processor Core ID: %d\n", - c->cpu_core_id); - printed = 1; - } #endif } diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c index cd531355e838..8445f779a016 100644 --- a/arch/x86/kernel/cpu/topology.c +++ b/arch/x86/kernel/cpu/topology.c @@ -32,7 +32,6 @@ void detect_extended_topology(struct cpuinfo_x86 *c) unsigned int eax, ebx, ecx, edx, sub_index; unsigned int ht_mask_width, core_plus_mask_width; unsigned int core_select_mask, core_level_siblings; - static bool printed; if (c->cpuid_level < 0xb) return; @@ -85,15 +84,5 @@ void detect_extended_topology(struct cpuinfo_x86 *c) c->apicid = apic->phys_pkg_id(c->initial_apicid, 0); c->x86_max_cores = (core_level_siblings / smp_num_siblings); - - if (!printed) { - pr_info("CPU: Physical Processor ID: %d\n", - c->phys_proc_id); - if (c->x86_max_cores > 1) - pr_info("CPU: Processor Core ID: %d\n", - c->cpu_core_id); - printed = 1; - } - return; #endif } From a6d2fa5dd70ad5caf47afce59ec468e176d026d0 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 6 Jun 2018 00:47:10 +0200 Subject: [PATCH 551/970] x86/cpu/AMD: Remove the pointless detect_ht() call commit 44ca36de56d1bf196dca2eb67cd753a46961ffe6 upstream Real 32bit AMD CPUs do not have SMT and the only value of the call was to reach the magic printout which got removed. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/amd.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 4c2be99fa0fb..4770fa59ea6b 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -805,10 +805,6 @@ static void init_amd(struct cpuinfo_x86 *c) srat_detect_node(c); } -#ifdef CONFIG_X86_32 - detect_ht(c); -#endif - init_amd_cacheinfo(c); if (c->x86 >= 0xf) From 691997bff5ff7e69b63c30ef36a29afc4a861c4e Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 6 Jun 2018 00:53:57 +0200 Subject: [PATCH 552/970] x86/cpu/common: Provide detect_ht_early() commit 545401f4448a807b963ff17b575e0a393e68b523 upstream To support force disabling of SMT it's required to know the number of thread siblings early. detect_ht() cannot be called before the APIC driver is selected, so split out the part which initializes smp_num_siblings. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/common.c | 26 +++++++++++++++----------- arch/x86/kernel/cpu/cpu.h | 1 + 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 258650801250..bfd00fabe17b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -606,32 +606,36 @@ static void cpu_detect_tlb(struct cpuinfo_x86 *c) tlb_lld_4m[ENTRIES], tlb_lld_1g[ENTRIES]); } -void detect_ht(struct cpuinfo_x86 *c) +int detect_ht_early(struct cpuinfo_x86 *c) { #ifdef CONFIG_SMP u32 eax, ebx, ecx, edx; - int index_msb, core_bits; if (!cpu_has(c, X86_FEATURE_HT)) - return; + return -1; if (cpu_has(c, X86_FEATURE_CMP_LEGACY)) - return; + return -1; if (cpu_has(c, X86_FEATURE_XTOPOLOGY)) - return; + return -1; cpuid(1, &eax, &ebx, &ecx, &edx); smp_num_siblings = (ebx & 0xff0000) >> 16; - - if (!smp_num_siblings) - smp_num_siblings = 1; - - if (smp_num_siblings == 1) { + if (smp_num_siblings == 1) pr_info_once("CPU0: Hyper-Threading is disabled\n"); +#endif + return 0; +} + +void detect_ht(struct cpuinfo_x86 *c) +{ +#ifdef CONFIG_SMP + int index_msb, core_bits; + + if (detect_ht_early(c) < 0) return; - } index_msb = get_count_order(smp_num_siblings); c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, index_msb); diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 3b19d82f7932..18ec2b004dde 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], extern void get_cpu_cap(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); +extern int detect_ht_early(struct cpuinfo_x86 *c); extern void x86_spec_ctrl_setup_ap(void); From 3b4f20ad388755d8b049b6b7387cbb847d142af6 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 6 Jun 2018 00:55:39 +0200 Subject: [PATCH 553/970] x86/cpu/topology: Provide detect_extended_topology_early() commit 95f3d39ccf7aaea79d1ffdac1c887c2e100ec1b6 upstream To support force disabling of SMT it's required to know the number of thread siblings early. detect_extended_topology() cannot be called before the APIC driver is selected, so split out the part which initializes smp_num_siblings. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/cpu.h | 1 + arch/x86/kernel/cpu/topology.c | 30 ++++++++++++++++++++++++------ 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 18ec2b004dde..2275900d4d1b 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -46,6 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], extern void get_cpu_cap(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); +extern int detect_extended_topology_early(struct cpuinfo_x86 *c); extern int detect_ht_early(struct cpuinfo_x86 *c); extern void x86_spec_ctrl_setup_ap(void); diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c index 8445f779a016..6b5a850885ac 100644 --- a/arch/x86/kernel/cpu/topology.c +++ b/arch/x86/kernel/cpu/topology.c @@ -26,15 +26,13 @@ * exists, use it for populating initial_apicid and cpu topology * detection. */ -void detect_extended_topology(struct cpuinfo_x86 *c) +int detect_extended_topology_early(struct cpuinfo_x86 *c) { #ifdef CONFIG_SMP - unsigned int eax, ebx, ecx, edx, sub_index; - unsigned int ht_mask_width, core_plus_mask_width; - unsigned int core_select_mask, core_level_siblings; + unsigned int eax, ebx, ecx, edx; if (c->cpuid_level < 0xb) - return; + return -1; cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx); @@ -42,7 +40,7 @@ void detect_extended_topology(struct cpuinfo_x86 *c) * check if the cpuid leaf 0xb is actually implemented. */ if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE)) - return; + return -1; set_cpu_cap(c, X86_FEATURE_XTOPOLOGY); @@ -50,10 +48,30 @@ void detect_extended_topology(struct cpuinfo_x86 *c) * initial apic id, which also represents 32-bit extended x2apic id. */ c->initial_apicid = edx; + smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx); +#endif + return 0; +} + +/* + * Check for extended topology enumeration cpuid leaf 0xb and if it + * exists, use it for populating initial_apicid and cpu topology + * detection. + */ +void detect_extended_topology(struct cpuinfo_x86 *c) +{ +#ifdef CONFIG_SMP + unsigned int eax, ebx, ecx, edx, sub_index; + unsigned int ht_mask_width, core_plus_mask_width; + unsigned int core_select_mask, core_level_siblings; + + if (detect_extended_topology_early(c) < 0) + return; /* * Populate HT related information from sub-leaf level 0. */ + cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx); core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx); core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax); From 0ee6f3b23c04b41ea5cf415aa8a31ef56ab21da7 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 6 Jun 2018 01:00:55 +0200 Subject: [PATCH 554/970] x86/cpu/intel: Evaluate smp_num_siblings early commit 1910ad5624968f93be48e8e265513c54d66b897c upstream Make use of the new early detection function to initialize smp_num_siblings on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's required for force disabling SMT. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/intel.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 93781e3f05b2..9ad86c4bf360 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -283,6 +283,13 @@ static void early_init_intel(struct cpuinfo_x86 *c) } check_mpx_erratum(c); + + /* + * Get the number of SMT siblings early from the extended topology + * leaf, if available. Otherwise try the legacy SMT detection. + */ + if (detect_extended_topology_early(c) < 0) + detect_ht_early(c); } #ifdef CONFIG_X86_32 From 112d243045c2b18f56b37334ee0f7faa01edc205 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 15 Jun 2018 20:48:39 +0200 Subject: [PATCH 555/970] x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info commit 119bff8a9c9bb00116a844ec68be7bc4b1c768f5 upstream Old code used to check whether CPUID ext max level is >= 0x80000008 because that last leaf contains the number of cores of the physical CPU. The three functions called there now do not depend on that leaf anymore so the check can go. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/amd.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 4770fa59ea6b..e9b613aed492 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -799,11 +799,8 @@ static void init_amd(struct cpuinfo_x86 *c) cpu_detect_cache_sizes(c); - /* Multi core CPU? */ - if (c->extended_cpuid_level >= 0x80000008) { - amd_detect_cmp(c); - srat_detect_node(c); - } + amd_detect_cmp(c); + srat_detect_node(c); init_amd_cacheinfo(c); From ae76eb1198fb9f90217d6f36072b780a250dda98 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 6 Jun 2018 00:57:38 +0200 Subject: [PATCH 556/970] x86/cpu/AMD: Evaluate smp_num_siblings early commit 1e1d7e25fd759eddf96d8ab39d0a90a1979b2d8c upstream To support force disabling of SMT it's required to know the number of thread siblings early. amd_get_topology() cannot be called before the APIC driver is selected, so split out the part which initializes smp_num_siblings and invoke it from amd_early_init(). [dwmw2: Backport to 4.9] Signed-off-by: Thomas Gleixner Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/amd.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index e9b613aed492..fa2bdf539aae 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -296,13 +296,24 @@ static int nearby_node(int apicid) } #endif +#ifdef CONFIG_SMP + +static void amd_get_topology_early(struct cpuinfo_x86 *c) +{ + if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { + u32 eax, ebx, ecx, edx; + + cpuid(0x8000001e, &eax, &ebx, &ecx, &edx); + smp_num_siblings = ((ebx >> 8) & 0xff) + 1; + } +} + /* * Fixup core topology information for * (1) AMD multi-node processors * Assumption: Number of cores in each internal node is the same. * (2) AMD processors supporting compute units */ -#ifdef CONFIG_SMP static void amd_get_topology(struct cpuinfo_x86 *c) { u8 node_id; @@ -633,6 +644,8 @@ static void early_init_amd(struct cpuinfo_x86 *c) */ if (cpu_has_amd_erratum(c, amd_erratum_400)) set_cpu_bug(c, X86_BUG_AMD_E400); + + amd_get_topology_early(c); } static void init_amd_k8(struct cpuinfo_x86 *c) From 4a818f2c354249439dcd9409dafbd95212c5cdb0 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 5 Jun 2018 14:00:11 +0200 Subject: [PATCH 557/970] x86/apic: Ignore secondary threads if nosmt=force MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 2207def700f902f169fc237b717252c326f9e464 upstream nosmt on the kernel command line merely prevents the onlining of the secondary SMT siblings. nosmt=force makes the APIC detection code ignore the secondary SMT siblings completely, so they even do not show up as possible CPUs. That reduces the amount of memory allocations for per cpu variables and saves other resources from being allocated too large. This is not fully equivalent to disabling SMT in the BIOS because the low level SMT enabling in the BIOS can result in partitioning of resources between the siblings, which is not undone by just ignoring them. Some CPUs can use the full resources when their sibling is not onlined, but this is depending on the CPU family and model and it's not well documented whether this applies to all partitioned resources. That means depending on the workload disabling SMT in the BIOS might result in better performance. Linus analysis of the Intel manual: The intel optimization manual is not very clear on what the partitioning rules are. I find: "In general, the buffers for staging instructions between major pipe stages are partitioned. These buffers include µop queues after the execution trace cache, the queues after the register rename stage, the reorder buffer which stages instructions for retirement, and the load and store buffers. In the case of load and store buffers, partitioning also provided an easier implementation to maintain memory ordering for each logical processor and detect memory ordering violations" but some of that partitioning may be relaxed if the HT thread is "not active": "In Intel microarchitecture code name Sandy Bridge, the micro-op queue is statically partitioned to provide 28 entries for each logical processor, irrespective of software executing in single thread or multiple threads. If one logical processor is not active in Intel microarchitecture code name Ivy Bridge, then a single thread executing on that processor core can use the 56 entries in the micro-op queue" but I do not know what "not active" means, and how dynamic it is. Some of that partitioning may be entirely static and depend on the early BIOS disabling of HT, and even if we park the cores, the resources will just be wasted. Signed-off-by: Thomas Gleixner Reviewed-by: Konrad Rzeszutek Wilk Acked-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/apic.h | 2 ++ arch/x86/kernel/acpi/boot.c | 3 ++- arch/x86/kernel/apic/apic.c | 19 +++++++++++++++++++ 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 20eed46c5827..050003f2735e 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -636,8 +636,10 @@ extern int default_check_phys_apicid_present(int phys_apicid); #ifdef CONFIG_SMP bool apic_id_is_primary_thread(unsigned int id); +bool apic_id_disabled(unsigned int id); #else static inline bool apic_id_is_primary_thread(unsigned int id) { return false; } +static inline bool apic_id_disabled(unsigned int id) { return false; } #endif extern void irq_enter(void); diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 0a1e8a67cc99..6a4c06c2e4ce 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -177,7 +177,8 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled) } if (!enabled) { - ++disabled_cpus; + if (!apic_id_disabled(id)) + ++disabled_cpus; return -EINVAL; } diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 28a1531a3d74..2d5f67a24d00 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2056,6 +2056,16 @@ bool apic_id_is_primary_thread(unsigned int apicid) return !(apicid & mask); } +/** + * apic_id_disabled - Check whether APIC ID is disabled via SMT control + * @id: APIC ID to check + */ +bool apic_id_disabled(unsigned int id) +{ + return (cpu_smt_control == CPU_SMT_FORCE_DISABLED && + !apic_id_is_primary_thread(id)); +} + /* * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids * and cpuid_to_apicid[] synchronized. @@ -2151,6 +2161,15 @@ int generic_processor_info(int apicid, int version) return -EINVAL; } + /* + * If SMT is force disabled and the APIC ID belongs to + * a secondary thread, ignore it. + */ + if (apic_id_disabled(apicid)) { + pr_info_once("Ignoring secondary SMT threads\n"); + return -EINVAL; + } + if (apicid == boot_cpu_physical_apicid) { /* * x86_bios_cpu_apicid is required to have processors listed From c4b998c88f86971400b556520ba55c8ca96fd8dc Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 21 Jun 2018 12:36:29 +0200 Subject: [PATCH 558/970] x86/speculation/l1tf: Extend 64bit swap file size limit commit 1a7ed1ba4bba6c075d5ad61bb75e3fbc870840d6 upstream The previous patch has limited swap file size so that large offsets cannot clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation. It assumed that offsets are encoded starting with bit 12, same as pfn. But on x86_64, offsets are encoded starting with bit 9. Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA and 256TB with 46bit MAX_PA. Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d3d732912304..273f45dd0d2f 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -791,7 +791,15 @@ unsigned long max_swapfile_size(void) if (boot_cpu_has_bug(X86_BUG_L1TF)) { /* Limit the swap file size to MAX_PA/2 for L1TF workaround */ - pages = min_t(unsigned long, l1tf_pfn_limit() + 1, pages); + unsigned long l1tf_limit = l1tf_pfn_limit() + 1; + /* + * We encode swap offsets also with 3 bits below those for pfn + * which makes the usable limit higher. + */ +#ifdef CONFIG_X86_64 + l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT; +#endif + pages = min_t(unsigned long, l1tf_limit, pages); } return pages; } From a8358624a3ca9139b461c669231e5f474df8ccf1 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 20 Jun 2018 16:42:58 -0400 Subject: [PATCH 559/970] x86/cpufeatures: Add detection of L1D cache flush support. commit 11e34e64e4103955fc4568750914c75d65ea87ee upstream 336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR (IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set. This new MSR "gives software a way to invalidate structures with finer granularity than other architectual methods like WBINVD." A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index a78ddc9c694e..fbc1474960e3 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -317,6 +317,7 @@ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */ #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */ From 250f0aebe2763df9e7ca8c9445d75fe5bbb3f970 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 22 Jun 2018 11:34:11 +0200 Subject: [PATCH 560/970] x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings commit 7ce2f0393ea2396142b7faf6ee9b1f3676d08a5f upstream The TOPOEXT reenablement is a workaround for broken BIOSen which didn't enable the CPUID bit. amd_get_topology_early(), however, relies on that bit being set so that it can read out the CPUID leaf and set smp_num_siblings properly. Move the reenablement up to early_init_amd(). While at it, simplify amd_get_topology_early(). [dwmw2: Backport to 4.9] Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/amd.c | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index fa2bdf539aae..6de596449488 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -300,12 +300,8 @@ static int nearby_node(int apicid) static void amd_get_topology_early(struct cpuinfo_x86 *c) { - if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { - u32 eax, ebx, ecx, edx; - - cpuid(0x8000001e, &eax, &ebx, &ecx, &edx); - smp_num_siblings = ((ebx >> 8) & 0xff) + 1; - } + if (cpu_has(c, X86_FEATURE_TOPOEXT)) + smp_num_siblings = ((cpuid_ebx(0x8000001e) >> 8) & 0xff) + 1; } /* @@ -326,7 +322,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c) cpuid(0x8000001e, &eax, &ebx, &ecx, &edx); node_id = ecx & 0xff; - smp_num_siblings = ((ebx >> 8) & 0xff) + 1; if (c->x86 == 0x15) c->cu_id = ebx & 0xff; @@ -578,6 +573,8 @@ static void bsp_init_amd(struct cpuinfo_x86 *c) static void early_init_amd(struct cpuinfo_x86 *c) { + u64 value; + early_init_amd_mc(c); /* @@ -645,6 +642,21 @@ static void early_init_amd(struct cpuinfo_x86 *c) if (cpu_has_amd_erratum(c, amd_erratum_400)) set_cpu_bug(c, X86_BUG_AMD_E400); + + /* Re-enable TopologyExtensions if switched off by BIOS */ + if (c->x86 == 0x15 && + (c->x86_model >= 0x10 && c->x86_model <= 0x6f) && + !cpu_has(c, X86_FEATURE_TOPOEXT)) { + + if (msr_set_bit(0xc0011005, 54) > 0) { + rdmsrl(0xc0011005, value); + if (value & BIT_64(54)) { + set_cpu_cap(c, X86_FEATURE_TOPOEXT); + pr_info_once(FW_INFO "CPU: Re-enabling disabled Topology Extensions Support.\n"); + } + } + } + amd_get_topology_early(c); } @@ -737,19 +749,6 @@ static void init_amd_bd(struct cpuinfo_x86 *c) { u64 value; - /* re-enable TopologyExtensions if switched off by BIOS */ - if ((c->x86_model >= 0x10) && (c->x86_model <= 0x6f) && - !cpu_has(c, X86_FEATURE_TOPOEXT)) { - - if (msr_set_bit(0xc0011005, 54) > 0) { - rdmsrl(0xc0011005, value); - if (value & BIT_64(54)) { - set_cpu_cap(c, X86_FEATURE_TOPOEXT); - pr_info_once(FW_INFO "CPU: Re-enabling disabled Topology Extensions Support.\n"); - } - } - } - /* * The way access filter has a performance penalty on some workloads. * Disable it on the affected CPUs. From 53527af79dc940a225efa266f6320ae9e8dae5e3 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 22 Jun 2018 17:39:33 +0200 Subject: [PATCH 561/970] x86/speculation/l1tf: Protect PAE swap entries against L1TF commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for offset. The lower word is zeroed, thus systems with less than 4GB memory are safe. With 4GB to 128GB the swap type selects the memory locations vulnerable to L1TF; with even more memory, also the swap offfset influences the address. This might be a problem with 32bit PAE guests running on large 64bit hosts. By continuing to keep the whole swap entry in either high or low 32bit word of PTE we would limit the swap size too much. Thus this patch uses the whole PAE PTE with the same layout as the 64bit version does. The macros just become a bit tricky since they assume the arch-dependent swp_entry_t to be 32bit. Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Acked-by: Michal Hocko Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable-3level.h | 35 +++++++++++++++++++++++++-- arch/x86/mm/init.c | 2 +- 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index 0c89891c7b44..5c686382d84b 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -177,12 +177,43 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp) #endif /* Encode and de-code a swap entry */ +#define SWP_TYPE_BITS 5 + +#define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1) + +/* We always extract/encode the offset by shifting it all the way up, and then down again */ +#define SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT + SWP_TYPE_BITS) + #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 5) #define __swp_type(x) (((x).val) & 0x1f) #define __swp_offset(x) ((x).val >> 5) #define __swp_entry(type, offset) ((swp_entry_t){(type) | (offset) << 5}) -#define __pte_to_swp_entry(pte) ((swp_entry_t){ (pte).pte_high }) -#define __swp_entry_to_pte(x) ((pte_t){ { .pte_high = (x).val } }) + +/* + * Normally, __swp_entry() converts from arch-independent swp_entry_t to + * arch-dependent swp_entry_t, and __swp_entry_to_pte() just stores the result + * to pte. But here we have 32bit swp_entry_t and 64bit pte, and need to use the + * whole 64 bits. Thus, we shift the "real" arch-dependent conversion to + * __swp_entry_to_pte() through the following helper macro based on 64bit + * __swp_entry(). + */ +#define __swp_pteval_entry(type, offset) ((pteval_t) { \ + (~(pteval_t)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \ + | ((pteval_t)(type) << (64 - SWP_TYPE_BITS)) }) + +#define __swp_entry_to_pte(x) ((pte_t){ .pte = \ + __swp_pteval_entry(__swp_type(x), __swp_offset(x)) }) +/* + * Analogically, __pte_to_swp_entry() doesn't just extract the arch-dependent + * swp_entry_t, but also has to convert it from 64bit to the 32bit + * intermediate representation, using the following macros based on 64bit + * __swp_type() and __swp_offset(). + */ +#define __pteval_swp_type(x) ((unsigned long)((x).pte >> (64 - SWP_TYPE_BITS))) +#define __pteval_swp_offset(x) ((unsigned long)(~((x).pte) << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)) + +#define __pte_to_swp_entry(pte) (__swp_entry(__pteval_swp_type(pte), \ + __pteval_swp_offset(pte))) #include diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 273f45dd0d2f..4393af3c2bd5 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -796,7 +796,7 @@ unsigned long max_swapfile_size(void) * We encode swap offsets also with 3 bits below those for pfn * which makes the usable limit higher. */ -#ifdef CONFIG_X86_64 +#if CONFIG_PGTABLE_LEVELS > 2 l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT; #endif pages = min_t(unsigned long, l1tf_limit, pages); From 3f0eb66f652ceb5985b9b619e33fc61519121045 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 27 Jun 2018 17:46:50 +0200 Subject: [PATCH 562/970] x86/speculation/l1tf: Fix up pte->pfn conversion for PAE commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for CONFIG_PAE because phys_addr_t is wider than unsigned long and so the pte_val reps. shift left would get truncated. Fix this up by using proper types. [dwmw2: Backport to 4.9] Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation") Reported-by: Jan Beulich Signed-off-by: Michal Hocko Signed-off-by: Thomas Gleixner Acked-by: Vlastimil Babka Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 38b679df119d..4e063582f1f0 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -171,21 +171,21 @@ static inline u64 protnone_mask(u64 val); static inline unsigned long pte_pfn(pte_t pte) { - unsigned long pfn = pte_val(pte); + phys_addr_t pfn = pte_val(pte); pfn ^= protnone_mask(pfn); return (pfn & PTE_PFN_MASK) >> PAGE_SHIFT; } static inline unsigned long pmd_pfn(pmd_t pmd) { - unsigned long pfn = pmd_val(pmd); + phys_addr_t pfn = pmd_val(pmd); pfn ^= protnone_mask(pfn); return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT; } static inline unsigned long pud_pfn(pud_t pud) { - unsigned long pfn = pud_val(pud); + phys_addr_t pfn = pud_val(pud); pfn ^= protnone_mask(pfn); return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT; } @@ -404,7 +404,7 @@ static inline pgprotval_t massage_pgprot(pgprot_t pgprot) static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot) { - phys_addr_t pfn = page_nr << PAGE_SHIFT; + phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT; pfn ^= protnone_mask(pgprot_val(pgprot)); pfn &= PTE_PFN_MASK; return __pte(pfn | massage_pgprot(pgprot)); @@ -412,7 +412,7 @@ static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot) static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot) { - phys_addr_t pfn = page_nr << PAGE_SHIFT; + phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT; pfn ^= protnone_mask(pgprot_val(pgprot)); pfn &= PHYSICAL_PMD_PAGE_MASK; return __pmd(pfn | massage_pgprot(pgprot)); From fe2a955476f9d9a00d09840b5642d963893abebb Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 29 Jun 2018 16:05:47 +0200 Subject: [PATCH 563/970] Revert "x86/apic: Ignore secondary threads if nosmt=force" commit 506a66f374891ff08e064a058c446b336c5ac760 upstream Dave Hansen reported, that it's outright dangerous to keep SMT siblings disabled completely so they are stuck in the BIOS and wait for SIPI. The reason is that Machine Check Exceptions are broadcasted to siblings and the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or reboots the machine. The MCE chapter in the SDM contains the following blurb: Because the logical processors within a physical package are tightly coupled with respect to shared hardware resources, both logical processors are notified of machine check errors that occur within a given physical processor. If machine-check exceptions are enabled when a fatal error is reported, all the logical processors within a physical package are dispatched to the machine-check exception handler. If machine-check exceptions are disabled, the logical processors enter the shutdown state and assert the IERR# signal. When enabling machine-check exceptions, the MCE flag in control register CR4 should be set for each logical processor. Reverting the commit which ignores siblings at enumeration time solves only half of the problem. The core cpuhotplug logic needs to be adjusted as well. This thoughtful engineered mechanism also turns the boot process on all Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU before the secondary CPUs are brought up. Depending on the number of physical cores the window in which this situation can happen is smaller or larger. On a HSW-EX it's about 750ms: MCE is enabled on the boot CPU: [ 0.244017] mce: CPU supports 22 MCE banks The corresponding sibling #72 boots: [ 1.008005] .... node #0, CPUs: #72 That means if an MCE hits on physical core 0 (logical CPUs 0 and 72) between these two points the machine is going to shutdown. At least it's a known safe state. It's obvious that the early boot can be hit by an MCE as well and then runs into the same situation because MCEs are not yet enabled on the boot CPU. But after enabling them on the boot CPU, it does not make any sense to prevent the kernel from recovering. Adjust the nosmt kernel parameter documentation as well. Reverts: 2207def700f9 ("x86/apic: Ignore secondary threads if nosmt=force") Reported-by: Dave Hansen Signed-off-by: Thomas Gleixner Tested-by: Tony Luck Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt | 8 ++------ arch/x86/include/asm/apic.h | 2 -- arch/x86/kernel/acpi/boot.c | 3 +-- arch/x86/kernel/apic/apic.c | 19 ------------------- 4 files changed, 3 insertions(+), 29 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 8fe810bc236c..e34e75cbd557 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2695,12 +2695,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Equivalent to smt=1. [KNL,x86] Disable symmetric multithreading (SMT). - nosmt=force: Force disable SMT, similar to disabling - it in the BIOS except that some of the - resource partitioning effects which are - caused by having SMT enabled in the BIOS - cannot be undone. Depending on the CPU - type this might have a performance impact. + nosmt=force: Force disable SMT, cannot be undone + via the sysfs control file. nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 050003f2735e..20eed46c5827 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -636,10 +636,8 @@ extern int default_check_phys_apicid_present(int phys_apicid); #ifdef CONFIG_SMP bool apic_id_is_primary_thread(unsigned int id); -bool apic_id_disabled(unsigned int id); #else static inline bool apic_id_is_primary_thread(unsigned int id) { return false; } -static inline bool apic_id_disabled(unsigned int id) { return false; } #endif extern void irq_enter(void); diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 6a4c06c2e4ce..0a1e8a67cc99 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -177,8 +177,7 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled) } if (!enabled) { - if (!apic_id_disabled(id)) - ++disabled_cpus; + ++disabled_cpus; return -EINVAL; } diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 2d5f67a24d00..28a1531a3d74 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2056,16 +2056,6 @@ bool apic_id_is_primary_thread(unsigned int apicid) return !(apicid & mask); } -/** - * apic_id_disabled - Check whether APIC ID is disabled via SMT control - * @id: APIC ID to check - */ -bool apic_id_disabled(unsigned int id) -{ - return (cpu_smt_control == CPU_SMT_FORCE_DISABLED && - !apic_id_is_primary_thread(id)); -} - /* * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids * and cpuid_to_apicid[] synchronized. @@ -2161,15 +2151,6 @@ int generic_processor_info(int apicid, int version) return -EINVAL; } - /* - * If SMT is force disabled and the APIC ID belongs to - * a secondary thread, ignore it. - */ - if (apic_id_disabled(apicid)) { - pr_info_once("Ignoring secondary SMT threads\n"); - return -EINVAL; - } - if (apicid == boot_cpu_physical_apicid) { /* * x86_bios_cpu_apicid is required to have processors listed From 8438e49bcac479213ada6a29595adfd2e3d99460 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 29 Jun 2018 16:05:48 +0200 Subject: [PATCH 564/970] cpu/hotplug: Boot HT siblings at least once commit 0cc3cd21657be04cb0559fe8063f2130493f92cf upstream Due to the way Machine Check Exceptions work on X86 hyperthreads it's required to boot up _all_ logical cores at least once in order to set the CR4.MCE bit. So instead of ignoring the sibling threads right away, let them boot up once so they can configure themselves. After they came out of the initial boot stage check whether its a "secondary" sibling and cancel the operation which puts the CPU back into offline state. [dwmw2: Backport to 4.9] Reported-by: Dave Hansen Signed-off-by: Thomas Gleixner Tested-by: Tony Luck Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- kernel/cpu.c | 72 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 48 insertions(+), 24 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 0fbf3e59cf14..49acbd2fa81a 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -54,6 +54,7 @@ struct cpuhp_cpu_state { bool rollback; bool single; bool bringup; + bool booted_once; struct hlist_node *node; enum cpuhp_state cb_state; int result; @@ -355,6 +356,40 @@ void cpu_hotplug_enable(void) EXPORT_SYMBOL_GPL(cpu_hotplug_enable); #endif /* CONFIG_HOTPLUG_CPU */ +#ifdef CONFIG_HOTPLUG_SMT +enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED; + +static int __init smt_cmdline_disable(char *str) +{ + cpu_smt_control = CPU_SMT_DISABLED; + if (str && !strcmp(str, "force")) { + pr_info("SMT: Force disabled\n"); + cpu_smt_control = CPU_SMT_FORCE_DISABLED; + } + return 0; +} +early_param("nosmt", smt_cmdline_disable); + +static inline bool cpu_smt_allowed(unsigned int cpu) +{ + if (cpu_smt_control == CPU_SMT_ENABLED) + return true; + + if (topology_is_primary_thread(cpu)) + return true; + + /* + * On x86 it's required to boot all logical CPUs at least once so + * that the init code can get a chance to set CR4.MCE on each + * CPU. Otherwise, a broadacasted MCE observing CR4.MCE=0b on any + * core will shutdown the machine. + */ + return !per_cpu(cpuhp_state, cpu).booted_once; +} +#else +static inline bool cpu_smt_allowed(unsigned int cpu) { return true; } +#endif + /* Need to know about CPUs going up/down? */ int register_cpu_notifier(struct notifier_block *nb) { @@ -431,6 +466,16 @@ static int bringup_wait_for_ap(unsigned int cpu) stop_machine_unpark(cpu); kthread_unpark(st->thread); + /* + * SMT soft disabling on X86 requires to bring the CPU out of the + * BIOS 'wait for SIPI' state in order to set the CR4.MCE bit. The + * CPU marked itself as booted_once in cpu_notify_starting() so the + * cpu_smt_allowed() check will now return false if this is not the + * primary sibling. + */ + if (!cpu_smt_allowed(cpu)) + return -ECANCELED; + /* Should we go further up ? */ if (st->target > CPUHP_AP_ONLINE_IDLE) { __cpuhp_kick_ap_work(st); @@ -978,29 +1023,6 @@ int cpu_down(unsigned int cpu) EXPORT_SYMBOL(cpu_down); #endif /*CONFIG_HOTPLUG_CPU*/ -#ifdef CONFIG_HOTPLUG_SMT -enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED; - -static int __init smt_cmdline_disable(char *str) -{ - cpu_smt_control = CPU_SMT_DISABLED; - if (str && !strcmp(str, "force")) { - pr_info("SMT: Force disabled\n"); - cpu_smt_control = CPU_SMT_FORCE_DISABLED; - } - return 0; -} -early_param("nosmt", smt_cmdline_disable); - -static inline bool cpu_smt_allowed(unsigned int cpu) -{ - return cpu_smt_control == CPU_SMT_ENABLED || - topology_is_primary_thread(cpu); -} -#else -static inline bool cpu_smt_allowed(unsigned int cpu) { return true; } -#endif - /** * notify_cpu_starting(cpu) - Invoke the callbacks on the starting CPU * @cpu: cpu that just started @@ -1014,6 +1036,7 @@ void notify_cpu_starting(unsigned int cpu) enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE); rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */ + st->booted_once = true; while (st->state < target) { st->state++; cpuhp_invoke_callback(cpu, st->state, true, NULL); @@ -2114,5 +2137,6 @@ void __init boot_cpu_init(void) */ void __init boot_cpu_hotplug_init(void) { - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE; + this_cpu_write(cpuhp_state.booted_once, true); + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); } From a0695af3406ae2a08184bd47a9e948fe6f9858b9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 20 Jun 2018 11:29:53 -0400 Subject: [PATCH 565/970] x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present commit 26acfb666a473d960f0fd971fe68f3e3ad16c70b upstream If the L1TF CPU bug is present we allow the KVM module to be loaded as the major of users that use Linux and KVM have trusted guests and do not want a broken setup. Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as such they are the ones that should set nosmt to one. Setting 'nosmt' means that the system administrator also needs to disable SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line parameter, or via the /sys/devices/system/cpu/smt/control. See commit 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT"). Other mitigations are to use task affinity, cpu sets, interrupt binding, etc - anything to make sure that _only_ the same guests vCPUs are running on sibling threads. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt | 6 ++++++ arch/x86/kvm/vmx.c | 19 +++++++++++++++++++ kernel/cpu.c | 1 + 3 files changed, 26 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e34e75cbd557..be17577f57d3 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1989,6 +1989,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. for all guests. Default is 1 (enabled) if in 64-bit or 32-bit PAE mode. + kvm-intel.nosmt=[KVM,Intel] If the L1TF CPU bug is present (CVE-2018-3620) + and the system has SMT (aka Hyper-Threading) enabled then + don't allow guests to be created. + + Default is 0 (allow guests to be created). + kvm-intel.ept= [KVM,Intel] Disable extended page tables (virtualized MMU) support on capable Intel chips. Default is 1 (enabled) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 30b74b491909..f43b6484ca2e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -69,6 +69,9 @@ static const struct x86_cpu_id vmx_cpu_id[] = { }; MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id); +static bool __read_mostly nosmt; +module_param(nosmt, bool, S_IRUGO); + static bool __read_mostly enable_vpid = 1; module_param_named(vpid, enable_vpid, bool, 0444); @@ -9298,6 +9301,20 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return ERR_PTR(err); } +#define L1TF_MSG "SMT enabled with L1TF CPU bug present. Refer to CVE-2018-3620 for details.\n" + +static int vmx_vm_init(struct kvm *kvm) +{ + if (boot_cpu_has(X86_BUG_L1TF) && cpu_smt_control == CPU_SMT_ENABLED) { + if (nosmt) { + pr_err(L1TF_MSG); + return -EOPNOTSUPP; + } + pr_warn(L1TF_MSG); + } + return 0; +} + static void __init vmx_check_processor_compat(void *rtn) { struct vmcs_config vmcs_conf; @@ -11367,6 +11384,8 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .cpu_has_accelerated_tpr = report_flexpriority, .has_emulated_msr = vmx_has_emulated_msr, + .vm_init = vmx_vm_init, + .vcpu_create = vmx_create_vcpu, .vcpu_free = vmx_free_vcpu, .vcpu_reset = vmx_vcpu_reset, diff --git a/kernel/cpu.c b/kernel/cpu.c index 49acbd2fa81a..8be14464b43c 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -358,6 +358,7 @@ EXPORT_SYMBOL_GPL(cpu_hotplug_enable); #ifdef CONFIG_HOTPLUG_SMT enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED; +EXPORT_SYMBOL_GPL(cpu_smt_control); static int __init smt_cmdline_disable(char *str) { From af6ce92977a25540e5d6e0cf90ca187178b0ff9f Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 2 Jul 2018 12:29:30 +0200 Subject: [PATCH 566/970] x86/KVM/VMX: Add module argument for L1TF mitigation commit a399477e52c17e148746d3ce9a483f681c2aa9a0 upstream Add a mitigation mode parameter "vmentry_l1d_flush" for CVE-2018-3620, aka L1 terminal fault. The valid arguments are: - "always" L1D cache flush on every VMENTER. - "cond" Conditional L1D cache flush, explained below - "never" Disable the L1D cache flush mitigation "cond" is trying to avoid L1D cache flushes on VMENTER if the code executed between VMEXIT and VMENTER is considered safe, i.e. is not bringing any interesting information into L1D which might exploited. [ tglx: Split out from a larger patch ] Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt | 12 ++++++ arch/x86/kvm/vmx.c | 65 ++++++++++++++++++++++++++++- 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index be17577f57d3..d76cb9c8fbb0 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2016,6 +2016,18 @@ bytes respectively. Such letter suffixes can also be entirely omitted. (virtualized real and unpaged mode) on capable Intel chips. Default is 1 (enabled) + kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault + CVE-2018-3620. + + Valid arguments: never, cond, always + + always: L1D cache flush on every VMENTER. + cond: Flush L1D on VMENTER only when the code between + VMEXIT and VMENTER can leak host memory. + never: Disables the mitigation + + Default is cond (do L1 cache flush in specific instances) + kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification feature (tagged TLBs) on capable Intel chips. Default is 1 (enabled) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f43b6484ca2e..69122cf8bd1d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -192,6 +192,54 @@ module_param(ple_window_max, int, S_IRUGO); extern const ulong vmx_return; +static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); + +/* These MUST be in sync with vmentry_l1d_param order. */ +enum vmx_l1d_flush_state { + VMENTER_L1D_FLUSH_NEVER, + VMENTER_L1D_FLUSH_COND, + VMENTER_L1D_FLUSH_ALWAYS, +}; + +static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush = VMENTER_L1D_FLUSH_COND; + +static const struct { + const char *option; + enum vmx_l1d_flush_state cmd; +} vmentry_l1d_param[] = { + {"never", VMENTER_L1D_FLUSH_NEVER}, + {"cond", VMENTER_L1D_FLUSH_COND}, + {"always", VMENTER_L1D_FLUSH_ALWAYS}, +}; + +static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) +{ + unsigned int i; + + if (!s) + return -EINVAL; + + for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) { + if (!strcmp(s, vmentry_l1d_param[i].option)) { + vmentry_l1d_flush = vmentry_l1d_param[i].cmd; + return 0; + } + } + + return -EINVAL; +} + +static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp) +{ + return sprintf(s, "%s\n", vmentry_l1d_param[vmentry_l1d_flush].option); +} + +static const struct kernel_param_ops vmentry_l1d_flush_ops = { + .set = vmentry_l1d_flush_set, + .get = vmentry_l1d_flush_get, +}; +module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, &vmentry_l1d_flush, S_IRUGO); + #define NR_AUTOLOAD_MSRS 8 struct vmcs { @@ -11505,10 +11553,23 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .setup_mce = vmx_setup_mce, }; +static void __init vmx_setup_l1d_flush(void) +{ + if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER || + !boot_cpu_has_bug(X86_BUG_L1TF)) + return; + + static_branch_enable(&vmx_l1d_should_flush); +} + static int __init vmx_init(void) { - int r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), - __alignof__(struct vcpu_vmx), THIS_MODULE); + int r; + + vmx_setup_l1d_flush(); + + r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), + __alignof__(struct vcpu_vmx), THIS_MODULE); if (r) return r; From b3d648aefab5265a566d6616de0e3a6b0aa2334b Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 2 Jul 2018 12:47:38 +0200 Subject: [PATCH 567/970] x86/KVM/VMX: Add L1D flush algorithm commit a47dd5f06714c844b33f3b5f517b6f3e81ce57b5 upstream To mitigate the L1 Terminal Fault vulnerability it's required to flush L1D on VMENTER to prevent rogue guests from snooping host memory. CPUs will have a new control MSR via a microcode update to flush L1D with a single MSR write, but in the absence of microcode a fallback to a software based flush algorithm is required. Add a software flush loop which is based on code from Intel. [ tglx: Split out from combo patch ] [ bpetkov: Polish the asm code ] Signed-off-by: Paolo Bonzini Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 70 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 66 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 69122cf8bd1d..7886cc2f0c74 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8536,6 +8536,46 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) } } +/* + * Software based L1D cache flush which is used when microcode providing + * the cache control MSR is not loaded. + * + * The L1D cache is 32 KiB on Nehalem and later microarchitectures, but to + * flush it is required to read in 64 KiB because the replacement algorithm + * is not exactly LRU. This could be sized at runtime via topology + * information but as all relevant affected CPUs have 32KiB L1D cache size + * there is no point in doing so. + */ +#define L1D_CACHE_ORDER 4 +static void *vmx_l1d_flush_pages; + +static void __maybe_unused vmx_l1d_flush(void) +{ + int size = PAGE_SIZE << L1D_CACHE_ORDER; + + asm volatile( + /* First ensure the pages are in the TLB */ + "xorl %%eax, %%eax\n" + ".Lpopulate_tlb:\n\t" + "movzbl (%[empty_zp], %%" _ASM_AX "), %%ecx\n\t" + "addl $4096, %%eax\n\t" + "cmpl %%eax, %[size]\n\t" + "jne .Lpopulate_tlb\n\t" + "xorl %%eax, %%eax\n\t" + "cpuid\n\t" + /* Now fill the cache */ + "xorl %%eax, %%eax\n" + ".Lfill_cache:\n" + "movzbl (%[empty_zp], %%" _ASM_AX "), %%ecx\n\t" + "addl $64, %%eax\n\t" + "cmpl %%eax, %[size]\n\t" + "jne .Lfill_cache\n\t" + "lfence\n" + :: [empty_zp] "r" (vmx_l1d_flush_pages), + [size] "r" (size) + : "eax", "ebx", "ecx", "edx"); +} + static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); @@ -11553,25 +11593,45 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .setup_mce = vmx_setup_mce, }; -static void __init vmx_setup_l1d_flush(void) +static int __init vmx_setup_l1d_flush(void) { + struct page *page; + if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER || !boot_cpu_has_bug(X86_BUG_L1TF)) - return; + return 0; + page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); + if (!page) + return -ENOMEM; + + vmx_l1d_flush_pages = page_address(page); static_branch_enable(&vmx_l1d_should_flush); + return 0; +} + +static void vmx_free_l1d_flush_pages(void) +{ + if (vmx_l1d_flush_pages) { + free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER); + vmx_l1d_flush_pages = NULL; + } } static int __init vmx_init(void) { int r; - vmx_setup_l1d_flush(); + r = vmx_setup_l1d_flush(); + if (r) + return r; r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), THIS_MODULE); - if (r) + if (r) { + vmx_free_l1d_flush_pages(); return r; + } #ifdef CONFIG_KEXEC_CORE rcu_assign_pointer(crash_vmclear_loaded_vmcss, @@ -11589,6 +11649,8 @@ static void __exit vmx_exit(void) #endif kvm_exit(); + + vmx_free_l1d_flush_pages(); } module_init(vmx_init) From acca8a70a5f6179007e1148a62b8bef12b212d9b Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 2 Jul 2018 13:03:48 +0200 Subject: [PATCH 568/970] x86/KVM/VMX: Add L1D MSR based flush commit 3fa045be4c720146b18a19cea7a767dc6ad5df94 upstream 336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR (IA32_FLUSH_CMD aka 0x10B) which has similar write-only semantics to other MSRs defined in the document. The semantics of this MSR is to allow "finer granularity invalidation of caching structures than existing mechanisms like WBINVD. It will writeback and invalidate the L1 data cache, including all cachelines brought in by preceding instructions, without invalidating all caches (eg. L2 or LLC). Some processors may also invalidate the first level level instruction cache on a L1D_FLUSH command. The L1 data and instruction caches may be shared across the logical processors of a core." Use it instead of the loop based L1 flush algorithm. A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199511 [ tglx: Avoid allocating pages when the MSR is available ] Signed-off-by: Paolo Bonzini Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/msr-index.h | 6 ++++++ arch/x86/kvm/vmx.c | 15 +++++++++++---- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 1ec13e253174..d49d3b5d359a 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -69,6 +69,12 @@ * control required. */ +#define MSR_IA32_FLUSH_CMD 0x0000010b +#define L1D_FLUSH (1 << 0) /* + * Writeback and invalidate the + * L1 data cache. + */ + #define MSR_IA32_BBL_CR_CTL 0x00000119 #define MSR_IA32_BBL_CR_CTL3 0x0000011e diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7886cc2f0c74..28b823362580 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8553,6 +8553,11 @@ static void __maybe_unused vmx_l1d_flush(void) { int size = PAGE_SIZE << L1D_CACHE_ORDER; + if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { + wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH); + return; + } + asm volatile( /* First ensure the pages are in the TLB */ "xorl %%eax, %%eax\n" @@ -11601,11 +11606,13 @@ static int __init vmx_setup_l1d_flush(void) !boot_cpu_has_bug(X86_BUG_L1TF)) return 0; - page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); - if (!page) - return -ENOMEM; + if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { + page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); + if (!page) + return -ENOMEM; + vmx_l1d_flush_pages = page_address(page); + } - vmx_l1d_flush_pages = page_address(page); static_branch_enable(&vmx_l1d_should_flush); return 0; } From b3dc63c4f43e57d73d769ad0d3f34eae74cb68a8 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 2 Jul 2018 13:07:14 +0200 Subject: [PATCH 569/970] x86/KVM/VMX: Add L1D flush logic commit c595ceee45707f00f64f61c54fb64ef0cc0b4e85 upstream Add the logic for flushing L1D on VMENTER. The flush depends on the static key being enabled and the new l1tf_flush_l1d flag being set. The flags is set: - Always, if the flush module parameter is 'always' - Conditionally at: - Entry to vcpu_run(), i.e. after executing user space - From the sched_in notifier, i.e. when switching to a vCPU thread. - From vmexit handlers which are considered unsafe, i.e. where sensitive data can be brought into L1D: - The emulator, which could be a good target for other speculative execution-based threats, - The MMU, which can bring host page tables in the L1 cache. - External interrupts - Nested operations that require the MMU (see above). That is vmptrld, vmptrst, vmclear,vmwrite,vmread. - When handling invept,invvpid [ tglx: Split out from combo patch and reduced to a single flag ] [ dwmw2: Backported to 4.9, set l1tf_flush_l1d in svm/vmx code ] Signed-off-by: Paolo Bonzini Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/kvm_host.h | 4 ++++ arch/x86/kvm/svm.c | 2 ++ arch/x86/kvm/vmx.c | 23 ++++++++++++++++++++++- arch/x86/kvm/x86.c | 8 ++++++++ 4 files changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7598a6c26f76..05678da200e1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -659,6 +659,9 @@ struct kvm_vcpu_arch { int pending_ioapic_eoi; int pending_external_vector; + + /* Flush the L1 Data cache for L1TF mitigation on VMENTER */ + bool l1tf_flush_l1d; }; struct kvm_lpage_info { @@ -819,6 +822,7 @@ struct kvm_vcpu_stat { u64 signal_exits; u64 irq_window_exits; u64 nmi_window_exits; + u64 l1d_flush; u64 halt_exits; u64 halt_successful_poll; u64 halt_attempted_poll; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c4cd1280ac3e..c8ae426ee37d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2124,6 +2124,8 @@ static int pf_interception(struct vcpu_svm *svm) u32 error_code; int r = 1; + svm->vcpu.arch.l1tf_flush_l1d = true; + switch (svm->apf_reason) { default: error_code = svm->vmcb->control.exit_info_1; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 28b823362580..1918a124f576 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5773,6 +5773,7 @@ static int handle_exception(struct kvm_vcpu *vcpu) BUG_ON(enable_ept); cr2 = vmcs_readl(EXIT_QUALIFICATION); trace_kvm_page_fault(cr2, error_code); + vcpu->arch.l1tf_flush_l1d = true; if (kvm_event_needs_reinjection(vcpu)) kvm_mmu_unprotect_page_virt(vcpu, cr2); @@ -8549,9 +8550,20 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) #define L1D_CACHE_ORDER 4 static void *vmx_l1d_flush_pages; -static void __maybe_unused vmx_l1d_flush(void) +static void vmx_l1d_flush(struct kvm_vcpu *vcpu) { int size = PAGE_SIZE << L1D_CACHE_ORDER; + bool always; + + /* + * If the mitigation mode is 'flush always', keep the flush bit + * set, otherwise clear it. It gets set again either from + * vcpu_run() or from one of the unsafe VMEXIT handlers. + */ + always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; + vcpu->arch.l1tf_flush_l1d = always; + + vcpu->stat.l1d_flush++; if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH); @@ -8792,6 +8804,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) [ss]"i"(__KERNEL_DS), [cs]"i"(__KERNEL_CS) ); + vcpu->arch.l1tf_flush_l1d = true; } } STACK_FRAME_NON_STANDARD(vmx_handle_external_intr); @@ -9037,6 +9050,11 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx->__launched = vmx->loaded_vmcs->launched; + if (static_branch_unlikely(&vmx_l1d_should_flush)) { + if (vcpu->arch.l1tf_flush_l1d) + vmx_l1d_flush(vcpu); + } + asm( /* Store host registers */ "push %%" _ASM_DX "; push %%" _ASM_BP ";" @@ -10552,6 +10570,9 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) vmcs12->launch_state = 1; + /* Hide L1D cache contents from the nested guest. */ + vmx->vcpu.arch.l1tf_flush_l1d = true; + if (vmcs12->guest_activity_state == GUEST_ACTIVITY_HLT) return kvm_vcpu_halt(vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5ca23af44c81..cc314e787e85 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -180,6 +180,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) }, { "irq_injections", VCPU_STAT(irq_injections) }, { "nmi_injections", VCPU_STAT(nmi_injections) }, + { "l1d_flush", VCPU_STAT(l1d_flush) }, { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) }, { "mmu_pte_write", VM_STAT(mmu_pte_write) }, { "mmu_pte_updated", VM_STAT(mmu_pte_updated) }, @@ -4476,6 +4477,9 @@ static int emulator_write_std(struct x86_emulate_ctxt *ctxt, gva_t addr, void *v int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu, gva_t addr, void *val, unsigned int bytes, struct x86_exception *exception) { + /* kvm_write_guest_virt_system can pull in tons of pages. */ + vcpu->arch.l1tf_flush_l1d = true; + return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, PFERR_WRITE_MASK, exception); } @@ -5574,6 +5578,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, bool writeback = true; bool write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable; + vcpu->arch.l1tf_flush_l1d = true; + /* * Clear write_fault_to_shadow_pgtable here to ensure it is * never reused. @@ -6929,6 +6935,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu) struct kvm *kvm = vcpu->kvm; vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); + vcpu->arch.l1tf_flush_l1d = true; for (;;) { if (kvm_vcpu_running(vcpu)) { @@ -7899,6 +7906,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) { + vcpu->arch.l1tf_flush_l1d = true; kvm_x86_ops->sched_in(vcpu, cpu); } From 69c2525237979d595bf0db29b84cc79b222e25c7 Mon Sep 17 00:00:00 2001 From: Jim Mattson Date: Tue, 4 Oct 2016 10:48:38 -0700 Subject: [PATCH 570/970] kvm: nVMX: Update MSR load counts on a VMCS switch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 83bafef1a131d1b8743d63658a180948bc880a74 upstream When L0 establishes (or removes) an MSR entry in the VM-entry or VM-exit MSR load lists, the change should affect the dormant VMCS as well as the current VMCS. Moreover, the vmcs02 MSR-load addresses should be initialized. [ dwmw2: Pulled in to 4.9 backports for L1TF ] Signed-off-by: Jim Mattson Signed-off-by: Radim Krčmář Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1918a124f576..1b24d15f953d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10220,6 +10220,15 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) */ vmx_set_constant_host_state(vmx); + /* + * Set the MSR load/store lists to match L0's settings. + */ + vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0); + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); + vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host)); + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); + vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest)); + /* * HOST_RSP is normally set correctly in vmx_vcpu_run() just before * entry, but only if the current (host) sp changed from the value @@ -11067,6 +11076,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, load_vmcs12_host_state(vcpu, vmcs12); /* Update any VMCS fields that might have changed while L2 ran */ + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset); if (vmx->hv_deadline_tsc == -1) vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL, From 57e3ada3e552dcd2de7e22acfb6eac2000f98868 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 20 Jun 2018 13:58:37 -0400 Subject: [PATCH 571/970] x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers commit 33966dd6b2d2c352fae55412db2ea8cfff5df13a upstream There is no semantic change but this change allows an unbalanced amount of MSRs to be loaded on VMEXIT and VMENTER, i.e. the number of MSRs to save or restore on VMEXIT or VMENTER may be different. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 65 +++++++++++++++++++++++++--------------------- 1 file changed, 35 insertions(+), 30 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1b24d15f953d..6cced1daa92c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -592,6 +592,11 @@ static inline int pi_test_sn(struct pi_desc *pi_desc) (unsigned long *)&pi_desc->control); } +struct vmx_msrs { + unsigned int nr; + struct vmx_msr_entry val[NR_AUTOLOAD_MSRS]; +}; + struct vcpu_vmx { struct kvm_vcpu vcpu; unsigned long host_rsp; @@ -624,9 +629,8 @@ struct vcpu_vmx { struct loaded_vmcs *loaded_vmcs; bool __launched; /* temporary, used in vmx_vcpu_run */ struct msr_autoload { - unsigned nr; - struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS]; - struct vmx_msr_entry host[NR_AUTOLOAD_MSRS]; + struct vmx_msrs guest; + struct vmx_msrs host; } msr_autoload; struct { int loaded; @@ -1994,18 +1998,18 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) } break; } - - for (i = 0; i < m->nr; ++i) - if (m->guest[i].index == msr) + for (i = 0; i < m->guest.nr; ++i) + if (m->guest.val[i].index == msr) break; - if (i == m->nr) + if (i == m->guest.nr) return; - --m->nr; - m->guest[i] = m->guest[m->nr]; - m->host[i] = m->host[m->nr]; - vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr); - vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr); + --m->guest.nr; + --m->host.nr; + m->guest.val[i] = m->guest.val[m->guest.nr]; + m->host.val[i] = m->host.val[m->host.nr]; + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); } static void add_atomic_switch_msr_special(struct vcpu_vmx *vmx, @@ -2057,24 +2061,25 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, wrmsrl(MSR_IA32_PEBS_ENABLE, 0); } - for (i = 0; i < m->nr; ++i) - if (m->guest[i].index == msr) + for (i = 0; i < m->guest.nr; ++i) + if (m->guest.val[i].index == msr) break; if (i == NR_AUTOLOAD_MSRS) { printk_once(KERN_WARNING "Not enough msr switch entries. " "Can't add msr %x\n", msr); return; - } else if (i == m->nr) { - ++m->nr; - vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr); - vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr); + } else if (i == m->guest.nr) { + ++m->guest.nr; + ++m->host.nr; + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); } - m->guest[i].index = msr; - m->guest[i].value = guest_val; - m->host[i].index = msr; - m->host[i].value = host_val; + m->guest.val[i].index = msr; + m->guest.val[i].value = guest_val; + m->host.val[i].index = msr; + m->host.val[i].value = host_val; } static void reload_tss(void) @@ -5316,9 +5321,9 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0); vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, 0); - vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host)); + vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host.val)); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 0); - vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest)); + vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest.val)); if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) vmcs_write64(GUEST_IA32_PAT, vmx->vcpu.arch.pat); @@ -10224,10 +10229,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * Set the MSR load/store lists to match L0's settings. */ vmcs_write32(VM_EXIT_MSR_STORE_COUNT, 0); - vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); - vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host)); - vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); - vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest)); + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr); + vmcs_write64(VM_EXIT_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.host.val)); + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr); + vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, __pa(vmx->msr_autoload.guest.val)); /* * HOST_RSP is normally set correctly in vmx_vcpu_run() just before @@ -11076,8 +11081,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, load_vmcs12_host_state(vcpu, vmcs12); /* Update any VMCS fields that might have changed while L2 ran */ - vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); - vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); + vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr); + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr); vmcs_write64(TSC_OFFSET, vcpu->arch.tsc_offset); if (vmx->hv_deadline_tsc == -1) vmcs_clear_bits(PIN_BASED_VM_EXEC_CONTROL, From 1555f9e8ed973df3e4a5aecc37cdb6d48469d366 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 20 Jun 2018 20:11:39 -0400 Subject: [PATCH 572/970] x86/KVM/VMX: Add find_msr() helper function commit ca83b4a7f2d068da79a029d323024aa45decb250 upstream .. to help find the MSR on either the guest or host MSR list. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6cced1daa92c..1efd6697337f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1975,9 +1975,20 @@ static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx, vm_exit_controls_clearbit(vmx, exit); } +static int find_msr(struct vmx_msrs *m, unsigned int msr) +{ + unsigned int i; + + for (i = 0; i < m->nr; ++i) { + if (m->val[i].index == msr) + return i; + } + return -ENOENT; +} + static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) { - unsigned i; + int i; struct msr_autoload *m = &vmx->msr_autoload; switch (msr) { @@ -1998,11 +2009,8 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) } break; } - for (i = 0; i < m->guest.nr; ++i) - if (m->guest.val[i].index == msr) - break; - - if (i == m->guest.nr) + i = find_msr(&m->guest, msr); + if (i < 0) return; --m->guest.nr; --m->host.nr; @@ -2026,7 +2034,7 @@ static void add_atomic_switch_msr_special(struct vcpu_vmx *vmx, static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, u64 guest_val, u64 host_val) { - unsigned i; + int i; struct msr_autoload *m = &vmx->msr_autoload; switch (msr) { @@ -2061,16 +2069,13 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, wrmsrl(MSR_IA32_PEBS_ENABLE, 0); } - for (i = 0; i < m->guest.nr; ++i) - if (m->guest.val[i].index == msr) - break; - + i = find_msr(&m->guest, msr); if (i == NR_AUTOLOAD_MSRS) { printk_once(KERN_WARNING "Not enough msr switch entries. " "Can't add msr %x\n", msr); return; - } else if (i == m->guest.nr) { - ++m->guest.nr; + } else if (i < 0) { + i = m->guest.nr++; ++m->host.nr; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); From 5d3eaa2d3935e9a5be2dd7186962e72e475d5966 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 20 Jun 2018 22:00:47 -0400 Subject: [PATCH 573/970] x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting commit 3190709335dd31fe1aeeebfe4ffb6c7624ef971f upstream This allows to load a different number of MSRs depending on the context: VMEXIT or VMENTER. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1efd6697337f..e4ccb931ab40 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2011,12 +2011,18 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) } i = find_msr(&m->guest, msr); if (i < 0) - return; + goto skip_guest; --m->guest.nr; - --m->host.nr; m->guest.val[i] = m->guest.val[m->guest.nr]; - m->host.val[i] = m->host.val[m->host.nr]; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); + +skip_guest: + i = find_msr(&m->host, msr); + if (i < 0) + return; + + --m->host.nr; + m->host.val[i] = m->host.val[m->host.nr]; vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); } @@ -2034,7 +2040,7 @@ static void add_atomic_switch_msr_special(struct vcpu_vmx *vmx, static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, u64 guest_val, u64 host_val) { - int i; + int i, j; struct msr_autoload *m = &vmx->msr_autoload; switch (msr) { @@ -2070,21 +2076,24 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, } i = find_msr(&m->guest, msr); - if (i == NR_AUTOLOAD_MSRS) { + j = find_msr(&m->host, msr); + if (i == NR_AUTOLOAD_MSRS || j == NR_AUTOLOAD_MSRS) { printk_once(KERN_WARNING "Not enough msr switch entries. " "Can't add msr %x\n", msr); return; - } else if (i < 0) { + } + if (i < 0) { i = m->guest.nr++; - ++m->host.nr; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); + } + if (j < 0) { + j = m->host.nr++; vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); } - m->guest.val[i].index = msr; m->guest.val[i].value = guest_val; - m->host.val[i].index = msr; - m->host.val[i].value = host_val; + m->host.val[j].index = msr; + m->host.val[j].value = host_val; } static void reload_tss(void) From c45ff817e91bef4cbb36944b0c723a42c4c920d2 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 20 Jun 2018 22:01:22 -0400 Subject: [PATCH 574/970] x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs commit 989e3992d2eca32c3f1404f2bc91acda3aa122d8 upstream The IA32_FLUSH_CMD MSR needs only to be written on VMENTER. Extend add_atomic_switch_msr() with an entry_only parameter to allow storing the MSR only in the guest (ENTRY) MSR array. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e4ccb931ab40..e868c06939cd 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2038,9 +2038,9 @@ static void add_atomic_switch_msr_special(struct vcpu_vmx *vmx, } static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, - u64 guest_val, u64 host_val) + u64 guest_val, u64 host_val, bool entry_only) { - int i, j; + int i, j = 0; struct msr_autoload *m = &vmx->msr_autoload; switch (msr) { @@ -2076,7 +2076,9 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, } i = find_msr(&m->guest, msr); - j = find_msr(&m->host, msr); + if (!entry_only) + j = find_msr(&m->host, msr); + if (i == NR_AUTOLOAD_MSRS || j == NR_AUTOLOAD_MSRS) { printk_once(KERN_WARNING "Not enough msr switch entries. " "Can't add msr %x\n", msr); @@ -2086,12 +2088,16 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, i = m->guest.nr++; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->guest.nr); } + m->guest.val[i].index = msr; + m->guest.val[i].value = guest_val; + + if (entry_only) + return; + if (j < 0) { j = m->host.nr++; vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->host.nr); } - m->guest.val[i].index = msr; - m->guest.val[i].value = guest_val; m->host.val[j].index = msr; m->host.val[j].value = host_val; } @@ -2150,7 +2156,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) guest_efer &= ~EFER_LME; if (guest_efer != host_efer) add_atomic_switch_msr(vmx, MSR_EFER, - guest_efer, host_efer); + guest_efer, host_efer, false); return false; } else { guest_efer &= ~ignore_bits; @@ -3314,7 +3320,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu->arch.ia32_xss = data; if (vcpu->arch.ia32_xss != host_xss) add_atomic_switch_msr(vmx, MSR_IA32_XSS, - vcpu->arch.ia32_xss, host_xss); + vcpu->arch.ia32_xss, host_xss, false); else clear_atomic_switch_msr(vmx, MSR_IA32_XSS); break; @@ -8985,7 +8991,7 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) clear_atomic_switch_msr(vmx, msrs[i].msr); else add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, - msrs[i].host); + msrs[i].host, false); } void vmx_arm_hv_timer(struct kvm_vcpu *vcpu) From a8c14676a93da6b3ef6610b37ef84e7d89f9f3a2 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Thu, 28 Jun 2018 17:10:36 -0400 Subject: [PATCH 575/970] x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required commit 390d975e0c4e60ce70d4157e0dd91ede37824603 upstream If the L1D flush module parameter is set to 'always' and the IA32_FLUSH_CMD MSR is available, optimize the VMENTER code with the MSR save list. Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 42 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e868c06939cd..b56761d994ea 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5269,6 +5269,16 @@ static void ept_set_mmio_spte_mask(void) kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull); } +static bool vmx_l1d_use_msr_save_list(void) +{ + if (!enable_ept || !boot_cpu_has_bug(X86_BUG_L1TF) || + static_cpu_has(X86_FEATURE_HYPERVISOR) || + !static_cpu_has(X86_FEATURE_FLUSH_L1D)) + return false; + + return vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; +} + #define VMX_XSS_EXIT_BITMAP 0 /* * Sets up the vmcs for emulated real mode. @@ -5618,6 +5628,12 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); } + /* + * If flushing the L1D cache on every VMENTER is enforced and the + * MSR is available, use the MSR save list. + */ + if (vmx_l1d_use_msr_save_list()) + add_atomic_switch_msr(vmx, MSR_IA32_FLUSH_CMD, L1D_FLUSH, 0, true); } static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) @@ -8581,11 +8597,26 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) bool always; /* - * If the mitigation mode is 'flush always', keep the flush bit - * set, otherwise clear it. It gets set again either from - * vcpu_run() or from one of the unsafe VMEXIT handlers. + * This code is only executed when: + * - the flush mode is 'cond' + * - the flush mode is 'always' and the flush MSR is not + * available + * + * If the CPU has the flush MSR then clear the flush bit because + * 'always' mode is handled via the MSR save list. + * + * If the MSR is not avaibable then act depending on the mitigation + * mode: If 'flush always', keep the flush bit set, otherwise clear + * it. + * + * The flush bit gets set again either from vcpu_run() or from one + * of the unsafe VMEXIT handlers. */ - always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; + if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) + always = false; + else + always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; + vcpu->arch.l1tf_flush_l1d = always; vcpu->stat.l1d_flush++; @@ -11660,7 +11691,8 @@ static int __init vmx_setup_l1d_flush(void) struct page *page; if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER || - !boot_cpu_has_bug(X86_BUG_L1TF)) + !boot_cpu_has_bug(X86_BUG_L1TF) || + vmx_l1d_use_msr_save_list()) return 0; if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { From e7cda2ffe1279bcf63f1dd8bbc3c7b818a9ba457 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 7 Jul 2018 11:40:18 +0200 Subject: [PATCH 576/970] cpu/hotplug: Online siblings when SMT control is turned on commit 215af5499d9e2b55f111d2431ea20218115f29b3 upstream Writing 'off' to /sys/devices/system/cpu/smt/control offlines all SMT siblings. Writing 'on' merily enables the abilify to online them, but does not online them automatically. Make 'on' more useful by onlining all offline siblings. Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- kernel/cpu.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 8be14464b43c..15a9bd108ce5 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1917,6 +1917,15 @@ static void cpuhp_offline_cpu_device(unsigned int cpu) kobject_uevent(&dev->kobj, KOBJ_OFFLINE); } +static void cpuhp_online_cpu_device(unsigned int cpu) +{ + struct device *dev = get_cpu_device(cpu); + + dev->offline = false; + /* Tell user space about the state change */ + kobject_uevent(&dev->kobj, KOBJ_ONLINE); +} + static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { int cpu, ret = 0; @@ -1949,11 +1958,24 @@ static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) return ret; } -static void cpuhp_smt_enable(void) +static int cpuhp_smt_enable(void) { + int cpu, ret = 0; + cpu_maps_update_begin(); cpu_smt_control = CPU_SMT_ENABLED; + for_each_present_cpu(cpu) { + /* Skip online CPUs and CPUs on offline nodes */ + if (cpu_online(cpu) || !node_online(cpu_to_node(cpu))) + continue; + ret = _cpu_up(cpu, 0, CPUHP_ONLINE); + if (ret) + break; + /* See comment in cpuhp_smt_disable() */ + cpuhp_online_cpu_device(cpu); + } cpu_maps_update_done(); + return ret; } static ssize_t @@ -1984,7 +2006,7 @@ store_smt_control(struct device *dev, struct device_attribute *attr, if (ctrlval != cpu_smt_control) { switch (ctrlval) { case CPU_SMT_ENABLED: - cpuhp_smt_enable(); + ret = cpuhp_smt_enable(); break; case CPU_SMT_DISABLED: case CPU_SMT_FORCE_DISABLED: From 80e55b5ea4e9dbc049594bf357b1a9b0347bb584 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:16 +0200 Subject: [PATCH 577/970] x86/litf: Introduce vmx status variable commit 72c6d2db64fa18c996ece8f06e499509e6c9a37e upstream Store the effective mitigation of VMX in a status variable and use it to report the VMX state in the l1tf sysfs file. Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/vmx.h | 9 +++++++++ arch/x86/kernel/cpu/bugs.c | 36 ++++++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx.c | 22 +++++++++++----------- 3 files changed, 54 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 9cbfbef6a115..c5639c139732 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -499,4 +499,13 @@ enum vm_instruction_error_number { VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID = 28, }; +enum vmx_l1d_flush_state { + VMENTER_L1D_FLUSH_AUTO, + VMENTER_L1D_FLUSH_NEVER, + VMENTER_L1D_FLUSH_COND, + VMENTER_L1D_FLUSH_ALWAYS, +}; + +extern enum vmx_l1d_flush_state l1tf_vmx_mitigation; + #endif diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 51257f927240..59eb6c5ce1ea 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -635,6 +636,12 @@ void x86_spec_ctrl_setup_ap(void) #undef pr_fmt #define pr_fmt(fmt) "L1TF: " fmt + +#if IS_ENABLED(CONFIG_KVM_INTEL) +enum vmx_l1d_flush_state l1tf_vmx_mitigation __ro_after_init = VMENTER_L1D_FLUSH_AUTO; +EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation); +#endif + static void __init l1tf_select_mitigation(void) { u64 half_pa; @@ -664,6 +671,32 @@ static void __init l1tf_select_mitigation(void) #ifdef CONFIG_SYSFS +#define L1TF_DEFAULT_MSG "Mitigation: PTE Inversion" + +#if IS_ENABLED(CONFIG_KVM_INTEL) +static const char *l1tf_vmx_states[] = { + [VMENTER_L1D_FLUSH_AUTO] = "auto", + [VMENTER_L1D_FLUSH_NEVER] = "vulnerable", + [VMENTER_L1D_FLUSH_COND] = "conditional cache flushes", + [VMENTER_L1D_FLUSH_ALWAYS] = "cache flushes", +}; + +static ssize_t l1tf_show_state(char *buf) +{ + if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO) + return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG); + + return sprintf(buf, "%s; VMX: SMT %s, L1D %s\n", L1TF_DEFAULT_MSG, + cpu_smt_control == CPU_SMT_ENABLED ? "vulnerable" : "disabled", + l1tf_vmx_states[l1tf_vmx_mitigation]); +} +#else +static ssize_t l1tf_show_state(char *buf) +{ + return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG); +} +#endif + static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { @@ -691,9 +724,8 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr case X86_BUG_L1TF: if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV)) - return sprintf(buf, "Mitigation: Page Table Inversion\n"); + return l1tf_show_state(buf); break; - default: break; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b56761d994ea..8aa245cafdd3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -194,19 +194,13 @@ extern const ulong vmx_return; static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); -/* These MUST be in sync with vmentry_l1d_param order. */ -enum vmx_l1d_flush_state { - VMENTER_L1D_FLUSH_NEVER, - VMENTER_L1D_FLUSH_COND, - VMENTER_L1D_FLUSH_ALWAYS, -}; - static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush = VMENTER_L1D_FLUSH_COND; static const struct { const char *option; enum vmx_l1d_flush_state cmd; } vmentry_l1d_param[] = { + {"auto", VMENTER_L1D_FLUSH_AUTO}, {"never", VMENTER_L1D_FLUSH_NEVER}, {"cond", VMENTER_L1D_FLUSH_COND}, {"always", VMENTER_L1D_FLUSH_ALWAYS}, @@ -11690,8 +11684,12 @@ static int __init vmx_setup_l1d_flush(void) { struct page *page; + if (!boot_cpu_has_bug(X86_BUG_L1TF)) + return 0; + + l1tf_vmx_mitigation = vmentry_l1d_flush; + if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER || - !boot_cpu_has_bug(X86_BUG_L1TF) || vmx_l1d_use_msr_save_list()) return 0; @@ -11706,12 +11704,14 @@ static int __init vmx_setup_l1d_flush(void) return 0; } -static void vmx_free_l1d_flush_pages(void) +static void vmx_cleanup_l1d_flush(void) { if (vmx_l1d_flush_pages) { free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER); vmx_l1d_flush_pages = NULL; } + /* Restore state so sysfs ignores VMX */ + l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; } static int __init vmx_init(void) @@ -11725,7 +11725,7 @@ static int __init vmx_init(void) r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), __alignof__(struct vcpu_vmx), THIS_MODULE); if (r) { - vmx_free_l1d_flush_pages(); + vmx_cleanup_l1d_flush(); return r; } @@ -11746,7 +11746,7 @@ static void __exit vmx_exit(void) kvm_exit(); - vmx_free_l1d_flush_pages(); + vmx_cleanup_l1d_flush(); } module_init(vmx_init) From 31282cf43b9d4fd950d8879af081771c0ff04f5f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:17 +0200 Subject: [PATCH 578/970] x86/kvm: Drop L1TF MSR list approach commit 2f055947ae5e2741fb2dc5bba1033c417ccf4faa upstream The VMX module parameter to control the L1D flush should become writeable. The MSR list is set up at VM init per guest VCPU, but the run time switching is based on a static key which is global. Toggling the MSR list at run time might be feasible, but for now drop this optimization and use the regular MSR write to make run-time switching possible. The default mitigation is the conditional flush anyway, so for extra paranoid setups this will add some small overhead, but the extra code executed is in the noise compared to the flush itself. Aside of that the EPT disabled case is not handled correctly at the moment and the MSR list magic is in the way for fixing that as well. If it's really providing a significant advantage, then this needs to be revisited after the code is correct and the control is writable. Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142322.516940445@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 43 +++++++------------------------------------ 1 file changed, 7 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 8aa245cafdd3..8155a4d545a4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5263,16 +5263,6 @@ static void ept_set_mmio_spte_mask(void) kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull); } -static bool vmx_l1d_use_msr_save_list(void) -{ - if (!enable_ept || !boot_cpu_has_bug(X86_BUG_L1TF) || - static_cpu_has(X86_FEATURE_HYPERVISOR) || - !static_cpu_has(X86_FEATURE_FLUSH_L1D)) - return false; - - return vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; -} - #define VMX_XSS_EXIT_BITMAP 0 /* * Sets up the vmcs for emulated real mode. @@ -5622,12 +5612,6 @@ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); } - /* - * If flushing the L1D cache on every VMENTER is enforced and the - * MSR is available, use the MSR save list. - */ - if (vmx_l1d_use_msr_save_list()) - add_atomic_switch_msr(vmx, MSR_IA32_FLUSH_CMD, L1D_FLUSH, 0, true); } static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) @@ -8591,26 +8575,14 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) bool always; /* - * This code is only executed when: - * - the flush mode is 'cond' - * - the flush mode is 'always' and the flush MSR is not - * available + * This code is only executed when the the flush mode is 'cond' or + * 'always' * - * If the CPU has the flush MSR then clear the flush bit because - * 'always' mode is handled via the MSR save list. - * - * If the MSR is not avaibable then act depending on the mitigation - * mode: If 'flush always', keep the flush bit set, otherwise clear - * it. - * - * The flush bit gets set again either from vcpu_run() or from one - * of the unsafe VMEXIT handlers. + * If 'flush always', keep the flush bit set, otherwise clear + * it. The flush bit gets set again either from vcpu_run() or from + * one of the unsafe VMEXIT handlers. */ - if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) - always = false; - else - always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; - + always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; vcpu->arch.l1tf_flush_l1d = always; vcpu->stat.l1d_flush++; @@ -11689,8 +11661,7 @@ static int __init vmx_setup_l1d_flush(void) l1tf_vmx_mitigation = vmentry_l1d_flush; - if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER || - vmx_l1d_use_msr_save_list()) + if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER) return 0; if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { From 4186ae815556590798de371e0d6ed85fc3682534 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:18 +0200 Subject: [PATCH 579/970] x86/l1tf: Handle EPT disabled state proper commit a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7 upstream If Extended Page Tables (EPT) are disabled or not supported, no L1D flushing is required. The setup function can just avoid setting up the L1D flush for the EPT=n case. Invoke it after the hardware setup has be done and enable_ept has the correct state and expose the EPT disabled state in the mitigation status as well. Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/vmx.h | 1 + arch/x86/kernel/cpu/bugs.c | 9 +++--- arch/x86/kvm/vmx.c | 58 ++++++++++++++++++++++---------------- 3 files changed, 39 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index c5639c139732..7862cbf8cc70 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -504,6 +504,7 @@ enum vmx_l1d_flush_state { VMENTER_L1D_FLUSH_NEVER, VMENTER_L1D_FLUSH_COND, VMENTER_L1D_FLUSH_ALWAYS, + VMENTER_L1D_FLUSH_EPT_DISABLED, }; extern enum vmx_l1d_flush_state l1tf_vmx_mitigation; diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 59eb6c5ce1ea..f1e2c94fdebf 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -675,10 +675,11 @@ static void __init l1tf_select_mitigation(void) #if IS_ENABLED(CONFIG_KVM_INTEL) static const char *l1tf_vmx_states[] = { - [VMENTER_L1D_FLUSH_AUTO] = "auto", - [VMENTER_L1D_FLUSH_NEVER] = "vulnerable", - [VMENTER_L1D_FLUSH_COND] = "conditional cache flushes", - [VMENTER_L1D_FLUSH_ALWAYS] = "cache flushes", + [VMENTER_L1D_FLUSH_AUTO] = "auto", + [VMENTER_L1D_FLUSH_NEVER] = "vulnerable", + [VMENTER_L1D_FLUSH_COND] = "conditional cache flushes", + [VMENTER_L1D_FLUSH_ALWAYS] = "cache flushes", + [VMENTER_L1D_FLUSH_EPT_DISABLED] = "EPT disabled", }; static ssize_t l1tf_show_state(char *buf) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 8155a4d545a4..1811aeef4f41 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -11659,6 +11659,11 @@ static int __init vmx_setup_l1d_flush(void) if (!boot_cpu_has_bug(X86_BUG_L1TF)) return 0; + if (!enable_ept) { + l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED; + return 0; + } + l1tf_vmx_mitigation = vmentry_l1d_flush; if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER) @@ -11685,30 +11690,8 @@ static void vmx_cleanup_l1d_flush(void) l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; } -static int __init vmx_init(void) -{ - int r; - r = vmx_setup_l1d_flush(); - if (r) - return r; - - r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), - __alignof__(struct vcpu_vmx), THIS_MODULE); - if (r) { - vmx_cleanup_l1d_flush(); - return r; - } - -#ifdef CONFIG_KEXEC_CORE - rcu_assign_pointer(crash_vmclear_loaded_vmcss, - crash_vmclear_local_loaded_vmcss); -#endif - - return 0; -} - -static void __exit vmx_exit(void) +static void vmx_exit(void) { #ifdef CONFIG_KEXEC_CORE RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL); @@ -11719,6 +11702,31 @@ static void __exit vmx_exit(void) vmx_cleanup_l1d_flush(); } - -module_init(vmx_init) module_exit(vmx_exit) + +static int __init vmx_init(void) +{ + int r; + + r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), + __alignof__(struct vcpu_vmx), THIS_MODULE); + if (r) + return r; + + /* + * Must be called after kvm_init() so enable_ept is properly set up + */ + r = vmx_setup_l1d_flush(); + if (r) { + vmx_exit(); + return r; + } + +#ifdef CONFIG_KEXEC_CORE + rcu_assign_pointer(crash_vmclear_loaded_vmcss, + crash_vmclear_local_loaded_vmcss); +#endif + + return 0; +} +module_init(vmx_init) From 641a211704f630a3cc0c9ad1a7d922baf9432f11 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:19 +0200 Subject: [PATCH 580/970] x86/kvm: Move l1tf setup function commit 7db92e165ac814487264632ab2624e832f20ae38 upstream In preparation of allowing run time control for L1D flushing, move the setup code to the module parameter handler. In case of pre module init parsing, just store the value and let vmx_init() do the actual setup after running kvm_init() so that enable_ept is having the correct state. During run-time invoke it directly from the parameter setter to prepare for run-time control. Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142322.694063239@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 123 ++++++++++++++++++++++++++++----------------- 1 file changed, 77 insertions(+), 46 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1811aeef4f41..399b0dcb2479 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -194,7 +194,8 @@ extern const ulong vmx_return; static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); -static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush = VMENTER_L1D_FLUSH_COND; +/* Storage for pre module init parameter parsing */ +static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO; static const struct { const char *option; @@ -206,33 +207,85 @@ static const struct { {"always", VMENTER_L1D_FLUSH_ALWAYS}, }; -static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) +#define L1D_CACHE_ORDER 4 +static void *vmx_l1d_flush_pages; + +static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) +{ + struct page *page; + + /* If set to 'auto' select 'cond' */ + if (l1tf == VMENTER_L1D_FLUSH_AUTO) + l1tf = VMENTER_L1D_FLUSH_COND; + + if (!enable_ept) { + l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED; + return 0; + } + + if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages && + !boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { + page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); + if (!page) + return -ENOMEM; + vmx_l1d_flush_pages = page_address(page); + } + + l1tf_vmx_mitigation = l1tf; + + if (l1tf != VMENTER_L1D_FLUSH_NEVER) + static_branch_enable(&vmx_l1d_should_flush); + return 0; +} + +static int vmentry_l1d_flush_parse(const char *s) { unsigned int i; - if (!s) - return -EINVAL; - - for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) { - if (!strcmp(s, vmentry_l1d_param[i].option)) { - vmentry_l1d_flush = vmentry_l1d_param[i].cmd; - return 0; + if (s) { + for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) { + if (!strcmp(s, vmentry_l1d_param[i].option)) + return vmentry_l1d_param[i].cmd; } } - return -EINVAL; } +static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) +{ + int l1tf; + + if (!boot_cpu_has(X86_BUG_L1TF)) + return 0; + + l1tf = vmentry_l1d_flush_parse(s); + if (l1tf < 0) + return l1tf; + + /* + * Has vmx_init() run already? If not then this is the pre init + * parameter parsing. In that case just store the value and let + * vmx_init() do the proper setup after enable_ept has been + * established. + */ + if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO) { + vmentry_l1d_flush_param = l1tf; + return 0; + } + + return vmx_setup_l1d_flush(l1tf); +} + static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp) { - return sprintf(s, "%s\n", vmentry_l1d_param[vmentry_l1d_flush].option); + return sprintf(s, "%s\n", vmentry_l1d_param[l1tf_vmx_mitigation].option); } static const struct kernel_param_ops vmentry_l1d_flush_ops = { .set = vmentry_l1d_flush_set, .get = vmentry_l1d_flush_get, }; -module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, &vmentry_l1d_flush, S_IRUGO); +module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, S_IRUGO); #define NR_AUTOLOAD_MSRS 8 @@ -8582,7 +8635,7 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) * it. The flush bit gets set again either from vcpu_run() or from * one of the unsafe VMEXIT handlers. */ - always = vmentry_l1d_flush == VMENTER_L1D_FLUSH_ALWAYS; + always = l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_ALWAYS; vcpu->arch.l1tf_flush_l1d = always; vcpu->stat.l1d_flush++; @@ -11652,34 +11705,6 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .setup_mce = vmx_setup_mce, }; -static int __init vmx_setup_l1d_flush(void) -{ - struct page *page; - - if (!boot_cpu_has_bug(X86_BUG_L1TF)) - return 0; - - if (!enable_ept) { - l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED; - return 0; - } - - l1tf_vmx_mitigation = vmentry_l1d_flush; - - if (vmentry_l1d_flush == VMENTER_L1D_FLUSH_NEVER) - return 0; - - if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { - page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); - if (!page) - return -ENOMEM; - vmx_l1d_flush_pages = page_address(page); - } - - static_branch_enable(&vmx_l1d_should_flush); - return 0; -} - static void vmx_cleanup_l1d_flush(void) { if (vmx_l1d_flush_pages) { @@ -11714,12 +11739,18 @@ static int __init vmx_init(void) return r; /* - * Must be called after kvm_init() so enable_ept is properly set up + * Must be called after kvm_init() so enable_ept is properly set + * up. Hand the parameter mitigation value in which was stored in + * the pre module init parser. If no parameter was given, it will + * contain 'auto' which will be turned into the default 'cond' + * mitigation mode. */ - r = vmx_setup_l1d_flush(); - if (r) { - vmx_exit(); - return r; + if (boot_cpu_has(X86_BUG_L1TF)) { + r = vmx_setup_l1d_flush(vmentry_l1d_flush_param); + if (r) { + vmx_exit(); + return r; + } } #ifdef CONFIG_KEXEC_CORE From dff0982c5719eaedff58c026be9871ea63af992c Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:20 +0200 Subject: [PATCH 581/970] x86/kvm: Add static key for flush always commit 4c6523ec59fe895ea352a650218a6be0653910b1 upstream Avoid the conditional in the L1D flush control path. Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142322.790914912@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 399b0dcb2479..332d92d2ee75 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -193,6 +193,7 @@ module_param(ple_window_max, int, S_IRUGO); extern const ulong vmx_return; static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); +static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_always); /* Storage for pre module init parameter parsing */ static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO; @@ -233,8 +234,12 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) l1tf_vmx_mitigation = l1tf; - if (l1tf != VMENTER_L1D_FLUSH_NEVER) - static_branch_enable(&vmx_l1d_should_flush); + if (l1tf == VMENTER_L1D_FLUSH_NEVER) + return 0; + + static_branch_enable(&vmx_l1d_should_flush); + if (l1tf == VMENTER_L1D_FLUSH_ALWAYS) + static_branch_enable(&vmx_l1d_flush_always); return 0; } @@ -8625,7 +8630,6 @@ static void *vmx_l1d_flush_pages; static void vmx_l1d_flush(struct kvm_vcpu *vcpu) { int size = PAGE_SIZE << L1D_CACHE_ORDER; - bool always; /* * This code is only executed when the the flush mode is 'cond' or @@ -8635,8 +8639,10 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) * it. The flush bit gets set again either from vcpu_run() or from * one of the unsafe VMEXIT handlers. */ - always = l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_ALWAYS; - vcpu->arch.l1tf_flush_l1d = always; + if (static_branch_unlikely(&vmx_l1d_flush_always)) + vcpu->arch.l1tf_flush_l1d = true; + else + vcpu->arch.l1tf_flush_l1d = false; vcpu->stat.l1d_flush++; From 6ccf633238db85cbadd3fa0830eab88fe949dd67 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:21 +0200 Subject: [PATCH 582/970] x86/kvm: Serialize L1D flush parameter setter commit dd4bfa739a72508b75760b393d129ed7b431daab upstream Writes to the parameter files are not serialized at the sysfs core level, so local serialization is required. Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142322.873642605@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 332d92d2ee75..ad51ebb54153 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -194,6 +194,7 @@ extern const ulong vmx_return; static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_always); +static DEFINE_MUTEX(vmx_l1d_flush_mutex); /* Storage for pre module init parameter parsing */ static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_L1D_FLUSH_AUTO; @@ -258,7 +259,7 @@ static int vmentry_l1d_flush_parse(const char *s) static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) { - int l1tf; + int l1tf, ret; if (!boot_cpu_has(X86_BUG_L1TF)) return 0; @@ -278,7 +279,10 @@ static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) return 0; } - return vmx_setup_l1d_flush(l1tf); + mutex_lock(&vmx_l1d_flush_mutex); + ret = vmx_setup_l1d_flush(l1tf); + mutex_unlock(&vmx_l1d_flush_mutex); + return ret; } static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp) From 4797c2f3791e58d21e82bf0948483ae9b639286b Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:22 +0200 Subject: [PATCH 583/970] x86/kvm: Allow runtime control of L1D flush commit 895ae47f9918833c3a880fbccd41e0692b37e7d9 upstream All mitigation modes can be switched at run time with a static key now: - Use sysfs_streq() instead of strcmp() to handle the trailing new line from sysfs writes correctly. - Make the static key management handle multiple invocations properly. - Set the module parameter file to RW Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 2 +- arch/x86/kvm/vmx.c | 13 ++++++++----- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index f1e2c94fdebf..23aa3de93920 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -638,7 +638,7 @@ void x86_spec_ctrl_setup_ap(void) #define pr_fmt(fmt) "L1TF: " fmt #if IS_ENABLED(CONFIG_KVM_INTEL) -enum vmx_l1d_flush_state l1tf_vmx_mitigation __ro_after_init = VMENTER_L1D_FLUSH_AUTO; +enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation); #endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ad51ebb54153..49ddeff727d7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -235,12 +235,15 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) l1tf_vmx_mitigation = l1tf; - if (l1tf == VMENTER_L1D_FLUSH_NEVER) - return 0; + if (l1tf != VMENTER_L1D_FLUSH_NEVER) + static_branch_enable(&vmx_l1d_should_flush); + else + static_branch_disable(&vmx_l1d_should_flush); - static_branch_enable(&vmx_l1d_should_flush); if (l1tf == VMENTER_L1D_FLUSH_ALWAYS) static_branch_enable(&vmx_l1d_flush_always); + else + static_branch_disable(&vmx_l1d_flush_always); return 0; } @@ -250,7 +253,7 @@ static int vmentry_l1d_flush_parse(const char *s) if (s) { for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) { - if (!strcmp(s, vmentry_l1d_param[i].option)) + if (sysfs_streq(s, vmentry_l1d_param[i].option)) return vmentry_l1d_param[i].cmd; } } @@ -294,7 +297,7 @@ static const struct kernel_param_ops vmentry_l1d_flush_ops = { .set = vmentry_l1d_flush_set, .get = vmentry_l1d_flush_get, }; -module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, S_IRUGO); +module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, 0644); #define NR_AUTOLOAD_MSRS 8 From a69c5e0706dc6783e11830bccafe34c0b7f0a979 Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Fri, 13 Jul 2018 16:23:23 +0200 Subject: [PATCH 584/970] cpu/hotplug: Expose SMT control init function commit 8e1b706b6e819bed215c0db16345568864660393 upstream The L1TF mitigation will gain a commend line parameter which allows to set a combination of hypervisor mitigation and SMT control. Expose cpu_smt_disable() so the command line parser can tweak SMT settings. [ tglx: Split out of larger patch and made it preserve an already existing force off state ] Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142323.039715135@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- include/linux/cpu.h | 2 ++ kernel/cpu.c | 16 +++++++++++++--- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index bb8fa64fbea3..e2a028e4fca9 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -266,8 +266,10 @@ enum cpuhp_smt_control { #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) extern enum cpuhp_smt_control cpu_smt_control; +extern void cpu_smt_disable(bool force); #else # define cpu_smt_control (CPU_SMT_ENABLED) +static inline void cpu_smt_disable(bool force) { } #endif #endif /* _LINUX_CPU_H_ */ diff --git a/kernel/cpu.c b/kernel/cpu.c index 15a9bd108ce5..e1de463a90f2 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -360,13 +360,23 @@ EXPORT_SYMBOL_GPL(cpu_hotplug_enable); enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED; EXPORT_SYMBOL_GPL(cpu_smt_control); -static int __init smt_cmdline_disable(char *str) +void __init cpu_smt_disable(bool force) { - cpu_smt_control = CPU_SMT_DISABLED; - if (str && !strcmp(str, "force")) { + if (cpu_smt_control == CPU_SMT_FORCE_DISABLED || + cpu_smt_control == CPU_SMT_NOT_SUPPORTED) + return; + + if (force) { pr_info("SMT: Force disabled\n"); cpu_smt_control = CPU_SMT_FORCE_DISABLED; + } else { + cpu_smt_control = CPU_SMT_DISABLED; } +} + +static int __init smt_cmdline_disable(char *str) +{ + cpu_smt_disable(str && !strcmp(str, "force")); return 0; } early_param("nosmt", smt_cmdline_disable); From 929d3b2e9b130f238a8eb206bdc3f063ca68438f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:24 +0200 Subject: [PATCH 585/970] cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early commit fee0aede6f4739c87179eca76136f83210953b86 upstream The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support SMT) when the sysfs SMT control file is initialized. That was fine so far as this was only required to make the output of the control file correct and to prevent writes in that case. With the upcoming l1tf command line parameter, this needs to be set up before the L1TF mitigation selection and command line parsing happens. Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 6 ++++++ include/linux/cpu.h | 2 ++ kernel/cpu.c | 13 ++++++++++--- 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 23aa3de93920..ddc2dfeb3d59 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -57,6 +57,12 @@ void __init check_bugs(void) { identify_boot_cpu(); + /* + * identify_boot_cpu() initialized SMT support information, let the + * core code know. + */ + cpu_smt_check_topology(); + if (!IS_ENABLED(CONFIG_SMP)) { pr_info("CPU: "); print_cpu_info(&boot_cpu_data); diff --git a/include/linux/cpu.h b/include/linux/cpu.h index e2a028e4fca9..d059c67e4658 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -267,9 +267,11 @@ enum cpuhp_smt_control { #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) extern enum cpuhp_smt_control cpu_smt_control; extern void cpu_smt_disable(bool force); +extern void cpu_smt_check_topology(void); #else # define cpu_smt_control (CPU_SMT_ENABLED) static inline void cpu_smt_disable(bool force) { } +static inline void cpu_smt_check_topology(void) { } #endif #endif /* _LINUX_CPU_H_ */ diff --git a/kernel/cpu.c b/kernel/cpu.c index e1de463a90f2..8a2ad1587280 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -374,6 +374,16 @@ void __init cpu_smt_disable(bool force) } } +/* + * The decision whether SMT is supported can only be done after the full + * CPU identification. Called from architecture code. + */ +void __init cpu_smt_check_topology(void) +{ + if (!topology_smt_supported()) + cpu_smt_control = CPU_SMT_NOT_SUPPORTED; +} + static int __init smt_cmdline_disable(char *str) { cpu_smt_disable(str && !strcmp(str, "force")); @@ -2053,9 +2063,6 @@ static const struct attribute_group cpuhp_smt_attr_group = { static int __init cpu_smt_state_init(void) { - if (!topology_smt_supported()) - cpu_smt_control = CPU_SMT_NOT_SUPPORTED; - return sysfs_create_group(&cpu_subsys.dev_root->kobj, &cpuhp_smt_attr_group); } From 2decbf5264ea6175c6fca28ba2b5c0c683facf27 Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Fri, 13 Jul 2018 16:23:25 +0200 Subject: [PATCH 586/970] x86/bugs, kvm: Introduce boot-time control of L1TF mitigations commit d90a7a0ec83fb86622cd7dae23255d3c50a99ec8 upstream Introduce the 'l1tf=' kernel command line option to allow for boot-time switching of mitigation that is used on processors affected by L1TF. The possible values are: full Provides all available mitigations for the L1TF vulnerability. Disables SMT and enables all mitigations in the hypervisors. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. full,force Same as 'full', but disables SMT control. Implies the 'nosmt=force' command line option. sysfs control of SMT and the hypervisor flush control is disabled. flush Leaves SMT enabled and enables the conditional hypervisor mitigation. Hypervisors will issue a warning when the first VM is started in a potentially insecure configuration, i.e. SMT enabled or L1D flush disabled. flush,nosmt Disables SMT and enables the conditional hypervisor mitigation. SMT control via /sys/devices/system/cpu/smt/control is still possible after boot. If SMT is reenabled or flushing disabled at runtime hypervisors will issue a warning. flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is started in a potentially insecure configuration. off Disables hypervisor mitigations and doesn't emit any warnings. Default is 'flush'. Let KVM adhere to these semantics, which means: - 'lt1f=full,force' : Performe L1D flushes. No runtime control possible. - 'l1tf=full' - 'l1tf-flush' - 'l1tf=flush,nosmt' : Perform L1D flushes and warn on VM start if SMT has been runtime enabled or L1D flushing has been run-time enabled - 'l1tf=flush,nowarn' : Perform L1D flushes and no warnings are emitted. - 'l1tf=off' : L1D flushes are not performed and no warnings are emitted. KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush' module parameter except when lt1f=full,force is set. This makes KVM's private 'nosmt' option redundant, and as it is a bit non-systematic anyway (this is something to control globally, not on hypervisor level), remove that option. Add the missing Documentation entry for the l1tf vulnerability sysfs file while at it. Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Tested-by: Jiri Kosina Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- .../ABI/testing/sysfs-devices-system-cpu | 4 ++ Documentation/kernel-parameters.txt | 68 +++++++++++++++++-- arch/x86/include/asm/processor.h | 12 ++++ arch/x86/kernel/cpu/bugs.c | 44 ++++++++++++ arch/x86/kvm/vmx.c | 56 +++++++++++---- 5 files changed, 165 insertions(+), 19 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 7d8ba8da3c04..069e8d52c991 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -356,6 +356,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/spectre_v1 /sys/devices/system/cpu/vulnerabilities/spectre_v2 /sys/devices/system/cpu/vulnerabilities/spec_store_bypass + /sys/devices/system/cpu/vulnerabilities/l1tf Date: January 2018 Contact: Linux kernel mailing list Description: Information about CPU vulnerabilities @@ -368,6 +369,9 @@ Description: Information about CPU vulnerabilities "Vulnerable" CPU is affected and no mitigation in effect "Mitigation: $M" CPU is affected and mitigation $M is in effect + Details about the l1tf file can be found in + Documentation/admin-guide/l1tf.rst + What: /sys/devices/system/cpu/smt /sys/devices/system/cpu/smt/active /sys/devices/system/cpu/smt/control diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index d76cb9c8fbb0..a36a695318c6 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1989,12 +1989,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. for all guests. Default is 1 (enabled) if in 64-bit or 32-bit PAE mode. - kvm-intel.nosmt=[KVM,Intel] If the L1TF CPU bug is present (CVE-2018-3620) - and the system has SMT (aka Hyper-Threading) enabled then - don't allow guests to be created. - - Default is 0 (allow guests to be created). - kvm-intel.ept= [KVM,Intel] Disable extended page tables (virtualized MMU) support on capable Intel chips. Default is 1 (enabled) @@ -2032,6 +2026,68 @@ bytes respectively. Such letter suffixes can also be entirely omitted. feature (tagged TLBs) on capable Intel chips. Default is 1 (enabled) + l1tf= [X86] Control mitigation of the L1TF vulnerability on + affected CPUs + + The kernel PTE inversion protection is unconditionally + enabled and cannot be disabled. + + full + Provides all available mitigations for the + L1TF vulnerability. Disables SMT and + enables all mitigations in the + hypervisors, i.e. unconditional L1D flush. + + SMT control and L1D flush control via the + sysfs interface is still possible after + boot. Hypervisors will issue a warning + when the first VM is started in a + potentially insecure configuration, + i.e. SMT enabled or L1D flush disabled. + + full,force + Same as 'full', but disables SMT and L1D + flush runtime control. Implies the + 'nosmt=force' command line option. + (i.e. sysfs control of SMT is disabled.) + + flush + Leaves SMT enabled and enables the default + hypervisor mitigation, i.e. conditional + L1D flush. + + SMT control and L1D flush control via the + sysfs interface is still possible after + boot. Hypervisors will issue a warning + when the first VM is started in a + potentially insecure configuration, + i.e. SMT enabled or L1D flush disabled. + + flush,nosmt + + Disables SMT and enables the default + hypervisor mitigation. + + SMT control and L1D flush control via the + sysfs interface is still possible after + boot. Hypervisors will issue a warning + when the first VM is started in a + potentially insecure configuration, + i.e. SMT enabled or L1D flush disabled. + + flush,nowarn + Same as 'flush', but hypervisors will not + warn when a VM is started in a potentially + insecure configuration. + + off + Disables hypervisor mitigations and doesn't + emit any warnings. + + Default is 'flush'. + + For details see: Documentation/admin-guide/l1tf.rst + l2cr= [PPC] l3cr= [PPC] diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 1b05055bbdc6..d5525a7e119e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -860,4 +860,16 @@ bool xen_set_default_idle(void); void stop_this_cpu(void *dummy); void df_debug(struct pt_regs *regs, long error_code); + +enum l1tf_mitigations { + L1TF_MITIGATION_OFF, + L1TF_MITIGATION_FLUSH_NOWARN, + L1TF_MITIGATION_FLUSH, + L1TF_MITIGATION_FLUSH_NOSMT, + L1TF_MITIGATION_FULL, + L1TF_MITIGATION_FULL_FORCE +}; + +extern enum l1tf_mitigations l1tf_mitigation; + #endif /* _ASM_X86_PROCESSOR_H */ diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index ddc2dfeb3d59..c366fdca7d4e 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -643,7 +643,11 @@ void x86_spec_ctrl_setup_ap(void) #undef pr_fmt #define pr_fmt(fmt) "L1TF: " fmt +/* Default mitigation for L1TF-affected CPUs */ +enum l1tf_mitigations l1tf_mitigation __ro_after_init = L1TF_MITIGATION_FLUSH; #if IS_ENABLED(CONFIG_KVM_INTEL) +EXPORT_SYMBOL_GPL(l1tf_mitigation); + enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation); #endif @@ -655,6 +659,20 @@ static void __init l1tf_select_mitigation(void) if (!boot_cpu_has_bug(X86_BUG_L1TF)) return; + switch (l1tf_mitigation) { + case L1TF_MITIGATION_OFF: + case L1TF_MITIGATION_FLUSH_NOWARN: + case L1TF_MITIGATION_FLUSH: + break; + case L1TF_MITIGATION_FLUSH_NOSMT: + case L1TF_MITIGATION_FULL: + cpu_smt_disable(false); + break; + case L1TF_MITIGATION_FULL_FORCE: + cpu_smt_disable(true); + break; + } + #if CONFIG_PGTABLE_LEVELS == 2 pr_warn("Kernel not compiled for PAE. No mitigation for L1TF\n"); return; @@ -673,6 +691,32 @@ static void __init l1tf_select_mitigation(void) setup_force_cpu_cap(X86_FEATURE_L1TF_PTEINV); } + +static int __init l1tf_cmdline(char *str) +{ + if (!boot_cpu_has_bug(X86_BUG_L1TF)) + return 0; + + if (!str) + return -EINVAL; + + if (!strcmp(str, "off")) + l1tf_mitigation = L1TF_MITIGATION_OFF; + else if (!strcmp(str, "flush,nowarn")) + l1tf_mitigation = L1TF_MITIGATION_FLUSH_NOWARN; + else if (!strcmp(str, "flush")) + l1tf_mitigation = L1TF_MITIGATION_FLUSH; + else if (!strcmp(str, "flush,nosmt")) + l1tf_mitigation = L1TF_MITIGATION_FLUSH_NOSMT; + else if (!strcmp(str, "full")) + l1tf_mitigation = L1TF_MITIGATION_FULL; + else if (!strcmp(str, "full,force")) + l1tf_mitigation = L1TF_MITIGATION_FULL_FORCE; + + return 0; +} +early_param("l1tf", l1tf_cmdline); + #undef pr_fmt #ifdef CONFIG_SYSFS diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 49ddeff727d7..03aea499fd5b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -69,9 +69,6 @@ static const struct x86_cpu_id vmx_cpu_id[] = { }; MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id); -static bool __read_mostly nosmt; -module_param(nosmt, bool, S_IRUGO); - static bool __read_mostly enable_vpid = 1; module_param_named(vpid, enable_vpid, bool, 0444); @@ -216,15 +213,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) { struct page *page; - /* If set to 'auto' select 'cond' */ - if (l1tf == VMENTER_L1D_FLUSH_AUTO) - l1tf = VMENTER_L1D_FLUSH_COND; - if (!enable_ept) { l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED; return 0; } + /* If set to auto use the default l1tf mitigation method */ + if (l1tf == VMENTER_L1D_FLUSH_AUTO) { + switch (l1tf_mitigation) { + case L1TF_MITIGATION_OFF: + l1tf = VMENTER_L1D_FLUSH_NEVER; + break; + case L1TF_MITIGATION_FLUSH_NOWARN: + case L1TF_MITIGATION_FLUSH: + case L1TF_MITIGATION_FLUSH_NOSMT: + l1tf = VMENTER_L1D_FLUSH_COND; + break; + case L1TF_MITIGATION_FULL: + case L1TF_MITIGATION_FULL_FORCE: + l1tf = VMENTER_L1D_FLUSH_ALWAYS; + break; + } + } else if (l1tf_mitigation == L1TF_MITIGATION_FULL_FORCE) { + l1tf = VMENTER_L1D_FLUSH_ALWAYS; + } + if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages && !boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); @@ -9500,16 +9513,33 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return ERR_PTR(err); } -#define L1TF_MSG "SMT enabled with L1TF CPU bug present. Refer to CVE-2018-3620 for details.\n" +#define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n" +#define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation disabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.\n" static int vmx_vm_init(struct kvm *kvm) { - if (boot_cpu_has(X86_BUG_L1TF) && cpu_smt_control == CPU_SMT_ENABLED) { - if (nosmt) { - pr_err(L1TF_MSG); - return -EOPNOTSUPP; + if (boot_cpu_has(X86_BUG_L1TF) && enable_ept) { + switch (l1tf_mitigation) { + case L1TF_MITIGATION_OFF: + case L1TF_MITIGATION_FLUSH_NOWARN: + /* 'I explicitly don't care' is set */ + break; + case L1TF_MITIGATION_FLUSH: + case L1TF_MITIGATION_FLUSH_NOSMT: + case L1TF_MITIGATION_FULL: + /* + * Warn upon starting the first VM in a potentially + * insecure environment. + */ + if (cpu_smt_control == CPU_SMT_ENABLED) + pr_warn_once(L1TF_MSG_SMT); + if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER) + pr_warn_once(L1TF_MSG_L1D); + break; + case L1TF_MITIGATION_FULL_FORCE: + /* Flush is enforced */ + break; } - pr_warn(L1TF_MSG); } return 0; } From 93aed2469df1fdef8ed97d6cbb6dd042181fe46e Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 13 Jul 2018 16:23:26 +0200 Subject: [PATCH 587/970] Documentation: Add section about CPU vulnerabilities commit 3ec8ce5d866ec6a08a9cfab82b62acf4a830b35f upstream Add documentation for the L1TF vulnerability and the mitigation mechanisms: - Explain the problem and risks - Document the mitigation mechanisms - Document the command line controls - Document the sysfs files Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Reviewed-by: Josh Poimboeuf Acked-by: Linus Torvalds Link: https://lkml.kernel.org/r/20180713142323.287429944@linutronix.de Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/index.rst | 1 + Documentation/l1tf.rst | 591 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 592 insertions(+) create mode 100644 Documentation/l1tf.rst diff --git a/Documentation/index.rst b/Documentation/index.rst index c53d089455a4..213399aac757 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -12,6 +12,7 @@ Contents: :maxdepth: 2 kernel-documentation + l1tf development-process/index dev-tools/tools driver-api/index diff --git a/Documentation/l1tf.rst b/Documentation/l1tf.rst new file mode 100644 index 000000000000..f2bb28cff439 --- /dev/null +++ b/Documentation/l1tf.rst @@ -0,0 +1,591 @@ +L1TF - L1 Terminal Fault +======================== + +L1 Terminal Fault is a hardware vulnerability which allows unprivileged +speculative access to data which is available in the Level 1 Data Cache +when the page table entry controlling the virtual address, which is used +for the access, has the Present bit cleared or other reserved bits set. + +Affected processors +------------------- + +This vulnerability affects a wide range of Intel processors. The +vulnerability is not present on: + + - Processors from AMD, Centaur and other non Intel vendors + + - Older processor models, where the CPU family is < 6 + + - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, + Penwell, Pineview, Slivermont, Airmont, Merrifield) + + - The Intel Core Duo Yonah variants (2006 - 2008) + + - The Intel XEON PHI family + + - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the + IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected + by the Meltdown vulnerability either. These CPUs should become + available by end of 2018. + +Whether a processor is affected or not can be read out from the L1TF +vulnerability file in sysfs. See :ref:`l1tf_sys_info`. + +Related CVEs +------------ + +The following CVE entries are related to the L1TF vulnerability: + + ============= ================= ============================== + CVE-2018-3615 L1 Terminal Fault SGX related aspects + CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects + CVE-2018-3646 L1 Terminal Fault Virtualization related aspects + ============= ================= ============================== + +Problem +------- + +If an instruction accesses a virtual address for which the relevant page +table entry (PTE) has the Present bit cleared or other reserved bits set, +then speculative execution ignores the invalid PTE and loads the referenced +data if it is present in the Level 1 Data Cache, as if the page referenced +by the address bits in the PTE was still present and accessible. + +While this is a purely speculative mechanism and the instruction will raise +a page fault when it is retired eventually, the pure act of loading the +data and making it available to other speculative instructions opens up the +opportunity for side channel attacks to unprivileged malicious code, +similar to the Meltdown attack. + +While Meltdown breaks the user space to kernel space protection, L1TF +allows to attack any physical memory address in the system and the attack +works across all protection domains. It allows an attack of SGX and also +works from inside virtual machines because the speculation bypasses the +extended page table (EPT) protection mechanism. + + +Attack scenarios +---------------- + +1. Malicious user space +^^^^^^^^^^^^^^^^^^^^^^^ + + Operating Systems store arbitrary information in the address bits of a + PTE which is marked non present. This allows a malicious user space + application to attack the physical memory to which these PTEs resolve. + In some cases user-space can maliciously influence the information + encoded in the address bits of the PTE, thus making attacks more + deterministic and more practical. + + The Linux kernel contains a mitigation for this attack vector, PTE + inversion, which is permanently enabled and has no performance + impact. The kernel ensures that the address bits of PTEs, which are not + marked present, never point to cacheable physical memory space. + + A system with an up to date kernel is protected against attacks from + malicious user space applications. + +2. Malicious guest in a virtual machine +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The fact that L1TF breaks all domain protections allows malicious guest + OSes, which can control the PTEs directly, and malicious guest user + space applications, which run on an unprotected guest kernel lacking the + PTE inversion mitigation for L1TF, to attack physical host memory. + + A special aspect of L1TF in the context of virtualization is symmetric + multi threading (SMT). The Intel implementation of SMT is called + HyperThreading. The fact that Hyperthreads on the affected processors + share the L1 Data Cache (L1D) is important for this. As the flaw allows + only to attack data which is present in L1D, a malicious guest running + on one Hyperthread can attack the data which is brought into the L1D by + the context which runs on the sibling Hyperthread of the same physical + core. This context can be host OS, host user space or a different guest. + + If the processor does not support Extended Page Tables, the attack is + only possible, when the hypervisor does not sanitize the content of the + effective (shadow) page tables. + + While solutions exist to mitigate these attack vectors fully, these + mitigations are not enabled by default in the Linux kernel because they + can affect performance significantly. The kernel provides several + mechanisms which can be utilized to address the problem depending on the + deployment scenario. The mitigations, their protection scope and impact + are described in the next sections. + + The default mitigations and the rationale for chosing them are explained + at the end of this document. See :ref:`default_mitigations`. + +.. _l1tf_sys_info: + +L1TF system information +----------------------- + +The Linux kernel provides a sysfs interface to enumerate the current L1TF +status of the system: whether the system is vulnerable, and which +mitigations are active. The relevant sysfs file is: + +/sys/devices/system/cpu/vulnerabilities/l1tf + +The possible values in this file are: + + =========================== =============================== + 'Not affected' The processor is not vulnerable + 'Mitigation: PTE Inversion' The host protection is active + =========================== =============================== + +If KVM/VMX is enabled and the processor is vulnerable then the following +information is appended to the 'Mitigation: PTE Inversion' part: + + - SMT status: + + ===================== ================ + 'VMX: SMT vulnerable' SMT is enabled + 'VMX: SMT disabled' SMT is disabled + ===================== ================ + + - L1D Flush mode: + + ================================ ==================================== + 'L1D vulnerable' L1D flushing is disabled + + 'L1D conditional cache flushes' L1D flush is conditionally enabled + + 'L1D cache flushes' L1D flush is unconditionally enabled + ================================ ==================================== + +The resulting grade of protection is discussed in the following sections. + + +Host mitigation mechanism +------------------------- + +The kernel is unconditionally protected against L1TF attacks from malicious +user space running on the host. + + +Guest mitigation mechanisms +--------------------------- + +.. _l1d_flush: + +1. L1D flush on VMENTER +^^^^^^^^^^^^^^^^^^^^^^^ + + To make sure that a guest cannot attack data which is present in the L1D + the hypervisor flushes the L1D before entering the guest. + + Flushing the L1D evicts not only the data which should not be accessed + by a potentially malicious guest, it also flushes the guest + data. Flushing the L1D has a performance impact as the processor has to + bring the flushed guest data back into the L1D. Depending on the + frequency of VMEXIT/VMENTER and the type of computations in the guest + performance degradation in the range of 1% to 50% has been observed. For + scenarios where guest VMEXIT/VMENTER are rare the performance impact is + minimal. Virtio and mechanisms like posted interrupts are designed to + confine the VMEXITs to a bare minimum, but specific configurations and + application scenarios might still suffer from a high VMEXIT rate. + + The kernel provides two L1D flush modes: + - conditional ('cond') + - unconditional ('always') + + The conditional mode avoids L1D flushing after VMEXITs which execute + only audited code pathes before the corresponding VMENTER. These code + pathes have beed verified that they cannot expose secrets or other + interesting data to an attacker, but they can leak information about the + address space layout of the hypervisor. + + Unconditional mode flushes L1D on all VMENTER invocations and provides + maximum protection. It has a higher overhead than the conditional + mode. The overhead cannot be quantified correctly as it depends on the + work load scenario and the resulting number of VMEXITs. + + The general recommendation is to enable L1D flush on VMENTER. The kernel + defaults to conditional mode on affected processors. + + **Note**, that L1D flush does not prevent the SMT problem because the + sibling thread will also bring back its data into the L1D which makes it + attackable again. + + L1D flush can be controlled by the administrator via the kernel command + line and sysfs control files. See :ref:`mitigation_control_command_line` + and :ref:`mitigation_control_kvm`. + +.. _guest_confinement: + +2. Guest VCPU confinement to dedicated physical cores +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + To address the SMT problem, it is possible to make a guest or a group of + guests affine to one or more physical cores. The proper mechanism for + that is to utilize exclusive cpusets to ensure that no other guest or + host tasks can run on these cores. + + If only a single guest or related guests run on sibling SMT threads on + the same physical core then they can only attack their own memory and + restricted parts of the host memory. + + Host memory is attackable, when one of the sibling SMT threads runs in + host OS (hypervisor) context and the other in guest context. The amount + of valuable information from the host OS context depends on the context + which the host OS executes, i.e. interrupts, soft interrupts and kernel + threads. The amount of valuable data from these contexts cannot be + declared as non-interesting for an attacker without deep inspection of + the code. + + **Note**, that assigning guests to a fixed set of physical cores affects + the ability of the scheduler to do load balancing and might have + negative effects on CPU utilization depending on the hosting + scenario. Disabling SMT might be a viable alternative for particular + scenarios. + + For further information about confining guests to a single or to a group + of cores consult the cpusets documentation: + + https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt + +.. _interrupt_isolation: + +3. Interrupt affinity +^^^^^^^^^^^^^^^^^^^^^ + + Interrupts can be made affine to logical CPUs. This is not universally + true because there are types of interrupts which are truly per CPU + interrupts, e.g. the local timer interrupt. Aside of that multi queue + devices affine their interrupts to single CPUs or groups of CPUs per + queue without allowing the administrator to control the affinities. + + Moving the interrupts, which can be affinity controlled, away from CPUs + which run untrusted guests, reduces the attack vector space. + + Whether the interrupts with are affine to CPUs, which run untrusted + guests, provide interesting data for an attacker depends on the system + configuration and the scenarios which run on the system. While for some + of the interrupts it can be assumed that they wont expose interesting + information beyond exposing hints about the host OS memory layout, there + is no way to make general assumptions. + + Interrupt affinity can be controlled by the administrator via the + /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is + available at: + + https://www.kernel.org/doc/Documentation/IRQ-affinity.txt + +.. _smt_control: + +4. SMT control +^^^^^^^^^^^^^^ + + To prevent the SMT issues of L1TF it might be necessary to disable SMT + completely. Disabling SMT can have a significant performance impact, but + the impact depends on the hosting scenario and the type of workloads. + The impact of disabling SMT needs also to be weighted against the impact + of other mitigation solutions like confining guests to dedicated cores. + + The kernel provides a sysfs interface to retrieve the status of SMT and + to control it. It also provides a kernel command line interface to + control SMT. + + The kernel command line interface consists of the following options: + + =========== ========================================================== + nosmt Affects the bring up of the secondary CPUs during boot. The + kernel tries to bring all present CPUs online during the + boot process. "nosmt" makes sure that from each physical + core only one - the so called primary (hyper) thread is + activated. Due to a design flaw of Intel processors related + to Machine Check Exceptions the non primary siblings have + to be brought up at least partially and are then shut down + again. "nosmt" can be undone via the sysfs interface. + + nosmt=force Has the same effect as "nosmt' but it does not allow to + undo the SMT disable via the sysfs interface. + =========== ========================================================== + + The sysfs interface provides two files: + + - /sys/devices/system/cpu/smt/control + - /sys/devices/system/cpu/smt/active + + /sys/devices/system/cpu/smt/control: + + This file allows to read out the SMT control state and provides the + ability to disable or (re)enable SMT. The possible states are: + + ============== =================================================== + on SMT is supported by the CPU and enabled. All + logical CPUs can be onlined and offlined without + restrictions. + + off SMT is supported by the CPU and disabled. Only + the so called primary SMT threads can be onlined + and offlined without restrictions. An attempt to + online a non-primary sibling is rejected + + forceoff Same as 'off' but the state cannot be controlled. + Attempts to write to the control file are rejected. + + notsupported The processor does not support SMT. It's therefore + not affected by the SMT implications of L1TF. + Attempts to write to the control file are rejected. + ============== =================================================== + + The possible states which can be written into this file to control SMT + state are: + + - on + - off + - forceoff + + /sys/devices/system/cpu/smt/active: + + This file reports whether SMT is enabled and active, i.e. if on any + physical core two or more sibling threads are online. + + SMT control is also possible at boot time via the l1tf kernel command + line parameter in combination with L1D flush control. See + :ref:`mitigation_control_command_line`. + +5. Disabling EPT +^^^^^^^^^^^^^^^^ + + Disabling EPT for virtual machines provides full mitigation for L1TF even + with SMT enabled, because the effective page tables for guests are + managed and sanitized by the hypervisor. Though disabling EPT has a + significant performance impact especially when the Meltdown mitigation + KPTI is enabled. + + EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. + +There is ongoing research and development for new mitigation mechanisms to +address the performance impact of disabling SMT or EPT. + +.. _mitigation_control_command_line: + +Mitigation control on the kernel command line +--------------------------------------------- + +The kernel command line allows to control the L1TF mitigations at boot +time with the option "l1tf=". The valid arguments for this option are: + + ============ ============================================================= + full Provides all available mitigations for the L1TF + vulnerability. Disables SMT and enables all mitigations in + the hypervisors, i.e. unconditional L1D flushing + + SMT control and L1D flush control via the sysfs interface + is still possible after boot. Hypervisors will issue a + warning when the first VM is started in a potentially + insecure configuration, i.e. SMT enabled or L1D flush + disabled. + + full,force Same as 'full', but disables SMT and L1D flush runtime + control. Implies the 'nosmt=force' command line option. + (i.e. sysfs control of SMT is disabled.) + + flush Leaves SMT enabled and enables the default hypervisor + mitigation, i.e. conditional L1D flushing + + SMT control and L1D flush control via the sysfs interface + is still possible after boot. Hypervisors will issue a + warning when the first VM is started in a potentially + insecure configuration, i.e. SMT enabled or L1D flush + disabled. + + flush,nosmt Disables SMT and enables the default hypervisor mitigation, + i.e. conditional L1D flushing. + + SMT control and L1D flush control via the sysfs interface + is still possible after boot. Hypervisors will issue a + warning when the first VM is started in a potentially + insecure configuration, i.e. SMT enabled or L1D flush + disabled. + + flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is + started in a potentially insecure configuration. + + off Disables hypervisor mitigations and doesn't emit any + warnings. + ============ ============================================================= + +The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`. + + +.. _mitigation_control_kvm: + +Mitigation control for KVM - module parameter +------------------------------------------------------------- + +The KVM hypervisor mitigation mechanism, flushing the L1D cache when +entering a guest, can be controlled with a module parameter. + +The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the +following arguments: + + ============ ============================================================== + always L1D cache flush on every VMENTER. + + cond Flush L1D on VMENTER only when the code between VMEXIT and + VMENTER can leak host memory which is considered + interesting for an attacker. This still can leak host memory + which allows e.g. to determine the hosts address space layout. + + never Disables the mitigation + ============ ============================================================== + +The parameter can be provided on the kernel command line, as a module +parameter when loading the modules and at runtime modified via the sysfs +file: + +/sys/module/kvm_intel/parameters/vmentry_l1d_flush + +The default is 'cond'. If 'l1tf=full,force' is given on the kernel command +line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush +module parameter is ignored and writes to the sysfs file are rejected. + + +Mitigation selection guide +-------------------------- + +1. No virtualization in use +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + The system is protected by the kernel unconditionally and no further + action is required. + +2. Virtualization with trusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + If the guest comes from a trusted source and the guest OS kernel is + guaranteed to have the L1TF mitigations in place the system is fully + protected against L1TF and no further action is required. + + To avoid the overhead of the default L1D flushing on VMENTER the + administrator can disable the flushing via the kernel command line and + sysfs control files. See :ref:`mitigation_control_command_line` and + :ref:`mitigation_control_kvm`. + + +3. Virtualization with untrusted guests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +3.1. SMT not supported or disabled +"""""""""""""""""""""""""""""""""" + + If SMT is not supported by the processor or disabled in the BIOS or by + the kernel, it's only required to enforce L1D flushing on VMENTER. + + Conditional L1D flushing is the default behaviour and can be tuned. See + :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. + +3.2. EPT not supported or disabled +"""""""""""""""""""""""""""""""""" + + If EPT is not supported by the processor or disabled in the hypervisor, + the system is fully protected. SMT can stay enabled and L1D flushing on + VMENTER is not required. + + EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. + +3.3. SMT and EPT supported and active +""""""""""""""""""""""""""""""""""""" + + If SMT and EPT are supported and active then various degrees of + mitigations can be employed: + + - L1D flushing on VMENTER: + + L1D flushing on VMENTER is the minimal protection requirement, but it + is only potent in combination with other mitigation methods. + + Conditional L1D flushing is the default behaviour and can be tuned. See + :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. + + - Guest confinement: + + Confinement of guests to a single or a group of physical cores which + are not running any other processes, can reduce the attack surface + significantly, but interrupts, soft interrupts and kernel threads can + still expose valuable data to a potential attacker. See + :ref:`guest_confinement`. + + - Interrupt isolation: + + Isolating the guest CPUs from interrupts can reduce the attack surface + further, but still allows a malicious guest to explore a limited amount + of host physical memory. This can at least be used to gain knowledge + about the host address space layout. The interrupts which have a fixed + affinity to the CPUs which run the untrusted guests can depending on + the scenario still trigger soft interrupts and schedule kernel threads + which might expose valuable information. See + :ref:`interrupt_isolation`. + +The above three mitigation methods combined can provide protection to a +certain degree, but the risk of the remaining attack surface has to be +carefully analyzed. For full protection the following methods are +available: + + - Disabling SMT: + + Disabling SMT and enforcing the L1D flushing provides the maximum + amount of protection. This mitigation is not depending on any of the + above mitigation methods. + + SMT control and L1D flushing can be tuned by the command line + parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run + time with the matching sysfs control files. See :ref:`smt_control`, + :ref:`mitigation_control_command_line` and + :ref:`mitigation_control_kvm`. + + - Disabling EPT: + + Disabling EPT provides the maximum amount of protection as well. It is + not depending on any of the above mitigation methods. SMT can stay + enabled and L1D flushing is not required, but the performance impact is + significant. + + EPT can be disabled in the hypervisor via the 'kvm-intel.ept' + parameter. + + +.. _default_mitigations: + +Default mitigations +------------------- + + The kernel default mitigations for vulnerable processors are: + + - PTE inversion to protect against malicious user space. This is done + unconditionally and cannot be controlled. + + - L1D conditional flushing on VMENTER when EPT is enabled for + a guest. + + The kernel does not by default enforce the disabling of SMT, which leaves + SMT systems vulnerable when running untrusted guests with EPT enabled. + + The rationale for this choice is: + + - Force disabling SMT can break existing setups, especially with + unattended updates. + + - If regular users run untrusted guests on their machine, then L1TF is + just an add on to other malware which might be embedded in an untrusted + guest, e.g. spam-bots or attacks on the local network. + + There is no technical way to prevent a user from running untrusted code + on their machines blindly. + + - It's technically extremely unlikely and from today's knowledge even + impossible that L1TF can be exploited via the most popular attack + mechanisms like JavaScript because these mechanisms have no way to + control PTEs. If this would be possible and not other mitigation would + be possible, then the default might be different. + + - The administrators of cloud and hosting setups have to carefully + analyze the risk for their scenarios and make the appropriate + mitigation choices, which might even vary across their deployed + machines and also result in other changes of their overall setup. + There is no way for the kernel to provide a sensible default for this + kind of scenarios. From 587d499c8bd203f6158779b5782a07fe7a5bcea8 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Wed, 18 Jul 2018 19:07:38 +0200 Subject: [PATCH 588/970] x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content commit 288d152c23dcf3c09da46c5c481903ca10ebfef7 upstream The slow path in vmx_l1d_flush() reads from vmx_l1d_flush_pages in order to evict the L1d cache. However, these pages are never cleared and, in theory, their data could be leaked. More importantly, KSM could merge a nested hypervisor's vmx_l1d_flush_pages to fewer than 1 << L1D_CACHE_ORDER host physical pages and this would break the L1d flushing algorithm: L1D on x86_64 is tagged by physical addresses. Fix this by initializing the individual vmx_l1d_flush_pages with a different pattern each. Rename the "empty_zp" asm constraint identifier in vmx_l1d_flush() to "flush_pages" to reflect this change. Fixes: a47dd5f06714 ("x86/KVM/VMX: Add L1D flush algorithm") Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 03aea499fd5b..58c462325621 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -212,6 +212,7 @@ static void *vmx_l1d_flush_pages; static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) { struct page *page; + unsigned int i; if (!enable_ept) { l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED; @@ -244,6 +245,16 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) if (!page) return -ENOMEM; vmx_l1d_flush_pages = page_address(page); + + /* + * Initialize each page with a different pattern in + * order to protect against KSM in the nested + * virtualization case. + */ + for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) { + memset(vmx_l1d_flush_pages + i * PAGE_SIZE, i + 1, + PAGE_SIZE); + } } l1tf_vmx_mitigation = l1tf; @@ -8675,7 +8686,7 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) /* First ensure the pages are in the TLB */ "xorl %%eax, %%eax\n" ".Lpopulate_tlb:\n\t" - "movzbl (%[empty_zp], %%" _ASM_AX "), %%ecx\n\t" + "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" "addl $4096, %%eax\n\t" "cmpl %%eax, %[size]\n\t" "jne .Lpopulate_tlb\n\t" @@ -8684,12 +8695,12 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) /* Now fill the cache */ "xorl %%eax, %%eax\n" ".Lfill_cache:\n" - "movzbl (%[empty_zp], %%" _ASM_AX "), %%ecx\n\t" + "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" "addl $64, %%eax\n\t" "cmpl %%eax, %[size]\n\t" "jne .Lfill_cache\n\t" "lfence\n" - :: [empty_zp] "r" (vmx_l1d_flush_pages), + :: [flush_pages] "r" (vmx_l1d_flush_pages), [size] "r" (size) : "eax", "ebx", "ecx", "edx"); } From 03b3614d4d6febe96117b9e5edc4941a8265e844 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Thu, 19 Jul 2018 13:49:58 -0700 Subject: [PATCH 589/970] Documentation/l1tf: Fix typos commit 1949f9f49792d65dba2090edddbe36a5f02e3ba3 upstream Fix spelling and other typos Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/l1tf.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/l1tf.rst b/Documentation/l1tf.rst index f2bb28cff439..ccf649cabdcd 100644 --- a/Documentation/l1tf.rst +++ b/Documentation/l1tf.rst @@ -17,7 +17,7 @@ vulnerability is not present on: - Older processor models, where the CPU family is < 6 - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, - Penwell, Pineview, Slivermont, Airmont, Merrifield) + Penwell, Pineview, Silvermont, Airmont, Merrifield) - The Intel Core Duo Yonah variants (2006 - 2008) @@ -113,7 +113,7 @@ Attack scenarios deployment scenario. The mitigations, their protection scope and impact are described in the next sections. - The default mitigations and the rationale for chosing them are explained + The default mitigations and the rationale for choosing them are explained at the end of this document. See :ref:`default_mitigations`. .. _l1tf_sys_info: @@ -191,15 +191,15 @@ Guest mitigation mechanisms - unconditional ('always') The conditional mode avoids L1D flushing after VMEXITs which execute - only audited code pathes before the corresponding VMENTER. These code - pathes have beed verified that they cannot expose secrets or other + only audited code paths before the corresponding VMENTER. These code + paths have been verified that they cannot expose secrets or other interesting data to an attacker, but they can leak information about the address space layout of the hypervisor. Unconditional mode flushes L1D on all VMENTER invocations and provides maximum protection. It has a higher overhead than the conditional mode. The overhead cannot be quantified correctly as it depends on the - work load scenario and the resulting number of VMEXITs. + workload scenario and the resulting number of VMEXITs. The general recommendation is to enable L1D flush on VMENTER. The kernel defaults to conditional mode on affected processors. @@ -262,7 +262,7 @@ Guest mitigation mechanisms Whether the interrupts with are affine to CPUs, which run untrusted guests, provide interesting data for an attacker depends on the system configuration and the scenarios which run on the system. While for some - of the interrupts it can be assumed that they wont expose interesting + of the interrupts it can be assumed that they won't expose interesting information beyond exposing hints about the host OS memory layout, there is no way to make general assumptions. @@ -299,7 +299,7 @@ Guest mitigation mechanisms to be brought up at least partially and are then shut down again. "nosmt" can be undone via the sysfs interface. - nosmt=force Has the same effect as "nosmt' but it does not allow to + nosmt=force Has the same effect as "nosmt" but it does not allow to undo the SMT disable via the sysfs interface. =========== ========================================================== From 8b1969db5567d49dd32c1f93fa9d7295a2c238a0 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 25 Jul 2018 12:00:27 +0200 Subject: [PATCH 590/970] cpu/hotplug: detect SMT disabled by BIOS commit 73d5e2b472640b1fcdb61ae8be389912ef211bda upstream If SMT is disabled in BIOS, the CPU code doesn't properly detect it. The /sys/devices/system/cpu/smt/control file shows 'on', and the 'l1tf' vulnerabilities file shows SMT as vulnerable. Fix it by forcing 'cpu_smt_control' to CPU_SMT_NOT_SUPPORTED in such a case. Unfortunately the detection can only be done after bringing all the CPUs online, so we have to overwrite any previous writes to the variable. Reported-by: Joe Mario Tested-by: Jiri Kosina Fixes: f048c399e0f7 ("x86/topology: Provide topology_smt_supported()") Signed-off-by: Josh Poimboeuf Signed-off-by: Peter Zijlstra Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- kernel/cpu.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index 8a2ad1587280..3b3a3d782c0f 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2063,6 +2063,15 @@ static const struct attribute_group cpuhp_smt_attr_group = { static int __init cpu_smt_state_init(void) { + /* + * If SMT was disabled by BIOS, detect it here, after the CPUs have + * been brought online. This ensures the smt/l1tf sysfs entries are + * consistent with reality. Note this may overwrite cpu_smt_control's + * previous setting. + */ + if (topology_max_smt_threads() == 1) + cpu_smt_control = CPU_SMT_NOT_SUPPORTED; + return sysfs_create_group(&cpu_subsys.dev_root->kobj, &cpuhp_smt_attr_group); } From 698ac1bc17c413fd340c243d64fb15cbaadf7178 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Sat, 21 Jul 2018 22:16:56 +0200 Subject: [PATCH 591/970] x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush() commit 379fd0c7e6a391e5565336a646f19f218fb98c6c upstream vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no point in setting l1tf_flush_l1d to true from there again. Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 58c462325621..0b2dd1463f0c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8665,15 +8665,15 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) /* * This code is only executed when the the flush mode is 'cond' or * 'always' - * - * If 'flush always', keep the flush bit set, otherwise clear - * it. The flush bit gets set again either from vcpu_run() or from - * one of the unsafe VMEXIT handlers. */ - if (static_branch_unlikely(&vmx_l1d_flush_always)) - vcpu->arch.l1tf_flush_l1d = true; - else + if (!static_branch_unlikely(&vmx_l1d_flush_always)) { + /* + * Clear the flush bit, it gets set again either from + * vcpu_run() or from one of the unsafe VMEXIT + * handlers. + */ vcpu->arch.l1tf_flush_l1d = false; + } vcpu->stat.l1d_flush++; From 936f566260c2ae883e41301edafa1afb2ba11241 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Sat, 21 Jul 2018 22:25:00 +0200 Subject: [PATCH 592/970] x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond' commit 427362a142441f08051369db6fbe7f61c73b3dca upstream The vmx_l1d_flush_always static key is only ever evaluated if vmx_l1d_should_flush is enabled. In that case however, there are only two L1d flushing modes possible: "always" and "conditional". The "conditional" mode's implementation tends to require more sophisticated logic than the "always" mode. Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key with a 'vmx_l1d_flush_cond' one. There is no change in functionality. Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0b2dd1463f0c..796c4eb2cd8f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -190,7 +190,7 @@ module_param(ple_window_max, int, S_IRUGO); extern const ulong vmx_return; static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); -static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_always); +static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond); static DEFINE_MUTEX(vmx_l1d_flush_mutex); /* Storage for pre module init parameter parsing */ @@ -264,10 +264,10 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) else static_branch_disable(&vmx_l1d_should_flush); - if (l1tf == VMENTER_L1D_FLUSH_ALWAYS) - static_branch_enable(&vmx_l1d_flush_always); + if (l1tf == VMENTER_L1D_FLUSH_COND) + static_branch_enable(&vmx_l1d_flush_cond); else - static_branch_disable(&vmx_l1d_flush_always); + static_branch_disable(&vmx_l1d_flush_cond); return 0; } @@ -8666,7 +8666,7 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) * This code is only executed when the the flush mode is 'cond' or * 'always' */ - if (!static_branch_unlikely(&vmx_l1d_flush_always)) { + if (static_branch_likely(&vmx_l1d_flush_cond)) { /* * Clear the flush bit, it gets set again either from * vcpu_run() or from one of the unsafe VMEXIT From 90bc306b76b8923e365b8e59ddc6968441594970 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Sat, 21 Jul 2018 22:35:28 +0200 Subject: [PATCH 593/970] x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush() commit 5b6ccc6c3b1a477fbac9ec97a0b4c1c48e765209 upstream Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes vmx_l1d_flush() if so. This test is unncessary for the "always flush L1D" mode. Move the check to vmx_l1d_flush()'s conditional mode code path. Notes: - vmx_l1d_flush() is likely to get inlined anyway and thus, there's no extra function call. - This inverts the (static) branch prediction, but there hadn't been any explicit likely()/unlikely() annotations before and so it stays as is. Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 796c4eb2cd8f..df1a72b59ad2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8667,12 +8667,16 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) * 'always' */ if (static_branch_likely(&vmx_l1d_flush_cond)) { + bool flush_l1d = vcpu->arch.l1tf_flush_l1d; + /* * Clear the flush bit, it gets set again either from * vcpu_run() or from one of the unsafe VMEXIT * handlers. */ vcpu->arch.l1tf_flush_l1d = false; + if (!flush_l1d) + return; } vcpu->stat.l1d_flush++; @@ -9162,10 +9166,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx->__launched = vmx->loaded_vmcs->launched; - if (static_branch_unlikely(&vmx_l1d_should_flush)) { - if (vcpu->arch.l1tf_flush_l1d) - vmx_l1d_flush(vcpu); - } + if (static_branch_unlikely(&vmx_l1d_should_flush)) + vmx_l1d_flush(vcpu); asm( /* Store host registers */ From 5766dc12985ca8fdba999d7b5d35035f252b27cf Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Fri, 27 Jul 2018 12:46:29 +0200 Subject: [PATCH 594/970] x86/irq: Demote irq_cpustat_t::__softirq_pending to u16 commit 9aee5f8a7e30330d0a8f4c626dc924ca5590aba5 upstream An upcoming patch will extend KVM's L1TF mitigation in conditional mode to also cover interrupts after VMEXITs. For tracking those, stores to a new per-cpu flag from interrupt handlers will become necessary. In order to improve cache locality, this new flag will be added to x86's irq_cpustat_t. Make some space available there by shrinking the ->softirq_pending bitfield from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS, i.e. 10. Suggested-by: Paolo Bonzini Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Reviewed-by: Paolo Bonzini Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/hardirq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 9b76cd331990..332613c41ae1 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -5,7 +5,7 @@ #include typedef struct { - unsigned int __softirq_pending; + u16 __softirq_pending; unsigned int __nmi_count; /* arch dependent */ #ifdef CONFIG_X86_LOCAL_APIC unsigned int apic_timer_irqs; /* arch dependent */ From e371c92e168df9c0713bd4085fbb8501d88b297a Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Fri, 27 Jul 2018 13:22:16 +0200 Subject: [PATCH 595/970] x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d commit 45b575c00d8e72d69d75dd8c112f044b7b01b069 upstream Part of the L1TF mitigation for vmx includes flushing the L1D cache upon VMENTRY. L1D flushes are costly and two modes of operations are provided to users: "always" and the more selective "conditional" mode. If operating in the latter, the cache would get flushed only if a host side code path considered unconfined had been traversed. "Unconfined" in this context means that it might have pulled in sensitive data like user data or kernel crypto keys. The need for L1D flushes is tracked by means of the per-vcpu flag l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a L1d flush based on its value and reset that flag again. Currently, interrupts delivered "normally" while in root operation between VMEXIT and VMENTER are not taken into account. Part of the reason is that these don't leave any traces and thus, the vmx code is unable to tell if any such has happened. As proposed by Paolo Bonzini, prepare for tracking all interrupts by introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in strong analogy to the per-vcpu ->l1tf_flush_l1d. A later patch will make interrupt handlers set it. For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86' per-cpu irq_cpustat_t as suggested by Peter Zijlstra. Provide the helpers kvm_set_cpu_l1tf_flush_l1d(), kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate. Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as l1tf_flush_l1d. Suggested-by: Paolo Bonzini Suggested-by: Peter Zijlstra Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Reviewed-by: Paolo Bonzini Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/hardirq.h | 23 +++++++++++++++++++++++ arch/x86/kvm/vmx.c | 17 +++++++++++++---- 2 files changed, 36 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 332613c41ae1..a61b70efe7a5 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -6,6 +6,9 @@ typedef struct { u16 __softirq_pending; +#if IS_ENABLED(CONFIG_KVM_INTEL) + u8 kvm_cpu_l1tf_flush_l1d; +#endif unsigned int __nmi_count; /* arch dependent */ #ifdef CONFIG_X86_LOCAL_APIC unsigned int apic_timer_irqs; /* arch dependent */ @@ -60,4 +63,24 @@ extern u64 arch_irq_stat_cpu(unsigned int cpu); extern u64 arch_irq_stat(void); #define arch_irq_stat arch_irq_stat + +#if IS_ENABLED(CONFIG_KVM_INTEL) +static inline void kvm_set_cpu_l1tf_flush_l1d(void) +{ + __this_cpu_write(irq_stat.kvm_cpu_l1tf_flush_l1d, 1); +} + +static inline void kvm_clear_cpu_l1tf_flush_l1d(void) +{ + __this_cpu_write(irq_stat.kvm_cpu_l1tf_flush_l1d, 0); +} + +static inline bool kvm_get_cpu_l1tf_flush_l1d(void) +{ + return __this_cpu_read(irq_stat.kvm_cpu_l1tf_flush_l1d); +} +#else /* !IS_ENABLED(CONFIG_KVM_INTEL) */ +static inline void kvm_set_cpu_l1tf_flush_l1d(void) { } +#endif /* IS_ENABLED(CONFIG_KVM_INTEL) */ + #endif /* _ASM_X86_HARDIRQ_H */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index df1a72b59ad2..306a11460633 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8667,14 +8667,23 @@ static void vmx_l1d_flush(struct kvm_vcpu *vcpu) * 'always' */ if (static_branch_likely(&vmx_l1d_flush_cond)) { - bool flush_l1d = vcpu->arch.l1tf_flush_l1d; + bool flush_l1d; /* - * Clear the flush bit, it gets set again either from - * vcpu_run() or from one of the unsafe VMEXIT - * handlers. + * Clear the per-vcpu flush bit, it gets set again + * either from vcpu_run() or from one of the unsafe + * VMEXIT handlers. */ + flush_l1d = vcpu->arch.l1tf_flush_l1d; vcpu->arch.l1tf_flush_l1d = false; + + /* + * Clear the per-cpu flush bit, it gets set again from + * the interrupt handlers. + */ + flush_l1d |= kvm_get_cpu_l1tf_flush_l1d(); + kvm_clear_cpu_l1tf_flush_l1d(); + if (!flush_l1d) return; } From 8574df1a8741f6cce1f2fbdd921b07adeec8d932 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Sun, 29 Jul 2018 12:15:33 +0200 Subject: [PATCH 596/970] x86: Don't include linux/irq.h from asm/hardirq.h commit 447ae316670230d7d29430e2cbf1f5db4f49d14c upstream The next patch in this series will have to make the definition of irq_cpustat_t available to entering_irq(). Inclusion of asm/hardirq.h into asm/apic.h would cause circular header dependencies like asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/topology.h linux/smp.h asm/smp.h or linux/gfp.h linux/mmzone.h asm/mmzone.h asm/mmzone_64.h asm/smp.h asm/apic.h asm/hardirq.h linux/irq.h linux/irqdesc.h linux/kobject.h linux/sysfs.h linux/kernfs.h linux/idr.h linux/gfp.h and others. This causes compilation errors because of the header guards becoming effective in the second inclusion: symbols/macros that had been defined before wouldn't be available to intermediate headers in the #include chain anymore. A possible workaround would be to move the definition of irq_cpustat_t into its own header and include that from both, asm/hardirq.h and asm/apic.h. However, this wouldn't solve the real problem, namely asm/harirq.h unnecessarily pulling in all the linux/irq.h cruft: nothing in asm/hardirq.h itself requires it. Also, note that there are some other archs, like e.g. arm64, which don't have that #include in their asm/hardirq.h. Remove the linux/irq.h #include from x86' asm/hardirq.h. Fix resulting compilation errors by adding appropriate #includes to *.c files as needed. Note that some of these *.c files could be cleaned up a bit wrt. to their set of #includes, but that should better be done from separate patches, if at all. Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner [dwmw2: More fixes for EFI and Xen in 4.9] Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/dmi.h | 2 +- arch/x86/include/asm/hardirq.h | 1 - arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kernel/apic/apic.c | 2 ++ arch/x86/kernel/apic/htirq.c | 2 ++ arch/x86/kernel/apic/io_apic.c | 1 + arch/x86/kernel/apic/msi.c | 1 + arch/x86/kernel/apic/vector.c | 1 + arch/x86/kernel/fpu/core.c | 1 + arch/x86/kernel/ftrace.c | 1 + arch/x86/kernel/hpet.c | 1 + arch/x86/kernel/i8259.c | 1 + arch/x86/kernel/irq.c | 1 + arch/x86/kernel/irq_32.c | 1 + arch/x86/kernel/irq_64.c | 1 + arch/x86/kernel/irqinit.c | 1 + arch/x86/kernel/kprobes/core.c | 1 + arch/x86/kernel/kprobes/opt.c | 1 + arch/x86/kernel/smpboot.c | 1 + arch/x86/kernel/time.c | 1 + arch/x86/mm/fault.c | 1 + arch/x86/mm/kaiser.c | 1 + arch/x86/platform/efi/efi_64.c | 1 + arch/x86/platform/efi/quirks.c | 1 + arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c | 1 + arch/x86/xen/enlighten.c | 1 + arch/x86/xen/setup.c | 1 + drivers/pci/host/pci-hyperv.c | 2 ++ 28 files changed, 30 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/dmi.h b/arch/x86/include/asm/dmi.h index 3c69fed215c5..d8b95604a2e7 100644 --- a/arch/x86/include/asm/dmi.h +++ b/arch/x86/include/asm/dmi.h @@ -3,8 +3,8 @@ #include #include +#include -#include #include static __always_inline __init void *dmi_alloc(unsigned len) diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index a61b70efe7a5..987165924a32 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -2,7 +2,6 @@ #define _ASM_X86_HARDIRQ_H #include -#include typedef struct { u16 __softirq_pending; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 05678da200e1..ce6a70ac237a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -17,6 +17,7 @@ #include #include #include +#include #include #include diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 28a1531a3d74..e85080b27231 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -55,6 +56,7 @@ #include #include #include +#include unsigned int num_processors; diff --git a/arch/x86/kernel/apic/htirq.c b/arch/x86/kernel/apic/htirq.c index ae50d3454d78..89d6e96d0038 100644 --- a/arch/x86/kernel/apic/htirq.c +++ b/arch/x86/kernel/apic/htirq.c @@ -16,6 +16,8 @@ #include #include #include +#include + #include #include #include diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index cf89928dbd46..d34629d70421 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -32,6 +32,7 @@ #include #include +#include #include #include #include diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 015bbf30e3e3..cfd17a3518bb 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -12,6 +12,7 @@ */ #include #include +#include #include #include #include diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 4922ab66fd29..c6bd3f9b4383 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -11,6 +11,7 @@ * published by the Free Software Foundation. */ #include +#include #include #include #include diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 96d80dfac383..430c095cfa0e 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 6bf09f5594b2..5e06ffefc5db 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -26,6 +26,7 @@ #include #include +#include #include #include diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c index 9512529e8eab..756634f14df6 100644 --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1,6 +1,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c index 4e3b8a587c88..26d5451b6b42 100644 --- a/arch/x86/kernel/i8259.c +++ b/arch/x86/kernel/i8259.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 8a7ad9fb22c1..c6f0ef1d9ab7 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index 2763573ee1d2..5aaa39a10823 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -10,6 +10,7 @@ #include #include +#include #include #include #include diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c index 9ebd0b0e73d9..bcd1b82c86e8 100644 --- a/arch/x86/kernel/irq_64.c +++ b/arch/x86/kernel/irq_64.c @@ -10,6 +10,7 @@ #include #include +#include #include #include #include diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index f480b38a03c3..eeb77e5e5179 100644 --- a/arch/x86/kernel/irqinit.c +++ b/arch/x86/kernel/irqinit.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 1fa7783fa51d..64a70b2e2285 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -61,6 +61,7 @@ #include #include #include +#include #include "common.h" diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 1808a9cc7701..1009d63a2b79 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -39,6 +39,7 @@ #include #include #include +#include #include "common.h" diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 168cc3baadf6..8e0fbce6dfdd 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -76,6 +76,7 @@ #include #include #include +#include /* Number of siblings per CPU package */ int smp_num_siblings = 1; diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c index d39c09119db6..f8a0518d2810 100644 --- a/arch/x86/kernel/time.c +++ b/arch/x86/kernel/time.c @@ -11,6 +11,7 @@ #include #include +#include #include #include #include diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ae23c996e3a8..acef3c6a32a2 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -23,6 +23,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include /* vma_pkey() */ +#include #define CREATE_TRACE_POINTS #include diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c index ec678aafa3f8..3f729e20f0e3 100644 --- a/arch/x86/mm/kaiser.c +++ b/arch/x86/mm/kaiser.c @@ -20,6 +20,7 @@ #include #include #include +#include int kaiser_enabled __read_mostly = 1; EXPORT_SYMBOL(kaiser_enabled); /* for inlined TLB flush functions */ diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index dcb2d9d185a2..351a55dc4a1d 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -45,6 +45,7 @@ #include #include #include +#include /* * We allocate runtime services regions bottom-up, starting from -4G, i.e. diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 393a0c0288d1..dee99391d7b2 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -13,6 +13,7 @@ #include #include #include +#include #define EFI_MIN_RESERVE 5120 diff --git a/arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c b/arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c index 10bad1e55fcc..85e112ea7aff 100644 --- a/arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c +++ b/arch/x86/platform/intel-mid/device_libs/platform_mrfld_wdt.c @@ -18,6 +18,7 @@ #include #include #include +#include #define TANGIER_EXT_TIMER0_MSI 12 diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 2986a13b9786..db7cf8727e1c 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -35,6 +35,7 @@ #include #include +#include #include #include diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 9f21b0c5945d..36bfafb2a853 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index d392a55ec0a9..b4d8ccfd9f7c 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -52,6 +52,8 @@ #include #include #include +#include + #include #include #include From 2c5a3a05474011cb84a1b6c45543d56c324cadca Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Sun, 29 Jul 2018 13:06:04 +0200 Subject: [PATCH 597/970] x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d commit ffcba43ff66c7dab34ec700debd491d2a4d319b4 upstream The last missing piece to having vmx_l1d_flush() take interrupts after VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on irq entry. Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(), ipi_entering_ack_irq(), smp_reschedule_interrupt() and uv_bau_message_interrupt(). Suggested-by: Paolo Bonzini Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/apic.h | 3 +++ arch/x86/kernel/smp.c | 1 + arch/x86/platform/uv/tlb_uv.c | 1 + 3 files changed, 5 insertions(+) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 20eed46c5827..2188b5af8167 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -12,6 +12,7 @@ #include #include #include +#include #define ARCH_APICTIMER_STOPS_ON_C3 1 @@ -647,6 +648,7 @@ static inline void entering_irq(void) { irq_enter(); exit_idle(); + kvm_set_cpu_l1tf_flush_l1d(); } static inline void entering_ack_irq(void) @@ -659,6 +661,7 @@ static inline void ipi_entering_ack_irq(void) { irq_enter(); ack_APIC_irq(); + kvm_set_cpu_l1tf_flush_l1d(); } static inline void exiting_irq(void) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index ea217caa731c..2863ad306692 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -271,6 +271,7 @@ __visible void __irq_entry smp_reschedule_interrupt(struct pt_regs *regs) /* * KVM uses this interrupt to force a cpu out of guest mode */ + kvm_set_cpu_l1tf_flush_l1d(); } __visible void __irq_entry smp_trace_reschedule_interrupt(struct pt_regs *regs) diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c index 0f0175186f1b..16d4967d59ea 100644 --- a/arch/x86/platform/uv/tlb_uv.c +++ b/arch/x86/platform/uv/tlb_uv.c @@ -1283,6 +1283,7 @@ void uv_bau_message_interrupt(struct pt_regs *regs) struct msg_desc msgdesc; ack_APIC_irq(); + kvm_set_cpu_l1tf_flush_l1d(); time_start = get_cycles(); bcp = &per_cpu(bau_control, smp_processor_id()); From 77a83b3a622a0fbdcb7c0d81c853649dbc0eb7a2 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Sun, 22 Jul 2018 13:38:18 +0200 Subject: [PATCH 598/970] x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr() commit 18b57ce2eb8c8b9a24174a89250cf5f57c76ecdc upstream For VMEXITs caused by external interrupts, vmx_handle_external_intr() indirectly calls into the interrupt handlers through the host's IDT. It follows that these interrupts get accounted for in the kvm_cpu_l1tf_flush_l1d per-cpu flag. The subsequently executed vmx_l1d_flush() will thus be aware that some interrupts have happened and conduct a L1d flush anyway. Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed anymore. Drop it. Signed-off-by: Nicolai Stange Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 306a11460633..0f4c94605584 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8929,7 +8929,6 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu) [ss]"i"(__KERNEL_DS), [cs]"i"(__KERNEL_CS) ); - vcpu->arch.l1tf_flush_l1d = true; } } STACK_FRAME_NON_STANDARD(vmx_handle_external_intr); From d9f378f64c0ae3d76c1828742557c6c0ccc9e977 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 5 Aug 2018 17:06:12 +0200 Subject: [PATCH 599/970] Documentation/l1tf: Remove Yonah processors from not vulnerable list commit 58331136136935c631c2b5f06daf4c3006416e91 upstream Dave reported, that it's not confirmed that Yonah processors are unaffected. Remove them from the list. Reported-by: ave Hansen Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/l1tf.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/Documentation/l1tf.rst b/Documentation/l1tf.rst index ccf649cabdcd..5dadb4503ec9 100644 --- a/Documentation/l1tf.rst +++ b/Documentation/l1tf.rst @@ -19,8 +19,6 @@ vulnerability is not present on: - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, Penwell, Pineview, Silvermont, Airmont, Merrifield) - - The Intel Core Duo Yonah variants (2006 - 2008) - - The Intel XEON PHI family - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the From 62d88fc0fb6bc888d30a5bd074afd5a0ae59a1af Mon Sep 17 00:00:00 2001 From: Tom Lendacky Date: Wed, 21 Feb 2018 13:39:51 -0600 Subject: [PATCH 600/970] KVM: x86: Add a framework for supporting MSR-based features MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 801e459a6f3a63af9d447e6249088c76ae16efc4 upstream Provide a new KVM capability that allows bits within MSRs to be recognized as features. Two new ioctls are added to the /dev/kvm ioctl routine to retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops callback is used to determine support for the listed MSR-based features. Signed-off-by: Tom Lendacky Signed-off-by: Paolo Bonzini [Tweaked documentation. - Radim] Signed-off-by: Radim Krčmář Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/virtual/kvm/api.txt | 40 ++++++++++++----- arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/svm.c | 6 +++ arch/x86/kvm/vmx.c | 6 +++ arch/x86/kvm/x86.c | 75 ++++++++++++++++++++++++++++--- include/uapi/linux/kvm.h | 2 + 6 files changed, 114 insertions(+), 17 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e46c14fac9da..3ff58a8ffabb 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -122,14 +122,15 @@ KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as privileged user (CAP_SYS_ADMIN). -4.3 KVM_GET_MSR_INDEX_LIST +4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST -Capability: basic +Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST Architectures: x86 -Type: system +Type: system ioctl Parameters: struct kvm_msr_list (in/out) Returns: 0 on success; -1 on error Errors: + EFAULT: the msr index list cannot be read from or written to E2BIG: the msr index list is to be to fit in the array specified by the user. @@ -138,16 +139,23 @@ struct kvm_msr_list { __u32 indices[0]; }; -This ioctl returns the guest msrs that are supported. The list varies -by kvm version and host processor, but does not change otherwise. The -user fills in the size of the indices array in nmsrs, and in return -kvm adjusts nmsrs to reflect the actual number of msrs and fills in -the indices array with their numbers. +The user fills in the size of the indices array in nmsrs, and in return +kvm adjusts nmsrs to reflect the actual number of msrs and fills in the +indices array with their numbers. + +KVM_GET_MSR_INDEX_LIST returns the guest msrs that are supported. The list +varies by kvm version and host processor, but does not change otherwise. Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are not returned in the MSR list, as different vcpus can have a different number of banks, as set via the KVM_X86_SETUP_MCE ioctl. +KVM_GET_MSR_FEATURE_INDEX_LIST returns the list of MSRs that can be passed +to the KVM_GET_MSRS system ioctl. This lets userspace probe host capabilities +and processor features that are exposed via MSRs (e.g., VMX capabilities). +This list also varies by kvm version and host processor, but does not change +otherwise. + 4.4 KVM_CHECK_EXTENSION @@ -474,14 +482,22 @@ Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. 4.18 KVM_GET_MSRS -Capability: basic +Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system) Architectures: x86 -Type: vcpu ioctl +Type: system ioctl, vcpu ioctl Parameters: struct kvm_msrs (in/out) -Returns: 0 on success, -1 on error +Returns: number of msrs successfully returned; + -1 on error +When used as a system ioctl: +Reads the values of MSR-based features that are available for the VM. This +is similar to KVM_GET_SUPPORTED_CPUID, but it returns MSR indices and values. +The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST +in a system ioctl. + +When used as a vcpu ioctl: Reads model-specific registers from the vcpu. Supported msr indices can -be obtained using KVM_GET_MSR_INDEX_LIST. +be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl. struct kvm_msrs { __u32 nmsrs; /* number of msrs in entries */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ce6a70ac237a..16d588c3e3c1 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1025,6 +1025,8 @@ struct kvm_x86_ops { void (*cancel_hv_timer)(struct kvm_vcpu *vcpu); void (*setup_mce)(struct kvm_vcpu *vcpu); + + int (*get_msr_feature)(struct kvm_msr_entry *entry); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c8ae426ee37d..16f3e2686cc7 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3485,6 +3485,11 @@ static int cr8_write_interception(struct vcpu_svm *svm) return 0; } +static int svm_get_msr_feature(struct kvm_msr_entry *msr) +{ + return 1; +} + static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct vcpu_svm *svm = to_svm(vcpu); @@ -5504,6 +5509,7 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = { .vcpu_unblocking = svm_vcpu_unblocking, .update_bp_intercept = update_bp_intercept, + .get_msr_feature = svm_get_msr_feature, .get_msr = svm_get_msr, .set_msr = svm_set_msr, .get_segment_base = svm_get_segment_base, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0f4c94605584..3e04e19dd5b1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3153,6 +3153,11 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu, return !(val & ~valid_bits); } +static int vmx_get_msr_feature(struct kvm_msr_entry *msr) +{ + return 1; +} + /* * Reads an msr value (of 'msr_index') into 'pdata'. * Returns 0 on success, non-0 otherwise. @@ -11659,6 +11664,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = { .vcpu_put = vmx_vcpu_put, .update_bp_intercept = update_exception_bitmap, + .get_msr_feature = vmx_get_msr_feature, .get_msr = vmx_get_msr, .set_msr = vmx_set_msr, .get_segment_base = vmx_get_segment_base, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cc314e787e85..6c0bb89dec7c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1008,6 +1008,28 @@ static u32 emulated_msrs[] = { static unsigned num_emulated_msrs; +/* + * List of msr numbers which are used to expose MSR-based features that + * can be used by a hypervisor to validate requested CPU features. + */ +static u32 msr_based_features[] = { +}; + +static unsigned int num_msr_based_features; + +static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data) +{ + struct kvm_msr_entry msr; + + msr.index = index; + if (kvm_x86_ops->get_msr_feature(&msr)) + return 1; + + *data = msr.data; + + return 0; +} + bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) { if (efer & efer_reserved_bits) @@ -2546,13 +2568,11 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, int (*do_msr)(struct kvm_vcpu *vcpu, unsigned index, u64 *data)) { - int i, idx; + int i; - idx = srcu_read_lock(&vcpu->kvm->srcu); for (i = 0; i < msrs->nmsrs; ++i) if (do_msr(vcpu, entries[i].index, &entries[i].data)) break; - srcu_read_unlock(&vcpu->kvm->srcu, idx); return i; } @@ -2652,6 +2672,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_PCI_2_3: #endif + case KVM_CAP_GET_MSR_FEATURES: r = 1; break; case KVM_CAP_ADJUST_CLOCK: @@ -2771,6 +2792,31 @@ long kvm_arch_dev_ioctl(struct file *filp, goto out; r = 0; break; + case KVM_GET_MSR_FEATURE_INDEX_LIST: { + struct kvm_msr_list __user *user_msr_list = argp; + struct kvm_msr_list msr_list; + unsigned int n; + + r = -EFAULT; + if (copy_from_user(&msr_list, user_msr_list, sizeof(msr_list))) + goto out; + n = msr_list.nmsrs; + msr_list.nmsrs = num_msr_based_features; + if (copy_to_user(user_msr_list, &msr_list, sizeof(msr_list))) + goto out; + r = -E2BIG; + if (n < msr_list.nmsrs) + goto out; + r = -EFAULT; + if (copy_to_user(user_msr_list->indices, &msr_based_features, + num_msr_based_features * sizeof(u32))) + goto out; + r = 0; + break; + } + case KVM_GET_MSRS: + r = msr_io(NULL, argp, do_get_msr_feature, 1); + break; } default: r = -EINVAL; @@ -3452,12 +3498,18 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = 0; break; } - case KVM_GET_MSRS: + case KVM_GET_MSRS: { + int idx = srcu_read_lock(&vcpu->kvm->srcu); r = msr_io(vcpu, argp, do_get_msr, 1); + srcu_read_unlock(&vcpu->kvm->srcu, idx); break; - case KVM_SET_MSRS: + } + case KVM_SET_MSRS: { + int idx = srcu_read_lock(&vcpu->kvm->srcu); r = msr_io(vcpu, argp, do_set_msr, 0); + srcu_read_unlock(&vcpu->kvm->srcu, idx); break; + } case KVM_TPR_ACCESS_REPORTING: { struct kvm_tpr_access_ctl tac; @@ -4237,6 +4289,19 @@ static void kvm_init_msr_list(void) j++; } num_emulated_msrs = j; + + for (i = j = 0; i < ARRAY_SIZE(msr_based_features); i++) { + struct kvm_msr_entry msr; + + msr.index = msr_based_features[i]; + if (kvm_x86_ops->get_msr_feature(&msr)) + continue; + + if (j < i) + msr_based_features[j] = msr_based_features[i]; + j++; + } + num_msr_based_features = j; } static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 05b9bb63dbec..a0a365cbf3c9 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -717,6 +717,7 @@ struct kvm_ppc_smmu_info { #define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07 #define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08 #define KVM_GET_EMULATED_CPUID _IOWR(KVMIO, 0x09, struct kvm_cpuid2) +#define KVM_GET_MSR_FEATURE_INDEX_LIST _IOWR(KVMIO, 0x0a, struct kvm_msr_list) /* * Extension capability list. @@ -871,6 +872,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_MSI_DEVID 131 #define KVM_CAP_PPC_HTM 132 #define KVM_CAP_S390_BPB 152 +#define KVM_CAP_GET_MSR_FEATURES 153 #ifdef KVM_CAP_IRQ_ROUTING From 1a155ef3c958b4916594eca132472c9af1c642f7 Mon Sep 17 00:00:00 2001 From: Tom Lendacky Date: Sat, 24 Feb 2018 00:18:20 +0100 Subject: [PATCH 601/970] KVM: SVM: Add MSR-based feature support for serializing LFENCE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d1d93fa90f1afa926cb060b7f78ab01a65705b4d upstream In order to determine if LFENCE is a serializing instruction on AMD processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state of bit 1 checked. This patch will add support to allow a guest to properly make this determination. Add the MSR feature callback operation to svm.c and add MSR 0xc0011029 to the list of MSR-based features. If LFENCE is serializing, then the feature is supported, allowing the hypervisor to set the value of the MSR that guest will see. Support is also added to write (hypervisor only) and read the MSR value for the guest. A write by the guest will result in a #GP. A read by the guest will return the value as set by the host. In this way, the support to expose the feature to the guest is controlled by the hypervisor. Signed-off-by: Tom Lendacky Signed-off-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Thomas Gleixner Reviewed-by: Paolo Bonzini Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 36 +++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 1 + 2 files changed, 36 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 16f3e2686cc7..44a6f7fa2279 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -175,6 +175,8 @@ struct vcpu_svm { uint64_t sysenter_eip; uint64_t tsc_aux; + u64 msr_decfg; + u64 next_rip; u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS]; @@ -3487,7 +3489,18 @@ static int cr8_write_interception(struct vcpu_svm *svm) static int svm_get_msr_feature(struct kvm_msr_entry *msr) { - return 1; + msr->data = 0; + + switch (msr->index) { + case MSR_F10H_DECFG: + if (boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) + msr->data |= MSR_F10H_DECFG_LFENCE_SERIALIZE; + break; + default: + return 1; + } + + return 0; } static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) @@ -3592,6 +3605,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = 0x1E; } break; + case MSR_F10H_DECFG: + msr_info->data = svm->msr_decfg; + break; default: return kvm_get_msr_common(vcpu, msr_info); } @@ -3780,6 +3796,24 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) case MSR_VM_IGNNE: vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data); break; + case MSR_F10H_DECFG: { + struct kvm_msr_entry msr_entry; + + msr_entry.index = msr->index; + if (svm_get_msr_feature(&msr_entry)) + return 1; + + /* Check the supported bits */ + if (data & ~msr_entry.data) + return 1; + + /* Don't allow the guest to change a bit, #GP */ + if (!msr->host_initiated && (data ^ msr_entry.data)) + return 1; + + svm->msr_decfg = data; + break; + } case MSR_IA32_APICBASE: if (kvm_vcpu_apicv_active(vcpu)) avic_update_vapic_bar(to_svm(vcpu), data); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6c0bb89dec7c..48114ef64f05 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1013,6 +1013,7 @@ static unsigned num_emulated_msrs; * can be used by a hypervisor to validate requested CPU features. */ static u32 msr_based_features[] = { + MSR_F10H_DECFG, }; static unsigned int num_msr_based_features; From 8a01dd38e5e1b06f9be73eb5eb80b267a236f29d Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Wed, 28 Feb 2018 14:03:30 +0800 Subject: [PATCH 602/970] KVM: X86: Introduce kvm_get_msr_feature() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 66421c1ec340096b291af763ed5721314cdd9c5c upstream Introduce kvm_get_msr_feature() to handle the msrs which are supported by different vendors and sharing the same emulation logic. Signed-off-by: Wanpeng Li Signed-off-by: Radim Krčmář Signed-off-by: Thomas Gleixner Reviewed-by: Paolo Bonzini Cc: Liran Alon Cc: Nadav Amit Cc: Borislav Petkov Cc: Tom Lendacky Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 48114ef64f05..bd0e68163b5f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1018,13 +1018,25 @@ static u32 msr_based_features[] = { static unsigned int num_msr_based_features; +static int kvm_get_msr_feature(struct kvm_msr_entry *msr) +{ + switch (msr->index) { + default: + if (kvm_x86_ops->get_msr_feature(msr)) + return 1; + } + return 0; +} + static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data) { struct kvm_msr_entry msr; + int r; msr.index = index; - if (kvm_x86_ops->get_msr_feature(&msr)) - return 1; + r = kvm_get_msr_feature(&msr); + if (r) + return r; *data = msr.data; @@ -4295,7 +4307,7 @@ static void kvm_init_msr_list(void) struct kvm_msr_entry msr; msr.index = msr_based_features[i]; - if (kvm_x86_ops->get_msr_feature(&msr)) + if (kvm_get_msr_feature(&msr)) continue; if (j < i) From 7a1eac80b5127b20abfcaaf92062c236078f812a Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Wed, 28 Feb 2018 14:03:31 +0800 Subject: [PATCH 603/970] KVM: X86: Allow userspace to define the microcode version MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 518e7b94817abed94becfe6a44f1ece0d4745afe upstream Linux (among the others) has checks to make sure that certain features aren't enabled on a certain family/model/stepping if the microcode version isn't greater than or equal to a known good version. By exposing the real microcode version, we're preventing buggy guests that don't check that they are running virtualized (i.e., they should trust the hypervisor) from disabling features that are effectively not buggy. Suggested-by: Filippo Sironi Signed-off-by: Wanpeng Li Signed-off-by: Radim Krčmář Signed-off-by: Thomas Gleixner Reviewed-by: Paolo Bonzini Cc: Liran Alon Cc: Nadav Amit Cc: Borislav Petkov Cc: Tom Lendacky Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm.c | 4 +--- arch/x86/kvm/vmx.c | 1 + arch/x86/kvm/x86.c | 11 +++++++++-- 4 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 16d588c3e3c1..631c3f59109a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -486,6 +486,7 @@ struct kvm_vcpu_arch { u64 smbase; bool tpr_access_reporting; u64 ia32_xss; + u64 microcode_version; /* * Paging state of the vcpu diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 44a6f7fa2279..c855080c7a71 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1569,6 +1569,7 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u32 dummy; u32 eax = 1; + vcpu->arch.microcode_version = 0x01000065; svm->spec_ctrl = 0; svm->virt_spec_ctrl = 0; @@ -3585,9 +3586,6 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = svm->virt_spec_ctrl; break; - case MSR_IA32_UCODE_REV: - msr_info->data = 0x01000065; - break; case MSR_F15H_IC_CFG: { int family, model; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3e04e19dd5b1..4596791dbb81 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5481,6 +5481,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) u64 cr0; vmx->rmode.vm86_active = 0; + vcpu->arch.microcode_version = 0x100000000ULL; vmx->spec_ctrl = 0; vmx->soft_vnmi_blocked = 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bd0e68163b5f..8749967a1d27 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1014,6 +1014,7 @@ static unsigned num_emulated_msrs; */ static u32 msr_based_features[] = { MSR_F10H_DECFG, + MSR_IA32_UCODE_REV, }; static unsigned int num_msr_based_features; @@ -1021,6 +1022,9 @@ static unsigned int num_msr_based_features; static int kvm_get_msr_feature(struct kvm_msr_entry *msr) { switch (msr->index) { + case MSR_IA32_UCODE_REV: + rdmsrl(msr->index, msr->data); + break; default: if (kvm_x86_ops->get_msr_feature(msr)) return 1; @@ -2157,13 +2161,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) switch (msr) { case MSR_AMD64_NB_CFG: - case MSR_IA32_UCODE_REV: case MSR_IA32_UCODE_WRITE: case MSR_VM_HSAVE_PA: case MSR_AMD64_PATCH_LOADER: case MSR_AMD64_BU_CFG2: break; + case MSR_IA32_UCODE_REV: + if (msr_info->host_initiated) + vcpu->arch.microcode_version = data; + break; case MSR_EFER: return set_efer(vcpu, data); case MSR_K7_HWCR: @@ -2438,7 +2445,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = 0; break; case MSR_IA32_UCODE_REV: - msr_info->data = 0x100000000ULL; + msr_info->data = vcpu->arch.microcode_version; break; case MSR_MTRRcap: case 0x200 ... 0x2ff: From ce2c755166f9503b1671bd2822d04939afce5b34 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 25 Jun 2018 14:04:37 +0200 Subject: [PATCH 604/970] KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR commit cd28325249a1ca0d771557ce823e0308ad629f98 upstream This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all requested features are available on the host. Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8749967a1d27..62593b740c82 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1015,6 +1015,7 @@ static unsigned num_emulated_msrs; static u32 msr_based_features[] = { MSR_F10H_DECFG, MSR_IA32_UCODE_REV, + MSR_IA32_ARCH_CAPABILITIES, }; static unsigned int num_msr_based_features; @@ -1023,7 +1024,8 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr) { switch (msr->index) { case MSR_IA32_UCODE_REV: - rdmsrl(msr->index, msr->data); + case MSR_IA32_ARCH_CAPABILITIES: + rdmsrl_safe(msr->index, &msr->data); break; default: if (kvm_x86_ops->get_msr_feature(msr)) From ee782edd87b482e66cc283cc23d1e984792874e8 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Sun, 5 Aug 2018 16:07:45 +0200 Subject: [PATCH 605/970] x86/speculation: Simplify sysfs report of VMX L1TF vulnerability commit ea156d192f5257a5bf393d33910d3b481bf8a401 upstream Three changes to the content of the sysfs file: - If EPT is disabled, L1TF cannot be exploited even across threads on the same core, and SMT is irrelevant. - If mitigation is completely disabled, and SMT is enabled, print "vulnerable" instead of "vulnerable, SMT vulnerable" - Reorder the two parts so that the main vulnerability state comes first and the detail on SMT is second. Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index c366fdca7d4e..9da5517994a5 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -737,9 +737,15 @@ static ssize_t l1tf_show_state(char *buf) if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO) return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG); - return sprintf(buf, "%s; VMX: SMT %s, L1D %s\n", L1TF_DEFAULT_MSG, - cpu_smt_control == CPU_SMT_ENABLED ? "vulnerable" : "disabled", - l1tf_vmx_states[l1tf_vmx_mitigation]); + if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_EPT_DISABLED || + (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER && + cpu_smt_control == CPU_SMT_ENABLED)) + return sprintf(buf, "%s; VMX: %s\n", L1TF_DEFAULT_MSG, + l1tf_vmx_states[l1tf_vmx_mitigation]); + + return sprintf(buf, "%s; VMX: %s, SMT %s\n", L1TF_DEFAULT_MSG, + l1tf_vmx_states[l1tf_vmx_mitigation], + cpu_smt_control == CPU_SMT_ENABLED ? "vulnerable" : "disabled"); } #else static ssize_t l1tf_show_state(char *buf) From 383f160027af7f3e3c32c2988980652e708a2119 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Sun, 5 Aug 2018 16:07:46 +0200 Subject: [PATCH 606/970] x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry commit 8e0b2b916662e09dd4d09e5271cdf214c6b80e62 upstream Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is not needed. Add a new value to enum vmx_l1d_flush_state, which is used either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES. Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/vmx.h | 1 + arch/x86/kernel/cpu/bugs.c | 1 + arch/x86/kvm/vmx.c | 10 ++++++++++ 4 files changed, 13 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index d49d3b5d359a..bbbb9b14ade1 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -63,6 +63,7 @@ #define MSR_IA32_ARCH_CAPABILITIES 0x0000010a #define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */ #define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */ +#define ARCH_CAP_SKIP_VMENTRY_L1DFLUSH (1 << 3) /* Skip L1D flush on vmentry */ #define ARCH_CAP_SSB_NO (1 << 4) /* * Not susceptible to Speculative Store Bypass * attack, so no Speculative Store Bypass diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 7862cbf8cc70..72cacb027b98 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -505,6 +505,7 @@ enum vmx_l1d_flush_state { VMENTER_L1D_FLUSH_COND, VMENTER_L1D_FLUSH_ALWAYS, VMENTER_L1D_FLUSH_EPT_DISABLED, + VMENTER_L1D_FLUSH_NOT_REQUIRED, }; extern enum vmx_l1d_flush_state l1tf_vmx_mitigation; diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 9da5517994a5..aa1bc4f88f82 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -730,6 +730,7 @@ static const char *l1tf_vmx_states[] = { [VMENTER_L1D_FLUSH_COND] = "conditional cache flushes", [VMENTER_L1D_FLUSH_ALWAYS] = "cache flushes", [VMENTER_L1D_FLUSH_EPT_DISABLED] = "EPT disabled", + [VMENTER_L1D_FLUSH_NOT_REQUIRED] = "flush not necessary" }; static ssize_t l1tf_show_state(char *buf) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4596791dbb81..46f271803e6f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -219,6 +219,16 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) return 0; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) { + u64 msr; + + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr); + if (msr & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) { + l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED; + return 0; + } + } + /* If set to auto use the default l1tf mitigation method */ if (l1tf == VMENTER_L1D_FLUSH_AUTO) { switch (l1tf_mitigation) { From f56c8ee659c926bdba42c0d45405433e1a00eb2e Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Sun, 5 Aug 2018 16:07:47 +0200 Subject: [PATCH 607/970] KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry commit 5b76a3cff011df2dcb6186c965a2e4d809a05ad4 upstream When nested virtualization is in use, VMENTER operations from the nested hypervisor into the nested guest will always be processed by the bare metal hypervisor, and KVM's "conditional cache flushes" mode in particular does a flush on nested vmentry. Therefore, include the "skip L1D flush on vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting. Add the relevant Documentation. Signed-off-by: Paolo Bonzini Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- Documentation/l1tf.rst | 21 +++++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx.c | 3 +-- arch/x86/kvm/x86.c | 26 +++++++++++++++++++++++++- 4 files changed, 48 insertions(+), 3 deletions(-) diff --git a/Documentation/l1tf.rst b/Documentation/l1tf.rst index 5dadb4503ec9..bae52b845de0 100644 --- a/Documentation/l1tf.rst +++ b/Documentation/l1tf.rst @@ -546,6 +546,27 @@ available: EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. +3.4. Nested virtual machines +"""""""""""""""""""""""""""" + +When nested virtualization is in use, three operating systems are involved: +the bare metal hypervisor, the nested hypervisor and the nested virtual +machine. VMENTER operations from the nested hypervisor into the nested +guest will always be processed by the bare metal hypervisor. If KVM is the +bare metal hypervisor it wiil: + + - Flush the L1D cache on every switch from the nested hypervisor to the + nested virtual machine, so that the nested hypervisor's secrets are not + exposed to the nested virtual machine; + + - Flush the L1D cache on every switch from the nested virtual machine to + the nested hypervisor; this is a complex operation, and flushing the L1D + cache avoids that the bare metal hypervisor's secrets are exposed to the + nested virtual machine; + + - Instruct the nested hypervisor to not perform any L1D cache flush. This + is an optimization to avoid double L1D flushing. + .. _default_mitigations: diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 631c3f59109a..22a0ccb17ad0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1346,6 +1346,7 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu); void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm, unsigned long address); +u64 kvm_get_arch_capabilities(void); void kvm_define_shared_msr(unsigned index, u32 msr); int kvm_set_shared_msr(unsigned index, u64 val, u64 mask); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 46f271803e6f..12826607a995 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5461,8 +5461,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) ++vmx->nmsrs; } - if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) - rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); + vmx->arch_capabilities = kvm_get_arch_capabilities(); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 62593b740c82..203d42340fc1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1020,11 +1020,35 @@ static u32 msr_based_features[] = { static unsigned int num_msr_based_features; +u64 kvm_get_arch_capabilities(void) +{ + u64 data; + + rdmsrl_safe(MSR_IA32_ARCH_CAPABILITIES, &data); + + /* + * If we're doing cache flushes (either "always" or "cond") + * we will do one whenever the guest does a vmlaunch/vmresume. + * If an outer hypervisor is doing the cache flush for us + * (VMENTER_L1D_FLUSH_NESTED_VM), we can safely pass that + * capability to the guest too, and if EPT is disabled we're not + * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will + * require a nested hypervisor to do a flush of its own. + */ + if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER) + data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH; + + return data; +} +EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities); + static int kvm_get_msr_feature(struct kvm_msr_entry *msr) { switch (msr->index) { - case MSR_IA32_UCODE_REV: case MSR_IA32_ARCH_CAPABILITIES: + msr->data = kvm_get_arch_capabilities(); + break; + case MSR_IA32_UCODE_REV: rdmsrl_safe(msr->index, &msr->data); break; default: From c504b9fce7ba7a2ff96f857d609c69e291553ef0 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 7 Aug 2018 08:19:57 +0200 Subject: [PATCH 608/970] cpu/hotplug: Fix SMT supported evaluation commit bc2d8d262cba5736332cbc866acb11b1c5748aa9 upstream Josh reported that the late SMT evaluation in cpu_smt_state_init() sets cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied on the kernel command line as it cannot differentiate between SMT disabled by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and makes the sysfs interface unusable. Rework this so that during bringup of the non boot CPUs the availability of SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a 'primary' thread then set the local cpu_smt_available marker and evaluate this explicitely right after the initial SMP bringup has finished. SMT evaulation on x86 is a trainwreck as the firmware has all the information _before_ booting the kernel, but there is no interface to query it. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Josh Poimboeuf Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 2 +- include/linux/cpu.h | 2 ++ kernel/cpu.c | 41 ++++++++++++++++++++++++++------------ kernel/smp.c | 2 ++ 4 files changed, 33 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index aa1bc4f88f82..5229eaf73828 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -61,7 +61,7 @@ void __init check_bugs(void) * identify_boot_cpu() initialized SMT support information, let the * core code know. */ - cpu_smt_check_topology(); + cpu_smt_check_topology_early(); if (!IS_ENABLED(CONFIG_SMP)) { pr_info("CPU: "); diff --git a/include/linux/cpu.h b/include/linux/cpu.h index d059c67e4658..ae5ac89324df 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -267,10 +267,12 @@ enum cpuhp_smt_control { #if defined(CONFIG_SMP) && defined(CONFIG_HOTPLUG_SMT) extern enum cpuhp_smt_control cpu_smt_control; extern void cpu_smt_disable(bool force); +extern void cpu_smt_check_topology_early(void); extern void cpu_smt_check_topology(void); #else # define cpu_smt_control (CPU_SMT_ENABLED) static inline void cpu_smt_disable(bool force) { } +static inline void cpu_smt_check_topology_early(void) { } static inline void cpu_smt_check_topology(void) { } #endif diff --git a/kernel/cpu.c b/kernel/cpu.c index 3b3a3d782c0f..58f55c2815cd 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -360,6 +360,8 @@ EXPORT_SYMBOL_GPL(cpu_hotplug_enable); enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED; EXPORT_SYMBOL_GPL(cpu_smt_control); +static bool cpu_smt_available __read_mostly; + void __init cpu_smt_disable(bool force) { if (cpu_smt_control == CPU_SMT_FORCE_DISABLED || @@ -376,11 +378,25 @@ void __init cpu_smt_disable(bool force) /* * The decision whether SMT is supported can only be done after the full - * CPU identification. Called from architecture code. + * CPU identification. Called from architecture code before non boot CPUs + * are brought up. + */ +void __init cpu_smt_check_topology_early(void) +{ + if (!topology_smt_supported()) + cpu_smt_control = CPU_SMT_NOT_SUPPORTED; +} + +/* + * If SMT was disabled by BIOS, detect it here, after the CPUs have been + * brought online. This ensures the smt/l1tf sysfs entries are consistent + * with reality. cpu_smt_available is set to true during the bringup of non + * boot CPUs when a SMT sibling is detected. Note, this may overwrite + * cpu_smt_control's previous setting. */ void __init cpu_smt_check_topology(void) { - if (!topology_smt_supported()) + if (!cpu_smt_available) cpu_smt_control = CPU_SMT_NOT_SUPPORTED; } @@ -393,10 +409,18 @@ early_param("nosmt", smt_cmdline_disable); static inline bool cpu_smt_allowed(unsigned int cpu) { - if (cpu_smt_control == CPU_SMT_ENABLED) + if (topology_is_primary_thread(cpu)) return true; - if (topology_is_primary_thread(cpu)) + /* + * If the CPU is not a 'primary' thread and the booted_once bit is + * set then the processor has SMT support. Store this information + * for the late check of SMT support in cpu_smt_check_topology(). + */ + if (per_cpu(cpuhp_state, cpu).booted_once) + cpu_smt_available = true; + + if (cpu_smt_control == CPU_SMT_ENABLED) return true; /* @@ -2063,15 +2087,6 @@ static const struct attribute_group cpuhp_smt_attr_group = { static int __init cpu_smt_state_init(void) { - /* - * If SMT was disabled by BIOS, detect it here, after the CPUs have - * been brought online. This ensures the smt/l1tf sysfs entries are - * consistent with reality. Note this may overwrite cpu_smt_control's - * previous setting. - */ - if (topology_max_smt_threads() == 1) - cpu_smt_control = CPU_SMT_NOT_SUPPORTED; - return sysfs_create_group(&cpu_subsys.dev_root->kobj, &cpuhp_smt_attr_group); } diff --git a/kernel/smp.c b/kernel/smp.c index bba3b201668d..399905fdfa3f 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -564,6 +564,8 @@ void __init smp_init(void) cpu_up(cpu); } + /* Final decision about SMT support */ + cpu_smt_check_topology(); /* Any cleanup work */ smp_announce(); smp_cpus_done(setup_max_cpus); From 4656dfb6b5ddb2c7e6120b8a8d0b144445bf5914 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Tue, 7 Aug 2018 15:09:36 -0700 Subject: [PATCH 609/970] x86/speculation/l1tf: Invert all not present mappings commit f22cc87f6c1f771b57c407555cfefd811cdd9507 upstream For kernel mappings PAGE_PROTNONE is not necessarily set for a non present mapping, but the inversion logic explicitely checks for !PRESENT and PROT_NONE. Remove the PROT_NONE check and make the inversion unconditional for all not present mappings. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable-invert.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h index 177564187fc0..44b1203ece12 100644 --- a/arch/x86/include/asm/pgtable-invert.h +++ b/arch/x86/include/asm/pgtable-invert.h @@ -6,7 +6,7 @@ static inline bool __pte_needs_invert(u64 val) { - return (val & (_PAGE_PRESENT|_PAGE_PROTNONE)) == _PAGE_PROTNONE; + return !(val & _PAGE_PRESENT); } /* Get a mask to xor with the page table entry to get the correct pfn. */ From 5ebf3f8d5b56412973ca3f2363dae52f795c6700 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Tue, 7 Aug 2018 15:09:37 -0700 Subject: [PATCH 610/970] x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert commit 0768f91530ff46683e0b372df14fd79fe8d156e5 upstream Some cases in THP like: - MADV_FREE - mprotect - split mark the PMD non present for temporarily to prevent races. The window for an L1TF attack in these contexts is very small, but it wants to be fixed for correctness sake. Use the proper low level functions for pmd/pud_mknotpresent() to address this. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 4e063582f1f0..47b37affddcf 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -350,11 +350,6 @@ static inline pmd_t pmd_mkwrite(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_RW); } -static inline pmd_t pmd_mknotpresent(pmd_t pmd) -{ - return pmd_clear_flags(pmd, _PAGE_PRESENT | _PAGE_PROTNONE); -} - #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline int pte_soft_dirty(pte_t pte) { @@ -418,6 +413,12 @@ static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot) return __pmd(pfn | massage_pgprot(pgprot)); } +static inline pmd_t pmd_mknotpresent(pmd_t pmd) +{ + return pfn_pmd(pmd_pfn(pmd), + __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); +} + static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) From 7e464373357dd6ff33a1a7373d5e596ed1dbb219 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Tue, 7 Aug 2018 15:09:39 -0700 Subject: [PATCH 611/970] x86/mm/pat: Make set_memory_np() L1TF safe commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream set_memory_np() is used to mark kernel mappings not present, but it has it's own open coded mechanism which does not have the L1TF protection of inverting the address bits. Replace the open coded PTE manipulation with the L1TF protecting low level PTE routines. Passes the CPA self test. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner [ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9dd, and pfn_pud() ] Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 27 +++++++++++++++++++++++++++ arch/x86/mm/pageattr.c | 8 ++++---- 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 47b37affddcf..5008be1ab183 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -413,12 +413,39 @@ static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot) return __pmd(pfn | massage_pgprot(pgprot)); } +static inline pud_t pfn_pud(unsigned long page_nr, pgprot_t pgprot) +{ + phys_addr_t pfn = page_nr << PAGE_SHIFT; + pfn ^= protnone_mask(pgprot_val(pgprot)); + pfn &= PHYSICAL_PUD_PAGE_MASK; + return __pud(pfn | massage_pgprot(pgprot)); +} + static inline pmd_t pmd_mknotpresent(pmd_t pmd) { return pfn_pmd(pmd_pfn(pmd), __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); } +static inline pud_t pud_set_flags(pud_t pud, pudval_t set) +{ + pudval_t v = native_pud_val(pud); + + return __pud(v | set); +} + +static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) +{ + pudval_t v = native_pud_val(pud); + + return __pud(v & ~clear); +} + +static inline pud_t pud_mkhuge(pud_t pud) +{ + return pud_set_flags(pud, _PAGE_PSE); +} + static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index dcd671467154..1271bc9fa3c6 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -1001,8 +1001,8 @@ static long populate_pmd(struct cpa_data *cpa, pmd = pmd_offset(pud, start); - set_pmd(pmd, __pmd(cpa->pfn << PAGE_SHIFT | _PAGE_PSE | - massage_pgprot(pmd_pgprot))); + set_pmd(pmd, pmd_mkhuge(pfn_pmd(cpa->pfn, + canon_pgprot(pmd_pgprot)))); start += PMD_SIZE; cpa->pfn += PMD_SIZE >> PAGE_SHIFT; @@ -1074,8 +1074,8 @@ static long populate_pud(struct cpa_data *cpa, unsigned long start, pgd_t *pgd, * Map everything starting from the Gb boundary, possibly with 1G pages */ while (boot_cpu_has(X86_FEATURE_GBPAGES) && end - start >= PUD_SIZE) { - set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE | - massage_pgprot(pud_pgprot))); + set_pud(pud, pud_mkhuge(pfn_pud(cpa->pfn, + canon_pgprot(pud_pgprot)))); start += PUD_SIZE; cpa->pfn += PUD_SIZE >> PAGE_SHIFT; From e79d049743f1466084df5708cd0e052b0d586548 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Tue, 7 Aug 2018 15:09:38 -0700 Subject: [PATCH 612/970] x86/mm/kmmio: Make the tracer robust against L1TF commit 1063711b57393c1999248cccb57bebfaf16739e7 upstream The mmio tracer sets io mapping PTEs and PMDs to non present when enabled without inverting the address bits, which makes the PTE entry vulnerable for L1TF. Make it use the right low level macros to actually invert the address bits to protect against L1TF. In principle this could be avoided because MMIO tracing is not likely to be enabled on production machines, but the fix is straigt forward and for consistency sake it's better to get rid of the open coded PTE manipulation. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/kmmio.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c index cadb82be5f36..c695272d89be 100644 --- a/arch/x86/mm/kmmio.c +++ b/arch/x86/mm/kmmio.c @@ -125,24 +125,29 @@ static struct kmmio_fault_page *get_kmmio_fault_page(unsigned long addr) static void clear_pmd_presence(pmd_t *pmd, bool clear, pmdval_t *old) { + pmd_t new_pmd; pmdval_t v = pmd_val(*pmd); if (clear) { - *old = v & _PAGE_PRESENT; - v &= ~_PAGE_PRESENT; - } else /* presume this has been called with clear==true previously */ - v |= *old; - set_pmd(pmd, __pmd(v)); + *old = v; + new_pmd = pmd_mknotpresent(*pmd); + } else { + /* Presume this has been called with clear==true previously */ + new_pmd = __pmd(*old); + } + set_pmd(pmd, new_pmd); } static void clear_pte_presence(pte_t *pte, bool clear, pteval_t *old) { pteval_t v = pte_val(*pte); if (clear) { - *old = v & _PAGE_PRESENT; - v &= ~_PAGE_PRESENT; - } else /* presume this has been called with clear==true previously */ - v |= *old; - set_pte_atomic(pte, __pte(v)); + *old = v; + /* Nothing should care about address */ + pte_clear(&init_mm, 0, pte); + } else { + /* Presume this has been called with clear==true previously */ + set_pte_atomic(pte, __pte(*old)); + } } static int clear_page_presence(struct kmmio_fault_page *f, bool clear) From d21c27185b6f2c32d4b029d1b5c0661702099baf Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Wed, 8 Aug 2018 11:00:16 +0100 Subject: [PATCH 613/970] tools headers: Synchronise x86 cpufeatures.h for L1TF additions commit e24f14b0ff985f3e09e573ba1134bfdf42987e05 upstream Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- tools/arch/x86/include/asm/cpufeatures.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h index aea30afeddb8..fbc1474960e3 100644 --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -213,7 +213,7 @@ #define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */ #define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ #define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ - +#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ @@ -317,6 +317,7 @@ #define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */ #define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ #define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */ #define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */ #define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */ @@ -349,5 +350,6 @@ #define X86_BUG_SPECTRE_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */ #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */ #define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */ +#define X86_BUG_L1TF X86_BUG(18) /* CPU is affected by L1 Terminal Fault */ #endif /* _ASM_X86_CPUFEATURES_H */ From 760f9488c13b7d2da69b152a55069e0267ca1477 Mon Sep 17 00:00:00 2001 From: Ashok Raj Date: Wed, 28 Feb 2018 11:28:43 +0100 Subject: [PATCH 614/970] x86/microcode: Do not upload microcode if CPUs are offline commit 30ec26da9967d0d785abc24073129a34c3211777 upstream. Avoid loading microcode if any of the CPUs are offline, and issue a warning. Having different microcode revisions on the system at any time is outright dangerous. [ Borislav: Massage changelog. ] Signed-off-by: Ashok Raj Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Tested-by: Tom Lendacky Tested-by: Ashok Raj Reviewed-by: Tom Lendacky Cc: Arjan Van De Ven Link: http://lkml.kernel.org/r/1519352533-15992-4-git-send-email-ashok.raj@intel.com Link: https://lkml.kernel.org/r/20180228102846.13447-5-bp@alien8.de Signed-off-by: Greg Kroah-Hartman Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/microcode/core.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index 0afaf00b029b..2da1c6aaf800 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -384,6 +384,16 @@ static void __exit microcode_dev_exit(void) /* fake device for request_firmware */ static struct platform_device *microcode_pdev; +static int check_online_cpus(void) +{ + if (num_online_cpus() == num_present_cpus()) + return 0; + + pr_err("Not all CPUs online, aborting microcode update.\n"); + + return -EINVAL; +} + static int reload_for_cpu(int cpu) { struct ucode_cpu_info *uci = ucode_cpu_info + cpu; @@ -418,7 +428,13 @@ static ssize_t reload_store(struct device *dev, return size; get_online_cpus(); + + ret = check_online_cpus(); + if (ret) + goto put; + mutex_lock(µcode_mutex); + for_each_online_cpu(cpu) { tmp_ret = reload_for_cpu(cpu); if (tmp_ret != 0) @@ -431,6 +447,8 @@ static ssize_t reload_store(struct device *dev, if (!ret) perf_check_microcode(); mutex_unlock(µcode_mutex); + +put: put_online_cpus(); if (!ret) From da540c063b06b18f77168c8a52ee5a9c783a7481 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 10 Aug 2018 08:31:10 +0100 Subject: [PATCH 615/970] x86/microcode: Allow late microcode loading with SMT disabled commit 07d981ad4cf1e78361c6db1c28ee5ba105f96cc1 upstream The kernel unnecessarily prevents late microcode loading when SMT is disabled. It should be safe to allow it if all the primary threads are online. Signed-off-by: Josh Poimboeuf Acked-by: Borislav Petkov Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/microcode/core.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index 2da1c6aaf800..b53a6579767d 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -386,12 +386,20 @@ static struct platform_device *microcode_pdev; static int check_online_cpus(void) { - if (num_online_cpus() == num_present_cpus()) - return 0; + unsigned int cpu; - pr_err("Not all CPUs online, aborting microcode update.\n"); + /* + * Make sure all CPUs are online. It's fine for SMT to be disabled if + * all the primary threads are still online. + */ + for_each_present_cpu(cpu) { + if (topology_is_primary_thread(cpu) && !cpu_online(cpu)) { + pr_err("Not all CPUs online, aborting microcode update.\n"); + return -EINVAL; + } + } - return -EINVAL; + return 0; } static int reload_for_cpu(int cpu) From 59a6e1f27602b24f7919e188ff54561e0653620b Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Tue, 14 Aug 2018 23:38:57 +0200 Subject: [PATCH 616/970] x86/smp: fix non-SMP broken build due to redefinition of apic_id_is_primary_thread MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d0055f351e647f33f3b0329bff022213bf8aa085 upstream. The function has an inline "return false;" definition with CONFIG_SMP=n but the "real" definition is also visible leading to "redefinition of ‘apic_id_is_primary_thread’" compiler error. Guard it with #ifdef CONFIG_SMP Signed-off-by: Vlastimil Babka Fixes: 6a4d2657e048 ("x86/smp: Provide topology_is_primary_thread()") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/apic/apic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index e85080b27231..4f2af1ee09cb 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2043,6 +2043,7 @@ static int cpuid_to_apicid[] = { [0 ... NR_CPUS - 1] = -1, }; +#ifdef CONFIG_SMP /** * apic_id_is_primary_thread - Check whether APIC ID belongs to a primary thread * @id: APIC ID to check @@ -2057,6 +2058,7 @@ bool apic_id_is_primary_thread(unsigned int apicid) mask = (1U << (fls(smp_num_siblings) - 1)) - 1; return !(apicid & mask); } +#endif /* * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids From aee0861fbe95f2311c81b8162bbd1eb196cdf5f2 Mon Sep 17 00:00:00 2001 From: Abel Vesa Date: Wed, 15 Aug 2018 00:26:00 +0300 Subject: [PATCH 617/970] cpu/hotplug: Non-SMP machines do not make use of booted_once commit 269777aa530f3438ec1781586cdac0b5fe47b061 upstream. Commit 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") breaks non-SMP builds. [ I suspect the 'bool' fields should just be made to be bitfields and be exposed regardless of configuration, but that's a separate cleanup that I'll leave to the owners of this file for later. - Linus ] Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Cc: Dave Hansen Cc: Thomas Gleixner Cc: Tony Luck Signed-off-by: Abel Vesa Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/cpu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/cpu.c b/kernel/cpu.c index 58f55c2815cd..b5a0165b7300 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2201,6 +2201,8 @@ void __init boot_cpu_init(void) */ void __init boot_cpu_hotplug_init(void) { +#ifdef CONFIG_SMP this_cpu_write(cpuhp_state.booted_once, true); +#endif this_cpu_write(cpuhp_state.state, CPUHP_ONLINE); } From 16848eb10e9e0989e5898dec204f0967c483f044 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Tue, 14 Aug 2018 20:50:47 +0200 Subject: [PATCH 618/970] x86/init: fix build with CONFIG_SWAP=n commit 792adb90fa724ce07c0171cbc96b9215af4b1045 upstream. The introduction of generic_max_swapfile_size and arch-specific versions has broken linking on x86 with CONFIG_SWAP=n due to undefined reference to 'generic_max_swapfile_size'. Fix it by compiling the x86-specific max_swapfile_size() only with CONFIG_SWAP=y. Reported-by: Tomas Pruzina Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by: Vlastimil Babka Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 4393af3c2bd5..5d35b555115a 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -783,6 +783,7 @@ void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache) __pte2cachemode_tbl[entry] = cache; } +#ifdef CONFIG_SWAP unsigned long max_swapfile_size(void) { unsigned long pages; @@ -803,3 +804,4 @@ unsigned long max_swapfile_size(void) } return pages; } +#endif From b4f17de89e7aaecfc67a173ca8607899ee8707c3 Mon Sep 17 00:00:00 2001 From: Jiri Kosina Date: Sat, 14 Jul 2018 21:56:13 +0200 Subject: [PATCH 619/970] x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures commit 6c26fcd2abfe0a56bbd95271fce02df2896cfd24 upstream. pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the !__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g. ia64) and breaks build: include/asm-generic/pgtable.h: Assembler messages: include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)' include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true' include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)' include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false' arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle Move those two static inlines into the !__ASSEMBLY__ section so that they don't confuse the asm build pass. Fixes: 42e4089c7890 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings") Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner [groeck: Context changes] Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- include/asm-generic/pgtable.h | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 0ffe4051ae2e..a88ea9e37a25 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -828,6 +828,19 @@ static inline int pmd_free_pte_page(pmd_t *pmd) struct file; int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, unsigned long size, pgprot_t *vma_prot); + +#ifndef __HAVE_ARCH_PFN_MODIFY_ALLOWED +static inline bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) +{ + return true; +} + +static inline bool arch_has_pfn_modify_check(void) +{ + return false; +} +#endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */ + #endif /* !__ASSEMBLY__ */ #ifndef io_remap_pfn_range @@ -842,16 +855,4 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, #endif #endif -#ifndef __HAVE_ARCH_PFN_MODIFY_ALLOWED -static inline bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) -{ - return true; -} - -static inline bool arch_has_pfn_modify_check(void) -{ - return false; -} -#endif - #endif /* _ASM_GENERIC_PGTABLE_H */ From 4edf4ad2e7ee7d527fd8288c22d6ee608eae705c Mon Sep 17 00:00:00 2001 From: Suravee Suthikulpanit Date: Mon, 31 Jul 2017 10:51:58 +0200 Subject: [PATCH 620/970] x86/cpu/amd: Limit cpu_core_id fixup to families older than F17h commit b89b41d0b8414690ec0030c134b8bde209e6d06c upstream. Current cpu_core_id fixup causes downcored F17h configurations to be incorrect: NODE: 0 processor 0 core id : 0 processor 1 core id : 1 processor 2 core id : 2 processor 3 core id : 4 processor 4 core id : 5 processor 5 core id : 0 NODE: 1 processor 6 core id : 2 processor 7 core id : 3 processor 8 core id : 4 processor 9 core id : 0 processor 10 core id : 1 processor 11 core id : 2 Code that relies on the cpu_core_id, like match_smt(), for example, which builds the thread siblings masks used by the scheduler, is mislead. So, limit the fixup to pre-F17h machines. The new value for cpu_core_id for F17h and later will represent the CPUID_Fn8000001E_EBX[CoreId], which is guaranteed to be unique for each core within a socket. This way we have: NODE: 0 processor 0 core id : 0 processor 1 core id : 1 processor 2 core id : 2 processor 3 core id : 4 processor 4 core id : 5 processor 5 core id : 6 NODE: 1 processor 6 core id : 8 processor 7 core id : 9 processor 8 core id : 10 processor 9 core id : 12 processor 10 core id : 13 processor 11 core id : 14 Signed-off-by: Suravee Suthikulpanit [ Heavily massaged. ] Signed-off-by: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Yazen Ghannam Link: http://lkml.kernel.org/r/20170731085159.9455-2-bp@alien8.de Signed-off-by: Ingo Molnar Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/amd.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index 6de596449488..e864ff6cd8bd 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -304,6 +304,22 @@ static void amd_get_topology_early(struct cpuinfo_x86 *c) smp_num_siblings = ((cpuid_ebx(0x8000001e) >> 8) & 0xff) + 1; } +/* + * Fix up cpu_core_id for pre-F17h systems to be in the + * [0 .. cores_per_node - 1] range. Not really needed but + * kept so as not to break existing setups. + */ +static void legacy_fixup_core_id(struct cpuinfo_x86 *c) +{ + u32 cus_per_node; + + if (c->x86 >= 0x17) + return; + + cus_per_node = c->x86_max_cores / nodes_per_socket; + c->cpu_core_id %= cus_per_node; +} + /* * Fixup core topology information for * (1) AMD multi-node processors @@ -359,15 +375,9 @@ static void amd_get_topology(struct cpuinfo_x86 *c) } else return; - /* fixup multi-node processor information */ if (nodes_per_socket > 1) { - u32 cus_per_node; - set_cpu_cap(c, X86_FEATURE_AMD_DCM); - cus_per_node = c->x86_max_cores / nodes_per_socket; - - /* core id has to be in the [0 .. cores_per_node - 1] range */ - c->cpu_core_id %= cus_per_node; + legacy_fixup_core_id(c); } } #endif From 7f5d090ffe9e7603265e7991aacec64d86cf70ab Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 27 Apr 2018 16:34:34 -0500 Subject: [PATCH 621/970] x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present commit f8b64d08dde2714c62751d18ba77f4aeceb161d3 upstream. Move smp_num_siblings and cpu_llc_id to cpu/common.c so that they're always present as symbols and not only in the CONFIG_SMP case. Then, other code using them doesn't need ugly ifdeffery anymore. Get rid of some ifdeffery. Signed-off-by: Borislav Petkov Signed-off-by: Suravee Suthikulpanit Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Link: http://lkml.kernel.org/r/1524864877-111962-2-git-send-email-suravee.suthikulpanit@amd.com Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/smp.h | 1 - arch/x86/kernel/cpu/amd.c | 11 +---------- arch/x86/kernel/cpu/common.c | 7 +++++++ arch/x86/kernel/smpboot.c | 7 ------- 4 files changed, 8 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 026ea82ecc60..d25fb6beb2f0 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -156,7 +156,6 @@ static inline int wbinvd_on_all_cpus(void) wbinvd(); return 0; } -#define smp_num_siblings 1 #endif /* CONFIG_SMP */ extern unsigned disabled_cpus; diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index e864ff6cd8bd..4c2648b96c9a 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -296,8 +296,6 @@ static int nearby_node(int apicid) } #endif -#ifdef CONFIG_SMP - static void amd_get_topology_early(struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_TOPOEXT)) @@ -380,7 +378,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c) legacy_fixup_core_id(c); } } -#endif /* * On a AMD dual core setup the lower bits of the APIC id distinguish the cores. @@ -388,7 +385,6 @@ static void amd_get_topology(struct cpuinfo_x86 *c) */ static void amd_detect_cmp(struct cpuinfo_x86 *c) { -#ifdef CONFIG_SMP unsigned bits; int cpu = smp_processor_id(); @@ -400,16 +396,11 @@ static void amd_detect_cmp(struct cpuinfo_x86 *c) /* use socket ID also for last level cache */ per_cpu(cpu_llc_id, cpu) = c->phys_proc_id; amd_get_topology(c); -#endif } u16 amd_get_nb_id(int cpu) { - u16 id = 0; -#ifdef CONFIG_SMP - id = per_cpu(cpu_llc_id, cpu); -#endif - return id; + return per_cpu(cpu_llc_id, cpu); } EXPORT_SYMBOL_GPL(amd_get_nb_id); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index bfd00fabe17b..13471b71bec7 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -61,6 +61,13 @@ cpumask_var_t cpu_callin_mask; /* representing cpus for which sibling maps can be computed */ cpumask_var_t cpu_sibling_setup_mask; +/* Number of siblings per CPU package */ +int smp_num_siblings = 1; +EXPORT_SYMBOL(smp_num_siblings); + +/* Last level cache ID of each logical CPU */ +DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID; + /* correctly size the local cpu masks */ void __init setup_cpu_local_masks(void) { diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 8e0fbce6dfdd..ef38bc1d1c00 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -78,13 +78,6 @@ #include #include -/* Number of siblings per CPU package */ -int smp_num_siblings = 1; -EXPORT_SYMBOL(smp_num_siblings); - -/* Last level cache ID of each logical CPU */ -DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID; - /* representing HT siblings of each logical CPU */ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map); EXPORT_PER_CPU_SYMBOL(cpu_sibling_map); From 93e02ae4200184bab43ce29966e895826a756a37 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 15 Aug 2018 18:14:55 +0200 Subject: [PATCH 622/970] Linux 4.9.120 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 0723bbe1d4a7..fea2fe577185 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 119 +SUBLEVEL = 120 EXTRAVERSION = NAME = Roaring Lionus From 61341a364d5594352f6e11b2254cb2e540d0c476 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 15 Aug 2018 08:38:33 -0700 Subject: [PATCH 623/970] x86/l1tf: Fix build error seen if CONFIG_KVM_INTEL is disabled commit 1eb46908b35dfbac0ec1848d4b1e39667e0187e9 upstream. allmodconfig+CONFIG_INTEL_KVM=n results in the following build error. ERROR: "l1tf_vmx_mitigation" [arch/x86/kvm/kvm.ko] undefined! Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry") Reported-by: Meelis Roos Cc: Meelis Roos Cc: Paolo Bonzini Cc: Thomas Gleixner Signed-off-by: Guenter Roeck Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 5229eaf73828..ac67a76550bd 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -647,10 +647,9 @@ void x86_spec_ctrl_setup_ap(void) enum l1tf_mitigations l1tf_mitigation __ro_after_init = L1TF_MITIGATION_FLUSH; #if IS_ENABLED(CONFIG_KVM_INTEL) EXPORT_SYMBOL_GPL(l1tf_mitigation); - +#endif enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation); -#endif static void __init l1tf_select_mitigation(void) { From cc83ba490ddae49dc2c8ccdf2f32ae2ac4aeb562 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 15 Aug 2018 13:22:27 -0700 Subject: [PATCH 624/970] x86: i8259: Add missing include file commit 0a957467c5fd46142bc9c52758ffc552d4c5e2f7 upstream. i8259.h uses inb/outb and thus needs to include asm/io.h to avoid the following build error, as seen with x86_64:defconfig and CONFIG_SMP=n. In file included from drivers/rtc/rtc-cmos.c:45:0: arch/x86/include/asm/i8259.h: In function 'inb_pic': arch/x86/include/asm/i8259.h:32:24: error: implicit declaration of function 'inb' arch/x86/include/asm/i8259.h: In function 'outb_pic': arch/x86/include/asm/i8259.h:45:2: error: implicit declaration of function 'outb' Reported-by: Sebastian Gottschall Suggested-by: Sebastian Gottschall Fixes: 447ae3166702 ("x86: Don't include linux/irq.h from asm/hardirq.h") Signed-off-by: Guenter Roeck Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/i8259.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h index bb078786a323..be6492c0deae 100644 --- a/arch/x86/include/asm/i8259.h +++ b/arch/x86/include/asm/i8259.h @@ -2,6 +2,7 @@ #define _ASM_X86_I8259_H #include +#include extern unsigned int cached_irq_mask; From 2130e543ff1aa0885ac4fc5f4ecd98cfccdeed00 Mon Sep 17 00:00:00 2001 From: Toshi Kani Date: Wed, 27 Jun 2018 08:13:46 -0600 Subject: [PATCH 625/970] x86/mm: Disable ioremap free page handling on x86-PAE commit f967db0b9ed44ec3057a28f3b28efc51df51b835 upstream. ioremap() supports pmd mappings on x86-PAE. However, kernel's pmd tables are not shared among processes on x86-PAE. Therefore, any update to sync'd pmd entries need re-syncing. Freeing a pte page also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one(). Disable free page handling on x86-PAE. pud_free_pmd_page() and pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present. This assures that ioremap() does not update sync'd pmd entries at the cost of falling back to pte mappings. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Reported-by: Joerg Roedel Signed-off-by: Toshi Kani Signed-off-by: Thomas Gleixner Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: cpandya@codeaurora.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Cc: Andrew Morton Cc: Michal Hocko Cc: "H. Peter Anvin" Cc: Link: https://lkml.kernel.org/r/20180627141348.21777-2-toshi.kani@hpe.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/pgtable.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index a3b63e5a527c..36d0b81933f8 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -653,6 +653,7 @@ int pmd_clear_huge(pmd_t *pmd) return 0; } +#ifdef CONFIG_X86_64 /** * pud_free_pmd_page - Clear pud entry and free pmd page. * @pud: Pointer to a PUD. @@ -700,4 +701,22 @@ int pmd_free_pte_page(pmd_t *pmd) return 1; } + +#else /* !CONFIG_X86_64 */ + +int pud_free_pmd_page(pud_t *pud) +{ + return pud_none(*pud); +} + +/* + * Disable free page handling on x86-PAE. This assures that ioremap() + * does not update sync'd pmd entries. See vmalloc_sync_one(). + */ +int pmd_free_pte_page(pmd_t *pmd) +{ + return pmd_none(*pmd); +} + +#endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ From 76b6f30f9443f57683d2bf428e5cba54ce9b1267 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Tue, 6 Feb 2018 15:36:00 -0800 Subject: [PATCH 626/970] kasan: don't emit builtin calls when sanitization is off commit 0e410e158e5baa1300bdf678cea4f4e0cf9d8b94 upstream. With KASAN enabled the kernel has two different memset() functions, one with KASAN checks (memset) and one without (__memset). KASAN uses some macro tricks to use the proper version where required. For example memset() calls in mm/slub.c are without KASAN checks, since they operate on poisoned slab object metadata. The issue is that clang emits memset() calls even when there is no memset() in the source code. They get linked with improper memset() implementation and the kernel fails to boot due to a huge amount of KASAN reports during early boot stages. The solution is to add -fno-builtin flag for files with KASAN_SANITIZE := n marker. Link: http://lkml.kernel.org/r/8ffecfffe04088c52c42b92739c2bd8a0bcb3f5e.1516384594.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Nick Desaulniers Cc: Masahiro Yamada Cc: Michal Marek Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [ Sami: Backported to 4.9 avoiding c5caf21ab0cf8 and e7c52b84fb ] Signed-off-by: Sami Tolvanen Signed-off-by: Nick Desaulniers Signed-off-by: Greg Kroah-Hartman --- Makefile | 3 ++- scripts/Makefile.kasan | 3 +++ scripts/Makefile.lib | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index fea2fe577185..62fbcc04bf70 100644 --- a/Makefile +++ b/Makefile @@ -417,7 +417,8 @@ export MAKE AWK GENKSYMS INSTALLKERNEL PERL PYTHON UTS_MACHINE export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS LDFLAGS -export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_KASAN CFLAGS_UBSAN +export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan index 37323b0df374..2624d4bf9a45 100644 --- a/scripts/Makefile.kasan +++ b/scripts/Makefile.kasan @@ -28,4 +28,7 @@ else CFLAGS_KASAN := $(CFLAGS_KASAN_MINIMAL) endif endif + +CFLAGS_KASAN_NOSANITIZE := -fno-builtin + endif diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index ae0f9ab1a70d..c954040c3cf2 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -127,7 +127,7 @@ endif ifeq ($(CONFIG_KASAN),y) _c_flags += $(if $(patsubst n%,, \ $(KASAN_SANITIZE_$(basetarget).o)$(KASAN_SANITIZE)y), \ - $(CFLAGS_KASAN)) + $(CFLAGS_KASAN), $(CFLAGS_KASAN_NOSANITIZE)) endif ifeq ($(CONFIG_UBSAN),y) From 52b9b51a5f4f55d80ef7ccb4558520464d7be076 Mon Sep 17 00:00:00 2001 From: Liwei Song Date: Tue, 13 Jun 2017 00:59:53 -0400 Subject: [PATCH 627/970] i2c: ismt: fix wrong device address when unmap the data buffer commit 17e83549e199d89aace7788a9f11c108671eecf5 upstream. Fix the following kernel bug: kernel BUG at drivers/iommu/intel-iommu.c:3260! invalid opcode: 0000 [#5] PREEMPT SMP Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB0.X64.0013.D39.1608311820 08/31/2016 task: ffff880175389950 ti: ffff880176bec000 task.ti: ffff880176bec000 RIP: 0010:[] [] intel_unmap+0x25b/0x260 RSP: 0018:ffff880176bef5e8 EFLAGS: 00010296 RAX: 0000000000000024 RBX: ffff8800773c7c88 RCX: 000000000000ce04 RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff880176bef638 R08: 0000000000000010 R09: 0000000000000004 R10: ffff880175389c78 R11: 0000000000000a4f R12: ffff8800773c7868 R13: 00000000ffffac88 R14: ffff8800773c7818 R15: 0000000000000001 FS: 00007fef21258700(0000) GS:ffff88017b5c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000066d6d8 CR3: 000000007118c000 CR4: 00000000003406e0 Stack: 00000000ffffac88 ffffffff8199867f ffff880176bef5f8 ffff880100000030 ffff880176bef668 ffff8800773c7c88 ffff880178288098 ffff8800772c0010 ffff8800773c7818 0000000000000001 ffff880176bef648 ffffffff8150a86e Call Trace: [] ? printk+0x46/0x48 [] intel_unmap_page+0xe/0x10 [] ismt_access+0x27b/0x8fa [i2c_ismt] [] ? __pm_runtime_suspend+0xa0/0xa0 [] ? pm_suspend_timer_fn+0x80/0x80 [] ? __pm_runtime_suspend+0xa0/0xa0 [] ? pm_suspend_timer_fn+0x80/0x80 [] ? pci_bus_read_dev_vendor_id+0xf0/0xf0 [] i2c_smbus_xfer+0xec/0x4b0 [] ? vprintk_emit+0x345/0x530 [] i2cdev_ioctl_smbus+0x12b/0x240 [i2c_dev] [] ? vprintk_default+0x29/0x40 [] i2cdev_ioctl+0x63/0x1ec [i2c_dev] [] do_vfs_ioctl+0x328/0x5d0 [] ? vfs_write+0x11c/0x190 [] ? rt_up_read+0x19/0x20 [] SyS_ioctl+0x81/0xa0 [] system_call_fastpath+0x16/0x6e This happen When run "i2cdetect -y 0" detect SMBus iSMT adapter. After finished I2C block read/write, when unmap the data buffer, a wrong device address was pass to dma_unmap_single(). To fix this, give dma_unmap_single() the "dev" parameter, just like what dma_map_single() does, then unmap can find the right devices. Fixes: 13f35ac14cd0 ("i2c: Adding support for Intel iSMT SMBus 2.0 host controller") Signed-off-by: Liwei Song Reviewed-by: Andy Shevchenko Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-ismt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-ismt.c b/drivers/i2c/busses/i2c-ismt.c index 7aea28815d99..b51adffa4841 100644 --- a/drivers/i2c/busses/i2c-ismt.c +++ b/drivers/i2c/busses/i2c-ismt.c @@ -589,7 +589,7 @@ static int ismt_access(struct i2c_adapter *adap, u16 addr, /* unmap the data buffer */ if (dma_size != 0) - dma_unmap_single(&adap->dev, dma_addr, dma_size, dma_direction); + dma_unmap_single(dev, dma_addr, dma_size, dma_direction); if (unlikely(!time_left)) { dev_err(dev, "completion wait timed out\n"); From 2d43ff0ffcf4b4c6587f292fbda4b27786e5e8ba Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 1 Jul 2018 19:46:06 -0700 Subject: [PATCH 628/970] kbuild: verify that $DEPMOD is installed commit 934193a654c1f4d0643ddbf4b2529b508cae926e upstream. Verify that 'depmod' ($DEPMOD) is installed. This is a partial revert of commit 620c231c7a7f ("kbuild: do not check for ancient modutils tools"). Also update Documentation/process/changes.rst to refer to kmod instead of module-init-tools. Fixes kernel bugzilla #198965: https://bugzilla.kernel.org/show_bug.cgi?id=198965 Signed-off-by: Randy Dunlap Cc: Lucas De Marchi Cc: Lucas De Marchi Cc: Michal Marek Cc: Jessica Yu Cc: Chih-Wei Huang Cc: stable@vger.kernel.org # any kernel since 2012 Signed-off-by: Masahiro Yamada Signed-off-by: Greg Kroah-Hartman --- Documentation/Changes | 19 +++++++------------ scripts/depmod.sh | 8 +++++++- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/Documentation/Changes b/Documentation/Changes index 22797a15dc24..76d6dc0d3227 100644 --- a/Documentation/Changes +++ b/Documentation/Changes @@ -33,7 +33,7 @@ GNU C 3.2 gcc --version GNU make 3.80 make --version binutils 2.12 ld -v util-linux 2.10o fdformat --version -module-init-tools 0.9.10 depmod -V +kmod 13 depmod -V e2fsprogs 1.41.4 e2fsck -V jfsutils 1.1.3 fsck.jfs -V reiserfsprogs 3.6.3 reiserfsck -V @@ -143,12 +143,6 @@ is not build with ``CONFIG_KALLSYMS`` and you have no way to rebuild and reproduce the Oops with that option, then you can still decode that Oops with ksymoops. -Module-Init-Tools ------------------ - -A new module loader is now in the kernel that requires ``module-init-tools`` -to use. It is backward compatible with the 2.4.x series kernels. - Mkinitrd -------- @@ -363,16 +357,17 @@ Util-linux - +Kmod +---- + +- +- + Ksymoops -------- - -Module-Init-Tools ------------------ - -- - Mkinitrd -------- diff --git a/scripts/depmod.sh b/scripts/depmod.sh index 122599b1c13b..ea1e96921e3b 100755 --- a/scripts/depmod.sh +++ b/scripts/depmod.sh @@ -10,10 +10,16 @@ DEPMOD=$1 KERNELRELEASE=$2 SYMBOL_PREFIX=$3 -if ! test -r System.map -a -x "$DEPMOD"; then +if ! test -r System.map ; then exit 0 fi +if [ -z $(command -v $DEPMOD) ]; then + echo "'make modules_install' requires $DEPMOD. Please install it." >&2 + echo "This is probably in the kmod package." >&2 + exit 1 +fi + # older versions of depmod don't support -P # support was added in module-init-tools 3.13 if test -n "$SYMBOL_PREFIX"; then From e87485a5542882280ce06c6aac1970a8c4881d73 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 29 Jun 2018 14:14:35 -0700 Subject: [PATCH 629/970] crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2() commit af839b4e546613aed1fbd64def73956aa98631e7 upstream. There is a copy-paste error where sha256_mb_mgr_get_comp_job_avx2() copies the SHA-256 digest state from sha256_mb_mgr::args::digest to job_sha256::result_digest. Consequently, the sha256_mb algorithm sometimes calculates the wrong digest. Fix it. Reproducer using AF_ALG: #include #include #include #include #include #include static const __u8 expected[32] = "\xad\x7f\xac\xb2\x58\x6f\xc6\xe9\x66\xc0\x04\xd7\xd1\xd1\x6b\x02" "\x4f\x58\x05\xff\x7c\xb4\x7c\x7a\x85\xda\xbd\x8b\x48\x89\x2c\xa7"; int main() { int fd; struct sockaddr_alg addr = { .salg_type = "hash", .salg_name = "sha256_mb", }; __u8 data[4096] = { 0 }; __u8 digest[32]; int ret; int i; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); fork(); fd = accept(fd, 0, 0); do { ret = write(fd, data, 4096); assert(ret == 4096); ret = read(fd, digest, 32); assert(ret == 32); } while (memcmp(digest, expected, 32) == 0); printf("wrong digest: "); for (i = 0; i < 32; i++) printf("%02x", digest[i]); printf("\n"); } Output was: wrong digest: ad7facb2000000000000000000000000ffffffef7cb47c7a85dabd8b48892ca7 Fixes: 172b1d6b5a93 ("crypto: sha256-mb - fix ctx pointer and digest copy") Cc: # v4.8+ Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S index ec9bee661d50..b7f50427a3ef 100644 --- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S +++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S @@ -265,7 +265,7 @@ ENTRY(sha256_mb_mgr_get_comp_job_avx2) vpinsrd $1, _args_digest+1*32(state, idx, 4), %xmm0, %xmm0 vpinsrd $2, _args_digest+2*32(state, idx, 4), %xmm0, %xmm0 vpinsrd $3, _args_digest+3*32(state, idx, 4), %xmm0, %xmm0 - vmovd _args_digest(state , idx, 4) , %xmm0 + vmovd _args_digest+4*32(state, idx, 4), %xmm1 vpinsrd $1, _args_digest+5*32(state, idx, 4), %xmm1, %xmm1 vpinsrd $2, _args_digest+6*32(state, idx, 4), %xmm1, %xmm1 vpinsrd $3, _args_digest+7*32(state, idx, 4), %xmm1, %xmm1 From 371c35cb8c77578528cad01f0e0c6dd369dbc730 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 18 Jun 2018 10:22:37 -0700 Subject: [PATCH 630/970] crypto: vmac - require a block cipher with 128-bit block size commit 73bf20ef3df262026c3470241ae4ac8196943ffa upstream. The VMAC template assumes the block cipher has a 128-bit block size, but it failed to check for that. Thus it was possible to instantiate it using a 64-bit block size cipher, e.g. "vmac(cast5)", causing uninitialized memory to be used. Add the needed check when instantiating the template. Fixes: f1939f7c5645 ("crypto: vmac - New hash algorithm for intel_txt support") Cc: # v2.6.32+ Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/vmac.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/crypto/vmac.c b/crypto/vmac.c index df76a816cfb2..3034454a3713 100644 --- a/crypto/vmac.c +++ b/crypto/vmac.c @@ -655,6 +655,10 @@ static int vmac_create(struct crypto_template *tmpl, struct rtattr **tb) if (IS_ERR(alg)) return PTR_ERR(alg); + err = -EINVAL; + if (alg->cra_blocksize != 16) + goto out_put_alg; + inst = shash_alloc_instance("vmac", alg); err = PTR_ERR(inst); if (IS_ERR(inst)) From 81ad8a8e866755385a216cebbb9ae54ad0b31ea2 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 18 Jun 2018 10:22:38 -0700 Subject: [PATCH 631/970] crypto: vmac - separate tfm and request context commit bb29648102335586e9a66289a1d98a0cb392b6e5 upstream. syzbot reported a crash in vmac_final() when multiple threads concurrently use the same "vmac(aes)" transform through AF_ALG. The bug is pretty fundamental: the VMAC template doesn't separate per-request state from per-tfm (per-key) state like the other hash algorithms do, but rather stores it all in the tfm context. That's wrong. Also, vmac_final() incorrectly zeroes most of the state including the derived keys and cached pseudorandom pad. Therefore, only the first VMAC invocation with a given key calculates the correct digest. Fix these bugs by splitting the per-tfm state from the per-request state and using the proper init/update/final sequencing for requests. Reproducer for the crash: #include #include #include int main() { int fd; struct sockaddr_alg addr = { .salg_type = "hash", .salg_name = "vmac(aes)", }; char buf[256] = { 0 }; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, buf, 16); fork(); fd = accept(fd, NULL, NULL); for (;;) write(fd, buf, 256); } The immediate cause of the crash is that vmac_ctx_t.partial_size exceeds VMAC_NHBYTES, causing vmac_final() to memset() a negative length. Reported-by: syzbot+264bca3a6e8d645550d3@syzkaller.appspotmail.com Fixes: f1939f7c5645 ("crypto: vmac - New hash algorithm for intel_txt support") Cc: # v2.6.32+ Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/vmac.c | 428 +++++++++++++++++++----------------------- include/crypto/vmac.h | 63 ------- 2 files changed, 191 insertions(+), 300 deletions(-) delete mode 100644 include/crypto/vmac.h diff --git a/crypto/vmac.c b/crypto/vmac.c index 3034454a3713..bb2fc787d615 100644 --- a/crypto/vmac.c +++ b/crypto/vmac.c @@ -1,6 +1,10 @@ /* - * Modified to interface to the Linux kernel + * VMAC: Message Authentication Code using Universal Hashing + * + * Reference: https://tools.ietf.org/html/draft-krovetz-vmac-01 + * * Copyright (c) 2009, Intel Corporation. + * Copyright (c) 2018, Google Inc. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -16,14 +20,15 @@ * Place - Suite 330, Boston, MA 02111-1307 USA. */ -/* -------------------------------------------------------------------------- - * VMAC and VHASH Implementation by Ted Krovetz (tdk@acm.org) and Wei Dai. - * This implementation is herby placed in the public domain. - * The authors offers no warranty. Use at your own risk. - * Please send bug reports to the authors. - * Last modified: 17 APR 08, 1700 PDT - * ----------------------------------------------------------------------- */ +/* + * Derived from: + * VMAC and VHASH Implementation by Ted Krovetz (tdk@acm.org) and Wei Dai. + * This implementation is herby placed in the public domain. + * The authors offers no warranty. Use at your own risk. + * Last modified: 17 APR 08, 1700 PDT + */ +#include #include #include #include @@ -31,9 +36,35 @@ #include #include #include -#include #include +/* + * User definable settings. + */ +#define VMAC_TAG_LEN 64 +#define VMAC_KEY_SIZE 128/* Must be 128, 192 or 256 */ +#define VMAC_KEY_LEN (VMAC_KEY_SIZE/8) +#define VMAC_NHBYTES 128/* Must 2^i for any 3 < i < 13 Standard = 128*/ + +/* per-transform (per-key) context */ +struct vmac_tfm_ctx { + struct crypto_cipher *cipher; + u64 nhkey[(VMAC_NHBYTES/8)+2*(VMAC_TAG_LEN/64-1)]; + u64 polykey[2*VMAC_TAG_LEN/64]; + u64 l3key[2*VMAC_TAG_LEN/64]; +}; + +/* per-request context */ +struct vmac_desc_ctx { + union { + u8 partial[VMAC_NHBYTES]; /* partial block */ + __le64 partial_words[VMAC_NHBYTES / 8]; + }; + unsigned int partial_size; /* size of the partial block */ + bool first_block_processed; + u64 polytmp[2*VMAC_TAG_LEN/64]; /* running total of L2-hash */ +}; + /* * Constants and masks */ @@ -318,13 +349,6 @@ static void poly_step_func(u64 *ahi, u64 *alo, } while (0) #endif -static void vhash_abort(struct vmac_ctx *ctx) -{ - ctx->polytmp[0] = ctx->polykey[0] ; - ctx->polytmp[1] = ctx->polykey[1] ; - ctx->first_block_processed = 0; -} - static u64 l3hash(u64 p1, u64 p2, u64 k1, u64 k2, u64 len) { u64 rh, rl, t, z = 0; @@ -364,280 +388,209 @@ static u64 l3hash(u64 p1, u64 p2, u64 k1, u64 k2, u64 len) return rl; } -static void vhash_update(const unsigned char *m, - unsigned int mbytes, /* Pos multiple of VMAC_NHBYTES */ - struct vmac_ctx *ctx) +/* L1 and L2-hash one or more VMAC_NHBYTES-byte blocks */ +static void vhash_blocks(const struct vmac_tfm_ctx *tctx, + struct vmac_desc_ctx *dctx, + const __le64 *mptr, unsigned int blocks) { - u64 rh, rl, *mptr; - const u64 *kptr = (u64 *)ctx->nhkey; - int i; - u64 ch, cl; - u64 pkh = ctx->polykey[0]; - u64 pkl = ctx->polykey[1]; + const u64 *kptr = tctx->nhkey; + const u64 pkh = tctx->polykey[0]; + const u64 pkl = tctx->polykey[1]; + u64 ch = dctx->polytmp[0]; + u64 cl = dctx->polytmp[1]; + u64 rh, rl; - if (!mbytes) - return; - - BUG_ON(mbytes % VMAC_NHBYTES); - - mptr = (u64 *)m; - i = mbytes / VMAC_NHBYTES; /* Must be non-zero */ - - ch = ctx->polytmp[0]; - cl = ctx->polytmp[1]; - - if (!ctx->first_block_processed) { - ctx->first_block_processed = 1; + if (!dctx->first_block_processed) { + dctx->first_block_processed = true; nh_vmac_nhbytes(mptr, kptr, VMAC_NHBYTES/8, rh, rl); rh &= m62; ADD128(ch, cl, rh, rl); mptr += (VMAC_NHBYTES/sizeof(u64)); - i--; + blocks--; } - while (i--) { + while (blocks--) { nh_vmac_nhbytes(mptr, kptr, VMAC_NHBYTES/8, rh, rl); rh &= m62; poly_step(ch, cl, pkh, pkl, rh, rl); mptr += (VMAC_NHBYTES/sizeof(u64)); } - ctx->polytmp[0] = ch; - ctx->polytmp[1] = cl; + dctx->polytmp[0] = ch; + dctx->polytmp[1] = cl; } -static u64 vhash(unsigned char m[], unsigned int mbytes, - u64 *tagl, struct vmac_ctx *ctx) +static int vmac_setkey(struct crypto_shash *tfm, + const u8 *key, unsigned int keylen) { - u64 rh, rl, *mptr; - const u64 *kptr = (u64 *)ctx->nhkey; - int i, remaining; - u64 ch, cl; - u64 pkh = ctx->polykey[0]; - u64 pkl = ctx->polykey[1]; + struct vmac_tfm_ctx *tctx = crypto_shash_ctx(tfm); + __be64 out[2]; + u8 in[16] = { 0 }; + unsigned int i; + int err; - mptr = (u64 *)m; - i = mbytes / VMAC_NHBYTES; - remaining = mbytes % VMAC_NHBYTES; - - if (ctx->first_block_processed) { - ch = ctx->polytmp[0]; - cl = ctx->polytmp[1]; - } else if (i) { - nh_vmac_nhbytes(mptr, kptr, VMAC_NHBYTES/8, ch, cl); - ch &= m62; - ADD128(ch, cl, pkh, pkl); - mptr += (VMAC_NHBYTES/sizeof(u64)); - i--; - } else if (remaining) { - nh_16(mptr, kptr, 2*((remaining+15)/16), ch, cl); - ch &= m62; - ADD128(ch, cl, pkh, pkl); - mptr += (VMAC_NHBYTES/sizeof(u64)); - goto do_l3; - } else {/* Empty String */ - ch = pkh; cl = pkl; - goto do_l3; + if (keylen != VMAC_KEY_LEN) { + crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN); + return -EINVAL; } - while (i--) { - nh_vmac_nhbytes(mptr, kptr, VMAC_NHBYTES/8, rh, rl); - rh &= m62; - poly_step(ch, cl, pkh, pkl, rh, rl); - mptr += (VMAC_NHBYTES/sizeof(u64)); - } - if (remaining) { - nh_16(mptr, kptr, 2*((remaining+15)/16), rh, rl); - rh &= m62; - poly_step(ch, cl, pkh, pkl, rh, rl); - } - -do_l3: - vhash_abort(ctx); - remaining *= 8; - return l3hash(ch, cl, ctx->l3key[0], ctx->l3key[1], remaining); -} - -static u64 vmac(unsigned char m[], unsigned int mbytes, - const unsigned char n[16], u64 *tagl, - struct vmac_ctx_t *ctx) -{ - u64 *in_n, *out_p; - u64 p, h; - int i; - - in_n = ctx->__vmac_ctx.cached_nonce; - out_p = ctx->__vmac_ctx.cached_aes; - - i = n[15] & 1; - if ((*(u64 *)(n+8) != in_n[1]) || (*(u64 *)(n) != in_n[0])) { - in_n[0] = *(u64 *)(n); - in_n[1] = *(u64 *)(n+8); - ((unsigned char *)in_n)[15] &= 0xFE; - crypto_cipher_encrypt_one(ctx->child, - (unsigned char *)out_p, (unsigned char *)in_n); - - ((unsigned char *)in_n)[15] |= (unsigned char)(1-i); - } - p = be64_to_cpup(out_p + i); - h = vhash(m, mbytes, (u64 *)0, &ctx->__vmac_ctx); - return le64_to_cpu(p + h); -} - -static int vmac_set_key(unsigned char user_key[], struct vmac_ctx_t *ctx) -{ - u64 in[2] = {0}, out[2]; - unsigned i; - int err = 0; - - err = crypto_cipher_setkey(ctx->child, user_key, VMAC_KEY_LEN); + err = crypto_cipher_setkey(tctx->cipher, key, keylen); if (err) return err; /* Fill nh key */ - ((unsigned char *)in)[0] = 0x80; - for (i = 0; i < sizeof(ctx->__vmac_ctx.nhkey)/8; i += 2) { - crypto_cipher_encrypt_one(ctx->child, - (unsigned char *)out, (unsigned char *)in); - ctx->__vmac_ctx.nhkey[i] = be64_to_cpup(out); - ctx->__vmac_ctx.nhkey[i+1] = be64_to_cpup(out+1); - ((unsigned char *)in)[15] += 1; + in[0] = 0x80; + for (i = 0; i < ARRAY_SIZE(tctx->nhkey); i += 2) { + crypto_cipher_encrypt_one(tctx->cipher, (u8 *)out, in); + tctx->nhkey[i] = be64_to_cpu(out[0]); + tctx->nhkey[i+1] = be64_to_cpu(out[1]); + in[15]++; } /* Fill poly key */ - ((unsigned char *)in)[0] = 0xC0; - in[1] = 0; - for (i = 0; i < sizeof(ctx->__vmac_ctx.polykey)/8; i += 2) { - crypto_cipher_encrypt_one(ctx->child, - (unsigned char *)out, (unsigned char *)in); - ctx->__vmac_ctx.polytmp[i] = - ctx->__vmac_ctx.polykey[i] = - be64_to_cpup(out) & mpoly; - ctx->__vmac_ctx.polytmp[i+1] = - ctx->__vmac_ctx.polykey[i+1] = - be64_to_cpup(out+1) & mpoly; - ((unsigned char *)in)[15] += 1; + in[0] = 0xC0; + in[15] = 0; + for (i = 0; i < ARRAY_SIZE(tctx->polykey); i += 2) { + crypto_cipher_encrypt_one(tctx->cipher, (u8 *)out, in); + tctx->polykey[i] = be64_to_cpu(out[0]) & mpoly; + tctx->polykey[i+1] = be64_to_cpu(out[1]) & mpoly; + in[15]++; } /* Fill ip key */ - ((unsigned char *)in)[0] = 0xE0; - in[1] = 0; - for (i = 0; i < sizeof(ctx->__vmac_ctx.l3key)/8; i += 2) { + in[0] = 0xE0; + in[15] = 0; + for (i = 0; i < ARRAY_SIZE(tctx->l3key); i += 2) { do { - crypto_cipher_encrypt_one(ctx->child, - (unsigned char *)out, (unsigned char *)in); - ctx->__vmac_ctx.l3key[i] = be64_to_cpup(out); - ctx->__vmac_ctx.l3key[i+1] = be64_to_cpup(out+1); - ((unsigned char *)in)[15] += 1; - } while (ctx->__vmac_ctx.l3key[i] >= p64 - || ctx->__vmac_ctx.l3key[i+1] >= p64); + crypto_cipher_encrypt_one(tctx->cipher, (u8 *)out, in); + tctx->l3key[i] = be64_to_cpu(out[0]); + tctx->l3key[i+1] = be64_to_cpu(out[1]); + in[15]++; + } while (tctx->l3key[i] >= p64 || tctx->l3key[i+1] >= p64); } - /* Invalidate nonce/aes cache and reset other elements */ - ctx->__vmac_ctx.cached_nonce[0] = (u64)-1; /* Ensure illegal nonce */ - ctx->__vmac_ctx.cached_nonce[1] = (u64)0; /* Ensure illegal nonce */ - ctx->__vmac_ctx.first_block_processed = 0; - - return err; -} - -static int vmac_setkey(struct crypto_shash *parent, - const u8 *key, unsigned int keylen) -{ - struct vmac_ctx_t *ctx = crypto_shash_ctx(parent); - - if (keylen != VMAC_KEY_LEN) { - crypto_shash_set_flags(parent, CRYPTO_TFM_RES_BAD_KEY_LEN); - return -EINVAL; - } - - return vmac_set_key((u8 *)key, ctx); -} - -static int vmac_init(struct shash_desc *pdesc) -{ - return 0; -} - -static int vmac_update(struct shash_desc *pdesc, const u8 *p, - unsigned int len) -{ - struct crypto_shash *parent = pdesc->tfm; - struct vmac_ctx_t *ctx = crypto_shash_ctx(parent); - int expand; - int min; - - expand = VMAC_NHBYTES - ctx->partial_size > 0 ? - VMAC_NHBYTES - ctx->partial_size : 0; - - min = len < expand ? len : expand; - - memcpy(ctx->partial + ctx->partial_size, p, min); - ctx->partial_size += min; - - if (len < expand) - return 0; - - vhash_update(ctx->partial, VMAC_NHBYTES, &ctx->__vmac_ctx); - ctx->partial_size = 0; - - len -= expand; - p += expand; - - if (len % VMAC_NHBYTES) { - memcpy(ctx->partial, p + len - (len % VMAC_NHBYTES), - len % VMAC_NHBYTES); - ctx->partial_size = len % VMAC_NHBYTES; - } - - vhash_update(p, len - len % VMAC_NHBYTES, &ctx->__vmac_ctx); - return 0; } -static int vmac_final(struct shash_desc *pdesc, u8 *out) +static int vmac_init(struct shash_desc *desc) { - struct crypto_shash *parent = pdesc->tfm; - struct vmac_ctx_t *ctx = crypto_shash_ctx(parent); - vmac_t mac; - u8 nonce[16] = {}; + const struct vmac_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); + struct vmac_desc_ctx *dctx = shash_desc_ctx(desc); - /* vmac() ends up accessing outside the array bounds that - * we specify. In appears to access up to the next 2-word - * boundary. We'll just be uber cautious and zero the - * unwritten bytes in the buffer. - */ - if (ctx->partial_size) { - memset(ctx->partial + ctx->partial_size, 0, - VMAC_NHBYTES - ctx->partial_size); + dctx->partial_size = 0; + dctx->first_block_processed = false; + memcpy(dctx->polytmp, tctx->polykey, sizeof(dctx->polytmp)); + return 0; +} + +static int vmac_update(struct shash_desc *desc, const u8 *p, unsigned int len) +{ + const struct vmac_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); + struct vmac_desc_ctx *dctx = shash_desc_ctx(desc); + unsigned int n; + + if (dctx->partial_size) { + n = min(len, VMAC_NHBYTES - dctx->partial_size); + memcpy(&dctx->partial[dctx->partial_size], p, n); + dctx->partial_size += n; + p += n; + len -= n; + if (dctx->partial_size == VMAC_NHBYTES) { + vhash_blocks(tctx, dctx, dctx->partial_words, 1); + dctx->partial_size = 0; + } } - mac = vmac(ctx->partial, ctx->partial_size, nonce, NULL, ctx); - memcpy(out, &mac, sizeof(vmac_t)); - memzero_explicit(&mac, sizeof(vmac_t)); - memset(&ctx->__vmac_ctx, 0, sizeof(struct vmac_ctx)); - ctx->partial_size = 0; + + if (len >= VMAC_NHBYTES) { + n = round_down(len, VMAC_NHBYTES); + /* TODO: 'p' may be misaligned here */ + vhash_blocks(tctx, dctx, (const __le64 *)p, n / VMAC_NHBYTES); + p += n; + len -= n; + } + + if (len) { + memcpy(dctx->partial, p, len); + dctx->partial_size = len; + } + + return 0; +} + +static u64 vhash_final(const struct vmac_tfm_ctx *tctx, + struct vmac_desc_ctx *dctx) +{ + unsigned int partial = dctx->partial_size; + u64 ch = dctx->polytmp[0]; + u64 cl = dctx->polytmp[1]; + + /* L1 and L2-hash the final block if needed */ + if (partial) { + /* Zero-pad to next 128-bit boundary */ + unsigned int n = round_up(partial, 16); + u64 rh, rl; + + memset(&dctx->partial[partial], 0, n - partial); + nh_16(dctx->partial_words, tctx->nhkey, n / 8, rh, rl); + rh &= m62; + if (dctx->first_block_processed) + poly_step(ch, cl, tctx->polykey[0], tctx->polykey[1], + rh, rl); + else + ADD128(ch, cl, rh, rl); + } + + /* L3-hash the 128-bit output of L2-hash */ + return l3hash(ch, cl, tctx->l3key[0], tctx->l3key[1], partial * 8); +} + +static int vmac_final(struct shash_desc *desc, u8 *out) +{ + const struct vmac_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); + struct vmac_desc_ctx *dctx = shash_desc_ctx(desc); + static const u8 nonce[16] = {}; /* TODO: this is insecure */ + union { + u8 bytes[16]; + __be64 pads[2]; + } block; + int index; + u64 hash, pad; + + /* Finish calculating the VHASH of the message */ + hash = vhash_final(tctx, dctx); + + /* Generate pseudorandom pad by encrypting the nonce */ + memcpy(&block, nonce, 16); + index = block.bytes[15] & 1; + block.bytes[15] &= ~1; + crypto_cipher_encrypt_one(tctx->cipher, block.bytes, block.bytes); + pad = be64_to_cpu(block.pads[index]); + + /* The VMAC is the sum of VHASH and the pseudorandom pad */ + put_unaligned_le64(hash + pad, out); return 0; } static int vmac_init_tfm(struct crypto_tfm *tfm) { - struct crypto_cipher *cipher; - struct crypto_instance *inst = (void *)tfm->__crt_alg; + struct crypto_instance *inst = crypto_tfm_alg_instance(tfm); struct crypto_spawn *spawn = crypto_instance_ctx(inst); - struct vmac_ctx_t *ctx = crypto_tfm_ctx(tfm); + struct vmac_tfm_ctx *tctx = crypto_tfm_ctx(tfm); + struct crypto_cipher *cipher; cipher = crypto_spawn_cipher(spawn); if (IS_ERR(cipher)) return PTR_ERR(cipher); - ctx->child = cipher; + tctx->cipher = cipher; return 0; } static void vmac_exit_tfm(struct crypto_tfm *tfm) { - struct vmac_ctx_t *ctx = crypto_tfm_ctx(tfm); - crypto_free_cipher(ctx->child); + struct vmac_tfm_ctx *tctx = crypto_tfm_ctx(tfm); + + crypto_free_cipher(tctx->cipher); } static int vmac_create(struct crypto_template *tmpl, struct rtattr **tb) @@ -674,11 +627,12 @@ static int vmac_create(struct crypto_template *tmpl, struct rtattr **tb) inst->alg.base.cra_blocksize = alg->cra_blocksize; inst->alg.base.cra_alignmask = alg->cra_alignmask; - inst->alg.digestsize = sizeof(vmac_t); - inst->alg.base.cra_ctxsize = sizeof(struct vmac_ctx_t); + inst->alg.base.cra_ctxsize = sizeof(struct vmac_tfm_ctx); inst->alg.base.cra_init = vmac_init_tfm; inst->alg.base.cra_exit = vmac_exit_tfm; + inst->alg.descsize = sizeof(struct vmac_desc_ctx); + inst->alg.digestsize = VMAC_TAG_LEN / 8; inst->alg.init = vmac_init; inst->alg.update = vmac_update; inst->alg.final = vmac_final; diff --git a/include/crypto/vmac.h b/include/crypto/vmac.h deleted file mode 100644 index 6b700c7b2fe1..000000000000 --- a/include/crypto/vmac.h +++ /dev/null @@ -1,63 +0,0 @@ -/* - * Modified to interface to the Linux kernel - * Copyright (c) 2009, Intel Corporation. - * - * This program is free software; you can redistribute it and/or modify it - * under the terms and conditions of the GNU General Public License, - * version 2, as published by the Free Software Foundation. - * - * This program is distributed in the hope it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., 59 Temple - * Place - Suite 330, Boston, MA 02111-1307 USA. - */ - -#ifndef __CRYPTO_VMAC_H -#define __CRYPTO_VMAC_H - -/* -------------------------------------------------------------------------- - * VMAC and VHASH Implementation by Ted Krovetz (tdk@acm.org) and Wei Dai. - * This implementation is herby placed in the public domain. - * The authors offers no warranty. Use at your own risk. - * Please send bug reports to the authors. - * Last modified: 17 APR 08, 1700 PDT - * ----------------------------------------------------------------------- */ - -/* - * User definable settings. - */ -#define VMAC_TAG_LEN 64 -#define VMAC_KEY_SIZE 128/* Must be 128, 192 or 256 */ -#define VMAC_KEY_LEN (VMAC_KEY_SIZE/8) -#define VMAC_NHBYTES 128/* Must 2^i for any 3 < i < 13 Standard = 128*/ - -/* - * This implementation uses u32 and u64 as names for unsigned 32- - * and 64-bit integer types. These are defined in C99 stdint.h. The - * following may need adaptation if you are not running a C99 or - * Microsoft C environment. - */ -struct vmac_ctx { - u64 nhkey[(VMAC_NHBYTES/8)+2*(VMAC_TAG_LEN/64-1)]; - u64 polykey[2*VMAC_TAG_LEN/64]; - u64 l3key[2*VMAC_TAG_LEN/64]; - u64 polytmp[2*VMAC_TAG_LEN/64]; - u64 cached_nonce[2]; - u64 cached_aes[2]; - int first_block_processed; -}; - -typedef u64 vmac_t; - -struct vmac_ctx_t { - struct crypto_cipher *child; - struct vmac_ctx __vmac_ctx; - u8 partial[VMAC_NHBYTES]; /* partial block */ - int partial_size; /* size of the partial block */ -}; - -#endif /* __CRYPTO_VMAC_H */ From afd5c42dea3f17932059471ce80cfd6d8fa93dfb Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 23 Jul 2018 10:54:57 -0700 Subject: [PATCH 632/970] crypto: blkcipher - fix crash flushing dcache in error path commit 0868def3e4100591e7a1fdbf3eed1439cc8f7ca3 upstream. Like the skcipher_walk case: scatterwalk_done() is only meant to be called after a nonzero number of bytes have been processed, since scatterwalk_pagedone() will flush the dcache of the *previous* page. But in the error case of blkcipher_walk_done(), e.g. if the input wasn't an integer number of blocks, scatterwalk_done() was actually called after advancing 0 bytes. This caused a crash ("BUG: unable to handle kernel paging request") during '!PageSlab(page)' on architectures like arm and arm64 that define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was page-aligned as in that case walk->offset == 0. Fix it by reorganizing blkcipher_walk_done() to skip the scatterwalk_advance() and scatterwalk_done() if an error has occurred. This bug was found by syzkaller fuzzing. Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE: #include #include #include int main() { struct sockaddr_alg addr = { .salg_type = "skcipher", .salg_name = "ecb(aes-generic)", }; char buffer[4096] __attribute__((aligned(4096))) = { 0 }; int fd; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16); fd = accept(fd, NULL, NULL); write(fd, buffer, 15); read(fd, buffer, 15); } Reported-by: Liu Chao Fixes: 5cde0af2a982 ("[CRYPTO] cipher: Added block cipher type") Cc: # v2.6.19+ Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/blkcipher.c | 54 ++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index a832426820e8..27f98666763a 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -70,19 +70,18 @@ static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len) return max(start, end_page); } -static inline unsigned int blkcipher_done_slow(struct blkcipher_walk *walk, - unsigned int bsize) +static inline void blkcipher_done_slow(struct blkcipher_walk *walk, + unsigned int bsize) { u8 *addr; addr = (u8 *)ALIGN((unsigned long)walk->buffer, walk->alignmask + 1); addr = blkcipher_get_spot(addr, bsize); scatterwalk_copychunks(addr, &walk->out, bsize, 1); - return bsize; } -static inline unsigned int blkcipher_done_fast(struct blkcipher_walk *walk, - unsigned int n) +static inline void blkcipher_done_fast(struct blkcipher_walk *walk, + unsigned int n) { if (walk->flags & BLKCIPHER_WALK_COPY) { blkcipher_map_dst(walk); @@ -96,49 +95,48 @@ static inline unsigned int blkcipher_done_fast(struct blkcipher_walk *walk, scatterwalk_advance(&walk->in, n); scatterwalk_advance(&walk->out, n); - - return n; } int blkcipher_walk_done(struct blkcipher_desc *desc, struct blkcipher_walk *walk, int err) { - unsigned int nbytes = 0; + unsigned int n; /* bytes processed */ + bool more; - if (likely(err >= 0)) { - unsigned int n = walk->nbytes - err; + if (unlikely(err < 0)) + goto finish; - if (likely(!(walk->flags & BLKCIPHER_WALK_SLOW))) - n = blkcipher_done_fast(walk, n); - else if (WARN_ON(err)) { + n = walk->nbytes - err; + walk->total -= n; + more = (walk->total != 0); + + if (likely(!(walk->flags & BLKCIPHER_WALK_SLOW))) { + blkcipher_done_fast(walk, n); + } else { + if (WARN_ON(err)) { + /* unexpected case; didn't process all bytes */ err = -EINVAL; - goto err; - } else - n = blkcipher_done_slow(walk, n); - - nbytes = walk->total - n; - err = 0; + goto finish; + } + blkcipher_done_slow(walk, n); } - scatterwalk_done(&walk->in, 0, nbytes); - scatterwalk_done(&walk->out, 1, nbytes); + scatterwalk_done(&walk->in, 0, more); + scatterwalk_done(&walk->out, 1, more); -err: - walk->total = nbytes; - walk->nbytes = nbytes; - - if (nbytes) { + if (more) { crypto_yield(desc->flags); return blkcipher_walk_next(desc, walk); } - + err = 0; +finish: + walk->nbytes = 0; if (walk->iv != desc->info) memcpy(desc->info, walk->iv, walk->ivsize); if (walk->buffer != walk->page) kfree(walk->buffer); if (walk->page) free_page((unsigned long)walk->page); - return err; } EXPORT_SYMBOL_GPL(blkcipher_walk_done); From b7c2b69911f84b6819d300c8b2849c04a1b782f0 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 23 Jul 2018 10:54:58 -0700 Subject: [PATCH 633/970] crypto: ablkcipher - fix crash flushing dcache in error path commit 318abdfbe708aaaa652c79fb500e9bd60521f9dc upstream. Like the skcipher_walk and blkcipher_walk cases: scatterwalk_done() is only meant to be called after a nonzero number of bytes have been processed, since scatterwalk_pagedone() will flush the dcache of the *previous* page. But in the error case of ablkcipher_walk_done(), e.g. if the input wasn't an integer number of blocks, scatterwalk_done() was actually called after advancing 0 bytes. This caused a crash ("BUG: unable to handle kernel paging request") during '!PageSlab(page)' on architectures like arm and arm64 that define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was page-aligned as in that case walk->offset == 0. Fix it by reorganizing ablkcipher_walk_done() to skip the scatterwalk_advance() and scatterwalk_done() if an error has occurred. Reported-by: Liu Chao Fixes: bf06099db18a ("crypto: skcipher - Add ablkcipher_walk interfaces") Cc: # v2.6.35+ Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/ablkcipher.c | 57 +++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 31 deletions(-) diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c index d676fc59521a..860c9e5dfd7a 100644 --- a/crypto/ablkcipher.c +++ b/crypto/ablkcipher.c @@ -70,11 +70,9 @@ static inline u8 *ablkcipher_get_spot(u8 *start, unsigned int len) return max(start, end_page); } -static inline unsigned int ablkcipher_done_slow(struct ablkcipher_walk *walk, - unsigned int bsize) +static inline void ablkcipher_done_slow(struct ablkcipher_walk *walk, + unsigned int n) { - unsigned int n = bsize; - for (;;) { unsigned int len_this_page = scatterwalk_pagelen(&walk->out); @@ -86,17 +84,13 @@ static inline unsigned int ablkcipher_done_slow(struct ablkcipher_walk *walk, n -= len_this_page; scatterwalk_start(&walk->out, sg_next(walk->out.sg)); } - - return bsize; } -static inline unsigned int ablkcipher_done_fast(struct ablkcipher_walk *walk, - unsigned int n) +static inline void ablkcipher_done_fast(struct ablkcipher_walk *walk, + unsigned int n) { scatterwalk_advance(&walk->in, n); scatterwalk_advance(&walk->out, n); - - return n; } static int ablkcipher_walk_next(struct ablkcipher_request *req, @@ -106,39 +100,40 @@ int ablkcipher_walk_done(struct ablkcipher_request *req, struct ablkcipher_walk *walk, int err) { struct crypto_tfm *tfm = req->base.tfm; - unsigned int nbytes = 0; + unsigned int n; /* bytes processed */ + bool more; - if (likely(err >= 0)) { - unsigned int n = walk->nbytes - err; + if (unlikely(err < 0)) + goto finish; - if (likely(!(walk->flags & ABLKCIPHER_WALK_SLOW))) - n = ablkcipher_done_fast(walk, n); - else if (WARN_ON(err)) { + n = walk->nbytes - err; + walk->total -= n; + more = (walk->total != 0); + + if (likely(!(walk->flags & ABLKCIPHER_WALK_SLOW))) { + ablkcipher_done_fast(walk, n); + } else { + if (WARN_ON(err)) { + /* unexpected case; didn't process all bytes */ err = -EINVAL; - goto err; - } else - n = ablkcipher_done_slow(walk, n); - - nbytes = walk->total - n; - err = 0; + goto finish; + } + ablkcipher_done_slow(walk, n); } - scatterwalk_done(&walk->in, 0, nbytes); - scatterwalk_done(&walk->out, 1, nbytes); + scatterwalk_done(&walk->in, 0, more); + scatterwalk_done(&walk->out, 1, more); -err: - walk->total = nbytes; - walk->nbytes = nbytes; - - if (nbytes) { + if (more) { crypto_yield(req->base.flags); return ablkcipher_walk_next(req, walk); } - + err = 0; +finish: + walk->nbytes = 0; if (walk->iv != req->info) memcpy(req->info, walk->iv, tfm->crt_ablkcipher.ivsize); kfree(walk->iv_buffer); - return err; } EXPORT_SYMBOL_GPL(ablkcipher_walk_done); From 5daf24711f653f695352c3f916b70fa227bf9a78 Mon Sep 17 00:00:00 2001 From: Thierry Escande Date: Fri, 8 Sep 2017 00:13:08 -0500 Subject: [PATCH 634/970] ASoC: Intel: cht_bsw_max98090_ti: Fix jack initialization commit 3bbda5a38601f7675a214be2044e41d7749e6c7b upstream. If the ts3a227e audio accessory detection hardware is present and its driver probed, the jack needs to be created before enabling jack detection in the ts3a227e driver. With this patch, the jack is instantiated in the max98090 headset init function if the ts3a227e is present. This fixes a null pointer dereference as the jack detection enabling function in the ts3a driver was called before the jack is created. [minor correction to keep error handling on jack creation the same as before by Pierre Bossart] Signed-off-by: Thierry Escande Signed-off-by: Pierre-Louis Bossart Acked-By: Vinod Koul Signed-off-by: Mark Brown Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- sound/soc/intel/boards/cht_bsw_max98090_ti.c | 45 ++++++++++++++------ 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/sound/soc/intel/boards/cht_bsw_max98090_ti.c b/sound/soc/intel/boards/cht_bsw_max98090_ti.c index cdcced9f32b6..b7c1e3d74ccc 100644 --- a/sound/soc/intel/boards/cht_bsw_max98090_ti.c +++ b/sound/soc/intel/boards/cht_bsw_max98090_ti.c @@ -128,23 +128,19 @@ static int cht_codec_init(struct snd_soc_pcm_runtime *runtime) struct cht_mc_private *ctx = snd_soc_card_get_drvdata(runtime->card); struct snd_soc_jack *jack = &ctx->jack; - /** - * TI supports 4 butons headset detection - * KEY_MEDIA - * KEY_VOICECOMMAND - * KEY_VOLUMEUP - * KEY_VOLUMEDOWN - */ - if (ctx->ts3a227e_present) - jack_type = SND_JACK_HEADPHONE | SND_JACK_MICROPHONE | - SND_JACK_BTN_0 | SND_JACK_BTN_1 | - SND_JACK_BTN_2 | SND_JACK_BTN_3; - else - jack_type = SND_JACK_HEADPHONE | SND_JACK_MICROPHONE; + if (ctx->ts3a227e_present) { + /* + * The jack has already been created in the + * cht_max98090_headset_init() function. + */ + snd_soc_jack_notifier_register(jack, &cht_jack_nb); + return 0; + } + + jack_type = SND_JACK_HEADPHONE | SND_JACK_MICROPHONE; ret = snd_soc_card_jack_new(runtime->card, "Headset Jack", jack_type, jack, NULL, 0); - if (ret) { dev_err(runtime->dev, "Headset Jack creation failed %d\n", ret); return ret; @@ -200,6 +196,27 @@ static int cht_max98090_headset_init(struct snd_soc_component *component) { struct snd_soc_card *card = component->card; struct cht_mc_private *ctx = snd_soc_card_get_drvdata(card); + struct snd_soc_jack *jack = &ctx->jack; + int jack_type; + int ret; + + /* + * TI supports 4 butons headset detection + * KEY_MEDIA + * KEY_VOICECOMMAND + * KEY_VOLUMEUP + * KEY_VOLUMEDOWN + */ + jack_type = SND_JACK_HEADPHONE | SND_JACK_MICROPHONE | + SND_JACK_BTN_0 | SND_JACK_BTN_1 | + SND_JACK_BTN_2 | SND_JACK_BTN_3; + + ret = snd_soc_card_jack_new(card, "Headset Jack", jack_type, + jack, NULL, 0); + if (ret) { + dev_err(card->dev, "Headset Jack creation failed %d\n", ret); + return ret; + } return ts3a227e_enable_jack_detect(component, &ctx->jack); } From 7c7940ffbaefdbb189f78a48b4e64b6f268b1dbf Mon Sep 17 00:00:00 2001 From: Mark Salyzyn Date: Tue, 31 Jul 2018 15:02:13 -0700 Subject: [PATCH 635/970] Bluetooth: hidp: buffer overflow in hidp_process_report commit 7992c18810e568b95c869b227137a2215702a805 upstream. CVE-2018-9363 The buffer length is unsigned at all layers, but gets cast to int and checked in hidp_process_report and can lead to a buffer overflow. Switch len parameter to unsigned int to resolve issue. This affects 3.18 and newer kernels. Signed-off-by: Mark Salyzyn Fixes: a4b1b5877b514b276f0f31efe02388a9c2836728 ("HID: Bluetooth: hidp: make sure input buffers are big enough") Cc: Marcel Holtmann Cc: Johan Hedberg Cc: "David S. Miller" Cc: Kees Cook Cc: Benjamin Tissoires Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: security@kernel.org Cc: kernel-team@android.com Acked-by: Kees Cook Signed-off-by: Marcel Holtmann Signed-off-by: Greg Kroah-Hartman --- net/bluetooth/hidp/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index 1fc076420d1e..1811f8e7ddf4 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -431,8 +431,8 @@ static void hidp_del_timer(struct hidp_session *session) del_timer(&session->timer); } -static void hidp_process_report(struct hidp_session *session, - int type, const u8 *data, int len, int intr) +static void hidp_process_report(struct hidp_session *session, int type, + const u8 *data, unsigned int len, int intr) { if (len > HID_MAX_BUFFER_SIZE) len = HID_MAX_BUFFER_SIZE; From 6e6b637779d78fd7d32e4e19d59cfe815e68857c Mon Sep 17 00:00:00 2001 From: Chintan Pandya Date: Wed, 27 Jun 2018 08:13:47 -0600 Subject: [PATCH 636/970] ioremap: Update pgtable free interfaces with addr commit 785a19f9d1dd8a4ab2d0633be4656653bd3de1fc upstream. The following kernel panic was observed on ARM64 platform due to a stale TLB entry. 1. ioremap with 4K size, a valid pte page table is set. 2. iounmap it, its pte entry is set to 0. 3. ioremap the same address with 2M size, update its pmd entry with a new value. 4. CPU may hit an exception because the old pmd entry is still in TLB, which leads to a kernel panic. Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table") has addressed this panic by falling to pte mappings in the above case on ARM64. To support pmd mappings in all cases, TLB purge needs to be performed in this case on ARM64. Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page() so that TLB purge can be added later in seprate patches. [toshi.kani@hpe.com: merge changes, rewrite patch description] Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Chintan Pandya Signed-off-by: Toshi Kani Signed-off-by: Thomas Gleixner Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: Will Deacon Cc: Joerg Roedel Cc: stable@vger.kernel.org Cc: Andrew Morton Cc: Michal Hocko Cc: "H. Peter Anvin" Cc: Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com Signed-off-by: Greg Kroah-Hartman --- arch/arm64/mm/mmu.c | 4 ++-- arch/x86/mm/pgtable.c | 12 +++++++----- include/asm-generic/pgtable.h | 8 ++++---- lib/ioremap.c | 4 ++-- 4 files changed, 15 insertions(+), 13 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 4cd4862845cd..0a56898f8410 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -804,12 +804,12 @@ int pmd_clear_huge(pmd_t *pmd) return 1; } -int pud_free_pmd_page(pud_t *pud) +int pud_free_pmd_page(pud_t *pud, unsigned long addr) { return pud_none(*pud); } -int pmd_free_pte_page(pmd_t *pmd) +int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) { return pmd_none(*pmd); } diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 36d0b81933f8..37786b101b51 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -657,11 +657,12 @@ int pmd_clear_huge(pmd_t *pmd) /** * pud_free_pmd_page - Clear pud entry and free pmd page. * @pud: Pointer to a PUD. + * @addr: Virtual address associated with pud. * * Context: The pud range has been unmaped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ -int pud_free_pmd_page(pud_t *pud) +int pud_free_pmd_page(pud_t *pud, unsigned long addr) { pmd_t *pmd; int i; @@ -672,7 +673,7 @@ int pud_free_pmd_page(pud_t *pud) pmd = (pmd_t *)pud_page_vaddr(*pud); for (i = 0; i < PTRS_PER_PMD; i++) - if (!pmd_free_pte_page(&pmd[i])) + if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE))) return 0; pud_clear(pud); @@ -684,11 +685,12 @@ int pud_free_pmd_page(pud_t *pud) /** * pmd_free_pte_page - Clear pmd entry and free pte page. * @pmd: Pointer to a PMD. + * @addr: Virtual address associated with pmd. * * Context: The pmd range has been unmaped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ -int pmd_free_pte_page(pmd_t *pmd) +int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) { pte_t *pte; @@ -704,7 +706,7 @@ int pmd_free_pte_page(pmd_t *pmd) #else /* !CONFIG_X86_64 */ -int pud_free_pmd_page(pud_t *pud) +int pud_free_pmd_page(pud_t *pud, unsigned long addr) { return pud_none(*pud); } @@ -713,7 +715,7 @@ int pud_free_pmd_page(pud_t *pud) * Disable free page handling on x86-PAE. This assures that ioremap() * does not update sync'd pmd entries. See vmalloc_sync_one(). */ -int pmd_free_pte_page(pmd_t *pmd) +int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) { return pmd_none(*pmd); } diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index a88ea9e37a25..0a4c2d4d9f8d 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -779,8 +779,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot); int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot); int pud_clear_huge(pud_t *pud); int pmd_clear_huge(pmd_t *pmd); -int pud_free_pmd_page(pud_t *pud); -int pmd_free_pte_page(pmd_t *pmd); +int pud_free_pmd_page(pud_t *pud, unsigned long addr); +int pmd_free_pte_page(pmd_t *pmd, unsigned long addr); #else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ static inline int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { @@ -798,11 +798,11 @@ static inline int pmd_clear_huge(pmd_t *pmd) { return 0; } -static inline int pud_free_pmd_page(pud_t *pud) +static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr) { return 0; } -static inline int pmd_free_pte_page(pmd_t *pmd) +static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) { return 0; } diff --git a/lib/ioremap.c b/lib/ioremap.c index 5323b59ca393..b9462037868d 100644 --- a/lib/ioremap.c +++ b/lib/ioremap.c @@ -84,7 +84,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr, if (ioremap_pmd_enabled() && ((next - addr) == PMD_SIZE) && IS_ALIGNED(phys_addr + addr, PMD_SIZE) && - pmd_free_pte_page(pmd)) { + pmd_free_pte_page(pmd, addr)) { if (pmd_set_huge(pmd, phys_addr + addr, prot)) continue; } @@ -111,7 +111,7 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr, if (ioremap_pud_enabled() && ((next - addr) == PUD_SIZE) && IS_ALIGNED(phys_addr + addr, PUD_SIZE) && - pud_free_pmd_page(pud)) { + pud_free_pmd_page(pud, addr)) { if (pud_set_huge(pud, phys_addr + addr, prot)) continue; } From e853786d3c72e1e1445cfd144f44fb9ba3b211f4 Mon Sep 17 00:00:00 2001 From: Toshi Kani Date: Wed, 27 Jun 2018 08:13:48 -0600 Subject: [PATCH 637/970] x86/mm: Add TLB purge to free pmd/pte page interfaces commit 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e upstream. ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates a pud / pmd map. The following preconditions are met at their entry. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been peformed for a target pud/pmd address range. The preconditions assure that there is no stale TLB entry for the range. Speculation may not cache TLB entries since it requires all levels of page entries, including ptes, to have P & A-bits set for an associated address. However, speculation may cache pud/pmd entries (paging-structure caches) when they have P-bit set. Add a system-wide TLB purge (INVLPG) to a single page after clearing pud/pmd entry's P-bit. SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches, states that: INVLPG invalidates all paging-structure caches associated with the current PCID regardless of the liner addresses to which they correspond. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Toshi Kani Signed-off-by: Thomas Gleixner Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: cpandya@codeaurora.org Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: Joerg Roedel Cc: stable@vger.kernel.org Cc: Andrew Morton Cc: Michal Hocko Cc: "H. Peter Anvin" Cc: Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/pgtable.c | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 37786b101b51..e30baa8ad94f 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -659,24 +659,44 @@ int pmd_clear_huge(pmd_t *pmd) * @pud: Pointer to a PUD. * @addr: Virtual address associated with pud. * - * Context: The pud range has been unmaped and TLB purged. + * Context: The pud range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. + * + * NOTE: Callers must allow a single page allocation. */ int pud_free_pmd_page(pud_t *pud, unsigned long addr) { - pmd_t *pmd; + pmd_t *pmd, *pmd_sv; + pte_t *pte; int i; if (pud_none(*pud)) return 1; pmd = (pmd_t *)pud_page_vaddr(*pud); + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); + if (!pmd_sv) + return 0; - for (i = 0; i < PTRS_PER_PMD; i++) - if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE))) - return 0; + for (i = 0; i < PTRS_PER_PMD; i++) { + pmd_sv[i] = pmd[i]; + if (!pmd_none(pmd[i])) + pmd_clear(&pmd[i]); + } pud_clear(pud); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(pmd_sv[i])) { + pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]); + free_page((unsigned long)pte); + } + } + + free_page((unsigned long)pmd_sv); free_page((unsigned long)pmd); return 1; @@ -687,7 +707,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr) * @pmd: Pointer to a PMD. * @addr: Virtual address associated with pmd. * - * Context: The pmd range has been unmaped and TLB purged. + * Context: The pmd range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) @@ -699,6 +719,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte = (pte_t *)pmd_page_vaddr(*pmd); pmd_clear(pmd); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + free_page((unsigned long)pte); return 1; From d0e3227f31c5cf6d063f956f37d8948959b60d08 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 17 Aug 2018 20:59:30 +0200 Subject: [PATCH 638/970] Linux 4.9.121 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 62fbcc04bf70..e54a126841a9 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 120 +SUBLEVEL = 121 EXTRAVERSION = NAME = Roaring Lionus From 7e5cac813b40cfd2ec77f7653287b388e3e59291 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 17 Aug 2018 10:27:36 -0700 Subject: [PATCH 639/970] x86/speculation/l1tf: Exempt zeroed PTEs from inversion commit f19f5c49bbc3ffcc9126cc245fc1b24cc29f4a37 upstream. It turns out that we should *not* invert all not-present mappings, because the all zeroes case is obviously special. clear_page() does not undergo the XOR logic to invert the address bits, i.e. PTE, PMD and PUD entries that have not been individually written will have val=0 and so will trigger __pte_needs_invert(). As a result, {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok because the page at physical address 0 is reserved early in boot specifically to mitigate L1TF, so explicitly exempt them from the inversion when reading the PFN. Manifested as an unexpected mprotect(..., PROT_NONE) failure when called on a VMA that has VM_PFNMAP and was mmap'd to as something other than PROT_NONE but never used. mprotect() sends the PROT_NONE request down prot_none_walk(), which walks the PTEs to check the PFNs. prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns -EACCES because it thinks mprotect() is trying to adjust a high MMIO address. [ This is a very modified version of Sean's original patch, but all credit goes to Sean for doing this and also pointing out that sometimes the __pte_needs_invert() function only gets the protection bits, not the full eventual pte. But zero remains special even in just protection bits, so that's ok. - Linus ] Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings") Signed-off-by: Sean Christopherson Acked-by: Andi Kleen Cc: Thomas Gleixner Cc: Josh Poimboeuf Cc: Michal Hocko Cc: Vlastimil Babka Cc: Dave Hansen Cc: Greg Kroah-Hartman Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable-invert.h b/arch/x86/include/asm/pgtable-invert.h index 44b1203ece12..a0c1525f1b6f 100644 --- a/arch/x86/include/asm/pgtable-invert.h +++ b/arch/x86/include/asm/pgtable-invert.h @@ -4,9 +4,18 @@ #ifndef __ASSEMBLY__ +/* + * A clear pte value is special, and doesn't get inverted. + * + * Note that even users that only pass a pgprot_t (rather + * than a full pte) won't trigger the special zero case, + * because even PAGE_NONE has _PAGE_PROTNONE | _PAGE_ACCESSED + * set. So the all zero case really is limited to just the + * cleared page table entry case. + */ static inline bool __pte_needs_invert(u64 val) { - return !(val & _PAGE_PRESENT); + return val && !(val & _PAGE_PRESENT); } /* Get a mask to xor with the page table entry to get the correct pfn. */ From ea101a702611f987c937a5fafe00f714fd9c1fe8 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 18 Aug 2018 10:47:20 +0200 Subject: [PATCH 640/970] Linux 4.9.122 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index e54a126841a9..1f44343a1e04 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 121 +SUBLEVEL = 122 EXTRAVERSION = NAME = Roaring Lionus From 4269a8f72efce7b4bf9aa127fa08b7a40af02eb1 Mon Sep 17 00:00:00 2001 From: Alexey Kodanev Date: Tue, 7 Aug 2018 20:03:57 +0300 Subject: [PATCH 641/970] dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart() [ Upstream commit 61ef4b07fcdc30535889990cf4229766502561cf ] The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value can lead to undefined behavior [1]. In order to fix this use a gradual shift of the window with a 'while' loop, similar to what tcp_cwnd_restart() is doing. When comparing delta and RTO there is a minor difference between TCP and DCCP, the last one also invokes dccp_cwnd_restart() and reduces 'cwnd' if delta equals RTO. That case is preserved in this change. [1]: [40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7 [40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int' [40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1 ... [40851.377176] Call Trace: [40851.408503] dump_stack+0xf1/0x17b [40851.451331] ? show_regs_print_info+0x5/0x5 [40851.503555] ubsan_epilogue+0x9/0x7c [40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4 [40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f [40851.686796] ? xfrm4_output_finish+0x80/0x80 [40851.739827] ? lock_downgrade+0x6d0/0x6d0 [40851.789744] ? xfrm4_prepare_output+0x160/0x160 [40851.845912] ? ip_queue_xmit+0x810/0x1db0 [40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp] [40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp] [40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp] [40852.142412] dccp_sendmsg+0x428/0xb20 [dccp] [40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp] [40852.254833] ? sched_clock+0x5/0x10 [40852.298508] ? sched_clock+0x5/0x10 [40852.342194] ? inet_create+0xdf0/0xdf0 [40852.388988] sock_sendmsg+0xd9/0x160 ... Fixes: 113ced1f52e5 ("dccp ccid-2: Perform congestion-window validation") Signed-off-by: Alexey Kodanev Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/ccids/ccid2.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c index 86a2ed0fb219..161dfcf86126 100644 --- a/net/dccp/ccids/ccid2.c +++ b/net/dccp/ccids/ccid2.c @@ -228,14 +228,16 @@ static void ccid2_cwnd_restart(struct sock *sk, const u32 now) struct ccid2_hc_tx_sock *hc = ccid2_hc_tx_sk(sk); u32 cwnd = hc->tx_cwnd, restart_cwnd, iwnd = rfc3390_bytes_to_packets(dccp_sk(sk)->dccps_mss_cache); + s32 delta = now - hc->tx_lsndtime; hc->tx_ssthresh = max(hc->tx_ssthresh, (cwnd >> 1) + (cwnd >> 2)); /* don't reduce cwnd below the initial window (IW) */ restart_cwnd = min(cwnd, iwnd); - cwnd >>= (now - hc->tx_lsndtime) / hc->tx_rto; - hc->tx_cwnd = max(cwnd, restart_cwnd); + while ((delta -= hc->tx_rto) >= 0 && cwnd > restart_cwnd) + cwnd >>= 1; + hc->tx_cwnd = max(cwnd, restart_cwnd); hc->tx_cwnd_stamp = now; hc->tx_cwnd_used = 0; From ae7d506b72db2f7c1e37233fecd05393767e2dcb Mon Sep 17 00:00:00 2001 From: Wei Wang Date: Fri, 10 Aug 2018 11:14:56 -0700 Subject: [PATCH 642/970] l2tp: use sk_dst_check() to avoid race on sk->sk_dst_cache [ Upstream commit 6d37fa49da1e8db8fb1995be22ac837ca41ac8a8 ] In l2tp code, if it is a L2TP_UDP_ENCAP tunnel, tunnel->sk points to a UDP socket. User could call sendmsg() on both this tunnel and the UDP socket itself concurrently. As l2tp_xmit_skb() holds socket lock and call __sk_dst_check() to refresh sk->sk_dst_cache, while udpv6_sendmsg() is lockless and call sk_dst_check() to refresh sk->sk_dst_cache, there could be a race and cause the dst cache to be freed multiple times. So we fix l2tp side code to always call sk_dst_check() to garantee xchg() is called when refreshing sk->sk_dst_cache to avoid race conditions. Syzkaller reported stack trace: BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: use-after-free in atomic_fetch_add_unless include/linux/atomic.h:575 [inline] BUG: KASAN: use-after-free in atomic_add_unless include/linux/atomic.h:597 [inline] BUG: KASAN: use-after-free in dst_hold_safe include/net/dst.h:308 [inline] BUG: KASAN: use-after-free in ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 Read of size 4 at addr ffff8801aea9a880 by task syz-executor129/4829 CPU: 0 PID: 4829 Comm: syz-executor129 Not tainted 4.18.0-rc7-next-20180802+ #30 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] atomic_fetch_add_unless include/linux/atomic.h:575 [inline] atomic_add_unless include/linux/atomic.h:597 [inline] dst_hold_safe include/net/dst.h:308 [inline] ip6_hold_safe+0xe6/0x670 net/ipv6/route.c:1029 rt6_get_pcpu_route net/ipv6/route.c:1249 [inline] ip6_pol_route+0x354/0xd20 net/ipv6/route.c:1922 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2098 fib6_rule_lookup+0x283/0x890 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2126 ip6_dst_lookup_tail+0x1278/0x1da0 net/ipv6/ip6_output.c:978 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079 ip6_sk_dst_lookup_flow+0x5ed/0xc50 net/ipv6/ip6_output.c:1117 udpv6_sendmsg+0x2163/0x36b0 net/ipv6/udp.c:1354 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x51d/0x930 net/socket.c:2115 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2210 __do_sys_sendmmsg net/socket.c:2239 [inline] __se_sys_sendmmsg net/socket.c:2236 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2236 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446a29 Code: e8 ac b8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4de5532db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000006dcc38 RCX: 0000000000446a29 RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003 RBP: 00000000006dcc30 R08: 00007f4de5533700 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dcc3c R13: 00007ffe2b830fdf R14: 00007f4de55339c0 R15: 0000000000000001 Fixes: 71b1391a4128 ("l2tp: ensure sk->dst is still valid") Reported-by: syzbot+05f840f3b04f211bad55@syzkaller.appspotmail.com Signed-off-by: Wei Wang Signed-off-by: Martin KaFai Lau Cc: Guillaume Nault Cc: David Ahern Cc: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/l2tp/l2tp_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index ead98e8e0b1f..a5333f6cb65a 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1239,7 +1239,7 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int hdr_len /* Get routing info from the tunnel socket */ skb_dst_drop(skb); - skb_dst_set(skb, dst_clone(__sk_dst_check(sk, 0))); + skb_dst_set(skb, sk_dst_check(sk, 0)); inet = inet_sk(sk); fl = &inet->cork.fl; From 2ff9f0820ddaad35bb2bca35b98f019a3b264d37 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Tue, 7 Aug 2018 12:41:38 -0700 Subject: [PATCH 643/970] llc: use refcount_inc_not_zero() for llc_sap_find() [ Upstream commit 0dcb82254d65f72333aa50ad626d1e9665ad093b ] llc_sap_put() decreases the refcnt before deleting sap from the global list. Therefore, there is a chance llc_sap_find() could find a sap with zero refcnt in this global list. Close this race condition by checking if refcnt is zero or not in llc_sap_find(), if it is zero then it is being removed so we can just treat it as gone. Reported-by: Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/llc.h | 5 +++++ net/llc/llc_core.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/net/llc.h b/include/net/llc.h index e8e61d4fb458..82d989995d18 100644 --- a/include/net/llc.h +++ b/include/net/llc.h @@ -116,6 +116,11 @@ static inline void llc_sap_hold(struct llc_sap *sap) atomic_inc(&sap->refcnt); } +static inline bool llc_sap_hold_safe(struct llc_sap *sap) +{ + return atomic_inc_not_zero(&sap->refcnt); +} + void llc_sap_close(struct llc_sap *sap); static inline void llc_sap_put(struct llc_sap *sap) diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c index 842851cef698..e896a2c53b12 100644 --- a/net/llc/llc_core.c +++ b/net/llc/llc_core.c @@ -73,8 +73,8 @@ struct llc_sap *llc_sap_find(unsigned char sap_value) rcu_read_lock_bh(); sap = __llc_sap_find(sap_value); - if (sap) - llc_sap_hold(sap); + if (!sap || !llc_sap_hold_safe(sap)) + sap = NULL; rcu_read_unlock_bh(); return sap; } From f6b82768f92dec3784fd8d1038342c4f46a222cd Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Mon, 6 Aug 2018 11:06:02 -0700 Subject: [PATCH 644/970] vsock: split dwork to avoid reinitializations [ Upstream commit 455f05ecd2b219e9a216050796d30c830d9bc393 ] syzbot reported that we reinitialize an active delayed work in vsock_stream_connect(): ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414 WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326 The pattern is apparently wrong, we should only initialize the dealyed work once and could repeatly schedule it. So we have to move out the initializations to allocation side. And to avoid confusion, we can split the shared dwork into two, instead of re-using the same one. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: Cc: Andy king Cc: Stefan Hajnoczi Cc: Jorgen Hansen Signed-off-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/af_vsock.h | 4 ++-- net/vmw_vsock/af_vsock.c | 15 ++++++++------- net/vmw_vsock/vmci_transport.c | 3 +-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index f32ed9ac181a..f38fe1c00564 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -62,7 +62,8 @@ struct vsock_sock { struct list_head pending_links; struct list_head accept_queue; bool rejected; - struct delayed_work dwork; + struct delayed_work connect_work; + struct delayed_work pending_work; struct delayed_work close_work; bool close_work_scheduled; u32 peer_shutdown; @@ -75,7 +76,6 @@ struct vsock_sock { s64 vsock_stream_has_data(struct vsock_sock *vsk); s64 vsock_stream_has_space(struct vsock_sock *vsk); -void vsock_pending_work(struct work_struct *work); struct sock *__vsock_create(struct net *net, struct socket *sock, struct sock *parent, diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index ee12e176256c..7566395e526d 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -448,14 +448,14 @@ static int vsock_send_shutdown(struct sock *sk, int mode) return transport->shutdown(vsock_sk(sk), mode); } -void vsock_pending_work(struct work_struct *work) +static void vsock_pending_work(struct work_struct *work) { struct sock *sk; struct sock *listener; struct vsock_sock *vsk; bool cleanup; - vsk = container_of(work, struct vsock_sock, dwork.work); + vsk = container_of(work, struct vsock_sock, pending_work.work); sk = sk_vsock(vsk); listener = vsk->listener; cleanup = true; @@ -495,7 +495,6 @@ void vsock_pending_work(struct work_struct *work) sock_put(sk); sock_put(listener); } -EXPORT_SYMBOL_GPL(vsock_pending_work); /**** SOCKET OPERATIONS ****/ @@ -594,6 +593,8 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr) return retval; } +static void vsock_connect_timeout(struct work_struct *work); + struct sock *__vsock_create(struct net *net, struct socket *sock, struct sock *parent, @@ -636,6 +637,8 @@ struct sock *__vsock_create(struct net *net, vsk->sent_request = false; vsk->ignore_connecting_rst = false; vsk->peer_shutdown = 0; + INIT_DELAYED_WORK(&vsk->connect_work, vsock_connect_timeout); + INIT_DELAYED_WORK(&vsk->pending_work, vsock_pending_work); psk = parent ? vsock_sk(parent) : NULL; if (parent) { @@ -1115,7 +1118,7 @@ static void vsock_connect_timeout(struct work_struct *work) struct vsock_sock *vsk; int cancel = 0; - vsk = container_of(work, struct vsock_sock, dwork.work); + vsk = container_of(work, struct vsock_sock, connect_work.work); sk = sk_vsock(vsk); lock_sock(sk); @@ -1219,9 +1222,7 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr, * timeout fires. */ sock_hold(sk); - INIT_DELAYED_WORK(&vsk->dwork, - vsock_connect_timeout); - schedule_delayed_work(&vsk->dwork, timeout); + schedule_delayed_work(&vsk->connect_work, timeout); /* Skip ahead to preserve error code set above. */ goto out_wait; diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index 4be4fbbc0b50..4aa391c5c733 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -1099,8 +1099,7 @@ static int vmci_transport_recv_listen(struct sock *sk, vpending->listener = sk; sock_hold(sk); sock_hold(pending); - INIT_DELAYED_WORK(&vpending->dwork, vsock_pending_work); - schedule_delayed_work(&vpending->dwork, HZ); + schedule_delayed_work(&vpending->pending_work, HZ); out: return err; From 87e7e8d40512331d01b7ab9d1f2e652443d9aff2 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 5 Aug 2018 22:46:07 +0800 Subject: [PATCH 645/970] ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit [ Upstream commit 82a40777de12728dedf4075453b694f0d1baee80 ] According to RFC791, 68 bytes is the minimum size of IPv4 datagram every device must be able to forward without further fragmentation while 576 bytes is the minimum size of IPv4 datagram every device has to be able to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right value for the ipv4 min mtu check in ip6_tnl_xmit. While at it, change to use max() instead of if statement. Fixes: c9fefa08190f ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit") Reported-by: Sabrina Dubroca Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_tunnel.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index c7b202c1720d..cda63426eefb 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1133,12 +1133,8 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, max_headroom += 8; mtu -= 8; } - if (skb->protocol == htons(ETH_P_IPV6)) { - if (mtu < IPV6_MIN_MTU) - mtu = IPV6_MIN_MTU; - } else if (mtu < 576) { - mtu = 576; - } + mtu = max(mtu, skb->protocol == htons(ETH_P_IPV6) ? + IPV6_MIN_MTU : IPV4_MIN_MTU); if (skb_dst(skb) && !t->parms.collect_md) skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu); From fa284d29d323c7bcaac35452ed22c9aec490c51c Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Mon, 13 Aug 2018 18:44:04 +0800 Subject: [PATCH 646/970] net_sched: Fix missing res info when create new tc_index filter [ Upstream commit 008369dcc5f7bfba526c98054f8525322acf0ea3 ] Li Shuang reported the following warn: [ 733.484610] WARNING: CPU: 6 PID: 21123 at net/sched/sch_cbq.c:1418 cbq_destroy_class+0x5d/0x70 [sch_cbq] [ 733.495190] Modules linked in: sch_cbq cls_tcindex sch_dsmark rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat l [ 733.574155] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ixgbe ahci libahci i2c_algo_bit libata i40e i2c_core dca mdio megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [ 733.592500] CPU: 6 PID: 21123 Comm: tc Not tainted 4.18.0-rc8.latest+ #131 [ 733.600169] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016 [ 733.608518] RIP: 0010:cbq_destroy_class+0x5d/0x70 [sch_cbq] [ 733.614734] Code: e7 d9 d2 48 8b 7b 48 e8 61 05 da d2 48 8d bb f8 00 00 00 e8 75 ae d5 d2 48 39 eb 74 0a 48 89 df 5b 5d e9 16 6c 94 d2 5b 5d c3 <0f> 0b eb b6 0f 1f 44 00 00 66 2e 0f 1f 84 [ 733.635798] RSP: 0018:ffffbfbb066bb9d8 EFLAGS: 00010202 [ 733.641627] RAX: 0000000000000001 RBX: ffff9cdd17392800 RCX: 000000008010000f [ 733.649588] RDX: ffff9cdd1df547e0 RSI: ffff9cdd17392800 RDI: ffff9cdd0f84c800 [ 733.657547] RBP: ffff9cdd0f84c800 R08: 0000000000000001 R09: 0000000000000000 [ 733.665508] R10: ffff9cdd0f84d000 R11: 0000000000000001 R12: 0000000000000001 [ 733.673469] R13: 0000000000000000 R14: 0000000000000001 R15: ffff9cdd17392200 [ 733.681430] FS: 00007f911890a740(0000) GS:ffff9cdd1f8c0000(0000) knlGS:0000000000000000 [ 733.690456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 733.696864] CR2: 0000000000b5544c CR3: 0000000859374002 CR4: 00000000001606e0 [ 733.704826] Call Trace: [ 733.707554] cbq_destroy+0xa1/0xd0 [sch_cbq] [ 733.712318] qdisc_destroy+0x62/0x130 [ 733.716401] dsmark_destroy+0x2a/0x70 [sch_dsmark] [ 733.721745] qdisc_destroy+0x62/0x130 [ 733.725829] qdisc_graft+0x3ba/0x470 [ 733.729817] tc_get_qdisc+0x2a6/0x2c0 [ 733.733901] ? cred_has_capability+0x7d/0x130 [ 733.738761] rtnetlink_rcv_msg+0x263/0x2d0 [ 733.743330] ? rtnl_calcit.isra.30+0x110/0x110 [ 733.748287] netlink_rcv_skb+0x4d/0x130 [ 733.752576] netlink_unicast+0x1a3/0x250 [ 733.756949] netlink_sendmsg+0x2ae/0x3a0 [ 733.761324] sock_sendmsg+0x36/0x40 [ 733.765213] ___sys_sendmsg+0x26f/0x2d0 [ 733.769493] ? handle_pte_fault+0x586/0xdf0 [ 733.774158] ? __handle_mm_fault+0x389/0x500 [ 733.778919] ? __sys_sendmsg+0x5e/0xa0 [ 733.783099] __sys_sendmsg+0x5e/0xa0 [ 733.787087] do_syscall_64+0x5b/0x180 [ 733.791171] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 733.796805] RIP: 0033:0x7f9117f23f10 [ 733.800791] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 [ 733.821873] RSP: 002b:00007ffe96818398 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 733.830319] RAX: ffffffffffffffda RBX: 000000005b71244c RCX: 00007f9117f23f10 [ 733.838280] RDX: 0000000000000000 RSI: 00007ffe968183e0 RDI: 0000000000000003 [ 733.846241] RBP: 00007ffe968183e0 R08: 000000000000ffff R09: 0000000000000003 [ 733.854202] R10: 00007ffe96817e20 R11: 0000000000000246 R12: 0000000000000000 [ 733.862161] R13: 0000000000662ee0 R14: 0000000000000000 R15: 0000000000000000 [ 733.870121] ---[ end trace 28edd4aad712ddca ]--- This is because we didn't update f->result.res when create new filter. Then in tcindex_delete() -> tcf_unbind_filter(), we will failed to find out the res and unbind filter, which will trigger the WARN_ON() in cbq_destroy_class(). Fix it by updating f->result.res when create new filter. Fixes: 6e0565697a106 ("net_sched: fix another crash in cls_tcindex") Reported-by: Li Shuang Signed-off-by: Hangbin Liu Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/cls_tcindex.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 0751245a6ace..b3db441bf03f 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -435,6 +435,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, struct tcindex_filter *nfp; struct tcindex_filter __rcu **fp; + f->result.res = r->res; tcf_exts_change(tp, &f->result.exts, &r->exts); fp = cp->h + (handle % cp->hash); From ce494b382c83883e8f67ffbdd724a7be90ee78b6 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Mon, 13 Aug 2018 18:44:03 +0800 Subject: [PATCH 647/970] net_sched: fix NULL pointer dereference when delete tcindex filter [ Upstream commit 2df8bee5654bb2b7312662ca6810d4dc16b0b67f ] Li Shuang reported the following crash: [ 71.267724] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 71.276456] PGD 800000085d9bd067 P4D 800000085d9bd067 PUD 859a0b067 PMD 0 [ 71.284127] Oops: 0000 [#1] SMP PTI [ 71.288015] CPU: 12 PID: 2386 Comm: tc Not tainted 4.18.0-rc8.latest+ #131 [ 71.295686] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016 [ 71.304037] RIP: 0010:tcindex_delete+0x72/0x280 [cls_tcindex] [ 71.310446] Code: 00 31 f6 48 87 75 20 48 85 f6 74 11 48 8b 47 18 48 8b 40 08 48 8b 40 50 e8 fb a6 f8 fc 48 85 db 0f 84 dc 00 00 00 48 8b 73 18 <8b> 56 04 48 8d 7e 04 85 d2 0f 84 7b 01 00 [ 71.331517] RSP: 0018:ffffb45207b3f898 EFLAGS: 00010282 [ 71.337345] RAX: ffff8ad3d72d6360 RBX: ffff8acc84393680 RCX: 000000000000002e [ 71.345306] RDX: ffff8ad3d72c8570 RSI: 0000000000000000 RDI: ffff8ad847a45800 [ 71.353277] RBP: ffff8acc84393688 R08: ffff8ad3d72c8400 R09: 0000000000000000 [ 71.361238] R10: ffff8ad3de786e00 R11: 0000000000000000 R12: ffffb45207b3f8c7 [ 71.369199] R13: ffff8ad3d93bd2a0 R14: 000000000000002e R15: ffff8ad3d72c9600 [ 71.377161] FS: 00007f9d3ec3e740(0000) GS:ffff8ad3df980000(0000) knlGS:0000000000000000 [ 71.386188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.392597] CR2: 0000000000000004 CR3: 0000000852f06003 CR4: 00000000001606e0 [ 71.400558] Call Trace: [ 71.403299] tcindex_destroy_element+0x25/0x40 [cls_tcindex] [ 71.409611] tcindex_walk+0xbb/0x110 [cls_tcindex] [ 71.414953] tcindex_destroy+0x44/0x90 [cls_tcindex] [ 71.420492] ? tcindex_delete+0x280/0x280 [cls_tcindex] [ 71.426323] tcf_proto_destroy+0x16/0x40 [ 71.430696] tcf_chain_flush+0x51/0x70 [ 71.434876] tcf_block_put_ext.part.30+0x8f/0x1b0 [ 71.440122] tcf_block_put+0x4d/0x70 [ 71.444108] cbq_destroy+0x4d/0xd0 [sch_cbq] [ 71.448869] qdisc_destroy+0x62/0x130 [ 71.452951] dsmark_destroy+0x2a/0x70 [sch_dsmark] [ 71.458300] qdisc_destroy+0x62/0x130 [ 71.462373] qdisc_graft+0x3ba/0x470 [ 71.466359] tc_get_qdisc+0x2a6/0x2c0 [ 71.470443] ? cred_has_capability+0x7d/0x130 [ 71.475307] rtnetlink_rcv_msg+0x263/0x2d0 [ 71.479875] ? rtnl_calcit.isra.30+0x110/0x110 [ 71.484832] netlink_rcv_skb+0x4d/0x130 [ 71.489109] netlink_unicast+0x1a3/0x250 [ 71.493482] netlink_sendmsg+0x2ae/0x3a0 [ 71.497859] sock_sendmsg+0x36/0x40 [ 71.501748] ___sys_sendmsg+0x26f/0x2d0 [ 71.506029] ? handle_pte_fault+0x586/0xdf0 [ 71.510694] ? __handle_mm_fault+0x389/0x500 [ 71.515457] ? __sys_sendmsg+0x5e/0xa0 [ 71.519636] __sys_sendmsg+0x5e/0xa0 [ 71.523626] do_syscall_64+0x5b/0x180 [ 71.527711] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 71.533345] RIP: 0033:0x7f9d3e257f10 [ 71.537331] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 [ 71.558401] RSP: 002b:00007fff6f893398 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 71.566848] RAX: ffffffffffffffda RBX: 000000005b71274d RCX: 00007f9d3e257f10 [ 71.574810] RDX: 0000000000000000 RSI: 00007fff6f8933e0 RDI: 0000000000000003 [ 71.582770] RBP: 00007fff6f8933e0 R08: 000000000000ffff R09: 0000000000000003 [ 71.590729] R10: 00007fff6f892e20 R11: 0000000000000246 R12: 0000000000000000 [ 71.598689] R13: 0000000000662ee0 R14: 0000000000000000 R15: 0000000000000000 [ 71.606651] Modules linked in: sch_cbq cls_tcindex sch_dsmark xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_coni [ 71.685425] libahci i2c_algo_bit i2c_core i40e libata dca mdio megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [ 71.697075] CR2: 0000000000000004 [ 71.700792] ---[ end trace f604eb1acacd978b ]--- Reproducer: tc qdisc add dev lo handle 1:0 root dsmark indices 64 set_tc_index tc filter add dev lo parent 1:0 protocol ip prio 1 tcindex mask 0xfc shift 2 tc qdisc add dev lo parent 1:0 handle 2:0 cbq bandwidth 10Mbit cell 8 avpkt 1000 mpu 64 tc class add dev lo parent 2:0 classid 2:1 cbq bandwidth 10Mbit rate 1500Kbit avpkt 1000 prio 1 bounded isolated allot 1514 weight 1 maxburst 10 tc filter add dev lo parent 2:0 protocol ip prio 1 handle 0x2e tcindex classid 2:1 pass_on tc qdisc add dev lo parent 2:1 pfifo limit 5 tc qdisc del dev lo root This is because in tcindex_set_parms, when there is no old_r, we set new exts to cr.exts. And we didn't set it to filter when r == &new_filter_result. Then in tcindex_delete() -> tcf_exts_get_net(), we will get NULL pointer dereference as we didn't init exts. Fix it by moving tcf_exts_change() after "if (old_r && old_r != r)" check. Then we don't need "cr" as there is no errout after that. Fixes: bf63ac73b3e13 ("net_sched: fix an oops in tcindex filter") Reported-by: Li Shuang Signed-off-by: Hangbin Liu Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/cls_tcindex.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index b3db441bf03f..db80a6440f37 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -414,11 +414,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, tcf_bind_filter(tp, &cr.res, base); } - if (old_r) - tcf_exts_change(tp, &r->exts, &e); - else - tcf_exts_change(tp, &cr.exts, &e); - if (old_r && old_r != r) { err = tcindex_filter_result_init(old_r); if (err < 0) { @@ -429,6 +424,8 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, oldp = p; r->res = cr.res; + tcf_exts_change(tp, &r->exts, &e); + rcu_assign_pointer(tp->root, cp); if (r == &new_filter_result) { From 2d65d06d39c7a452422de3ea0e6e79bfb31edd63 Mon Sep 17 00:00:00 2001 From: Park Ju Hyung Date: Sat, 28 Jul 2018 03:16:42 +0900 Subject: [PATCH 648/970] ALSA: hda - Sleep for 10ms after entering D3 on Conexant codecs commit f59cf9a0551dd954ad8b752461cf19d9789f4b1d upstream. On rare occasions, we are still noticing that the internal speaker spitting out spurious noises even after adding the problematic codec to the list. Adding a 10ms artificial delay before rebooting fixes the issue entirely. Patch for Realtek codecs also adds the same amount of delay after entering D3. Signed-off-by: Park Ju Hyung Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_conexant.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c index 6b5804e063a3..97649c0a4974 100644 --- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -219,6 +219,7 @@ static void cx_auto_reboot_notify(struct hda_codec *codec) snd_hda_codec_set_power_to_all(codec, codec->core.afg, AC_PWRST_D3); snd_hda_codec_write(codec, codec->core.afg, 0, AC_VERB_SET_POWER_STATE, AC_PWRST_D3); + msleep(10); } static void cx_auto_free(struct hda_codec *codec) From e89ba2cfdb39adb6b8ee8bde09b2f5b70046e0e3 Mon Sep 17 00:00:00 2001 From: Park Ju Hyung Date: Sat, 28 Jul 2018 03:16:21 +0900 Subject: [PATCH 649/970] ALSA: hda - Turn CX8200 into D3 as well upon reboot commit d77a4b4a5b0b2ebcbc9840995d91311ef28302ab upstream. As an equivalent codec with CX20724, CX8200 is also subject to the reboot bug. Late 2017 and 2018 LG Gram and some HP Spectre laptops are known victims to this issue, causing extremely loud noises upon reboot. Now that we know that this bug is subject to multiple codecs, fix the comment as well. Signed-off-by: Park Ju Hyung Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_conexant.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c index 97649c0a4974..a6e98a4d6834 100644 --- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -205,6 +205,7 @@ static void cx_auto_reboot_notify(struct hda_codec *codec) struct conexant_spec *spec = codec->spec; switch (codec->core.vendor_id) { + case 0x14f12008: /* CX8200 */ case 0x14f150f2: /* CX20722 */ case 0x14f150f4: /* CX20724 */ break; @@ -212,7 +213,7 @@ static void cx_auto_reboot_notify(struct hda_codec *codec) return; } - /* Turn the CX20722 codec into D3 to avoid spurious noises + /* Turn the problematic codec into D3 to avoid spurious noises from the internal speaker during (and after) reboot */ cx_auto_turn_eapd(codec, spec->num_eapds, spec->eapds, false); From 5cb120335cf46d87cdbe83b341bce3e9da03fee0 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 25 Jul 2018 17:10:11 +0200 Subject: [PATCH 650/970] ALSA: vx222: Fix invalid endian conversions commit fff71a4c050ba46e305d910c837b99ba1728135e upstream. The endian conversions used in vx2_dma_read() and vx2_dma_write() are superfluous and even wrong on big-endian machines, as inl() and outl() already do conversions. Kill them. Spotted by sparse, a warning like: sound/pci/vx222/vx222_ops.c:278:30: warning: incorrect type in argument 1 (different base types) Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/vx222/vx222_ops.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sound/pci/vx222/vx222_ops.c b/sound/pci/vx222/vx222_ops.c index 8e457ea27f89..1997bb048d8b 100644 --- a/sound/pci/vx222/vx222_ops.c +++ b/sound/pci/vx222/vx222_ops.c @@ -275,7 +275,7 @@ static void vx2_dma_write(struct vx_core *chip, struct snd_pcm_runtime *runtime, length >>= 2; /* in 32bit words */ /* Transfer using pseudo-dma. */ for (; length > 0; length--) { - outl(cpu_to_le32(*addr), port); + outl(*addr, port); addr++; } addr = (u32 *)runtime->dma_area; @@ -285,7 +285,7 @@ static void vx2_dma_write(struct vx_core *chip, struct snd_pcm_runtime *runtime, count >>= 2; /* in 32bit words */ /* Transfer using pseudo-dma. */ for (; count > 0; count--) { - outl(cpu_to_le32(*addr), port); + outl(*addr, port); addr++; } @@ -313,7 +313,7 @@ static void vx2_dma_read(struct vx_core *chip, struct snd_pcm_runtime *runtime, length >>= 2; /* in 32bit words */ /* Transfer using pseudo-dma. */ for (; length > 0; length--) - *addr++ = le32_to_cpu(inl(port)); + *addr++ = inl(port); addr = (u32 *)runtime->dma_area; pipe->hw_ptr = 0; } @@ -321,7 +321,7 @@ static void vx2_dma_read(struct vx_core *chip, struct snd_pcm_runtime *runtime, count >>= 2; /* in 32bit words */ /* Transfer using pseudo-dma. */ for (; count > 0; count--) - *addr++ = le32_to_cpu(inl(port)); + *addr++ = inl(port); vx2_release_pseudo_dma(chip); } From b1e4b1ca28eab0f7ade5cfa46a41aaf072cbd4d1 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 26 Jul 2018 14:27:59 +0200 Subject: [PATCH 651/970] ALSA: virmidi: Fix too long output trigger loop commit 50e9ffb1996a5d11ff5040a266585bad4ceeca0a upstream. The virmidi output trigger tries to parse the all available bytes and process sequencer events as much as possible. In a normal situation, this is supposed to be relatively short, but a program may give a huge buffer and it'll take a long time in a single spin lock, which may eventually lead to a soft lockup. This patch simply adds a workaround, a cond_resched() call in the loop if applicable. A better solution would be to move the event processor into a work, but let's put a duct-tape quickly at first. Reported-and-tested-by: Dae R. Jeong Reported-by: syzbot+619d9f40141d826b097e@syzkaller.appspotmail.com Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/seq/seq_virmidi.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/sound/core/seq/seq_virmidi.c b/sound/core/seq/seq_virmidi.c index 8bdc4c961f04..1ebb34656241 100644 --- a/sound/core/seq/seq_virmidi.c +++ b/sound/core/seq/seq_virmidi.c @@ -163,6 +163,7 @@ static void snd_virmidi_output_trigger(struct snd_rawmidi_substream *substream, int count, res; unsigned char buf[32], *pbuf; unsigned long flags; + bool check_resched = !in_atomic(); if (up) { vmidi->trigger = 1; @@ -200,6 +201,15 @@ static void snd_virmidi_output_trigger(struct snd_rawmidi_substream *substream, vmidi->event.type = SNDRV_SEQ_EVENT_NONE; } } + if (!check_resched) + continue; + /* do temporary unlock & cond_resched() for avoiding + * CPU soft lockup, which may happen via a write from + * a huge rawmidi buffer + */ + spin_unlock_irqrestore(&substream->runtime->lock, flags); + cond_resched(); + spin_lock_irqsave(&substream->runtime->lock, flags); } out: spin_unlock_irqrestore(&substream->runtime->lock, flags); From 6479c9e55354c81fcbe6a52f75cf56c4f91c2e5b Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 25 Jul 2018 17:59:26 +0200 Subject: [PATCH 652/970] ALSA: cs5535audio: Fix invalid endian conversion commit 69756930f2de0457d51db7d505a1e4f40e9fd116 upstream. One place in cs5535audio_build_dma_packets() does an extra conversion via cpu_to_le32(); namely jmpprd_addr is passed to setup_prd() ops, which writes the value via cs_writel(). That is, the callback does the conversion by itself, and we don't need to convert beforehand. This patch fixes that bogus conversion. Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/cs5535audio/cs5535audio.h | 6 +++--- sound/pci/cs5535audio/cs5535audio_pcm.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/sound/pci/cs5535audio/cs5535audio.h b/sound/pci/cs5535audio/cs5535audio.h index 0579daa62215..425d1b664029 100644 --- a/sound/pci/cs5535audio/cs5535audio.h +++ b/sound/pci/cs5535audio/cs5535audio.h @@ -66,9 +66,9 @@ struct cs5535audio_dma_ops { }; struct cs5535audio_dma_desc { - u32 addr; - u16 size; - u16 ctlreserved; + __le32 addr; + __le16 size; + __le16 ctlreserved; }; struct cs5535audio_dma { diff --git a/sound/pci/cs5535audio/cs5535audio_pcm.c b/sound/pci/cs5535audio/cs5535audio_pcm.c index c208c1d8dbb2..b9912ec2375b 100644 --- a/sound/pci/cs5535audio/cs5535audio_pcm.c +++ b/sound/pci/cs5535audio/cs5535audio_pcm.c @@ -158,8 +158,8 @@ static int cs5535audio_build_dma_packets(struct cs5535audio *cs5535au, lastdesc->addr = cpu_to_le32((u32) dma->desc_buf.addr); lastdesc->size = 0; lastdesc->ctlreserved = cpu_to_le16(PRD_JMP); - jmpprd_addr = cpu_to_le32(lastdesc->addr + - (sizeof(struct cs5535audio_dma_desc)*periods)); + jmpprd_addr = (u32)dma->desc_buf.addr + + sizeof(struct cs5535audio_dma_desc) * periods; dma->substream = substream; dma->period_bytes = period_bytes; From 323cb0fa957b8bd4d3580ffc9d1cdbbb0e837796 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Thu, 2 Aug 2018 14:04:45 +0200 Subject: [PATCH 653/970] ALSA: hda: Correct Asrock B85M-ITX power_save blacklist entry commit 8e82a728792bf66b9f0a29c9d4c4b0630f7b9c79 upstream. I added the subsys product-id for the HDMI HDA device rather then for the PCH one, this commit fixes this. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1525104 Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/hda_intel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 4e9112001306..4e331dd5ff47 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2058,7 +2058,7 @@ static int azx_probe(struct pci_dev *pci, */ static struct snd_pci_quirk power_save_blacklist[] = { /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ - SND_PCI_QUIRK(0x1849, 0x0c0c, "Asrock B85M-ITX", 0), + SND_PCI_QUIRK(0x1849, 0xc892, "Asrock B85M-ITX", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1525104 */ SND_PCI_QUIRK(0x1043, 0x8733, "Asus Prime X370-Pro", 0), /* https://bugzilla.redhat.com/show_bug.cgi?id=1572975 */ From 674ed567bf8410a2c1c9f5d1ff3149e32548575c Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 19 Jul 2018 11:01:04 +0200 Subject: [PATCH 654/970] ALSA: memalloc: Don't exceed over the requested size commit dfef01e150824b0e6da750cacda8958188d29aea upstream. snd_dma_alloc_pages_fallback() tries to allocate pages again when the allocation fails with reduced size. But the first try actually *increases* the size to power-of-two, which may give back a larger chunk than the requested size. This confuses the callers, e.g. sgbuf assumes that the size is equal or less, and it may result in a bad loop due to the underflow and eventually lead to Oops. The code of this function seems incorrectly assuming the usage of get_order(). We need to decrease at first, then align to power-of-two. Reported-and-tested-by: he, bo Reported-by: zhang jun Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/memalloc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c index f05cb6a8cbe0..78ffe445d775 100644 --- a/sound/core/memalloc.c +++ b/sound/core/memalloc.c @@ -239,16 +239,12 @@ int snd_dma_alloc_pages_fallback(int type, struct device *device, size_t size, int err; while ((err = snd_dma_alloc_pages(type, device, size, dmab)) < 0) { - size_t aligned_size; if (err != -ENOMEM) return err; if (size <= PAGE_SIZE) return -ENOMEM; - aligned_size = PAGE_SIZE << get_order(size); - if (size != aligned_size) - size = aligned_size; - else - size >>= 1; + size >>= 1; + size = PAGE_SIZE << get_order(size); } if (! dmab->area) return -ENOMEM; From cdd187871caddae0ded62e6d5a3f8622b31c67a4 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 25 Jul 2018 17:11:38 +0200 Subject: [PATCH 655/970] ALSA: vxpocket: Fix invalid endian conversions commit 3acd3e3bab95ec3622ff98da313290ee823a0f68 upstream. The endian conversions used in vxp_dma_read() and vxp_dma_write() are superfluous and even wrong on big-endian machines, as inw() and outw() already do conversions. Kill them. Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pcmcia/vx/vxp_ops.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/sound/pcmcia/vx/vxp_ops.c b/sound/pcmcia/vx/vxp_ops.c index 56aa1ba73ccc..49a883341eff 100644 --- a/sound/pcmcia/vx/vxp_ops.c +++ b/sound/pcmcia/vx/vxp_ops.c @@ -375,7 +375,7 @@ static void vxp_dma_write(struct vx_core *chip, struct snd_pcm_runtime *runtime, length >>= 1; /* in 16bit words */ /* Transfer using pseudo-dma. */ for (; length > 0; length--) { - outw(cpu_to_le16(*addr), port); + outw(*addr, port); addr++; } addr = (unsigned short *)runtime->dma_area; @@ -385,7 +385,7 @@ static void vxp_dma_write(struct vx_core *chip, struct snd_pcm_runtime *runtime, count >>= 1; /* in 16bit words */ /* Transfer using pseudo-dma. */ for (; count > 0; count--) { - outw(cpu_to_le16(*addr), port); + outw(*addr, port); addr++; } vx_release_pseudo_dma(chip); @@ -417,7 +417,7 @@ static void vxp_dma_read(struct vx_core *chip, struct snd_pcm_runtime *runtime, length >>= 1; /* in 16bit words */ /* Transfer using pseudo-dma. */ for (; length > 0; length--) - *addr++ = le16_to_cpu(inw(port)); + *addr++ = inw(port); addr = (unsigned short *)runtime->dma_area; pipe->hw_ptr = 0; } @@ -425,12 +425,12 @@ static void vxp_dma_read(struct vx_core *chip, struct snd_pcm_runtime *runtime, count >>= 1; /* in 16bit words */ /* Transfer using pseudo-dma. */ for (; count > 1; count--) - *addr++ = le16_to_cpu(inw(port)); + *addr++ = inw(port); /* Disable DMA */ pchip->regDIALOG &= ~VXP_DLG_DMAREAD_SEL_MASK; vx_outb(chip, DIALOG, pchip->regDIALOG); /* Read the last word (16 bits) */ - *addr = le16_to_cpu(inw(port)); + *addr = inw(port); /* Disable 16-bit accesses */ pchip->regDIALOG &= ~VXP_DLG_DMA16_SEL_MASK; vx_outb(chip, DIALOG, pchip->regDIALOG); From 4daf820df7b2917175c9fe010335bf9051d3f383 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 15 Aug 2018 12:14:05 -0700 Subject: [PATCH 656/970] isdn: Disable IIOCDBGVAR [ Upstream commit 5e22002aa8809e2efab2da95855f73f63e14a36c ] It was possible to directly leak the kernel address where the isdn_dev structure pointer was stored. This is a kernel ASLR bypass for anyone with access to the ioctl. The code had been present since the beginning of git history, though this shouldn't ever be needed for normal operation, therefore remove it. Reported-by: Al Viro Cc: Karsten Keil Signed-off-by: Kees Cook Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/isdn/i4l/isdn_common.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/isdn/i4l/isdn_common.c b/drivers/isdn/i4l/isdn_common.c index e4c43a17b333..8088c34336aa 100644 --- a/drivers/isdn/i4l/isdn_common.c +++ b/drivers/isdn/i4l/isdn_common.c @@ -1655,13 +1655,7 @@ isdn_ioctl(struct file *file, uint cmd, ulong arg) } else return -EINVAL; case IIOCDBGVAR: - if (arg) { - if (copy_to_user(argp, &dev, sizeof(ulong))) - return -EFAULT; - return 0; - } else - return -EINVAL; - break; + return -EINVAL; default: if ((cmd & IIOCDRVCTL) == IIOCDRVCTL) cmd = ((cmd >> _IOC_NRSHIFT) & _IOC_NRMASK) & ISDN_DRVIOCTL_MASK; From c4db09a6f60a4b70e3845ab9978cf61bcb2a456e Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Tue, 14 Aug 2018 17:28:26 +0800 Subject: [PATCH 657/970] cls_matchall: fix tcf_unbind_filter missing [ Upstream commit a51c76b4dfb30496dc65396a957ef0f06af7fb22 ] Fix tcf_unbind_filter missing in cls_matchall as this will trigger WARN_ON() in cbq_destroy_class(). Fixes: fd62d9f5c575f ("net/sched: matchall: Fix configuration race") Reported-by: Li Shuang Signed-off-by: Hangbin Liu Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/cls_matchall.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index e75fb65037d7..61ddfbad2aae 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -94,6 +94,8 @@ static bool mall_destroy(struct tcf_proto *tp, bool force) if (!head) return true; + tcf_unbind_filter(tp, &head->res); + if (tc_should_offload(dev, tp, head->flags)) mall_destroy_hw_filter(tp, head, (unsigned long) head); From 5823374e9c4bd92d5c3c4527e55774b2fe52175b Mon Sep 17 00:00:00 2001 From: John Ogness Date: Sun, 24 Jun 2018 00:32:11 +0200 Subject: [PATCH 658/970] USB: serial: sierra: fix potential deadlock at close commit e60870012e5a35b1506d7b376fddfb30e9da0b27 upstream. The portdata spinlock can be taken in interrupt context (via sierra_outdat_callback()). Disable interrupts when taking the portdata spinlock when discarding deferred URBs during close to prevent a possible deadlock. Fixes: 014333f77c0b ("USB: sierra: fix urb and memory leak on disconnect") Cc: stable Signed-off-by: John Ogness Signed-off-by: Sebastian Andrzej Siewior [ johan: amend commit message and add fixes and stable tags ] Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/sierra.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c index e1994e264cc0..fbc7b29e855e 100644 --- a/drivers/usb/serial/sierra.c +++ b/drivers/usb/serial/sierra.c @@ -790,9 +790,9 @@ static void sierra_close(struct usb_serial_port *port) kfree(urb->transfer_buffer); usb_free_urb(urb); usb_autopm_put_interface_async(serial->interface); - spin_lock(&portdata->lock); + spin_lock_irq(&portdata->lock); portdata->outstanding_urbs--; - spin_unlock(&portdata->lock); + spin_unlock_irq(&portdata->lock); } sierra_stop_rx_urbs(port); From f1fe792771466ef3e24cdbb0f402d3f7b9353e8a Mon Sep 17 00:00:00 2001 From: Aleksander Morgado Date: Tue, 24 Jul 2018 01:34:01 +0200 Subject: [PATCH 659/970] USB: option: add support for DW5821e commit 7bab01ecc6c43da882333c6db39741cb43677004 upstream. The device exposes AT, NMEA and DIAG ports in both USB configurations. The patch explicitly ignores interfaces 0 and 1, as they're bound to other drivers already; and also interface 6, which is a GNSS interface for which we don't have a driver yet. T: Bus=01 Lev=03 Prnt=04 Port=00 Cnt=01 Dev#= 18 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 2 P: Vendor=413c ProdID=81d7 Rev=03.18 S: Manufacturer=DELL S: Product=DW5821e Snapdragon X20 LTE S: SerialNumber=0123456789ABCDEF C: #Ifs= 7 Cfg#= 2 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 6 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) T: Bus=01 Lev=03 Prnt=04 Port=00 Cnt=01 Dev#= 16 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 2 P: Vendor=413c ProdID=81d7 Rev=03.18 S: Manufacturer=DELL S: Product=DW5821e Snapdragon X20 LTE S: SerialNumber=0123456789ABCDEF C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option Signed-off-by: Aleksander Morgado Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index d982c455e18e..2b81939fecd7 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -199,6 +199,8 @@ static void option_instat_callback(struct urb *urb); #define DELL_PRODUCT_5800_V2_MINICARD_VZW 0x8196 /* Novatel E362 */ #define DELL_PRODUCT_5804_MINICARD_ATT 0x819b /* Novatel E371 */ +#define DELL_PRODUCT_5821E 0x81d7 + #define KYOCERA_VENDOR_ID 0x0c88 #define KYOCERA_PRODUCT_KPC650 0x17da #define KYOCERA_PRODUCT_KPC680 0x180a @@ -1033,6 +1035,8 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE_AND_INTERFACE_INFO(DELL_VENDOR_ID, DELL_PRODUCT_5800_MINICARD_VZW, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(DELL_VENDOR_ID, DELL_PRODUCT_5800_V2_MINICARD_VZW, 0xff, 0xff, 0xff) }, { USB_DEVICE_AND_INTERFACE_INFO(DELL_VENDOR_ID, DELL_PRODUCT_5804_MINICARD_ATT, 0xff, 0xff, 0xff) }, + { USB_DEVICE(DELL_VENDOR_ID, DELL_PRODUCT_5821E), + .driver_info = RSVD(0) | RSVD(1) | RSVD(6) }, { USB_DEVICE(ANYDATA_VENDOR_ID, ANYDATA_PRODUCT_ADU_E100A) }, /* ADU-E100, ADU-310 */ { USB_DEVICE(ANYDATA_VENDOR_ID, ANYDATA_PRODUCT_ADU_500A) }, { USB_DEVICE(ANYDATA_VENDOR_ID, ANYDATA_PRODUCT_ADU_620UW) }, From a469b811851e94b02792609b452b53fbe1844063 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Mon, 9 Jul 2018 14:03:55 +0200 Subject: [PATCH 660/970] ACPI / PM: save NVS memory for ASUS 1025C laptop commit 231f9415001138a000cd0f881c46654b7ea3f8c5 upstream. Every time I tried to upgrade my laptop from 3.10.x to 4.x I faced an issue by which the fan would run at full speed upon resume. Bisecting it showed me the issue was introduced in 3.17 by commit 821d6f0359b0 (ACPI / sleep: Do not save NVS for new machines to accelerate S3). This code only affects machines built starting as of 2012, but this Asus 1025C laptop was made in 2012 and apparently needs the NVS data to be saved, otherwise the CPU's thermal state is not properly reported on resume and the fan runs at full speed upon resume. Here's a very simple way to check if such a machine is affected : # cat /sys/class/thermal/thermal_zone0/temp 55000 ( now suspend, wait one second and resume ) # cat /sys/class/thermal/thermal_zone0/temp 0 (and after ~15 seconds the fan starts to spin) Let's apply the same quirk as commit cbc00c13 (ACPI: save NVS memory for Lenovo G50-45) and reuse the function it provides. Note that this commit was already backported to 4.9.x but not 4.4.x. Cc: 3.17+ # 3.17+: requires cbc00c13 Signed-off-by: Willy Tarreau Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/sleep.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c index 097d630ab886..f0633169c35b 100644 --- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -330,6 +330,14 @@ static struct dmi_system_id acpisleep_dmi_table[] __initdata = { DMI_MATCH(DMI_PRODUCT_NAME, "K54HR"), }, }, + { + .callback = init_nvs_save_s3, + .ident = "Asus 1025C", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_PRODUCT_NAME, "1025C"), + }, + }, /* * https://bugzilla.kernel.org/show_bug.cgi?id=189431 * Lenovo G50-45 is a platform later than 2012, but needs nvs memory From c39998a1541ec1fa1e33f154ef34c0b68ec5b217 Mon Sep 17 00:00:00 2001 From: Mark Date: Sun, 12 Aug 2018 11:47:16 -0400 Subject: [PATCH 661/970] tty: serial: 8250: Revert NXP SC16C2552 workaround commit 47ac76662ca9c5852fd353093f19de3ae85f2e66 upstream. Revert commit ecb988a3b7985913d1f0112f66667cdd15e40711: tty: serial: 8250: 8250_core: NXP SC16C2552 workaround The above commit causes userland application to no longer write correctly its first write to a dumb terminal connected to /dev/ttyS0. This commit seems to be the culprit. It's as though the TX FIFO is being reset during that write. What should be displayed is: PSW 80000000 INST 00000000 HALT // What is displayed is some variation of: T 00000000 HAL// Reverting this commit via this patch fixes my problem. Signed-off-by: Mark Hounschell Fixes: ecb988a3b798 ("tty: serial: 8250: 8250_core: NXP SC16C2552 workaround") Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_port.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index 5d9038a5bbc4..5b54439a8a9b 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -83,8 +83,7 @@ static const struct serial8250_config uart_config[] = { .name = "16550A", .fifo_size = 16, .tx_loadsz = 16, - .fcr = UART_FCR_ENABLE_FIFO | UART_FCR_R_TRIG_10 | - UART_FCR_CLEAR_RCVR | UART_FCR_CLEAR_XMIT, + .fcr = UART_FCR_ENABLE_FIFO | UART_FCR_R_TRIG_10, .rxtrig_bytes = {1, 4, 8, 14}, .flags = UART_CAP_FIFO, }, From 86698956fcdb42e3680c3295e5b6bd697eef3ca9 Mon Sep 17 00:00:00 2001 From: Chen Hu Date: Fri, 27 Jul 2018 18:32:41 +0800 Subject: [PATCH 662/970] serial: 8250_dw: always set baud rate in dw8250_set_termios commit dfcab6ba573445c703235ab6c83758eec12d7f28 upstream. dw8250_set_termios() doesn't set baud rate if the arg "old ktermios" is NULL. This happens during resume. Call Trace: ... [ 54.928108] dw8250_set_termios+0x162/0x170 [ 54.928114] serial8250_set_termios+0x17/0x20 [ 54.928117] uart_change_speed+0x64/0x160 [ 54.928119] uart_resume_port ... So the baud rate is not restored after S3 and breaks the apps who use UART, for example, console and bluetooth etc. We address this issue by setting the baud rate irrespective of arg "old", just like the drivers for other 8250 IPs. This is tested with Intel Broxton platform. Signed-off-by: Chen Hu Fixes: 4e26b134bd17 ("serial: 8250_dw: clock rate handling for all ACPI platforms") Cc: Heikki Krogerus Cc: stable Reviewed-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_dw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index 3eb01a719d22..970ec8c74a2a 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -235,7 +235,7 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios, unsigned int rate; int ret; - if (IS_ERR(d->clk) || !old) + if (IS_ERR(d->clk)) goto out; clk_disable_unprepare(d->clk); From 0f9f323b82a6c754a1c16a0db2f0387e913e5353 Mon Sep 17 00:00:00 2001 From: Srinath Mannam Date: Sat, 28 Jul 2018 20:55:15 +0530 Subject: [PATCH 663/970] serial: 8250_dw: Add ACPI support for uart on Broadcom SoC commit 784c29eda5b4e28c3a56aa90b3815f9a1b0cfdc1 upstream. Add ACPI identifier HID for UART DW 8250 on Broadcom SoCs to match the HID passed through ACPI tables to enable UART controller. Signed-off-by: Srinath Mannam Reviewed-by: Vladimir Olovyannikov Tested-by: Vladimir Olovyannikov Reviewed-by: Andy Shevchenko Cc: stable Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_dw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index 970ec8c74a2a..3177264a1166 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -626,6 +626,7 @@ static const struct acpi_device_id dw8250_acpi_match[] = { { "APMC0D08", 0}, { "AMD0020", 0 }, { "AMDI0020", 0 }, + { "BRCM2032", 0 }, { "HISI0031", 0 }, { }, }; From 89c059b66a08634c4fa91082f6b600daa2a6cd6a Mon Sep 17 00:00:00 2001 From: Tom Lendacky Date: Mon, 17 Jul 2017 16:10:06 -0500 Subject: [PATCH 664/970] x86/mm: Simplify p[g4um]d_page() macros MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit fd7e315988b784509ba3f1b42f539bd0b1fca9bb upstream. Create a pgd_pfn() macro similar to the p[4um]d_pfn() macros and then use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros instead of duplicating the code. Signed-off-by: Tom Lendacky Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Brijesh Singh Cc: Dave Young Cc: Dmitry Vyukov Cc: Jonathan Corbet Cc: Konrad Rzeszutek Wilk Cc: Larry Woodman Cc: Linus Torvalds Cc: Matt Fleming Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Radim Krčmář Cc: Rik van Riel Cc: Toshimitsu Kani Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/e61eb533a6d0aac941db2723d8aa63ef6b882dee.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar [Backported to 4.9 stable by AK, suggested by Michael Hocko] Signed-off-by: Andi Kleen Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5008be1ab183..c535012bdb56 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -190,6 +190,11 @@ static inline unsigned long pud_pfn(pud_t pud) return (pfn & pud_pfn_mask(pud)) >> PAGE_SHIFT; } +static inline unsigned long pgd_pfn(pgd_t pgd) +{ + return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT; +} + #define pte_page(pte) pfn_to_page(pte_pfn(pte)) static inline int pmd_large(pmd_t pte) @@ -621,8 +626,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pmd_page(pmd) \ - pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT) +#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd)) /* * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD] @@ -690,8 +694,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pud_page(pud) \ - pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT) +#define pud_page(pud) pfn_to_page(pud_pfn(pud)) /* Find an entry in the second-level page table.. */ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) @@ -731,7 +734,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd) * Currently stuck as a macro due to indirect forward reference to * linux/mmzone.h's __section_mem_map_addr() definition: */ -#define pgd_page(pgd) pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT) +#define pgd_page(pgd) pfn_to_page(pgd_pfn(pgd)) /* to find an entry in a page-table-directory. */ static inline unsigned long pud_index(unsigned long address) From 34a806bb5cdbdff868a1da31ed1273c27d862d51 Mon Sep 17 00:00:00 2001 From: Sudip Mukherjee Date: Sun, 15 Jul 2018 20:36:50 +0100 Subject: [PATCH 665/970] Bluetooth: avoid killing an already killed socket commit 4e1a720d0312fd510699032c7694a362a010170f upstream. slub debug reported: [ 440.648642] ============================================================================= [ 440.648649] BUG kmalloc-1024 (Tainted: G BU O ): Poison overwritten [ 440.648651] ----------------------------------------------------------------------------- [ 440.648655] INFO: 0xe70f4bec-0xe70f4bec. First byte 0x6a instead of 0x6b [ 440.648665] INFO: Allocated in sk_prot_alloc+0x6b/0xc6 age=33155 cpu=1 pid=1047 [ 440.648671] ___slab_alloc.constprop.24+0x1fc/0x292 [ 440.648675] __slab_alloc.isra.18.constprop.23+0x1c/0x25 [ 440.648677] __kmalloc+0xb6/0x17f [ 440.648680] sk_prot_alloc+0x6b/0xc6 [ 440.648683] sk_alloc+0x1e/0xa1 [ 440.648700] sco_sock_alloc.constprop.6+0x26/0xaf [bluetooth] [ 440.648716] sco_connect_cfm+0x166/0x281 [bluetooth] [ 440.648731] hci_conn_request_evt.isra.53+0x258/0x281 [bluetooth] [ 440.648746] hci_event_packet+0x28b/0x2326 [bluetooth] [ 440.648759] hci_rx_work+0x161/0x291 [bluetooth] [ 440.648764] process_one_work+0x163/0x2b2 [ 440.648767] worker_thread+0x1a9/0x25c [ 440.648770] kthread+0xf8/0xfd [ 440.648774] ret_from_fork+0x2e/0x38 [ 440.648779] INFO: Freed in __sk_destruct+0xd3/0xdf age=3815 cpu=1 pid=1047 [ 440.648782] __slab_free+0x4b/0x27a [ 440.648784] kfree+0x12e/0x155 [ 440.648787] __sk_destruct+0xd3/0xdf [ 440.648790] sk_destruct+0x27/0x29 [ 440.648793] __sk_free+0x75/0x91 [ 440.648795] sk_free+0x1c/0x1e [ 440.648810] sco_sock_kill+0x5a/0x5f [bluetooth] [ 440.648825] sco_conn_del+0x8e/0xba [bluetooth] [ 440.648840] sco_disconn_cfm+0x3a/0x41 [bluetooth] [ 440.648855] hci_event_packet+0x45e/0x2326 [bluetooth] [ 440.648868] hci_rx_work+0x161/0x291 [bluetooth] [ 440.648872] process_one_work+0x163/0x2b2 [ 440.648875] worker_thread+0x1a9/0x25c [ 440.648877] kthread+0xf8/0xfd [ 440.648880] ret_from_fork+0x2e/0x38 [ 440.648884] INFO: Slab 0xf4718580 objects=27 used=27 fp=0x (null) flags=0x40008100 [ 440.648886] INFO: Object 0xe70f4b88 @offset=19336 fp=0xe70f54f8 When KASAN was enabled, it reported: [ 210.096613] ================================================================== [ 210.096634] BUG: KASAN: use-after-free in ex_handler_refcount+0x5b/0x127 [ 210.096641] Write of size 4 at addr ffff880107e17160 by task kworker/u9:1/2040 [ 210.096651] CPU: 1 PID: 2040 Comm: kworker/u9:1 Tainted: G U O 4.14.47-20180606+ #2 [ 210.096654] Hardware name: , BIOS 2017.01-00087-g43e04de 08/30/2017 [ 210.096693] Workqueue: hci0 hci_rx_work [bluetooth] [ 210.096698] Call Trace: [ 210.096711] dump_stack+0x46/0x59 [ 210.096722] print_address_description+0x6b/0x23b [ 210.096729] ? ex_handler_refcount+0x5b/0x127 [ 210.096736] kasan_report+0x220/0x246 [ 210.096744] ex_handler_refcount+0x5b/0x127 [ 210.096751] ? ex_handler_clear_fs+0x85/0x85 [ 210.096757] fixup_exception+0x8c/0x96 [ 210.096766] do_trap+0x66/0x2c1 [ 210.096773] do_error_trap+0x152/0x180 [ 210.096781] ? fixup_bug+0x78/0x78 [ 210.096817] ? hci_debugfs_create_conn+0x244/0x26a [bluetooth] [ 210.096824] ? __schedule+0x113b/0x1453 [ 210.096830] ? sysctl_net_exit+0xe/0xe [ 210.096837] ? __wake_up_common+0x343/0x343 [ 210.096843] ? insert_work+0x107/0x163 [ 210.096850] invalid_op+0x1b/0x40 [ 210.096888] RIP: 0010:hci_debugfs_create_conn+0x244/0x26a [bluetooth] [ 210.096892] RSP: 0018:ffff880094a0f970 EFLAGS: 00010296 [ 210.096898] RAX: 0000000000000000 RBX: ffff880107e170e8 RCX: ffff880107e17160 [ 210.096902] RDX: 000000000000002f RSI: ffff88013b80ed40 RDI: ffffffffa058b940 [ 210.096906] RBP: ffff88011b2b0578 R08: 00000000852f0ec9 R09: ffffffff81cfcf9b [ 210.096909] R10: 00000000d21bdad7 R11: 0000000000000001 R12: ffff8800967b0488 [ 210.096913] R13: ffff880107e17168 R14: 0000000000000068 R15: ffff8800949c0008 [ 210.096920] ? __sk_destruct+0x2c6/0x2d4 [ 210.096959] hci_event_packet+0xff5/0x7de2 [bluetooth] [ 210.096969] ? __local_bh_enable_ip+0x43/0x5b [ 210.097004] ? l2cap_sock_recv_cb+0x158/0x166 [bluetooth] [ 210.097039] ? hci_le_meta_evt+0x2bb3/0x2bb3 [bluetooth] [ 210.097075] ? l2cap_ertm_init+0x94e/0x94e [bluetooth] [ 210.097093] ? xhci_urb_enqueue+0xbd8/0xcf5 [xhci_hcd] [ 210.097102] ? __accumulate_pelt_segments+0x24/0x33 [ 210.097109] ? __accumulate_pelt_segments+0x24/0x33 [ 210.097115] ? __update_load_avg_se.isra.2+0x217/0x3a4 [ 210.097122] ? set_next_entity+0x7c3/0x12cd [ 210.097128] ? pick_next_entity+0x25e/0x26c [ 210.097135] ? pick_next_task_fair+0x2ca/0xc1a [ 210.097141] ? switch_mm_irqs_off+0x346/0xb4f [ 210.097147] ? __switch_to+0x769/0xbc4 [ 210.097153] ? compat_start_thread+0x66/0x66 [ 210.097188] ? hci_conn_check_link_mode+0x1cd/0x1cd [bluetooth] [ 210.097195] ? finish_task_switch+0x392/0x431 [ 210.097228] ? hci_rx_work+0x154/0x487 [bluetooth] [ 210.097260] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097269] process_one_work+0x579/0x9e9 [ 210.097277] worker_thread+0x68f/0x804 [ 210.097285] kthread+0x31c/0x32b [ 210.097292] ? rescuer_thread+0x70c/0x70c [ 210.097299] ? kthread_create_on_node+0xa3/0xa3 [ 210.097306] ret_from_fork+0x35/0x40 [ 210.097314] Allocated by task 2040: [ 210.097323] kasan_kmalloc.part.1+0x51/0xc7 [ 210.097328] __kmalloc+0x17f/0x1b6 [ 210.097335] sk_prot_alloc+0xf2/0x1a3 [ 210.097340] sk_alloc+0x22/0x297 [ 210.097375] sco_sock_alloc.constprop.7+0x23/0x202 [bluetooth] [ 210.097410] sco_connect_cfm+0x2d0/0x566 [bluetooth] [ 210.097443] hci_conn_request_evt.isra.53+0x6d3/0x762 [bluetooth] [ 210.097476] hci_event_packet+0x85e/0x7de2 [bluetooth] [ 210.097507] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097512] process_one_work+0x579/0x9e9 [ 210.097517] worker_thread+0x68f/0x804 [ 210.097523] kthread+0x31c/0x32b [ 210.097529] ret_from_fork+0x35/0x40 [ 210.097533] Freed by task 2040: [ 210.097539] kasan_slab_free+0xb3/0x15e [ 210.097544] kfree+0x103/0x1a9 [ 210.097549] __sk_destruct+0x2c6/0x2d4 [ 210.097584] sco_conn_del.isra.1+0xba/0x10e [bluetooth] [ 210.097617] hci_event_packet+0xff5/0x7de2 [bluetooth] [ 210.097648] hci_rx_work+0x154/0x487 [bluetooth] [ 210.097653] process_one_work+0x579/0x9e9 [ 210.097658] worker_thread+0x68f/0x804 [ 210.097663] kthread+0x31c/0x32b [ 210.097670] ret_from_fork+0x35/0x40 [ 210.097676] The buggy address belongs to the object at ffff880107e170e8 which belongs to the cache kmalloc-1024 of size 1024 [ 210.097681] The buggy address is located 120 bytes inside of 1024-byte region [ffff880107e170e8, ffff880107e174e8) [ 210.097683] The buggy address belongs to the page: [ 210.097689] page:ffffea00041f8400 count:1 mapcount:0 mapping: (null) index:0xffff880107e15b68 compound_mapcount: 0 [ 210.110194] flags: 0x8000000000008100(slab|head) [ 210.115441] raw: 8000000000008100 0000000000000000 ffff880107e15b68 0000000100170016 [ 210.115448] raw: ffffea0004a47620 ffffea0004b48e20 ffff88013b80ed40 0000000000000000 [ 210.115451] page dumped because: kasan: bad access detected [ 210.115454] Memory state around the buggy address: [ 210.115460] ffff880107e17000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 210.115465] ffff880107e17080: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb [ 210.115469] >ffff880107e17100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115472] ^ [ 210.115477] ffff880107e17180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115481] ffff880107e17200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 210.115483] ================================================================== And finally when BT_DBG() and ftrace was enabled it showed: <...>-14979 [001] .... 186.104191: sco_sock_kill <-sco_sock_close <...>-14979 [001] .... 186.104191: sco_sock_kill <-sco_sock_release <...>-14979 [001] .... 186.104192: sco_sock_kill: sk ef0497a0 state 9 <...>-14979 [001] .... 186.104193: bt_sock_unlink <-sco_sock_kill kworker/u9:2-792 [001] .... 186.104246: sco_sock_kill <-sco_conn_del kworker/u9:2-792 [001] .... 186.104248: sco_sock_kill: sk ef0497a0 state 9 kworker/u9:2-792 [001] .... 186.104249: bt_sock_unlink <-sco_sock_kill kworker/u9:2-792 [001] .... 186.104250: sco_sock_destruct <-__sk_destruct kworker/u9:2-792 [001] .... 186.104250: sco_sock_destruct: sk ef0497a0 kworker/u9:2-792 [001] .... 186.104860: hci_conn_del <-hci_event_packet kworker/u9:2-792 [001] .... 186.104864: hci_conn_del: hci0 hcon ef0484c0 handle 266 Only in the failed case, sco_sock_kill() gets called with the same sock pointer two times. Add a check for SOCK_DEAD to avoid continue killing a socket which has already been killed. Signed-off-by: Sudip Mukherjee Signed-off-by: Marcel Holtmann Signed-off-by: Greg Kroah-Hartman --- net/bluetooth/sco.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index 3125ce670c2f..95fd7a837dc5 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -392,7 +392,8 @@ static void sco_sock_cleanup_listen(struct sock *parent) */ static void sco_sock_kill(struct sock *sk) { - if (!sock_flag(sk, SOCK_ZAPPED) || sk->sk_socket) + if (!sock_flag(sk, SOCK_ZAPPED) || sk->sk_socket || + sock_flag(sk, SOCK_DEAD)) return; BT_DBG("sk %p state %d", sk, sk->sk_state); From 676054232ecfeaf9392201fd5fc3f74a6d39d9f3 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 22 Aug 2018 07:47:16 +0200 Subject: [PATCH 666/970] Linux 4.9.123 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 1f44343a1e04..b11e375bb18e 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 122 +SUBLEVEL = 123 EXTRAVERSION = NAME = Roaring Lionus From 987156381c5f875d75ef1f7cc29994d82f646dad Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Sun, 22 Jul 2018 11:05:09 -0700 Subject: [PATCH 667/970] x86/entry/64: Remove %ebx handling from error_entry/exit commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream. error_entry and error_exit communicate the user vs. kernel status of the frame using %ebx. This is unnecessary -- the information is in regs->cs. Just use regs->cs. This makes error_entry simpler and makes error_exit more robust. It also fixes a nasty bug. Before all the Spectre nonsense, the xen_failsafe_callback entry point returned like this: ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS ENCODE_FRAME_POINTER jmp error_exit And it did not go through error_entry. This was bogus: RBX contained garbage, and error_exit expected a flag in RBX. Fortunately, it generally contained *nonzero* garbage, so the correct code path was used. As part of the Spectre fixes, code was added to clear RBX to mitigate certain speculation attacks. Now, depending on kernel configuration, RBX got zeroed and, when running some Wine workloads, the kernel crashes. This was introduced by: commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") With this patch applied, RBX is no longer needed as a flag, and the problem goes away. I suspect that malicious userspace could use this bug to crash the kernel even without the offending patch applied, though. [ Historical note: I wrote this patch as a cleanup before I was aware of the bug it fixed. ] [ Note to stable maintainers: this should probably get applied to all kernels. If you're nervous about that, a more conservative fix to add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should also fix the problem. ] Reported-and-tested-by: M. Vefa Bicakci Signed-off-by: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: Dominik Brodowski Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: stable@vger.kernel.org Cc: xen-devel@lists.xenproject.org Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Sarah Newman Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index d58d8dcb8245..76c1d85e749b 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -774,7 +774,7 @@ ENTRY(\sym) call \do_sym - jmp error_exit /* %ebx: no swapgs flag */ + jmp error_exit .endif END(\sym) .endm @@ -1043,7 +1043,6 @@ END(paranoid_exit) /* * Save all registers in pt_regs, and switch gs if needed. - * Return: EBX=0: came from user mode; EBX=1: otherwise */ ENTRY(error_entry) cld @@ -1056,7 +1055,6 @@ ENTRY(error_entry) * the kernel CR3 here. */ SWITCH_KERNEL_CR3 - xorl %ebx, %ebx testb $3, CS+8(%rsp) jz .Lerror_kernelspace @@ -1087,7 +1085,6 @@ ENTRY(error_entry) * for these here too. */ .Lerror_kernelspace: - incl %ebx leaq native_irq_return_iret(%rip), %rcx cmpq %rcx, RIP+8(%rsp) je .Lerror_bad_iret @@ -1119,28 +1116,19 @@ ENTRY(error_entry) /* * Pretend that the exception came from user mode: set up pt_regs - * as if we faulted immediately after IRET and clear EBX so that - * error_exit knows that we will be returning to user mode. + * as if we faulted immediately after IRET. */ mov %rsp, %rdi call fixup_bad_iret mov %rax, %rsp - decl %ebx jmp .Lerror_entry_from_usermode_after_swapgs END(error_entry) - -/* - * On entry, EBX is a "return to kernel mode" flag: - * 1: already in kernel mode, don't need SWAPGS - * 0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode - */ ENTRY(error_exit) - movl %ebx, %eax DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF - testl %eax, %eax - jnz retint_kernel + testb $3, CS(%rsp) + jz retint_kernel jmp retint_user END(error_exit) From e7a0393b52bc0580b23ff1d0e97a4c3a53536f78 Mon Sep 17 00:00:00 2001 From: Alexey Brodkin Date: Fri, 1 Jun 2018 14:34:33 +0300 Subject: [PATCH 668/970] ARC: Explicitly add -mmedium-calls to CFLAGS [ Upstream commit 74c11e300c103af47db5b658fdcf28002421e250 ] GCC built for arc*-*-linux has "-mmedium-calls" implicitly enabled by default thus we don't see any problems during Linux kernel compilation. ----------------------------->8------------------------ arc-linux-gcc -mcpu=arc700 -Q --help=target | grep calls -mlong-calls [disabled] -mmedium-calls [enabled] ----------------------------->8------------------------ But if we try to use so-called Elf32 toolchain with GCC configured for arc*-*-elf* then we'd see the following failure: ----------------------------->8------------------------ init/do_mounts.o: In function 'init_rootfs': do_mounts.c:(.init.text+0x108): relocation truncated to fit: R_ARC_S21W_PCREL against symbol 'unregister_filesystem' defined in .text section in fs/filesystems.o arc-elf32-ld: final link failed: Symbol needs debug section which does not exist make: *** [vmlinux] Error 1 ----------------------------->8------------------------ That happens because neither "-mmedium-calls" nor "-mlong-calls" are enabled in Elf32 GCC: ----------------------------->8------------------------ arc-elf32-gcc -mcpu=arc700 -Q --help=target | grep calls -mlong-calls [disabled] -mmedium-calls [disabled] ----------------------------->8------------------------ Now to make it possible to use Elf32 toolchain for building Linux kernel we're explicitly add "-mmedium-calls" to CFLAGS. And since we add "-mmedium-calls" to the global CFLAGS there's no point in having per-file copies thus removing them. Signed-off-by: Alexey Brodkin Signed-off-by: Vineet Gupta Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arc/Makefile | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 19cce226d1a8..8447eed836ef 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -18,7 +18,7 @@ endif KBUILD_DEFCONFIG := nsim_700_defconfig -cflags-y += -fno-common -pipe -fno-builtin -D__linux__ +cflags-y += -fno-common -pipe -fno-builtin -mmedium-calls -D__linux__ cflags-$(CONFIG_ISA_ARCOMPACT) += -mA7 cflags-$(CONFIG_ISA_ARCV2) += -mcpu=archs @@ -141,16 +141,3 @@ dtbs: scripts archclean: $(Q)$(MAKE) $(clean)=$(boot) - -# Hacks to enable final link due to absence of link-time branch relexation -# and gcc choosing optimal(shorter) branches at -O3 -# -# vineetg Feb 2010: -mlong-calls switched off for overall kernel build -# However lib/decompress_inflate.o (.init.text) calls -# zlib_inflate_workspacesize (.text) causing relocation errors. -# Thus forcing all exten calls in this file to be long calls -export CFLAGS_decompress_inflate.o = -mmedium-calls -export CFLAGS_initramfs.o = -mmedium-calls -ifdef CONFIG_SMP -export CFLAGS_core.o = -mmedium-calls -endif From 00fb7e1404725edced5d6f77b0333b94bdd3413e Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 31 May 2018 16:45:52 +0200 Subject: [PATCH 669/970] usb: dwc3: of-simple: fix use-after-free on remove [ Upstream commit 896e518883f18e601335908192e33426c1f599a4 ] The clocks have already been explicitly disabled and put as part of remove() so the runtime suspend callback must not be run when balancing the runtime PM usage count before returning. Fixes: 16adc674d0d6 ("usb: dwc3: add generic OF glue layer") Signed-off-by: Johan Hovold Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc3/dwc3-of-simple.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/dwc3-of-simple.c b/drivers/usb/dwc3/dwc3-of-simple.c index a3e2200f5b5f..58526932d2b6 100644 --- a/drivers/usb/dwc3/dwc3-of-simple.c +++ b/drivers/usb/dwc3/dwc3-of-simple.c @@ -132,8 +132,9 @@ static int dwc3_of_simple_remove(struct platform_device *pdev) of_platform_depopulate(dev); - pm_runtime_put_sync(dev); pm_runtime_disable(dev); + pm_runtime_put_noidle(dev); + pm_runtime_set_suspended(dev); return 0; } From a677cc36431f68c1208bfe288fa5f407c0114094 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 13 Jun 2018 10:11:56 -0700 Subject: [PATCH 670/970] netfilter: ipv6: nf_defrag: reduce struct net memory waste [ Upstream commit 9ce7bc036ae4cfe3393232c86e9e1fea2153c237 ] It is a waste of memory to use a full "struct netns_sysctl_ipv6" while only one pointer is really used, considering netns_sysctl_ipv6 keeps growing. Also, since "struct netns_frags" has cache line alignment, it is better to move the frags_hdr pointer outside, otherwise we spend a full cache line for this pointer. This saves 192 bytes of memory per netns. Fixes: c038a767cd69 ("ipv6: add a new namespace for nf_conntrack_reasm") Signed-off-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/net/net_namespace.h | 1 + include/net/netns/ipv6.h | 1 - net/ipv6/netfilter/nf_conntrack_reasm.c | 6 +++--- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 23102da24dd9..c05db6ff2515 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -116,6 +116,7 @@ struct net { #endif #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) struct netns_nf_frag nf_frag; + struct ctl_table_header *nf_frag_frags_hdr; #endif struct sock *nfnl; struct sock *nfnl_stash; diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h index 10d0848f5b8a..5cae575db07b 100644 --- a/include/net/netns/ipv6.h +++ b/include/net/netns/ipv6.h @@ -89,7 +89,6 @@ struct netns_ipv6 { #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) struct netns_nf_frag { - struct netns_sysctl_ipv6 sysctl; struct netns_frags frags; }; #endif diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 722a9db8c6a7..ee33a6743f3b 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -117,7 +117,7 @@ static int nf_ct_frag6_sysctl_register(struct net *net) if (hdr == NULL) goto err_reg; - net->nf_frag.sysctl.frags_hdr = hdr; + net->nf_frag_frags_hdr = hdr; return 0; err_reg: @@ -131,8 +131,8 @@ static void __net_exit nf_ct_frags6_sysctl_unregister(struct net *net) { struct ctl_table *table; - table = net->nf_frag.sysctl.frags_hdr->ctl_table_arg; - unregister_net_sysctl_table(net->nf_frag.sysctl.frags_hdr); + table = net->nf_frag_frags_hdr->ctl_table_arg; + unregister_net_sysctl_table(net->nf_frag_frags_hdr); if (!net_eq(net, &init_net)) kfree(table); } From 879beb74aa0800ff1abece7f1b0e95a1f55e219e Mon Sep 17 00:00:00 2001 From: "Shuah Khan (Samsung OSG)" Date: Tue, 12 Jun 2018 16:46:03 -0600 Subject: [PATCH 671/970] selftests: pstore: return Kselftest Skip code for skipped tests [ Upstream commit 856e7c4b619af622d56b3b454f7bec32a170ac99 ] When pstore_post_reboot test gets skipped because of unmet dependencies and/or unsupported configuration, it returns 0 which is treated as a pass by the Kselftest framework. This leads to false positive result even when the test could not be run. Change it to return kselftest skip code when a test gets skipped to clearly report that the test could not be run. Kselftest framework SKIP code is 4 and the framework prints appropriate messages to indicate that the test is skipped. Signed-off-by: Shuah Khan (Samsung OSG) Reviewed-by: Kees Cook Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/pstore/pstore_post_reboot_tests | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/pstore/pstore_post_reboot_tests b/tools/testing/selftests/pstore/pstore_post_reboot_tests index 6ccb154cb4aa..22f8df1ad7d4 100755 --- a/tools/testing/selftests/pstore/pstore_post_reboot_tests +++ b/tools/testing/selftests/pstore/pstore_post_reboot_tests @@ -7,13 +7,16 @@ # # Released under the terms of the GPL v2. +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + . ./common_tests if [ -e $REBOOT_FLAG ]; then rm $REBOOT_FLAG else prlog "pstore_crash_test has not been executed yet. we skip further tests." - exit 0 + exit $ksft_skip fi prlog -n "Mounting pstore filesystem ... " From ccb8eef63e3152c60752e7d37aabd01d5c22d10c Mon Sep 17 00:00:00 2001 From: "Shuah Khan (Samsung OSG)" Date: Tue, 12 Jun 2018 17:40:31 -0600 Subject: [PATCH 672/970] selftests: static_keys: return Kselftest Skip code for skipped tests [ Upstream commit 8781578087b8fb8829558bac96c3c24e5ba26f82 ] When static_keys test is skipped because of unmet dependencies and/or unsupported configuration, it exits with error which is treated as a fail by the Kselftest framework. This leads to false negative result even when the test could not be run. Change it to return kselftest skip code when a test gets skipped to clearly report that the test could not be run. Added an explicit searches for test_static_key_base and test_static_keys modules and return skip code if they aren't found to differentiate between the failure to load the module condition and module not found condition. Kselftest framework SKIP code is 4 and the framework prints appropriate messages to indicate that the test is skipped. Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- .../selftests/static_keys/test_static_keys.sh | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/tools/testing/selftests/static_keys/test_static_keys.sh b/tools/testing/selftests/static_keys/test_static_keys.sh index 1261e3fa1e3a..5bba7796fb34 100755 --- a/tools/testing/selftests/static_keys/test_static_keys.sh +++ b/tools/testing/selftests/static_keys/test_static_keys.sh @@ -1,6 +1,19 @@ #!/bin/sh # Runs static keys kernel module tests +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +if ! /sbin/modprobe -q -n test_static_key_base; then + echo "static_key: module test_static_key_base is not found [SKIP]" + exit $ksft_skip +fi + +if ! /sbin/modprobe -q -n test_static_keys; then + echo "static_key: module test_static_keys is not found [SKIP]" + exit $ksft_skip +fi + if /sbin/modprobe -q test_static_key_base; then if /sbin/modprobe -q test_static_keys; then echo "static_key: ok" From dbd816e106229adb3ded86ce233878962bd2b469 Mon Sep 17 00:00:00 2001 From: "Shuah Khan (Samsung OSG)" Date: Wed, 13 Jun 2018 21:10:48 -0600 Subject: [PATCH 673/970] selftests: user: return Kselftest Skip code for skipped tests [ Upstream commit d7d5311d4aa9611fe1a5a851e6f75733237a668a ] When user test is skipped because of unmet dependencies and/or unsupported configuration, it exits with error which is treated as a fail by the Kselftest framework. This leads to false negative result even when the test could not be run. Change it to return kselftest skip code when a test gets skipped to clearly report that the test could not be run. Add an explicit check for module presence and return skip code if module isn't present. Kselftest framework SKIP code is 4 and the framework prints appropriate messages to indicate that the test is skipped. Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/user/test_user_copy.sh | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/tools/testing/selftests/user/test_user_copy.sh b/tools/testing/selftests/user/test_user_copy.sh index 350107f40c1d..0409270f998c 100755 --- a/tools/testing/selftests/user/test_user_copy.sh +++ b/tools/testing/selftests/user/test_user_copy.sh @@ -1,6 +1,13 @@ #!/bin/sh # Runs copy_to/from_user infrastructure using test_user_copy kernel module +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +if ! /sbin/modprobe -q -n test_user_copy; then + echo "user: module test_user_copy is not found [SKIP]" + exit $ksft_skip +fi if /sbin/modprobe -q test_user_copy; then /sbin/modprobe -q -r test_user_copy echo "user_copy: ok" From dd91188f5611657c16c064fd0d739a994b8262f8 Mon Sep 17 00:00:00 2001 From: "Shuah Khan (Samsung OSG)" Date: Thu, 14 Jun 2018 16:56:13 -0600 Subject: [PATCH 674/970] selftests: zram: return Kselftest Skip code for skipped tests [ Upstream commit 685814466bf8398192cf855415a0bb2cefc1930e ] When zram test is skipped because of unmet dependencies and/or unsupported configuration, it exits with error which is treated as a fail by the Kselftest framework. This leads to false negative result even when the test could not be run. Change it to return kselftest skip code when a test gets skipped to clearly report that the test could not be run. Kselftest framework SKIP code is 4 and the framework prints appropriate messages to indicate that the test is skipped. Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/zram/zram.sh | 5 ++++- tools/testing/selftests/zram/zram_lib.sh | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/zram/zram.sh b/tools/testing/selftests/zram/zram.sh index 683a292e3290..9399c4aeaa26 100755 --- a/tools/testing/selftests/zram/zram.sh +++ b/tools/testing/selftests/zram/zram.sh @@ -1,6 +1,9 @@ #!/bin/bash TCID="zram.sh" +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + . ./zram_lib.sh run_zram () { @@ -23,5 +26,5 @@ elif [ -b /dev/zram0 ]; then else echo "$TCID : No zram.ko module or /dev/zram0 device file not found" echo "$TCID : CONFIG_ZRAM is not set" - exit 1 + exit $ksft_skip fi diff --git a/tools/testing/selftests/zram/zram_lib.sh b/tools/testing/selftests/zram/zram_lib.sh index f6a9c73e7a44..9e73a4fb9b0a 100755 --- a/tools/testing/selftests/zram/zram_lib.sh +++ b/tools/testing/selftests/zram/zram_lib.sh @@ -18,6 +18,9 @@ MODULE=0 dev_makeswap=-1 dev_mounted=-1 +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + trap INT check_prereqs() @@ -27,7 +30,7 @@ check_prereqs() if [ $uid -ne 0 ]; then echo $msg must be run as root >&2 - exit 0 + exit $ksft_skip fi } From 0fab6c6dd55c6ff0e6b92862c7521e6f810ecd12 Mon Sep 17 00:00:00 2001 From: Fathi Boudra Date: Thu, 14 Jun 2018 11:57:08 +0200 Subject: [PATCH 675/970] selftests: sync: add config fragment for testing sync framework [ Upstream commit d6a3e55131fcb1e5ca1753f4b6f297a177b2fc91 ] Unless the software synchronization objects (CONFIG_SW_SYNC) is enabled, the sync test will be skipped: TAP version 13 1..0 # Skipped: Sync framework not supported by kernel Add a config fragment file to be able to run "make kselftest-merge" to enable relevant configuration required in order to run the sync test. Signed-off-by: Fathi Boudra Link: https://lkml.org/lkml/2017/5/5/14 Signed-off-by: Anders Roxell Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/sync/config | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 tools/testing/selftests/sync/config diff --git a/tools/testing/selftests/sync/config b/tools/testing/selftests/sync/config new file mode 100644 index 000000000000..1ab7e8130db2 --- /dev/null +++ b/tools/testing/selftests/sync/config @@ -0,0 +1,4 @@ +CONFIG_STAGING=y +CONFIG_ANDROID=y +CONFIG_SYNC=y +CONFIG_SW_SYNC=y From 31ceb5846b02f844957e9e66358055db5bd1cbad Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 11 Jun 2018 15:47:12 -0700 Subject: [PATCH 676/970] ARM: dts: NSP: Fix i2c controller interrupt type [ Upstream commit a3e32e78a40017756c71ef6dad429ffe3301126a ] The i2c controller should use IRQ_TYPE_LEVEL_HIGH instead of IRQ_TYPE_NONE. Fixes: 0f9f27a36d09 ("ARM: dts: NSP: Add I2C support to the DT") Signed-off-by: Florian Fainelli Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/bcm-nsp.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi index 65e0db1d3bd7..42ab132e361e 100644 --- a/arch/arm/boot/dts/bcm-nsp.dtsi +++ b/arch/arm/boot/dts/bcm-nsp.dtsi @@ -288,7 +288,7 @@ reg = <0x38000 0x50>; #address-cells = <1>; #size-cells = <0>; - interrupts = ; + interrupts = ; clock-frequency = <100000>; }; From e63c10bc9b5afe7446b5c4f754717d4a283ec6c0 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 11 Jun 2018 15:47:13 -0700 Subject: [PATCH 677/970] ARM: dts: NSP: Fix PCIe controllers interrupt types [ Upstream commit 403fde644855bc71318c8db65646383e22653b13 ] The interrupts for the PCIe controllers should all be of type IRQ_TYPE_LEVEL_HIGH instead of IRQ_TYPE_NONE. Fixes: d71eb9412088 ("ARM: dts: NSP: Add MSI support on PCI") Fixes: 522199029fdc ("ARM: dts: NSP: Fix PCIE DT issue") Signed-off-by: Florian Fainelli Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/bcm-nsp.dtsi | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi index 42ab132e361e..6e3d3b50824e 100644 --- a/arch/arm/boot/dts/bcm-nsp.dtsi +++ b/arch/arm/boot/dts/bcm-nsp.dtsi @@ -375,7 +375,7 @@ #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 0>; - interrupt-map = <0 0 0 0 &gic GIC_SPI 131 IRQ_TYPE_NONE>; + interrupt-map = <0 0 0 0 &gic GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>; linux,pci-domain = <0>; @@ -397,10 +397,10 @@ compatible = "brcm,iproc-msi"; msi-controller; interrupt-parent = <&gic>; - interrupts = , - , - , - ; + interrupts = , + , + , + ; brcm,pcie-msi-inten; }; }; @@ -411,7 +411,7 @@ #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 0>; - interrupt-map = <0 0 0 0 &gic GIC_SPI 137 IRQ_TYPE_NONE>; + interrupt-map = <0 0 0 0 &gic GIC_SPI 137 IRQ_TYPE_LEVEL_HIGH>; linux,pci-domain = <1>; @@ -433,10 +433,10 @@ compatible = "brcm,iproc-msi"; msi-controller; interrupt-parent = <&gic>; - interrupts = , - , - , - ; + interrupts = , + , + , + ; brcm,pcie-msi-inten; }; }; @@ -447,7 +447,7 @@ #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 0>; - interrupt-map = <0 0 0 0 &gic GIC_SPI 143 IRQ_TYPE_NONE>; + interrupt-map = <0 0 0 0 &gic GIC_SPI 143 IRQ_TYPE_LEVEL_HIGH>; linux,pci-domain = <2>; @@ -469,10 +469,10 @@ compatible = "brcm,iproc-msi"; msi-controller; interrupt-parent = <&gic>; - interrupts = , - , - , - ; + interrupts = , + , + , + ; brcm,pcie-msi-inten; }; }; From 7f9391ce4a6dd69a49db8ecd581714ba5607ad16 Mon Sep 17 00:00:00 2001 From: Ray Jui Date: Tue, 12 Jun 2018 13:21:27 -0700 Subject: [PATCH 678/970] ARM: dts: Cygnus: Fix I2C controller interrupt type [ Upstream commit 71ca3409703b62b6a092d0d9d13f366c121bc5d3 ] Fix I2C controller interrupt to use IRQ_TYPE_LEVEL_HIGH for Broadcom Cygnus SoC. Fixes: b51c05a331ff ("ARM: dts: add I2C device nodes for Broadcom Cygnus") Signed-off-by: Ray Jui Signed-off-by: Florian Fainelli Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/bcm-cygnus.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi b/arch/arm/boot/dts/bcm-cygnus.dtsi index fabc9f36c408..c62d778a4cde 100644 --- a/arch/arm/boot/dts/bcm-cygnus.dtsi +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi @@ -128,7 +128,7 @@ reg = <0x18008000 0x100>; #address-cells = <1>; #size-cells = <0>; - interrupts = ; + interrupts = ; clock-frequency = <100000>; status = "disabled"; }; @@ -157,7 +157,7 @@ reg = <0x1800b000 0x100>; #address-cells = <1>; #size-cells = <0>; - interrupts = ; + interrupts = ; clock-frequency = <100000>; status = "disabled"; }; From a5865bbc4cf446606859373f217eee41889ce8b0 Mon Sep 17 00:00:00 2001 From: Ray Jui Date: Tue, 12 Jun 2018 13:21:28 -0700 Subject: [PATCH 679/970] ARM: dts: Cygnus: Fix PCIe controller interrupt type [ Upstream commit 6cb1628ad3506b315cdddd7676db0ff2af378d28 ] Fix PCIe controller interrupt to use IRQ_TYPE_LEVEL_HIGH for Broadcom Cygnus SoC Fixes: cd590b50a936 ("ARM: dts: enable PCIe support for Cygnus") Fixes: f6b889358a82 ("ARM: dts: Enable MSI support for Broadcom Cygnus") Signed-off-by: Ray Jui Signed-off-by: Florian Fainelli Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/bcm-cygnus.dtsi | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi b/arch/arm/boot/dts/bcm-cygnus.dtsi index c62d778a4cde..5ad61537990e 100644 --- a/arch/arm/boot/dts/bcm-cygnus.dtsi +++ b/arch/arm/boot/dts/bcm-cygnus.dtsi @@ -168,7 +168,7 @@ #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 0>; - interrupt-map = <0 0 0 0 &gic GIC_SPI 100 IRQ_TYPE_NONE>; + interrupt-map = <0 0 0 0 &gic GIC_SPI 100 IRQ_TYPE_LEVEL_HIGH>; linux,pci-domain = <0>; @@ -190,10 +190,10 @@ compatible = "brcm,iproc-msi"; msi-controller; interrupt-parent = <&gic>; - interrupts = , - , - , - ; + interrupts = , + , + , + ; }; }; @@ -203,7 +203,7 @@ #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 0>; - interrupt-map = <0 0 0 0 &gic GIC_SPI 106 IRQ_TYPE_NONE>; + interrupt-map = <0 0 0 0 &gic GIC_SPI 106 IRQ_TYPE_LEVEL_HIGH>; linux,pci-domain = <1>; @@ -225,10 +225,10 @@ compatible = "brcm,iproc-msi"; msi-controller; interrupt-parent = <&gic>; - interrupts = , - , - , - ; + interrupts = , + , + , + ; }; }; From 9eb1a106268e47b242e90e79b1d89e1b1cab7cca Mon Sep 17 00:00:00 2001 From: Ray Jui Date: Tue, 12 Jun 2018 13:21:29 -0700 Subject: [PATCH 680/970] arm64: dts: ns2: Fix I2C controller interrupt type [ Upstream commit e605c287deed45624e8d35a15e3f0b4faab1a62d ] Fix I2C controller interrupt to use IRQ_TYPE_LEVEL_HIGH for Broadcom NS2 SoC. Fixes: 7ac674e8df7a ("arm64: dts: Add I2C nodes for NS2") Signed-off-by: Ray Jui Signed-off-by: Florian Fainelli Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm64/boot/dts/broadcom/ns2.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/boot/dts/broadcom/ns2.dtsi b/arch/arm64/boot/dts/broadcom/ns2.dtsi index a16b1b3f9fc6..8a94ec8035d3 100644 --- a/arch/arm64/boot/dts/broadcom/ns2.dtsi +++ b/arch/arm64/boot/dts/broadcom/ns2.dtsi @@ -393,7 +393,7 @@ reg = <0x66080000 0x100>; #address-cells = <1>; #size-cells = <0>; - interrupts = ; + interrupts = ; clock-frequency = <100000>; status = "disabled"; }; @@ -421,7 +421,7 @@ reg = <0x660b0000 0x100>; #address-cells = <1>; #size-cells = <0>; - interrupts = ; + interrupts = ; clock-frequency = <100000>; status = "disabled"; }; From fc1241bcd1a916effa3b147a2a27a458f428eb52 Mon Sep 17 00:00:00 2001 From: Alison Wang Date: Tue, 24 Apr 2018 10:42:32 +0800 Subject: [PATCH 681/970] drm: mali-dp: Enable Global SE interrupts mask for DP500 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 89610dc2c235e7b02bb9fba0ce247e12d4dde7cd ] In the situation that DE and SE aren’t shared the same interrupt number, the Global SE interrupts mask bit MASK_IRQ_EN in MASKIRQ must be set, or else other mask bits will not work and no SE interrupt will occur. This patch enables MASK_IRQ_EN for SE to fix this problem. Signed-off-by: Alison Wang Acked-by: Liviu Dudau Signed-off-by: Liviu Dudau Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/arm/malidp_hw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/arm/malidp_hw.c b/drivers/gpu/drm/arm/malidp_hw.c index a6132f1d58c1..5eee325feb29 100644 --- a/drivers/gpu/drm/arm/malidp_hw.c +++ b/drivers/gpu/drm/arm/malidp_hw.c @@ -432,7 +432,8 @@ const struct malidp_hw_device malidp_device[MALIDP_MAX_DEVICES] = { .vsync_irq = MALIDP500_DE_IRQ_VSYNC, }, .se_irq_map = { - .irq_mask = MALIDP500_SE_IRQ_CONF_MODE, + .irq_mask = MALIDP500_SE_IRQ_CONF_MODE | + MALIDP500_SE_IRQ_GLOBAL, .vsync_irq = 0, }, .dc_irq_map = { From aad3fdc0468b20cca5ee5949af95035e63b3dfee Mon Sep 17 00:00:00 2001 From: Vijay Immanuel Date: Tue, 12 Jun 2018 18:16:05 -0700 Subject: [PATCH 682/970] IB/rxe: Fix missing completion for mem_reg work requests [ Upstream commit 375dc53d032fc11e98036b5f228ad13f7c5933f5 ] Run the completer task to post a work completion after processing a memory registration or invalidate work request. This covers the case where the memory registration or invalidate was the last work request posted to the qp. Signed-off-by: Vijay Immanuel Reviewed-by: Yonatan Cohen Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/sw/rxe/rxe_req.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 5b0ca35c06ab..47219ebd8ff2 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -648,6 +648,9 @@ int rxe_requester(void *arg) } else { goto exit; } + if ((wqe->wr.send_flags & IB_SEND_SIGNALED) || + qp->sq_sig_type == IB_SIGNAL_ALL_WR) + rxe_run_task(&qp->comp.task, 1); qp->req.wqe_index = next_index(qp->sq.queue, qp->req.wqe_index); goto next_wqe; From 5600d61e7d9521f69bf500744cd6e72dd96d37ba Mon Sep 17 00:00:00 2001 From: John Garry Date: Fri, 8 Jun 2018 18:26:33 +0800 Subject: [PATCH 683/970] libahci: Fix possible Spectre-v1 pmp indexing in ahci_led_store() [ Upstream commit fae2a63737e5973f1426bc139935a0f42e232844 ] Currently smatch warns of possible Spectre-V1 issue in ahci_led_store(): drivers/ata/libahci.c:1150 ahci_led_store() warn: potential spectre issue 'pp->em_priv' (local cap) Userspace controls @pmp from following callchain: em_message->store() ->ata_scsi_em_message_store() -->ap->ops->em_store() --->ahci_led_store() After the mask+shift @pmp is effectively an 8b value, which is used to index into an array of length 8, so sanitize the array index. Signed-off-by: John Garry Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/ata/libahci.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index 0d028ead99e8..d81b984b72e4 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -1124,10 +1125,12 @@ static ssize_t ahci_led_store(struct ata_port *ap, const char *buf, /* get the slot number from the message */ pmp = (state & EM_MSG_LED_PMP_SLOT) >> 8; - if (pmp < EM_MAX_SLOTS) + if (pmp < EM_MAX_SLOTS) { + pmp = array_index_nospec(pmp, EM_MAX_SLOTS); emp = &pp->em_priv[pmp]; - else + } else { return -EINVAL; + } /* mask off the activity bits if we are in sw_activity * mode, user should turn off sw_activity before setting From 77569fc23d0ab96dd7a6d7ae1a1f01f1b84a6070 Mon Sep 17 00:00:00 2001 From: William Wu Date: Fri, 11 May 2018 17:46:32 +0800 Subject: [PATCH 684/970] usb: dwc2: fix isoc split in transfer with no data [ Upstream commit 70c3c8cb83856758025c2a211dd022bc0478922a ] If isoc split in transfer with no data (the length of DATA0 packet is zero), we can't simply return immediately. Because the DATA0 can be the first transaction or the second transaction for the isoc split in transaction. If the DATA0 packet with no data is in the first transaction, we can return immediately. But if the DATA0 packet with no data is in the second transaction of isoc split in transaction sequence, we need to increase the qtd->isoc_frame_index and giveback urb to device driver if needed, otherwise, the MDATA packet will be lost. A typical test case is that connect the dwc2 controller with an usb hs Hub (GL852G-12), and plug an usb fs audio device (Plantronics headset) into the downstream port of Hub. Then use the usb mic to record, we can find noise when playback. In the case, the isoc split in transaction sequence like this: - SSPLIT IN transaction - CSPLIT IN transaction - MDATA packet (176 bytes) - CSPLIT IN transaction - DATA0 packet (0 byte) This patch use both the length of DATA0 and qtd->isoc_split_offset to check if the DATA0 is in the second transaction. Tested-by: Gevorg Sahakyan Tested-by: Heiko Stuebner Acked-by: Minas Harutyunyan hminas@synopsys.com> Signed-off-by: William Wu Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc2/hcd_intr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c index 906f223542ee..8066fa9ac97b 100644 --- a/drivers/usb/dwc2/hcd_intr.c +++ b/drivers/usb/dwc2/hcd_intr.c @@ -922,9 +922,8 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg *hsotg, frame_desc = &qtd->urb->iso_descs[qtd->isoc_frame_index]; len = dwc2_get_actual_xfer_length(hsotg, chan, chnum, qtd, DWC2_HC_XFER_COMPLETE, NULL); - if (!len) { + if (!len && !qtd->isoc_split_offset) { qtd->complete_split = 0; - qtd->isoc_split_offset = 0; return 0; } From 121621e73737af151ef66704ac739cf58a8f1e3c Mon Sep 17 00:00:00 2001 From: Chunfeng Yun Date: Fri, 25 May 2018 17:24:57 +0800 Subject: [PATCH 685/970] usb: gadget: composite: fix delayed_status race condition when set_interface [ Upstream commit 980900d6318066b9f8314bfb87329a20fd0d1ca4 ] It happens when enable debug log, if set_alt() returns USB_GADGET_DELAYED_STATUS and usb_composite_setup_continue() is called before increasing count of @delayed_status, so fix it by using spinlock of @cdev->lock. Signed-off-by: Chunfeng Yun Tested-by: Jay Hsu Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/composite.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c index ca97f5b36e1b..2c022a08f163 100644 --- a/drivers/usb/gadget/composite.c +++ b/drivers/usb/gadget/composite.c @@ -1712,6 +1712,8 @@ composite_setup(struct usb_gadget *gadget, const struct usb_ctrlrequest *ctrl) */ if (w_value && !f->get_alt) break; + + spin_lock(&cdev->lock); value = f->set_alt(f, w_index, w_value); if (value == USB_GADGET_DELAYED_STATUS) { DBG(cdev, @@ -1721,6 +1723,7 @@ composite_setup(struct usb_gadget *gadget, const struct usb_ctrlrequest *ctrl) DBG(cdev, "delayed_status count %d\n", cdev->delayed_status); } + spin_unlock(&cdev->lock); break; case USB_REQ_GET_INTERFACE: if (ctrl->bRequestType != (USB_DIR_IN|USB_RECIP_INTERFACE)) From 44af104eabf9ef43179c55478b3e160770e7dec1 Mon Sep 17 00:00:00 2001 From: Grigor Tovmasyan Date: Thu, 24 May 2018 18:22:30 +0400 Subject: [PATCH 686/970] usb: gadget: dwc2: fix memory leak in gadget_init() [ Upstream commit 9bb073a053f0464ea74a4d4c331fdb7da58568d6 ] Freed allocated request for ep0 to prevent memory leak in case when dwc2_driver_probe() failed. Cc: Stefan Wahren Cc: Marek Szyprowski Tested-by: Stefan Wahren Tested-by: Marek Szyprowski Acked-by: Minas Harutyunyan Signed-off-by: Grigor Tovmasyan Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/dwc2/gadget.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c index 09921ef07ac5..3ae27b6ed07c 100644 --- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -3948,9 +3948,11 @@ int dwc2_gadget_init(struct dwc2_hsotg *hsotg, int irq) } ret = usb_add_gadget_udc(dev, &hsotg->gadget); - if (ret) + if (ret) { + dwc2_hsotg_ep_free_request(&hsotg->eps_out[0]->ep, + hsotg->ctrl_req); return ret; - + } dwc2_hsotg_dump(hsotg); return 0; @@ -3963,6 +3965,7 @@ int dwc2_gadget_init(struct dwc2_hsotg *hsotg, int irq) int dwc2_hsotg_remove(struct dwc2_hsotg *hsotg) { usb_del_gadget_udc(&hsotg->gadget); + dwc2_hsotg_ep_free_request(&hsotg->eps_out[0]->ep, hsotg->ctrl_req); return 0; } From 10a4d81876f31c72201a05f23dbce28de64e86cd Mon Sep 17 00:00:00 2001 From: Zhouyang Jia Date: Fri, 15 Jun 2018 07:34:52 +0800 Subject: [PATCH 687/970] xen: add error handling for xenbus_printf [ Upstream commit 84c029a73327cef571eaa61c7d6e67e8031b52ec ] When xenbus_printf fails, the lack of error-handling code may cause unexpected results. This patch adds error-handling code after calling xenbus_printf. Signed-off-by: Zhouyang Jia Reviewed-by: Boris Ostrovsky Signed-off-by: Juergen Gross Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/xen/manage.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 9122ba25bb00..7abaaa5f0f67 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -291,8 +291,15 @@ static void sysrq_handler(struct xenbus_watch *watch, const char **vec, return; } - if (sysrq_key != '\0') - xenbus_printf(xbt, "control", "sysrq", "%c", '\0'); + if (sysrq_key != '\0') { + err = xenbus_printf(xbt, "control", "sysrq", "%c", '\0'); + if (err) { + pr_err("%s: Error %d writing sysrq in control/sysrq\n", + __func__, err); + xenbus_transaction_end(xbt, 1); + return; + } + } err = xenbus_transaction_end(xbt, 0); if (err == -EAGAIN) @@ -344,7 +351,12 @@ static int setup_shutdown_watcher(void) continue; snprintf(node, FEATURE_PATH_SIZE, "feature-%s", shutdown_handlers[idx].command); - xenbus_printf(XBT_NIL, "control", node, "%u", 1); + err = xenbus_printf(XBT_NIL, "control", node, "%u", 1); + if (err) { + pr_err("%s: Error %d writing %s\n", __func__, + err, node); + return err; + } } return 0; From 2941d96a731a52b2aff8f4bf6fc7e7f97ed5274f Mon Sep 17 00:00:00 2001 From: Zhouyang Jia Date: Sat, 16 Jun 2018 01:05:01 +0800 Subject: [PATCH 688/970] scsi: xen-scsifront: add error handling for xenbus_printf [ Upstream commit 93efbd39870474cc536b9caf4a6efeb03b0bc56f ] When xenbus_printf fails, the lack of error-handling code may cause unexpected results. This patch adds error-handling code after calling xenbus_printf. Signed-off-by: Zhouyang Jia Reviewed-by: Juergen Gross Signed-off-by: Juergen Gross Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/xen-scsifront.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/xen-scsifront.c b/drivers/scsi/xen-scsifront.c index 9dc8687bf048..e1b32ed0aa20 100644 --- a/drivers/scsi/xen-scsifront.c +++ b/drivers/scsi/xen-scsifront.c @@ -676,10 +676,17 @@ static int scsifront_dev_reset_handler(struct scsi_cmnd *sc) static int scsifront_sdev_configure(struct scsi_device *sdev) { struct vscsifrnt_info *info = shost_priv(sdev->host); + int err; - if (info && current == info->curr) - xenbus_printf(XBT_NIL, info->dev->nodename, + if (info && current == info->curr) { + err = xenbus_printf(XBT_NIL, info->dev->nodename, info->dev_state_path, "%d", XenbusStateConnected); + if (err) { + xenbus_dev_error(info->dev, err, + "%s: writing dev_state_path", __func__); + return err; + } + } return 0; } @@ -687,10 +694,15 @@ static int scsifront_sdev_configure(struct scsi_device *sdev) static void scsifront_sdev_destroy(struct scsi_device *sdev) { struct vscsifrnt_info *info = shost_priv(sdev->host); + int err; - if (info && current == info->curr) - xenbus_printf(XBT_NIL, info->dev->nodename, + if (info && current == info->curr) { + err = xenbus_printf(XBT_NIL, info->dev->nodename, info->dev_state_path, "%d", XenbusStateClosed); + if (err) + xenbus_dev_error(info->dev, err, + "%s: writing dev_state_path", __func__); + } } static struct scsi_host_template scsifront_sht = { @@ -1025,9 +1037,12 @@ static void scsifront_do_lun_hotplug(struct vscsifrnt_info *info, int op) if (scsi_add_device(info->host, chn, tgt, lun)) { dev_err(&dev->dev, "scsi_add_device\n"); - xenbus_printf(XBT_NIL, dev->nodename, + err = xenbus_printf(XBT_NIL, dev->nodename, info->dev_state_path, "%d", XenbusStateClosed); + if (err) + xenbus_dev_error(dev, err, + "%s: writing dev_state_path", __func__); } break; case VSCSIFRONT_OP_DEL_LUN: @@ -1041,10 +1056,14 @@ static void scsifront_do_lun_hotplug(struct vscsifrnt_info *info, int op) } break; case VSCSIFRONT_OP_READD_LUN: - if (device_state == XenbusStateConnected) - xenbus_printf(XBT_NIL, dev->nodename, + if (device_state == XenbusStateConnected) { + err = xenbus_printf(XBT_NIL, dev->nodename, info->dev_state_path, "%d", XenbusStateConnected); + if (err) + xenbus_dev_error(dev, err, + "%s: writing dev_state_path", __func__); + } break; default: break; From c0a01787462a3a370006cdf5d14d7e7967ad59aa Mon Sep 17 00:00:00 2001 From: Zhouyang Jia Date: Sat, 16 Jun 2018 08:14:37 +0800 Subject: [PATCH 689/970] xen/scsiback: add error handling for xenbus_printf [ Upstream commit 7c63ca24c878e0051c91904b72174029320ef4bd ] When xenbus_printf fails, the lack of error-handling code may cause unexpected results. This patch adds error-handling code after calling xenbus_printf. Signed-off-by: Zhouyang Jia Reviewed-by: Juergen Gross Signed-off-by: Juergen Gross Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/xen/xen-scsiback.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c index 980f32817305..992cb8fa272c 100644 --- a/drivers/xen/xen-scsiback.c +++ b/drivers/xen/xen-scsiback.c @@ -1014,6 +1014,7 @@ static void scsiback_do_add_lun(struct vscsibk_info *info, const char *state, { struct v2p_entry *entry; unsigned long flags; + int err; if (try) { spin_lock_irqsave(&info->v2p_lock, flags); @@ -1029,8 +1030,11 @@ static void scsiback_do_add_lun(struct vscsibk_info *info, const char *state, scsiback_del_translation_entry(info, vir); } } else if (!try) { - xenbus_printf(XBT_NIL, info->dev->nodename, state, + err = xenbus_printf(XBT_NIL, info->dev->nodename, state, "%d", XenbusStateClosed); + if (err) + xenbus_dev_error(info->dev, err, + "%s: writing %s", __func__, state); } } @@ -1069,8 +1073,11 @@ static void scsiback_do_1lun_hotplug(struct vscsibk_info *info, int op, snprintf(str, sizeof(str), "vscsi-devs/%s/p-dev", ent); val = xenbus_read(XBT_NIL, dev->nodename, str, NULL); if (IS_ERR(val)) { - xenbus_printf(XBT_NIL, dev->nodename, state, + err = xenbus_printf(XBT_NIL, dev->nodename, state, "%d", XenbusStateClosed); + if (err) + xenbus_dev_error(info->dev, err, + "%s: writing %s", __func__, state); return; } strlcpy(phy, val, VSCSI_NAMELEN); @@ -1081,8 +1088,11 @@ static void scsiback_do_1lun_hotplug(struct vscsibk_info *info, int op, err = xenbus_scanf(XBT_NIL, dev->nodename, str, "%u:%u:%u:%u", &vir.hst, &vir.chn, &vir.tgt, &vir.lun); if (XENBUS_EXIST_ERR(err)) { - xenbus_printf(XBT_NIL, dev->nodename, state, + err = xenbus_printf(XBT_NIL, dev->nodename, state, "%d", XenbusStateClosed); + if (err) + xenbus_dev_error(info->dev, err, + "%s: writing %s", __func__, state); return; } From 40137ff99323f88357a55120a4e86471149e3670 Mon Sep 17 00:00:00 2001 From: Zhizhou Zhang Date: Tue, 12 Jun 2018 17:07:37 +0800 Subject: [PATCH 690/970] arm64: make secondary_start_kernel() notrace [ Upstream commit b154886f7892499d0d3054026e19dfb9a731df61 ] We can't call function trace hook before setup percpu offset. When entering secondary_start_kernel(), percpu offset has not been initialized. So this lead hotplug malfunction. Here is the flow to reproduce this bug: echo 0 > /sys/devices/system/cpu/cpu1/online echo function > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on echo 1 > /sys/devices/system/cpu/cpu1/online Acked-by: Mark Rutland Tested-by: Suzuki K Poulose Signed-off-by: Zhizhou Zhang Signed-off-by: Catalin Marinas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index a70f7d3361c4..cfd33f18f437 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -205,7 +205,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) * This is the secondary CPU boot entry. We're using this CPUs * idle thread stack, but a set of temporary page tables. */ -asmlinkage void secondary_start_kernel(void) +asmlinkage notrace void secondary_start_kernel(void) { struct mm_struct *mm = &init_mm; unsigned int cpu = smp_processor_id(); From 4186504db3a9d543ba07fd39121378f5ef18829b Mon Sep 17 00:00:00 2001 From: Sudarsana Reddy Kalluru Date: Mon, 18 Jun 2018 21:58:01 -0700 Subject: [PATCH 691/970] qed: Add sanity check for SIMD fastpath handler. [ Upstream commit 3935a70968820c3994db4de7e6e1c7e814bff875 ] Avoid calling a SIMD fastpath handler if it is NULL. The check is needed to handle an unlikely scenario where unsolicited interrupt is destined to a PF in INTa mode. Fixes: fe56b9e6a ("qed: Add module with basic common support") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qlogic/qed/qed_main.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index f36bd0bd37da..1ed13a165e58 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -502,8 +502,16 @@ static irqreturn_t qed_single_int(int irq, void *dev_instance) /* Fastpath interrupts */ for (j = 0; j < 64; j++) { if ((0x2ULL << j) & status) { - hwfn->simd_proto_handler[j].func( - hwfn->simd_proto_handler[j].token); + struct qed_simd_fp_handler *p_handler = + &hwfn->simd_proto_handler[j]; + + if (p_handler->func) + p_handler->func(p_handler->token); + else + DP_NOTICE(hwfn, + "Not calling fastpath handler as it is NULL [handler #%d, status 0x%llx]\n", + j, status); + status &= ~(0x2ULL << j); rc = IRQ_HANDLED; } From bed5acf33a65fb3ed07d3efbc29f6ab6e7f5145e Mon Sep 17 00:00:00 2001 From: Govindarajulu Varadarajan Date: Tue, 19 Jun 2018 08:15:24 -0700 Subject: [PATCH 692/970] enic: initialize enic->rfs_h.lock in enic_probe [ Upstream commit 3256d29fc7aecdf99feb1cb9475ed2252769a8a7 ] lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops can be called when interface is down. Move enic_rfs_flw_tbl_init to enic_probe. INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 18 PID: 1189 Comm: ethtool Not tainted 4.17.0-rc7-devel+ #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: dump_stack+0x85/0xc0 register_lock_class+0x550/0x560 ? __handle_mm_fault+0xa8b/0x1100 __lock_acquire+0x81/0x670 lock_acquire+0xb9/0x1e0 ? enic_get_rxnfc+0x139/0x2b0 [enic] _raw_spin_lock_bh+0x38/0x80 ? enic_get_rxnfc+0x139/0x2b0 [enic] enic_get_rxnfc+0x139/0x2b0 [enic] ethtool_get_rxnfc+0x8d/0x1c0 dev_ethtool+0x16c8/0x2400 ? __mutex_lock+0x64d/0xa00 ? dev_load+0x6a/0x150 dev_ioctl+0x253/0x4b0 sock_do_ioctl+0x9a/0x130 sock_ioctl+0x1af/0x350 do_vfs_ioctl+0x8e/0x670 ? syscall_trace_enter+0x1e2/0x380 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5a/0x170 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Govindarajulu Varadarajan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cisco/enic/enic_clsf.c | 3 +-- drivers/net/ethernet/cisco/enic/enic_main.c | 3 ++- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.c b/drivers/net/ethernet/cisco/enic/enic_clsf.c index 3c677ed3c29e..4d9014d5b36d 100644 --- a/drivers/net/ethernet/cisco/enic/enic_clsf.c +++ b/drivers/net/ethernet/cisco/enic/enic_clsf.c @@ -78,7 +78,6 @@ void enic_rfs_flw_tbl_init(struct enic *enic) enic->rfs_h.max = enic->config.num_arfs; enic->rfs_h.free = enic->rfs_h.max; enic->rfs_h.toclean = 0; - enic_rfs_timer_start(enic); } void enic_rfs_flw_tbl_free(struct enic *enic) @@ -87,7 +86,6 @@ void enic_rfs_flw_tbl_free(struct enic *enic) enic_rfs_timer_stop(enic); spin_lock_bh(&enic->rfs_h.lock); - enic->rfs_h.free = 0; for (i = 0; i < (1 << ENIC_RFS_FLW_BITSHIFT); i++) { struct hlist_head *hhead; struct hlist_node *tmp; @@ -98,6 +96,7 @@ void enic_rfs_flw_tbl_free(struct enic *enic) enic_delfltr(enic, n->fltr_id); hlist_del(&n->node); kfree(n); + enic->rfs_h.free++; } } spin_unlock_bh(&enic->rfs_h.lock); diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index 99f593be26e2..2e9bab45d419 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -1760,7 +1760,7 @@ static int enic_open(struct net_device *netdev) vnic_intr_unmask(&enic->intr[i]); enic_notify_timer_start(enic); - enic_rfs_flw_tbl_init(enic); + enic_rfs_timer_start(enic); return 0; @@ -2692,6 +2692,7 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) enic->notify_timer.function = enic_notify_timer; enic->notify_timer.data = (unsigned long)enic; + enic_rfs_flw_tbl_init(enic); enic_set_rx_coal_setting(enic); INIT_WORK(&enic->reset, enic_reset); INIT_WORK(&enic->tx_hang_reset, enic_tx_hang_reset); From 7694397e695186e4d89e777ff51fceee75bcc01c Mon Sep 17 00:00:00 2001 From: Stefan Agner Date: Sun, 17 Jun 2018 23:40:53 +0200 Subject: [PATCH 693/970] net: hamradio: use eth_broadcast_addr [ Upstream commit 4e8439aa34802deab11cee68b0ecb18f887fb153 ] The array bpq_eth_addr is only used to get the size of an address, whereas the bcast_addr is used to set the broadcast address. This leads to a warning when using clang: drivers/net/hamradio/bpqether.c:94:13: warning: variable 'bpq_eth_addr' is not needed and will not be emitted [-Wunneeded-internal-declaration] static char bpq_eth_addr[6]; ^ Remove both variables and use the common eth_broadcast_addr to set the broadcast address. Signed-off-by: Stefan Agner Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/hamradio/bpqether.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c index 622ab3ab9e93..f5e0983ae2a1 100644 --- a/drivers/net/hamradio/bpqether.c +++ b/drivers/net/hamradio/bpqether.c @@ -89,10 +89,6 @@ static const char banner[] __initconst = KERN_INFO \ "AX.25: bpqether driver version 004\n"; -static char bcast_addr[6]={0xFF,0xFF,0xFF,0xFF,0xFF,0xFF}; - -static char bpq_eth_addr[6]; - static int bpq_rcv(struct sk_buff *, struct net_device *, struct packet_type *, struct net_device *); static int bpq_device_event(struct notifier_block *, unsigned long, void *); @@ -515,8 +511,8 @@ static int bpq_new_device(struct net_device *edev) bpq->ethdev = edev; bpq->axdev = ndev; - memcpy(bpq->dest_addr, bcast_addr, sizeof(bpq_eth_addr)); - memcpy(bpq->acpt_addr, bcast_addr, sizeof(bpq_eth_addr)); + eth_broadcast_addr(bpq->dest_addr); + eth_broadcast_addr(bpq->acpt_addr); err = register_netdevice(ndev); if (err) From 53adf7365536614fa72e5c9493a3d5d72b0ca1c2 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Tue, 19 Jun 2018 17:23:17 +0800 Subject: [PATCH 694/970] net: propagate dev_get_valid_name return code [ Upstream commit 7892bd081045222b9e4027fec279a28d6fe7aa66 ] if dev_get_valid_name failed, propagate its return code and remove the setting err to ENODEV, it will be set to 0 again before dev_change_net_namespace exits. Signed-off-by: Li RongQing Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/core/dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 5407d5f7b2d0..b85e789044d5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7945,7 +7945,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char /* We get here if we can't use the current device name */ if (!pat) goto out; - if (dev_get_valid_name(net, dev, pat) < 0) + err = dev_get_valid_name(net, dev, pat); + if (err < 0) goto out; } @@ -7957,7 +7958,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char dev_close(dev); /* And unlink it from device chain */ - err = -ENODEV; unlist_netdevice(dev); synchronize_net(); From 6f37f7b62baa6a71d7f3f298acb64de51275e724 Mon Sep 17 00:00:00 2001 From: Dinh Nguyen Date: Tue, 19 Jun 2018 10:35:38 -0500 Subject: [PATCH 695/970] net: stmmac: socfpga: add additional ocp reset line for Stratix10 [ Upstream commit bc8a2d9bcbf1ca548b1deb315d14e1da81945bea ] The Stratix10 platform has an additional reset line, OCP(Open Core Protocol), that also needs to get deasserted for the stmmac ethernet controller to work. Thus we need to update the Kconfig to include ARCH_STRATIX10 in order to build dwmac-socfpga. Also, remove the redundant check for the reset controller pointer. The reset driver already checks for the pointer and returns 0 if the pointer is NULL. Signed-off-by: Dinh Nguyen Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/stmicro/stmmac/Kconfig | 2 +- .../ethernet/stmicro/stmmac/dwmac-socfpga.c | 18 ++++++++++++++---- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig index 4b78168a5f3c..0d03682faffb 100644 --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig @@ -83,7 +83,7 @@ config DWMAC_ROCKCHIP config DWMAC_SOCFPGA tristate "SOCFPGA dwmac support" default ARCH_SOCFPGA - depends on OF && (ARCH_SOCFPGA || COMPILE_TEST) + depends on OF && (ARCH_SOCFPGA || ARCH_STRATIX10 || COMPILE_TEST) select MFD_SYSCON help Support for ethernet controller on Altera SOCFPGA diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c index 0c420e97de1e..c3a78c113424 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c @@ -55,6 +55,7 @@ struct socfpga_dwmac { struct device *dev; struct regmap *sys_mgr_base_addr; struct reset_control *stmmac_rst; + struct reset_control *stmmac_ocp_rst; void __iomem *splitter_base; bool f2h_ptp_ref_clk; struct tse_pcs pcs; @@ -262,8 +263,8 @@ static int socfpga_dwmac_set_phy_mode(struct socfpga_dwmac *dwmac) val = SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII; /* Assert reset to the enet controller before changing the phy mode */ - if (dwmac->stmmac_rst) - reset_control_assert(dwmac->stmmac_rst); + reset_control_assert(dwmac->stmmac_ocp_rst); + reset_control_assert(dwmac->stmmac_rst); regmap_read(sys_mgr_base_addr, reg_offset, &ctrl); ctrl &= ~(SYSMGR_EMACGRP_CTRL_PHYSEL_MASK << reg_shift); @@ -285,8 +286,8 @@ static int socfpga_dwmac_set_phy_mode(struct socfpga_dwmac *dwmac) /* Deassert reset for the phy configuration to be sampled by * the enet controller, and operation to start in requested mode */ - if (dwmac->stmmac_rst) - reset_control_deassert(dwmac->stmmac_rst); + reset_control_deassert(dwmac->stmmac_ocp_rst); + reset_control_deassert(dwmac->stmmac_rst); if (phymode == PHY_INTERFACE_MODE_SGMII) { if (tse_pcs_init(dwmac->pcs.tse_pcs_base, &dwmac->pcs) != 0) { dev_err(dwmac->dev, "Unable to initialize TSE PCS"); @@ -321,6 +322,15 @@ static int socfpga_dwmac_probe(struct platform_device *pdev) goto err_remove_config_dt; } + dwmac->stmmac_ocp_rst = devm_reset_control_get_optional(dev, "stmmaceth-ocp"); + if (IS_ERR(dwmac->stmmac_ocp_rst)) { + ret = PTR_ERR(dwmac->stmmac_ocp_rst); + dev_err(dev, "error getting reset control of ocp %d\n", ret); + goto err_remove_config_dt; + } + + reset_control_deassert(dwmac->stmmac_ocp_rst); + ret = socfpga_dwmac_parse_data(dwmac, dev); if (ret) { dev_err(dev, "Unable to parse OF data\n"); From 313d65a648d07bd5081205b4655750efd8f67252 Mon Sep 17 00:00:00 2001 From: Max Gurtuvoy Date: Tue, 19 Jun 2018 15:45:33 +0300 Subject: [PATCH 696/970] nvmet: reset keep alive timer in controller enable [ Upstream commit d68a90e148f5a82aa67654c5012071e31c0e4baa ] Controllers that are not yet enabled should not really enforce keep alive timeouts, but we still want to track a timeout and cleanup in case a host died before it enabled the controller. Hence, simply reset the keep alive timer when the controller is enabled. Suggested-by: Max Gurtovoy Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/target/core.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 64b40a12abcf..f12753eb3216 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -578,6 +578,14 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl) } ctrl->csts = NVME_CSTS_RDY; + + /* + * Controllers that are not yet enabled should not really enforce the + * keep alive timeout, but we still want to track a timeout and cleanup + * in case a host died before it enabled the controller. Hence, simply + * reset the keep alive timer when the controller is enabled. + */ + mod_delayed_work(system_wq, &ctrl->ka_work, ctrl->kato * HZ); } static void nvmet_clear_ctrl(struct nvmet_ctrl *ctrl) From 8cc70515536870ebcd9d81c54fc4d6451e4e6912 Mon Sep 17 00:00:00 2001 From: Alexey Brodkin Date: Wed, 29 Nov 2017 11:21:45 +0300 Subject: [PATCH 697/970] ARC: Enable machine_desc->init_per_cpu for !CONFIG_SMP [ Upstream commit 2f24ef7413a4d91657ef04e77c27ce0b313e6c95 ] machine_desc->init_per_cpu() hook is supposed to be per cpu initialization and would seem to apply equally to UP and/or SMP. Infact the comment in header file seems to suggest it works for UP too, which was not the case and this patch. This enables !CONFIG_SMP build for platforms such as hsdk. Signed-off-by: Alexey Brodkin Signed-off-by: Vineet Gupta [vgupta: trimmeed changelog] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arc/include/asm/mach_desc.h | 2 -- arch/arc/kernel/irq.c | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/arc/include/asm/mach_desc.h b/arch/arc/include/asm/mach_desc.h index c28e6c347b49..871f3cb16af9 100644 --- a/arch/arc/include/asm/mach_desc.h +++ b/arch/arc/include/asm/mach_desc.h @@ -34,9 +34,7 @@ struct machine_desc { const char *name; const char **dt_compat; void (*init_early)(void); -#ifdef CONFIG_SMP void (*init_per_cpu)(unsigned int); -#endif void (*init_machine)(void); void (*init_late)(void); diff --git a/arch/arc/kernel/irq.c b/arch/arc/kernel/irq.c index 538b36afe89e..62b185057c04 100644 --- a/arch/arc/kernel/irq.c +++ b/arch/arc/kernel/irq.c @@ -31,10 +31,10 @@ void __init init_IRQ(void) /* a SMP H/w block could do IPI IRQ request here */ if (plat_smp_ops.init_per_cpu) plat_smp_ops.init_per_cpu(smp_processor_id()); +#endif if (machine_desc->init_per_cpu) machine_desc->init_per_cpu(smp_processor_id()); -#endif } /* From 69534ac30288f5e2d61c081efad27f136d4ee83f Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Wed, 20 Jun 2018 10:03:56 +0200 Subject: [PATCH 698/970] net: davinci_emac: match the mdio device against its compatible if possible [ Upstream commit ea0820bb771175c7d4192fc6f5b5c56b3c6d5239 ] Device tree based systems without of_dev_auxdata will have the mdio device named differently than "davinci_mdio(.0)". In this case use the device's parent's compatible string for matching Signed-off-by: Bartosz Golaszewski Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ti/davinci_emac.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c index 481c7bf0395b..413cf14dbacd 100644 --- a/drivers/net/ethernet/ti/davinci_emac.c +++ b/drivers/net/ethernet/ti/davinci_emac.c @@ -1387,6 +1387,10 @@ static int emac_devioctl(struct net_device *ndev, struct ifreq *ifrq, int cmd) static int match_first_device(struct device *dev, void *data) { + if (dev->parent && dev->parent->of_node) + return of_device_is_compatible(dev->parent->of_node, + "ti,davinci_mdio"); + return !strncmp(dev_name(dev), "davinci_mdio", 12); } From ee838b3f825a397deff94936e0d5537c1ddcde38 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 1 Jun 2018 17:06:28 +0200 Subject: [PATCH 699/970] KVM: arm/arm64: Drop resource size check for GICV window [ Upstream commit ba56bc3a0786992755e6804fbcbdc60ef6cfc24c ] When booting a 64 KB pages kernel on a ACPI GICv3 system that implements support for v2 emulation, the following warning is produced GICV size 0x2000 not a multiple of page size 0x10000 and support for v2 emulation is disabled, preventing GICv2 VMs from being able to run on such hosts. The reason is that vgic_v3_probe() performs a sanity check on the size of the window (it should be a multiple of the page size), while the ACPI MADT parsing code hardcodes the size of the window to 8 KB. This makes sense, considering that ACPI does not bother to describe the size in the first place, under the assumption that platforms implementing ACPI will follow the architecture and not put anything else in the same 64 KB window. So let's just drop the sanity check altogether, and assume that the window is at least 64 KB in size. Fixes: 909777324588 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init") Signed-off-by: Ard Biesheuvel Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- virt/kvm/arm/vgic/vgic-v3.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index f1320063db28..c7924718990e 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -337,11 +337,6 @@ int vgic_v3_probe(const struct gic_kvm_info *info) pr_warn("GICV physical address 0x%llx not page aligned\n", (unsigned long long)info->vcpu.start); kvm_vgic_global_state.vcpu_base = 0; - } else if (!PAGE_ALIGNED(resource_size(&info->vcpu))) { - pr_warn("GICV size 0x%llx not a multiple of page size 0x%lx\n", - (unsigned long long)resource_size(&info->vcpu), - PAGE_SIZE); - kvm_vgic_global_state.vcpu_base = 0; } else { kvm_vgic_global_state.vcpu_base = info->vcpu.start; kvm_vgic_global_state.can_emulate_gicv2 = true; From a8b8e5276788c998dcaeff7c4fcf1755d07af0c3 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 4 Apr 2018 14:06:30 -0400 Subject: [PATCH 700/970] locking/lockdep: Do not record IRQ state within lockdep code [ Upstream commit fcc784be837714a9173b372ff9fb9b514590dad9 ] While debugging where things were going wrong with mapping enabling/disabling interrupts with the lockdep state and actual real enabling and disabling interrupts, I had to silent the IRQ disabling/enabling in debug_check_no_locks_freed() because it was always showing up as it was called before the splat was. Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed() but for all internal lockdep functions, as they hide useful information about where interrupts were used incorrectly last. Signed-off-by: Steven Rostedt (VMware) Cc: Andrew Morton Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/locking/lockdep.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 6599c7f3071d..61a15e538435 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1240,11 +1240,11 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class) this.parent = NULL; this.class = class; - local_irq_save(flags); + raw_local_irq_save(flags); arch_spin_lock(&lockdep_lock); ret = __lockdep_count_forward_deps(&this); arch_spin_unlock(&lockdep_lock); - local_irq_restore(flags); + raw_local_irq_restore(flags); return ret; } @@ -1267,11 +1267,11 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class) this.parent = NULL; this.class = class; - local_irq_save(flags); + raw_local_irq_save(flags); arch_spin_lock(&lockdep_lock); ret = __lockdep_count_backward_deps(&this); arch_spin_unlock(&lockdep_lock); - local_irq_restore(flags); + raw_local_irq_restore(flags); return ret; } @@ -4273,7 +4273,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len) if (unlikely(!debug_locks)) return; - local_irq_save(flags); + raw_local_irq_save(flags); for (i = 0; i < curr->lockdep_depth; i++) { hlock = curr->held_locks + i; @@ -4284,7 +4284,7 @@ void debug_check_no_locks_freed(const void *mem_from, unsigned long mem_len) print_freed_lock_bug(curr, mem_from, mem_from + mem_len, hlock); break; } - local_irq_restore(flags); + raw_local_irq_restore(flags); } EXPORT_SYMBOL_GPL(debug_check_no_locks_freed); From f12a825bbd39600c72a833c42a8f4621c351530d Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Thu, 21 Jun 2018 19:49:36 +0800 Subject: [PATCH 701/970] ipv6: mcast: fix unsolicited report interval after receiving querys [ Upstream commit 6c6da92808442908287fae8ebb0ca041a52469f4 ] After recieving MLD querys, we update idev->mc_maxdelay with max_delay from query header. This make the later unsolicited reports have the same interval with mc_maxdelay, which means we may send unsolicited reports with long interval time instead of default configured interval time. Also as we will not call ipv6_mc_reset() after device up. This issue will be there even after leave the group and join other groups. Fixes: fc4eba58b4c14 ("ipv6: make unsolicited report intervals configurable for mld") Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv6/mcast.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 918c161e5b55..6c54c76847bf 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -2084,7 +2084,8 @@ void ipv6_mc_dad_complete(struct inet6_dev *idev) mld_send_initial_cr(idev); idev->mc_dad_count--; if (idev->mc_dad_count) - mld_dad_start_timer(idev, idev->mc_maxdelay); + mld_dad_start_timer(idev, + unsolicited_report_interval(idev)); } } @@ -2096,7 +2097,8 @@ static void mld_dad_timer_expire(unsigned long data) if (idev->mc_dad_count) { idev->mc_dad_count--; if (idev->mc_dad_count) - mld_dad_start_timer(idev, idev->mc_maxdelay); + mld_dad_start_timer(idev, + unsolicited_report_interval(idev)); } in6_dev_put(idev); } @@ -2454,7 +2456,8 @@ static void mld_ifc_timer_expire(unsigned long data) if (idev->mc_ifc_count) { idev->mc_ifc_count--; if (idev->mc_ifc_count) - mld_ifc_start_timer(idev, idev->mc_maxdelay); + mld_ifc_start_timer(idev, + unsolicited_report_interval(idev)); } in6_dev_put(idev); } From ebc6dcb6de73781730b64237f4ace0952027da95 Mon Sep 17 00:00:00 2001 From: Casey Schaufler Date: Fri, 22 Jun 2018 10:54:45 -0700 Subject: [PATCH 702/970] Smack: Mark inode instant in smack_task_to_inode [ Upstream commit 7b4e88434c4e7982fb053c49657e1c8bbb8692d9 ] Smack: Mark inode instant in smack_task_to_inode /proc clean-up in commit 1bbc55131e59bd099fdc568d3aa0b42634dbd188 resulted in smack_task_to_inode() being called before smack_d_instantiate. This resulted in the smk_inode value being ignored, even while present for files in /proc/self. Marking the inode as instant here fixes that. Signed-off-by: Casey Schaufler Signed-off-by: James Morris Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- security/smack/smack_lsm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index a8a7fbc377f6..ca3ea985c100 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -2307,6 +2307,7 @@ static void smack_task_to_inode(struct task_struct *p, struct inode *inode) struct smack_known *skp = smk_of_task_struct(p); isp->smk_inode = skp; + isp->smk_flags |= SMK_INODE_INSTANT; } /* From b5f6a0ab2a21bc4872bd7bdae5dc0ae1f6cbc7a6 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Sat, 2 Jun 2018 17:26:34 +0200 Subject: [PATCH 703/970] batman-adv: Fix bat_ogm_iv best gw refcnt after netlink dump [ Upstream commit b5685d2687d6612adf5eac519eb7008f74dfd1ec ] A reference for the best gateway is taken when the list of gateways in the mesh is sent via netlink. This is necessary to check whether the currently dumped entry is the currently selected gateway or not. This information is then transferred as flag BATADV_ATTR_FLAG_BEST. After the comparison of the current entry is done, batadv_iv_gw_dump_entry() has to decrease the reference counter again. Otherwise the reference will be held and thus prevents a proper shutdown of the batman-adv interfaces (and some of the interfaces enslaved in it). Fixes: efb766af06e3 ("batman-adv: add B.A.T.M.A.N. IV bat_gw_dump implementations") Reported-by: Andreas Ziegler Tested-by: Andreas Ziegler Signed-off-by: Sven Eckelmann Acked-by: Marek Lindner Signed-off-by: Simon Wunderlich Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/batman-adv/bat_iv_ogm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 946f1c269b1f..1ae8c59fcb2d 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -2704,7 +2704,7 @@ static int batadv_iv_gw_dump_entry(struct sk_buff *msg, u32 portid, u32 seq, { struct batadv_neigh_ifinfo *router_ifinfo = NULL; struct batadv_neigh_node *router; - struct batadv_gw_node *curr_gw; + struct batadv_gw_node *curr_gw = NULL; int ret = 0; void *hdr; @@ -2752,6 +2752,8 @@ static int batadv_iv_gw_dump_entry(struct sk_buff *msg, u32 portid, u32 seq, ret = 0; out: + if (curr_gw) + batadv_gw_node_put(curr_gw); if (router_ifinfo) batadv_neigh_ifinfo_put(router_ifinfo); if (router) From 40ae32c6c487c1b4d4d99c1ba41297aaa630cbed Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Sat, 2 Jun 2018 17:26:35 +0200 Subject: [PATCH 704/970] batman-adv: Fix bat_v best gw refcnt after netlink dump [ Upstream commit 9713cb0cf19f1cec6c007e3b37be0697042b6720 ] A reference for the best gateway is taken when the list of gateways in the mesh is sent via netlink. This is necessary to check whether the currently dumped entry is the currently selected gateway or not. This information is then transferred as flag BATADV_ATTR_FLAG_BEST. After the comparison of the current entry is done, batadv_v_gw_dump_entry() has to decrease the reference counter again. Otherwise the reference will be held and thus prevents a proper shutdown of the batman-adv interfaces (and some of the interfaces enslaved in it). Fixes: b71bb6f924fe ("batman-adv: add B.A.T.M.A.N. V bat_gw_dump implementations") Signed-off-by: Sven Eckelmann Acked-by: Marek Lindner Signed-off-by: Simon Wunderlich Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/batman-adv/bat_v.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/batman-adv/bat_v.c b/net/batman-adv/bat_v.c index ed4ddf2059a6..4348118e7eac 100644 --- a/net/batman-adv/bat_v.c +++ b/net/batman-adv/bat_v.c @@ -919,7 +919,7 @@ static int batadv_v_gw_dump_entry(struct sk_buff *msg, u32 portid, u32 seq, { struct batadv_neigh_ifinfo *router_ifinfo = NULL; struct batadv_neigh_node *router; - struct batadv_gw_node *curr_gw; + struct batadv_gw_node *curr_gw = NULL; int ret = 0; void *hdr; @@ -987,6 +987,8 @@ static int batadv_v_gw_dump_entry(struct sk_buff *msg, u32 portid, u32 seq, ret = 0; out: + if (curr_gw) + batadv_gw_node_put(curr_gw); if (router_ifinfo) batadv_neigh_ifinfo_put(router_ifinfo); if (router) From 6c500d00cc7b1ca915a10dc771073d962eb7bdea Mon Sep 17 00:00:00 2001 From: Ganesh Goudar Date: Sat, 23 Jun 2018 20:28:26 +0530 Subject: [PATCH 705/970] cxgb4: when disabling dcb set txq dcb priority to 0 [ Upstream commit 5ce36338a30f9814fc4824f9fe6c20cd83d872c7 ] When we are disabling DCB, store "0" in txq->dcb_prio since that's used for future TX Work Request "OVLAN_IDX" values. Setting non zero priority upon disabling DCB would halt the traffic. Reported-by: AMG Zollner Robert CC: David Ahern Signed-off-by: Casey Leedom Signed-off-by: Ganesh Goudar Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index c395b21cb57b..c71a52a2f801 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -274,7 +274,7 @@ static void dcb_tx_queue_prio_enable(struct net_device *dev, int enable) "Can't %s DCB Priority on port %d, TX Queue %d: err=%d\n", enable ? "set" : "unset", pi->port_id, i, -err); else - txq->dcb_prio = value; + txq->dcb_prio = enable ? value : 0; } } From b4164f8c69cf6cd11e6859dd0065826fed4cb779 Mon Sep 17 00:00:00 2001 From: Tomasz Duszynski Date: Mon, 28 May 2018 17:38:59 +0200 Subject: [PATCH 706/970] iio: pressure: bmp280: fix relative humidity unit [ Upstream commit 13399ff25f179811ce9c1df1523eb39f9e4a4772 ] According to IIO ABI relative humidity reading should be returned in milli percent. This patch addresses that by applying proper scaling and returning integer instead of fractional format type specifier. Note that the fixes tag is before the driver was heavily refactored to introduce spi support, so the patch won't apply that far back. Signed-off-by: Tomasz Duszynski Fixes: 14beaa8f5ab1 ("iio: pressure: bmp280: add humidity support") Acked-by: Matt Ranostay Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/iio/pressure/bmp280-core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iio/pressure/bmp280-core.c b/drivers/iio/pressure/bmp280-core.c index 19aa957bd454..c9263acc190b 100644 --- a/drivers/iio/pressure/bmp280-core.c +++ b/drivers/iio/pressure/bmp280-core.c @@ -347,10 +347,9 @@ static int bmp280_read_humid(struct bmp280_data *data, int *val, int *val2) adc_humidity = be16_to_cpu(tmp); comp_humidity = bmp280_compensate_humidity(data, adc_humidity); - *val = comp_humidity; - *val2 = 1024; + *val = comp_humidity * 1000 / 1024; - return IIO_VAL_FRACTIONAL; + return IIO_VAL_INT; } static int bmp280_read_raw(struct iio_dev *indio_dev, From 89efc936f85d81ecb9e0ed225501a122fa6b9144 Mon Sep 17 00:00:00 2001 From: Michael Trimarchi Date: Wed, 30 May 2018 11:06:34 +0200 Subject: [PATCH 707/970] brcmfmac: stop watchdog before detach and free everything [ Upstream commit 373c83a801f15b1e3d02d855fad89112bd4ccbe0 ] Using built-in in kernel image without a firmware in filesystem or in the kernel image can lead to a kernel NULL pointer deference. Watchdog need to be stopped in brcmf_sdio_remove The system is going down NOW! [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8 Sent SIGTERM to all processes [ 1348.121412] Mem abort info: [ 1348.126962] ESR = 0x96000004 [ 1348.130023] Exception class = DABT (current EL), IL = 32 bits [ 1348.135948] SET = 0, FnV = 0 [ 1348.138997] EA = 0, S1PTW = 0 [ 1348.142154] Data abort info: [ 1348.145045] ISV = 0, ISS = 0x00000004 [ 1348.148884] CM = 0, WnR = 0 [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____) [ 1348.158475] [00000000000002f8] pgd=0000000000000000 [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1348.168927] Modules linked in: ipv6 [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted 4.17.0-rc5-next-20180517 #18 [ 1348.180757] Hardware name: Amarula A64-Relic (DT) [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO) [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20 [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290 [ 1348.200253] sp : ffff00000b85be30 [ 1348.203561] x29: ffff00000b85be30 x28: 0000000000000000 [ 1348.208868] x27: ffff00000b6cb918 x26: ffff80003b990638 [ 1348.214176] x25: ffff0000087b1a20 x24: ffff80003b94f800 [ 1348.219483] x23: ffff000008e620c8 x22: ffff000008f0b660 [ 1348.224790] x21: ffff000008c6a858 x20: 00000000fffffe00 [ 1348.230097] x19: ffff80003b94f800 x18: 0000000000000001 [ 1348.235404] x17: 0000ffffab2e8a74 x16: ffff0000080d7de8 [ 1348.240711] x15: 0000000000000000 x14: 0000000000000400 [ 1348.246018] x13: 0000000000000400 x12: 0000000000000001 [ 1348.251324] x11: 00000000000002c4 x10: 0000000000000a10 [ 1348.256631] x9 : ffff00000b85bc40 x8 : ffff80003be11870 [ 1348.261937] x7 : ffff80003dfc7308 x6 : 000000078ff08b55 [ 1348.267243] x5 : 00000139e1058400 x4 : 0000000000000000 [ 1348.272550] x3 : dead000000000100 x2 : 958f2788d6618100 [ 1348.277856] x1 : 00000000fffffe00 x0 : 0000000000000000 Signed-off-by: Michael Trimarchi Acked-by: Arend van Spriel Tested-by: Andy Shevchenko Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index d46f086e6360..de52d826eb24 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -4229,6 +4229,13 @@ void brcmf_sdio_remove(struct brcmf_sdio *bus) brcmf_dbg(TRACE, "Enter\n"); if (bus) { + /* Stop watchdog task */ + if (bus->watchdog_tsk) { + send_sig(SIGTERM, bus->watchdog_tsk, 1); + kthread_stop(bus->watchdog_tsk); + bus->watchdog_tsk = NULL; + } + /* De-register interrupt handler */ brcmf_sdiod_intr_unregister(bus->sdiodev); From 8982dcff574304b26b3a22e046e08f526daa2307 Mon Sep 17 00:00:00 2001 From: Daniel Mack Date: Sun, 17 Jun 2018 13:53:09 +0200 Subject: [PATCH 708/970] ARM: dts: am437x: make edt-ft5x06 a wakeup source [ Upstream commit 49a6ec5b807ea4ad7ebe1f58080ebb8497cb2d2c ] The touchscreen driver no longer configures the device as wakeup source by default. A "wakeup-source" property is needed. Signed-off-by: Daniel Mack Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/am437x-sk-evm.dts | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/boot/dts/am437x-sk-evm.dts b/arch/arm/boot/dts/am437x-sk-evm.dts index 319d94205350..6482ada22015 100644 --- a/arch/arm/boot/dts/am437x-sk-evm.dts +++ b/arch/arm/boot/dts/am437x-sk-evm.dts @@ -533,6 +533,8 @@ touchscreen-size-x = <480>; touchscreen-size-y = <272>; + + wakeup-source; }; tlv320aic3106: tlv320aic3106@1b { From 2f483f92ce2a53b27ebead4a2ee903ec83822904 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 25 Jun 2018 11:13:59 +0200 Subject: [PATCH 709/970] ALSA: seq: Fix UBSAN warning at SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT ioctl [ Upstream commit c9a4c63888dbb79ce4d068ca1dd8b05bc3f156b1 ] The kernel may spew a WARNING with UBSAN undefined behavior at handling ALSA sequencer ioctl SNDRV_SEQ_IOCTL_QUERY_NEXT_CLIENT: UBSAN: Undefined behaviour in sound/core/seq/seq_clientmgr.c:2007:14 signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x122/0x1c8 lib/dump_stack.c:113 ubsan_epilogue+0x12/0x86 lib/ubsan.c:159 handle_overflow+0x1c2/0x21f lib/ubsan.c:190 __ubsan_handle_add_overflow+0x2a/0x31 lib/ubsan.c:198 snd_seq_ioctl_query_next_client+0x1ac/0x1d0 sound/core/seq/seq_clientmgr.c:2007 snd_seq_ioctl+0x264/0x3d0 sound/core/seq/seq_clientmgr.c:2144 .... It happens only when INT_MAX is passed there, as we're incrementing it unconditionally. So the fix is trivial, check the value with INT_MAX. Although the bug itself is fairly harmless, it's better to fix it so that fuzzers won't hit this again later. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200211 Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/core/seq/seq_clientmgr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c index ecd1c5fc8db8..965473d4129c 100644 --- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -2002,7 +2002,8 @@ static int snd_seq_ioctl_query_next_client(struct snd_seq_client *client, struct snd_seq_client *cptr = NULL; /* search for next client */ - info->client++; + if (info->client < INT_MAX) + info->client++; if (info->client < 0) info->client = 0; for (; info->client < SNDRV_SEQ_MAX_CLIENTS; info->client++) { From 613c5948d9193c89e77ba28423774c46b1d45f3d Mon Sep 17 00:00:00 2001 From: Dongjiu Geng Date: Thu, 21 Jun 2018 16:19:43 +0300 Subject: [PATCH 710/970] usb: xhci: remove the code build warning [ Upstream commit 36eb93509c45d0bdbd8d09a01ab9d857972f5963 ] Initialize the 'err' variate to remove the build warning, the warning is shown as below: drivers/usb/host/xhci-tegra.c: In function 'tegra_xusb_mbox_thread': drivers/usb/host/xhci-tegra.c:552:6: warning: 'err' may be used uninitialized in this function [-Wuninitialized] drivers/usb/host/xhci-tegra.c:482:6: note: 'err' was declared here Fixes: e84fce0f8837 ("usb: xhci: Add NVIDIA Tegra XUSB controller driver") Signed-off-by: Dongjiu Geng Acked-by: Thierry Reding Acked-by: Jon Hunter Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-tegra.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-tegra.c b/drivers/usb/host/xhci-tegra.c index a59fafb4b329..97d57a94776a 100644 --- a/drivers/usb/host/xhci-tegra.c +++ b/drivers/usb/host/xhci-tegra.c @@ -482,7 +482,7 @@ static void tegra_xusb_mbox_handle(struct tegra_xusb *tegra, unsigned long mask; unsigned int port; bool idle, enable; - int err; + int err = 0; memset(&rsp, 0, sizeof(rsp)); From d5ff711a000e2682ce4beed95daaa5256a0991b2 Mon Sep 17 00:00:00 2001 From: Ajay Gupta Date: Thu, 21 Jun 2018 16:19:45 +0300 Subject: [PATCH 711/970] usb: xhci: increase CRS timeout value [ Upstream commit 305886ca87be480ae159908c2affd135c04215cf ] Some controllers take almost 55ms to complete controller restore state (CRS). There is no timeout limit mentioned in xhci specification so fixing the issue by increasing the timeout limit to 100ms [reformat code comment -Mathias] Signed-off-by: Ajay Gupta Signed-off-by: Nagaraj Annaiah Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 81f25213cb41..c190fabd1875 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -1056,8 +1056,13 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) command = readl(&xhci->op_regs->command); command |= CMD_CRS; writel(command, &xhci->op_regs->command); + /* + * Some controllers take up to 55+ ms to complete the controller + * restore so setting the timeout to 100ms. Xhci specification + * doesn't mention any timeout value. + */ if (xhci_handshake(&xhci->op_regs->status, - STS_RESTORE, 0, 10 * 1000)) { + STS_RESTORE, 0, 100 * 1000)) { xhci_warn(xhci, "WARN: xHC restore state timeout\n"); spin_unlock_irq(&xhci->lock); return -ETIMEDOUT; From 5113cb753af23b4d0e89004ccf1cda3446bf6ac3 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Thu, 7 Jun 2018 15:54:48 +0200 Subject: [PATCH 712/970] NFC: pn533: Fix wrong GFP flag usage [ Upstream commit ecc443c03fb14abfb8a6af5e3b2d43b5257e60f2 ] pn533_recv_response() is an urb completion handler, so it must use GFP_ATOMIC. pn533_usb_send_frame() OTOH runs from a regular sleeping context, so the pn533_submit_urb_for_response() there (and only there) can use the regular GFP_KERNEL flags. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1514134 Fixes: 9815c7cf22da ("NFC: pn533: Separate physical layer from ...") Cc: Michael Thalmeier Signed-off-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/nfc/pn533/usb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nfc/pn533/usb.c b/drivers/nfc/pn533/usb.c index 33ed78be2750..3a897f57cac5 100644 --- a/drivers/nfc/pn533/usb.c +++ b/drivers/nfc/pn533/usb.c @@ -71,7 +71,7 @@ static void pn533_recv_response(struct urb *urb) struct sk_buff *skb = NULL; if (!urb->status) { - skb = alloc_skb(urb->actual_length, GFP_KERNEL); + skb = alloc_skb(urb->actual_length, GFP_ATOMIC); if (!skb) { nfc_err(&phy->udev->dev, "failed to alloc memory\n"); } else { @@ -180,7 +180,7 @@ static int pn533_usb_send_frame(struct pn533 *dev, if (dev->protocol_type == PN533_PROTO_REQ_RESP) { /* request for response for sent packet directly */ - rc = pn533_submit_urb_for_response(phy, GFP_ATOMIC); + rc = pn533_submit_urb_for_response(phy, GFP_KERNEL); if (rc) goto error; } else if (dev->protocol_type == PN533_PROTO_REQ_ACK_RESP) { From 241ad31f0944306c9af526cf22b86e6eebf9e45e Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Mon, 11 Jun 2018 09:31:53 +0200 Subject: [PATCH 713/970] perf test session topology: Fix test on s390 [ Upstream commit b930e62ecd362843002bdf84c2940439822af321 ] On s390 this test case fails because the socket identifiction numbers assigned to the CPU are higher than the CPU identification numbers. F/ix this by adding the platform architecture into the perf data header flag information. This helps identifiing the test platform and handles s390 specifics in process_cpu_topology(). Before: [root@p23lp27 perf]# perf test -vvvvv -F 39 39: Session topology : --- start --- templ file: /tmp/perf-test-iUv755 socket_id number is too big.You may need to upgrade the perf tool. ---- end ---- Session topology: Skip [root@p23lp27 perf]# After: [root@p23lp27 perf]# perf test -vvvvv -F 39 39: Session topology : --- start --- templ file: /tmp/perf-test-8X8VTs CPU 0, core 0, socket 6 CPU 1, core 1, socket 3 ---- end ---- Session topology: Ok [root@p23lp27 perf]# Signed-off-by: Thomas Richter Reviewed-by: Hendrik Brueckner Cc: Heiko Carstens Cc: Martin Schwidefsky Fixes: c84974ed9fb6 ("perf test: Add entry to test cpu topology") Link: http://lkml.kernel.org/r/20180611073153.15592-2-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/tests/topology.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/tests/topology.c b/tools/perf/tests/topology.c index 98fe69ac553c..3e7cdefb0817 100644 --- a/tools/perf/tests/topology.c +++ b/tools/perf/tests/topology.c @@ -42,6 +42,7 @@ static int session_write_header(char *path) perf_header__set_feat(&session->header, HEADER_CPU_TOPOLOGY); perf_header__set_feat(&session->header, HEADER_NRCPUS); + perf_header__set_feat(&session->header, HEADER_ARCH); session->header.data_size += DATA_SIZE; From dcc0fbf12d91b05a640db8b40f7a9ffdbae53f7e Mon Sep 17 00:00:00 2001 From: Sandipan Das Date: Mon, 11 Jun 2018 16:10:49 +0530 Subject: [PATCH 714/970] perf report powerpc: Fix crash if callchain is empty [ Upstream commit 143c99f6ac6812d23254e80844d6e34be897d3e1 ] For some cases, the callchain provided by the kernel may be empty. So, the callchain ip filtering code will cause a crash if we do not check whether the struct ip_callchain pointer is NULL before accessing any members. This can be observed on a powerpc64le system running Fedora 27 as shown below. # perf record -b -e cycles:u ls Before: # perf report --branch-history perf: Segmentation fault -------- backtrace -------- perf[0x1027615c] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff856304d8] perf(arch_skip_callchain_idx+0x44)[0x10257c58] perf[0x1017f2e4] perf(thread__resolve_callchain+0x124)[0x1017ff5c] perf(sample__resolve_callchain+0xf0)[0x10172788] ... After: # perf report --branch-history Samples: 25 of event 'cycles:u', Event count (approx.): 2306870 Overhead Source:Line Symbol Shared Object + 11.60% _init+35736 [.] _init ls + 9.84% strcoll_l.c:137 [.] __strcoll_l libc-2.26.so + 9.16% memcpy.S:175 [.] __memcpy_power7 libc-2.26.so + 9.01% gconv_charset.h:54 [.] _nl_find_locale libc-2.26.so + 8.87% dl-addr.c:52 [.] _dl_addr libc-2.26.so + 8.83% _init+236 [.] _init ls ... Reported-by: Ravi Bangoria Signed-off-by: Sandipan Das Acked-by: Ravi Bangoria Cc: Jiri Olsa Cc: Naveen N. Rao Cc: Sukadev Bhattiprolu Link: http://lkml.kernel.org/r/20180611104049.11048-1-sandipan@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/arch/powerpc/util/skip-callchain-idx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/arch/powerpc/util/skip-callchain-idx.c b/tools/perf/arch/powerpc/util/skip-callchain-idx.c index 0c370f81e002..bd630c222e65 100644 --- a/tools/perf/arch/powerpc/util/skip-callchain-idx.c +++ b/tools/perf/arch/powerpc/util/skip-callchain-idx.c @@ -243,7 +243,7 @@ int arch_skip_callchain_idx(struct thread *thread, struct ip_callchain *chain) u64 ip; u64 skip_slot = -1; - if (chain->nr < 3) + if (!chain || chain->nr < 3) return skip_slot; ip = chain->ips[2]; From f6e22734fba141b631965c38d04365e430673e0c Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 20 Jun 2018 11:40:36 +0200 Subject: [PATCH 715/970] perf bench: Fix numa report output code [ Upstream commit 983107072be1a39cbde67d45cb0059138190e015 ] Currently we can hit following assert when running numa bench: $ perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0cm --thp 1 perf: bench/numa.c:1577: __bench_numa: Assertion `!(!(((wait_stat) & 0x7f) == 0))' failed. The assertion is correct, because we hit the SIGFPE in following line: Thread 2.2 "thread 0/0" received signal SIGFPE, Arithmetic exception. [Switching to Thread 0x7fffd28c6700 (LWP 11750)] 0x000.. in worker_thread (__tdata=0x7.. ) at bench/numa.c:1257 1257 td->speed_gbs = bytes_done / (td->runtime_ns / NSEC_PER_SEC) / 1e9; We don't check if the runtime is actually bigger than 1 second, and thus this might end up with zero division within FPU. Adding the check to prevent this. Signed-off-by: Jiri Olsa Cc: Alexander Shishkin Cc: David Ahern Cc: Namhyung Kim Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20180620094036.17278-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/bench/numa.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c index 23cce5e5197a..ee9565a033f4 100644 --- a/tools/perf/bench/numa.c +++ b/tools/perf/bench/numa.c @@ -1093,7 +1093,7 @@ static void *worker_thread(void *__tdata) u8 *global_data; u8 *process_data; u8 *thread_data; - u64 bytes_done; + u64 bytes_done, secs; long work_done; u32 l; struct rusage rusage; @@ -1249,7 +1249,8 @@ static void *worker_thread(void *__tdata) timersub(&stop, &start0, &diff); td->runtime_ns = diff.tv_sec * NSEC_PER_SEC; td->runtime_ns += diff.tv_usec * NSEC_PER_USEC; - td->speed_gbs = bytes_done / (td->runtime_ns / NSEC_PER_SEC) / 1e9; + secs = td->runtime_ns / NSEC_PER_SEC; + td->speed_gbs = secs ? bytes_done / secs / 1e9 : 0; getrusage(RUSAGE_THREAD, &rusage); td->system_time_ns = rusage.ru_stime.tv_sec * NSEC_PER_SEC; From f6a67684626cf03b317ccc4c6d5bac99822d1b06 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Wed, 20 Jun 2018 18:33:45 +0200 Subject: [PATCH 716/970] netfilter: nf_log: fix uninit read in nf_log_proc_dostring [ Upstream commit dffd22aed2aa1e804bccf19b30a421e89ee2ae61 ] When proc_dostring() is called with a non-zero offset in strict mode, it doesn't just write to the ->data buffer, it also reads. Make sure it doesn't read uninitialized data. Fixes: c6ac37d8d884 ("netfilter: nf_log: fix error on write NONE to [...]") Signed-off-by: Jann Horn Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_log.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c index e02fed784cd0..42938f9c467e 100644 --- a/net/netfilter/nf_log.c +++ b/net/netfilter/nf_log.c @@ -426,6 +426,10 @@ static int nf_log_proc_dostring(struct ctl_table *table, int write, if (write) { struct ctl_table tmp = *table; + /* proc_dostring() can append to existing strings, so we need to + * initialize it as an empty string. + */ + buf[0] = '\0'; tmp.data = buf; r = proc_dostring(&tmp, write, buffer, lenp, ppos); if (r) From 01d012fe089af02616c00cde45665fa6d895a4ff Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" Date: Tue, 19 Jun 2018 18:20:34 +0800 Subject: [PATCH 717/970] ceph: fix dentry leak in splice_dentry() [ Upstream commit 8b8f53af1ed9df88a4c0fbfdf3db58f62060edf3 ] In any case, d_splice_alias() does not drop reference of original dentry. Signed-off-by: "Yan, Zheng" Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/ceph/inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 4a6df2ce0f76..1f754336f801 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1077,6 +1077,7 @@ static struct dentry *splice_dentry(struct dentry *dn, struct inode *in) if (IS_ERR(realdn)) { pr_err("splice_dentry error %ld %p inode %p ino %llx.%llx\n", PTR_ERR(realdn), dn, in, ceph_vinop(in)); + dput(dn); dn = realdn; /* note realdn contains the error */ goto out; } else if (realdn) { From bc53be37d3e833a8d6af14abb17828a3ff6d942f Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 26 Jun 2018 22:17:17 -0700 Subject: [PATCH 718/970] selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs [ Upstream commit ec348020566009d3da9b99f07c05814d13969c78 ] When I wrote the sigreturn test, I didn't realize that AMD's busted IRET behavior was different from Intel's busted IRET behavior: On AMD CPUs, the CPU leaks the high 32 bits of the kernel stack pointer to certain userspace contexts. Gee, thanks. There's very little the kernel can do about it. Modify the test so it passes. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/86e7fd3564497f657de30a36da4505799eebef01.1530076529.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/x86/sigreturn.c | 46 ++++++++++++++++--------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/tools/testing/selftests/x86/sigreturn.c b/tools/testing/selftests/x86/sigreturn.c index 246145b84a12..2559e2c01793 100644 --- a/tools/testing/selftests/x86/sigreturn.c +++ b/tools/testing/selftests/x86/sigreturn.c @@ -612,19 +612,38 @@ static int test_valid_sigreturn(int cs_bits, bool use_16bit_ss, int force_ss) greg_t req = requested_regs[i], res = resulting_regs[i]; if (i == REG_TRAPNO || i == REG_IP) continue; /* don't care */ - if (i == REG_SP) { - printf("\tSP: %llx -> %llx\n", (unsigned long long)req, - (unsigned long long)res); + if (i == REG_SP) { /* - * In many circumstances, the high 32 bits of rsp - * are zeroed. For example, we could be a real - * 32-bit program, or we could hit any of a number - * of poorly-documented IRET or segmented ESP - * oddities. If this happens, it's okay. + * If we were using a 16-bit stack segment, then + * the kernel is a bit stuck: IRET only restores + * the low 16 bits of ESP/RSP if SS is 16-bit. + * The kernel uses a hack to restore bits 31:16, + * but that hack doesn't help with bits 63:32. + * On Intel CPUs, bits 63:32 end up zeroed, and, on + * AMD CPUs, they leak the high bits of the kernel + * espfix64 stack pointer. There's very little that + * the kernel can do about it. + * + * Similarly, if we are returning to a 32-bit context, + * the CPU will often lose the high 32 bits of RSP. */ - if (res == (req & 0xFFFFFFFF)) - continue; /* OK; not expected to work */ + + if (res == req) + continue; + + if (cs_bits != 64 && ((res ^ req) & 0xFFFFFFFF) == 0) { + printf("[NOTE]\tSP: %llx -> %llx\n", + (unsigned long long)req, + (unsigned long long)res); + continue; + } + + printf("[FAIL]\tSP mismatch: requested 0x%llx; got 0x%llx\n", + (unsigned long long)requested_regs[i], + (unsigned long long)resulting_regs[i]); + nerrs++; + continue; } bool ignore_reg = false; @@ -663,13 +682,6 @@ static int test_valid_sigreturn(int cs_bits, bool use_16bit_ss, int force_ss) } if (requested_regs[i] != resulting_regs[i] && !ignore_reg) { - /* - * SP is particularly interesting here. The - * usual cause of failures is that we hit the - * nasty IRET case of returning to a 16-bit SS, - * in which case bits 16:31 of the *kernel* - * stack pointer persist in ESP. - */ printf("[FAIL]\tReg %d mismatch: requested 0x%llx; got 0x%llx\n", i, (unsigned long long)requested_regs[i], (unsigned long long)resulting_regs[i]); From 7aa92621b27740b2542d48023d82b7935a1751d7 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 26 Jun 2018 22:17:18 -0700 Subject: [PATCH 719/970] selftests/x86/sigreturn: Do minor cleanups [ Upstream commit e8a445dea219c32727016af14f847d2e8f7ebec8 ] We have short names for the requested and resulting register values. Use them instead of spelling out the whole register entry for each case. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/bb3bc1f923a2f6fe7912d22a1068fe29d6033d38.1530076529.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/x86/sigreturn.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/x86/sigreturn.c b/tools/testing/selftests/x86/sigreturn.c index 2559e2c01793..4d9dc3f2fd70 100644 --- a/tools/testing/selftests/x86/sigreturn.c +++ b/tools/testing/selftests/x86/sigreturn.c @@ -610,6 +610,7 @@ static int test_valid_sigreturn(int cs_bits, bool use_16bit_ss, int force_ss) */ for (int i = 0; i < NGREG; i++) { greg_t req = requested_regs[i], res = resulting_regs[i]; + if (i == REG_TRAPNO || i == REG_IP) continue; /* don't care */ @@ -673,18 +674,18 @@ static int test_valid_sigreturn(int cs_bits, bool use_16bit_ss, int force_ss) #endif /* Sanity check on the kernel */ - if (i == REG_CX && requested_regs[i] != resulting_regs[i]) { + if (i == REG_CX && req != res) { printf("[FAIL]\tCX (saved SP) mismatch: requested 0x%llx; got 0x%llx\n", - (unsigned long long)requested_regs[i], - (unsigned long long)resulting_regs[i]); + (unsigned long long)req, + (unsigned long long)res); nerrs++; continue; } - if (requested_regs[i] != resulting_regs[i] && !ignore_reg) { + if (req != res && !ignore_reg) { printf("[FAIL]\tReg %d mismatch: requested 0x%llx; got 0x%llx\n", - i, (unsigned long long)requested_regs[i], - (unsigned long long)resulting_regs[i]); + i, (unsigned long long)req, + (unsigned long long)res); nerrs++; } } From 680fc01a4a1eb193b97216b903ff348465b0a149 Mon Sep 17 00:00:00 2001 From: Keerthy Date: Tue, 5 Jun 2018 15:37:51 +0530 Subject: [PATCH 720/970] ARM: dts: da850: Fix interrups property for gpio [ Upstream commit 3eb1b955cd7ed1e621ace856710006c2a8a7f231 ] The intc #interrupt-cells is equal to 1. Currently gpio node has 2 cells per IRQ which is wrong. Remove the additional cell for each of the interrupts. Signed-off-by: Keerthy Fixes: 2e38b946dc54 ("ARM: davinci: da850: add GPIO DT node") Signed-off-by: Sekhar Nori Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/da850.dtsi | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi index f79e1b91c680..51ab92fa7a6a 100644 --- a/arch/arm/boot/dts/da850.dtsi +++ b/arch/arm/boot/dts/da850.dtsi @@ -377,11 +377,7 @@ gpio-controller; #gpio-cells = <2>; reg = <0x226000 0x1000>; - interrupts = <42 IRQ_TYPE_EDGE_BOTH - 43 IRQ_TYPE_EDGE_BOTH 44 IRQ_TYPE_EDGE_BOTH - 45 IRQ_TYPE_EDGE_BOTH 46 IRQ_TYPE_EDGE_BOTH - 47 IRQ_TYPE_EDGE_BOTH 48 IRQ_TYPE_EDGE_BOTH - 49 IRQ_TYPE_EDGE_BOTH 50 IRQ_TYPE_EDGE_BOTH>; + interrupts = <42 43 44 45 46 47 48 49 50>; ti,ngpio = <144>; ti,davinci-gpio-unbanked = <0>; status = "disabled"; From 8c5fd3d56f87e0dd401ba49c6061e15ac0c0860b Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Tue, 19 Jun 2018 15:20:50 +0200 Subject: [PATCH 721/970] dmaengine: pl330: report BURST residue granularity [ Upstream commit e3f329c600033f011a978a8bc4ddb1e2e94c4f4d ] The reported residue is already calculated in BURST unit granularity, so advertise this capability properly to other devices in the system. Fixes: aee4d1fac887 ("dmaengine: pl330: improve pl330_tx_status() function") Signed-off-by: Marek Szyprowski Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/dma/pl330.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index 2c449bdacb91..b9dad4e40bb8 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2951,7 +2951,7 @@ pl330_probe(struct amba_device *adev, const struct amba_id *id) pd->src_addr_widths = PL330_DMA_BUSWIDTHS; pd->dst_addr_widths = PL330_DMA_BUSWIDTHS; pd->directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV); - pd->residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT; + pd->residue_granularity = DMA_RESIDUE_GRANULARITY_BURST; pd->max_burst = ((pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP) ? 1 : PL330_MAX_BURST); From 87aea47373e6064140506316052f22f52c9e08c0 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 22 Jun 2018 14:15:47 +0300 Subject: [PATCH 722/970] dmaengine: k3dma: Off by one in k3_of_dma_simple_xlate() [ Upstream commit c4c2b7644cc9a41f17a8cc8904efe3f66ae4c7ed ] The d->chans[] array has d->dma_requests elements so the > should be >= here. Fixes: 8e6152bc660e ("dmaengine: Add hisilicon k3 DMA engine driver") Signed-off-by: Dan Carpenter Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/dma/k3dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/k3dma.c b/drivers/dma/k3dma.c index aabcb7934b05..cd7f67b2545c 100644 --- a/drivers/dma/k3dma.c +++ b/drivers/dma/k3dma.c @@ -792,7 +792,7 @@ static struct dma_chan *k3_of_dma_simple_xlate(struct of_phandle_args *dma_spec, struct k3_dma_dev *d = ofdma->of_dma_data; unsigned int request = dma_spec->args[0]; - if (request > d->dma_requests) + if (request >= d->dma_requests) return NULL; return dma_get_slave_channel(&(d->chans[request].vc.chan)); From e303840d5030de3ef4e751985b5458c6f095a539 Mon Sep 17 00:00:00 2001 From: BingJing Chang Date: Thu, 28 Jun 2018 18:40:11 +0800 Subject: [PATCH 723/970] md/raid10: fix that replacement cannot complete recovery after reassemble [ Upstream commit bda3153998f3eb2cafa4a6311971143628eacdbc ] During assemble, the spare marked for replacement is not checked. conf->fullsync cannot be updated to be 1. As a result, recovery will treat it as a clean array. All recovering sectors are skipped. Original device is replaced with the not-recovered spare. mdadm -C /dev/md0 -l10 -n4 -pn2 /dev/loop[0123] mdadm /dev/md0 -a /dev/loop4 mdadm /dev/md0 --replace /dev/loop0 mdadm -S /dev/md0 # stop array during recovery mdadm -A /dev/md0 /dev/loop[01234] After reassemble, you can see recovery go on, but it completes immediately. In fact, recovery is not actually processed. To solve this problem, we just add the missing logics for replacment spares. (In raid1.c or raid5.c, they have already been checked.) Reported-by: Alex Chen Reviewed-by: Alex Wu Reviewed-by: Chung-Chiang Cheng Signed-off-by: BingJing Chang Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid10.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index b138b5cba286..6da66c3acd46 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -3734,6 +3734,13 @@ static int raid10_run(struct mddev *mddev) disk->rdev->saved_raid_disk < 0) conf->fullsync = 1; } + + if (disk->replacement && + !test_bit(In_sync, &disk->replacement->flags) && + disk->replacement->saved_raid_disk < 0) { + conf->fullsync = 1; + } + disk->recovery_disabled = mddev->recovery_disabled - 1; } From 15a7879dbcc7305df58ef91ea187601a1a36bf14 Mon Sep 17 00:00:00 2001 From: Bob Copeland Date: Sun, 24 Jun 2018 21:10:49 -0400 Subject: [PATCH 724/970] nl80211: relax ht operation checks for mesh [ Upstream commit 188f60ab8e787fcbb5ac9d64ede23a0070231f09 ] Commit 9757235f451c, "nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value") relaxed the range for the HT operation field in meshconf, while also adding checks requiring the non-greenfield and non-ht-sta bits to be set in certain circumstances. The latter bit is actually reserved for mesh BSSes according to Table 9-168 in 802.11-2016, so in fact it should not be set. wpa_supplicant sets these bits because the mesh and AP code share the same implementation, but authsae does not. As a result, some meshconf updates from authsae which set only the NONHT_MIXED protection bits were being rejected. In order to avoid breaking userspace by changing the rules again, simply accept the values with or without the bits set, and mask off the reserved bit to match the spec. While in here, update the 802.11-2012 reference to 802.11-2016. Fixes: 9757235f451c ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value") Cc: Masashi Honma Signed-off-by: Bob Copeland Reviewed-by: Masashi Honma Reviewed-by: Masashi Honma Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/wireless/nl80211.c | 19 +++---------------- 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 36280e114959..5b75468b5acd 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -5847,7 +5847,7 @@ do { \ nl80211_check_s32); /* * Check HT operation mode based on - * IEEE 802.11 2012 8.4.2.59 HT Operation element. + * IEEE 802.11-2016 9.4.2.57 HT Operation element. */ if (tb[NL80211_MESHCONF_HT_OPMODE]) { ht_opmode = nla_get_u16(tb[NL80211_MESHCONF_HT_OPMODE]); @@ -5857,22 +5857,9 @@ do { \ IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT)) return -EINVAL; - if ((ht_opmode & IEEE80211_HT_OP_MODE_NON_GF_STA_PRSNT) && - (ht_opmode & IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT)) - return -EINVAL; + /* NON_HT_STA bit is reserved, but some programs set it */ + ht_opmode &= ~IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT; - switch (ht_opmode & IEEE80211_HT_OP_MODE_PROTECTION) { - case IEEE80211_HT_OP_MODE_PROTECTION_NONE: - case IEEE80211_HT_OP_MODE_PROTECTION_20MHZ: - if (ht_opmode & IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT) - return -EINVAL; - break; - case IEEE80211_HT_OP_MODE_PROTECTION_NONMEMBER: - case IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED: - if (!(ht_opmode & IEEE80211_HT_OP_MODE_NON_HT_STA_PRSNT)) - return -EINVAL; - break; - } cfg->ht_opmode = ht_opmode; mask |= (1 << (NL80211_MESHCONF_HT_OPMODE - 1)); } From f17bac067e01198717b9d4f53ffb8d240d845408 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Thu, 7 Jun 2018 13:06:13 +0200 Subject: [PATCH 725/970] drm/exynos: gsc: Fix support for NV16/61, YUV420/YVU420 and YUV422 modes [ Upstream commit dd209ef809080ced903e7747ee3ef640c923a1d2 ] Fix following issues related to planar YUV pixel format configuration: - NV16/61 modes were incorrectly programmed as NV12/21, - YVU420 was programmed as YUV420 on source, - YVU420 and YUV422 were programmed as YUV420 on output. Signed-off-by: Marek Szyprowski Signed-off-by: Inki Dae Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/exynos/exynos_drm_gsc.c | 29 +++++++++++++++++-------- drivers/gpu/drm/exynos/regs-gsc.h | 1 + 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_gsc.c b/drivers/gpu/drm/exynos/exynos_drm_gsc.c index 52a9d269484e..4c81c79b15ea 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gsc.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gsc.c @@ -532,21 +532,25 @@ static int gsc_src_set_fmt(struct device *dev, u32 fmt) GSC_IN_CHROMA_ORDER_CRCB); break; case DRM_FORMAT_NV21: + cfg |= (GSC_IN_CHROMA_ORDER_CRCB | GSC_IN_YUV420_2P); + break; case DRM_FORMAT_NV61: - cfg |= (GSC_IN_CHROMA_ORDER_CRCB | - GSC_IN_YUV420_2P); + cfg |= (GSC_IN_CHROMA_ORDER_CRCB | GSC_IN_YUV422_2P); break; case DRM_FORMAT_YUV422: cfg |= GSC_IN_YUV422_3P; break; case DRM_FORMAT_YUV420: + cfg |= (GSC_IN_CHROMA_ORDER_CBCR | GSC_IN_YUV420_3P); + break; case DRM_FORMAT_YVU420: - cfg |= GSC_IN_YUV420_3P; + cfg |= (GSC_IN_CHROMA_ORDER_CRCB | GSC_IN_YUV420_3P); break; case DRM_FORMAT_NV12: + cfg |= (GSC_IN_CHROMA_ORDER_CBCR | GSC_IN_YUV420_2P); + break; case DRM_FORMAT_NV16: - cfg |= (GSC_IN_CHROMA_ORDER_CBCR | - GSC_IN_YUV420_2P); + cfg |= (GSC_IN_CHROMA_ORDER_CBCR | GSC_IN_YUV422_2P); break; default: dev_err(ippdrv->dev, "invalid target yuv order 0x%x.\n", fmt); @@ -806,18 +810,25 @@ static int gsc_dst_set_fmt(struct device *dev, u32 fmt) GSC_OUT_CHROMA_ORDER_CRCB); break; case DRM_FORMAT_NV21: - case DRM_FORMAT_NV61: cfg |= (GSC_OUT_CHROMA_ORDER_CRCB | GSC_OUT_YUV420_2P); break; + case DRM_FORMAT_NV61: + cfg |= (GSC_OUT_CHROMA_ORDER_CRCB | GSC_OUT_YUV422_2P); + break; case DRM_FORMAT_YUV422: + cfg |= GSC_OUT_YUV422_3P; + break; case DRM_FORMAT_YUV420: + cfg |= (GSC_OUT_CHROMA_ORDER_CBCR | GSC_OUT_YUV420_3P); + break; case DRM_FORMAT_YVU420: - cfg |= GSC_OUT_YUV420_3P; + cfg |= (GSC_OUT_CHROMA_ORDER_CRCB | GSC_OUT_YUV420_3P); break; case DRM_FORMAT_NV12: + cfg |= (GSC_OUT_CHROMA_ORDER_CBCR | GSC_OUT_YUV420_2P); + break; case DRM_FORMAT_NV16: - cfg |= (GSC_OUT_CHROMA_ORDER_CBCR | - GSC_OUT_YUV420_2P); + cfg |= (GSC_OUT_CHROMA_ORDER_CBCR | GSC_OUT_YUV422_2P); break; default: dev_err(ippdrv->dev, "invalid target yuv order 0x%x.\n", fmt); diff --git a/drivers/gpu/drm/exynos/regs-gsc.h b/drivers/gpu/drm/exynos/regs-gsc.h index 4704a993cbb7..16b39734115c 100644 --- a/drivers/gpu/drm/exynos/regs-gsc.h +++ b/drivers/gpu/drm/exynos/regs-gsc.h @@ -138,6 +138,7 @@ #define GSC_OUT_YUV420_3P (3 << 4) #define GSC_OUT_YUV422_1P (4 << 4) #define GSC_OUT_YUV422_2P (5 << 4) +#define GSC_OUT_YUV422_3P (6 << 4) #define GSC_OUT_YUV444 (7 << 4) #define GSC_OUT_TILE_TYPE_MASK (1 << 2) #define GSC_OUT_TILE_C_16x8 (0 << 2) From 29d33ed428503c7a7c39bedf692adafb331abfe7 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Thu, 7 Jun 2018 13:07:40 +0200 Subject: [PATCH 726/970] drm/exynos: decon5433: Fix per-plane global alpha for XRGB modes [ Upstream commit ab337fc274a1957ff0771f19e826c736253f7c39 ] Set per-plane global alpha to maximum value to get proper blending of XRGB and ARGB planes. This fixes the strange order of overlapping planes. Signed-off-by: Marek Szyprowski Signed-off-by: Inki Dae Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/exynos/exynos5433_drm_decon.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c index 6dd09c306bc1..ab6878b402e2 100644 --- a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c +++ b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c @@ -291,8 +291,8 @@ static void decon_update_plane(struct exynos_drm_crtc *crtc, COORDINATE_Y(state->crtc.y + state->crtc.h - 1); writel(val, ctx->addr + DECON_VIDOSDxB(win)); - val = VIDOSD_Wx_ALPHA_R_F(0x0) | VIDOSD_Wx_ALPHA_G_F(0x0) | - VIDOSD_Wx_ALPHA_B_F(0x0); + val = VIDOSD_Wx_ALPHA_R_F(0xff) | VIDOSD_Wx_ALPHA_G_F(0xff) | + VIDOSD_Wx_ALPHA_B_F(0xff); writel(val, ctx->addr + DECON_VIDOSDxC(win)); val = VIDOSD_Wx_ALPHA_R_F(0x0) | VIDOSD_Wx_ALPHA_G_F(0x0) | From 1db00def7103c388efdaaba96e16b0b07e7cf9c1 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Thu, 7 Jun 2018 13:07:49 +0200 Subject: [PATCH 727/970] drm/exynos: decon5433: Fix WINCONx reset value [ Upstream commit 7b7aa62c05eac9789c208b946f515983a9255d8d ] The only bits that should be preserved in decon_win_set_fmt() is WINCONx_ENWIN_F. All other bits depends on the selected pixel formats and are set by the mentioned function. Signed-off-by: Marek Szyprowski Signed-off-by: Inki Dae Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/exynos/exynos5433_drm_decon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c index ab6878b402e2..bdcc6ec3c67b 100644 --- a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c +++ b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c @@ -199,7 +199,7 @@ static void decon_win_set_pixfmt(struct decon_context *ctx, unsigned int win, unsigned long val; val = readl(ctx->addr + DECON_WINCONx(win)); - val &= ~WINCONx_BPPMODE_MASK; + val &= WINCONx_ENWIN_F; switch (fb->pixel_format) { case DRM_FORMAT_XRGB1555: From 36b0779e9e289aeaf7efc7db977b1e4e5b95aaeb Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Thu, 28 Jun 2018 23:34:58 +0200 Subject: [PATCH 728/970] bpf, s390: fix potential memleak when later bpf_jit_prog fails [ Upstream commit f605ce5eb26ac934fb8106d75d46a2c875a2bf23 ] If we would ever fail in the bpf_jit_prog() pass that writes the actual insns to the image after we got header via bpf_jit_binary_alloc() then we also need to make sure to free it through bpf_jit_binary_free() again when bailing out. Given we had prior bpf_jit_prog() passes to initially probe for clobbered registers, program size and to fill in addrs arrray for jump targets, this is more of a theoretical one, but at least make sure this doesn't break with future changes. Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Daniel Borkmann Cc: Martin Schwidefsky Acked-by: Alexei Starovoitov Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/s390/net/bpf_jit_comp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index e7ce2577f0c9..949a871e9506 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -1386,6 +1386,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) goto free_addrs; } if (bpf_jit_prog(&jit, fp)) { + bpf_jit_binary_free(header); fp = orig_fp; goto free_addrs; } From b61fc97ca5d5898c4621ef7785d7df53be5b10df Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Fri, 29 Jun 2018 13:49:54 -0500 Subject: [PATCH 729/970] PCI: xilinx: Add missing of_node_put() [ Upstream commit 8c3f9bd851a4d3acf0a0f222d4e9e41c0cd1ea8e ] The call to of_get_next_child() returns a node pointer with refcount incremented thus it must be explicitly decremented here after the last usage. Fixes: 8961def56845 ("PCI: xilinx: Add Xilinx AXI PCIe Host Bridge IP driver") Signed-off-by: Nicholas Mc Guire [lorenzo.pieralisi@arm.com: reworked commit log] Signed-off-by: Lorenzo Pieralisi Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/host/pcie-xilinx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/host/pcie-xilinx.c b/drivers/pci/host/pcie-xilinx.c index c8616fadccf1..61332f4d51c3 100644 --- a/drivers/pci/host/pcie-xilinx.c +++ b/drivers/pci/host/pcie-xilinx.c @@ -527,6 +527,7 @@ static int xilinx_pcie_init_irq_domain(struct xilinx_pcie_port *port) port->leg_domain = irq_domain_add_linear(pcie_intc_node, 4, &intx_domain_ops, port); + of_node_put(pcie_intc_node); if (!port->leg_domain) { dev_err(dev, "Failed to get a INTx IRQ domain\n"); return -ENODEV; From 3b8eeaed767a2dbb7577a10c44e05612a32711c6 Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Fri, 29 Jun 2018 13:50:10 -0500 Subject: [PATCH 730/970] PCI: xilinx-nwl: Add missing of_node_put() [ Upstream commit 342639d996f18bc0a4db2f42a84230c0a966dc94 ] The call to of_get_next_child() returns a node pointer with refcount incremented thus it must be explicitly decremented here after the last usage. Fixes: ab597d35ef11 ("PCI: xilinx-nwl: Add support for Xilinx NWL PCIe Host Controller") Signed-off-by: Nicholas Mc Guire [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/host/pcie-xilinx-nwl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/host/pcie-xilinx-nwl.c b/drivers/pci/host/pcie-xilinx-nwl.c index 43eaa4afab94..94fdd295aae2 100644 --- a/drivers/pci/host/pcie-xilinx-nwl.c +++ b/drivers/pci/host/pcie-xilinx-nwl.c @@ -532,7 +532,7 @@ static int nwl_pcie_init_irq_domain(struct nwl_pcie *pcie) INTX_NUM, &legacy_domain_ops, pcie); - + of_node_put(legacy_intc_node); if (!pcie->legacy_irq_domain) { dev_err(dev, "failed to create IRQ domain\n"); return -ENOMEM; From 21fe14fa3f42e27cd771acf108c4dd05f13a718c Mon Sep 17 00:00:00 2001 From: Sudarsana Reddy Kalluru Date: Thu, 28 Jun 2018 04:52:15 -0700 Subject: [PATCH 731/970] bnx2x: Fix receiving tx-timeout in error or recovery state. [ Upstream commit 484c016d9392786ce5c74017c206c706f29f823d ] Driver performs the internal reload when it receives tx-timeout event from the OS. Internal reload might fail in some scenarios e.g., fatal HW issues. In such cases OS still see the link, which would result in undesirable functionalities such as re-generation of tx-timeouts. The patch addresses this issue by indicating the link-down to OS when tx-timeout is detected, and keeping the link in down state till the internal reload is successful. Please consider applying it to 'net' branch. Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Ariel Elior Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 6 ++++++ drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 6 ++++++ 3 files changed, 13 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index 7dd7490fdac1..f3f2d66432ab 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1529,6 +1529,7 @@ struct bnx2x { struct link_vars link_vars; u32 link_cnt; struct bnx2x_link_report_data last_reported_link; + bool force_link_down; struct mdio_if_info mdio; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index 31287cec6e3a..2cd1dcd77559 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -1265,6 +1265,11 @@ void __bnx2x_link_report(struct bnx2x *bp) { struct bnx2x_link_report_data cur_data; + if (bp->force_link_down) { + bp->link_vars.link_up = 0; + return; + } + /* reread mf_cfg */ if (IS_PF(bp) && !CHIP_IS_E1(bp)) bnx2x_read_mf_cfg(bp); @@ -2822,6 +2827,7 @@ int bnx2x_nic_load(struct bnx2x *bp, int load_mode) bp->pending_max = 0; } + bp->force_link_down = false; if (bp->port.pmf) { rc = bnx2x_initial_phy_init(bp, load_mode); if (rc) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 554c4086b3c6..54dab4eac804 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -10279,6 +10279,12 @@ static void bnx2x_sp_rtnl_task(struct work_struct *work) bp->sp_rtnl_state = 0; smp_mb(); + /* Immediately indicate link as down */ + bp->link_vars.link_up = 0; + bp->force_link_down = true; + netif_carrier_off(bp->dev); + BNX2X_ERR("Indicating link is down due to Tx-timeout\n"); + bnx2x_nic_unload(bp, UNLOAD_NORMAL, true); bnx2x_nic_load(bp, LOAD_NORMAL); From 23458d7f976c4fedfa473cbb0d7620be4847f783 Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Thu, 28 Jun 2018 09:56:55 -0700 Subject: [PATCH 732/970] acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value [ Upstream commit c1985cefd844e26bd19673a6df8d8f0b1918c2db ] cmd_rc is passed in by reference to the acpi_nfit_ctl() function and the caller expects a value returned. However, when the package is pass through via the ND_CMD_CALL command, cmd_rc is not touched. Make sure cmd_rc is always set. Fixes: aef253382266 ("libnvdimm, nfit: centralize command status translation") Signed-off-by: Dave Jiang Signed-off-by: Dan Williams Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/nfit/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 3874eec972cd..fb73b4a9cf34 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -201,6 +201,7 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, const u8 *uuid; int rc, i; + *cmd_rc = -EINVAL; func = cmd; if (cmd == ND_CMD_CALL) { call_pkg = buf; @@ -288,6 +289,7 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, * If we return an error (like elsewhere) then caller wouldn't * be able to rely upon data returned to make calculation. */ + *cmd_rc = 0; return 0; } From b9ce3ceff7fadf319cefba6d9e8635c2f2b87176 Mon Sep 17 00:00:00 2001 From: Greg Ungerer Date: Mon, 18 Jun 2018 15:34:14 +1000 Subject: [PATCH 733/970] m68k: fix "bad page state" oops on ColdFire boot MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ecd60532e060e45c63c57ecf1c8549b1d656d34d ] Booting a ColdFire m68k core with MMU enabled causes a "bad page state" oops since commit 1d40a5ea01d5 ("mm: mark pages in use for page tables"): BUG: Bad page state in process sh pfn:01ce2 page:004fefc8 count:0 mapcount:-1024 mapping:00000000 index:0x0 flags: 0x0() raw: 00000000 00000000 00000000 fffffbff 00000000 00000100 00000200 00000000 raw: 039c4000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 22 Comm: sh Not tainted 4.17.0-07461-g1d40a5ea01d5 #13 Fix by calling pgtable_page_dtor() in our __pte_free_tlb() code path, so that the PG_table flag is cleared before we free the pte page. Note that I had to change the type of pte_free() to be static from extern. Otherwise you get a lot of warnings like this: ./arch/m68k/include/asm/mcf_pgalloc.h:80:2: warning: ‘pgtable_page_dtor’ is static but used in inline function ‘pte_free’ which is not static pgtable_page_dtor(page); ^ And making it static is consistent with our use of this in the other m68k pgalloc definitions of pte_free(). Signed-off-by: Greg Ungerer CC: Matthew Wilcox Reviewed-by: Geert Uytterhoeven Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/m68k/include/asm/mcf_pgalloc.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h index fb95aed5f428..dac7564a48da 100644 --- a/arch/m68k/include/asm/mcf_pgalloc.h +++ b/arch/m68k/include/asm/mcf_pgalloc.h @@ -43,6 +43,7 @@ extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned long address) static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t page, unsigned long address) { + pgtable_page_dtor(page); __free_page(page); } @@ -73,8 +74,9 @@ static inline struct page *pte_alloc_one(struct mm_struct *mm, return page; } -extern inline void pte_free(struct mm_struct *mm, struct page *page) +static inline void pte_free(struct mm_struct *mm, struct page *page) { + pgtable_page_dtor(page); __free_page(page); } From ce94ead62008756c64537f01f6da546c98949081 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Wed, 27 Jun 2018 17:03:45 -0500 Subject: [PATCH 734/970] objtool: Support GCC 8 '-fnoreorder-functions' [ Upstream commit 08b393d01c88aff27347ed2b1b354eb4db2f1532 ] Since the following commit: cd77849a69cf ("objtool: Fix GCC 8 cold subfunction detection for aliased functions") ... if the kernel is built with EXTRA_CFLAGS='-fno-reorder-functions', objtool can get stuck in an infinite loop. That flag causes the new GCC 8 cold subfunctions to be placed in .text instead of .text.unlikely. But it also has an unfortunate quirk: in the symbol table, the subfunction (e.g., nmi_panic.cold.7) is nested inside the parent (nmi_panic). That function overlap confuses objtool, and causes it to get into an infinite loop in next_insn_same_func(). Here's Allan's description of the loop: "Objtool iterates through the instructions in nmi_panic using next_insn_same_func. Once it reaches the end of nmi_panic at 0x534 it jumps to 0x528 as that's the start of nmi_panic.cold.7. However, since the instructions starting at 0x528 are still associated with nmi_panic objtool will get stuck in a loop, continually jumping back to 0x528 after reaching 0x534." Fix it by shortening the length of the parent function so that the functions no longer overlap. Reported-and-analyzed-by: Allan Xavier Signed-off-by: Josh Poimboeuf Cc: Allan Xavier Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/9e704c52bee651129b036be14feda317ae5606ae.1530136978.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/objtool/elf.c | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c index 4e60e105583e..0d1acb704f64 100644 --- a/tools/objtool/elf.c +++ b/tools/objtool/elf.c @@ -302,19 +302,34 @@ static int read_symbols(struct elf *elf) continue; sym->pfunc = sym->cfunc = sym; coldstr = strstr(sym->name, ".cold."); - if (coldstr) { - coldstr[0] = '\0'; - pfunc = find_symbol_by_name(elf, sym->name); - coldstr[0] = '.'; + if (!coldstr) + continue; - if (!pfunc) { - WARN("%s(): can't find parent function", - sym->name); - goto err; - } + coldstr[0] = '\0'; + pfunc = find_symbol_by_name(elf, sym->name); + coldstr[0] = '.'; - sym->pfunc = pfunc; - pfunc->cfunc = sym; + if (!pfunc) { + WARN("%s(): can't find parent function", + sym->name); + goto err; + } + + sym->pfunc = pfunc; + pfunc->cfunc = sym; + + /* + * Unfortunately, -fnoreorder-functions puts the child + * inside the parent. Remove the overlap so we can + * have sane assumptions. + * + * Note that pfunc->len now no longer matches + * pfunc->sym.st_size. + */ + if (sym->sec == pfunc->sec && + sym->offset >= pfunc->offset && + sym->offset + sym->len == pfunc->offset + pfunc->len) { + pfunc->len -= sym->len; } } } From 377c72c803e802926a947567ac30053697c413d7 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Sun, 1 Jul 2018 16:21:21 +0800 Subject: [PATCH 735/970] ipvlan: call dev_change_flags when ipvlan mode is reset [ Upstream commit 5dc2d3996a8b221c20dd0900bdad45031a572530 ] After we change the ipvlan mode from l3 to l2, or vice versa, we only reset IFF_NOARP flag, but don't flush the ARP table cache, which will cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2(). Then the message will not come out of host. Here is the reproducer on local host: ip link set eth1 up ip addr add 192.168.1.1/24 dev eth1 ip link add link eth1 ipvlan1 type ipvlan mode l3 ip netns add net1 ip link set ipvlan1 netns net1 ip netns exec net1 ip link set ipvlan1 up ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1 ip route add 192.168.2.0/24 via 192.168.1.2 ping 192.168.2.2 -c 2 ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2 ping 192.168.2.2 -c 2 Add the same configuration on remote host. After we set the mode to l2, we could find that the src/dst MAC addresses are the same on eth1: 21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64 Fix this by calling dev_change_flags(), which will call netdevice notifier with flag change info. v2: a) As pointed out by Wang Cong, check return value for dev_change_flags() when change dev flags. b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops. So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path. Reported-by: Jianlin Shi Reviewed-by: Stefano Brivio Reviewed-by: Sabrina Dubroca Fixes: 2ad7bf3638411 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ipvlan/ipvlan_main.c | 36 +++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 24eb5755604f..b299277361b7 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -63,10 +63,23 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval) { struct ipvl_dev *ipvlan; struct net_device *mdev = port->dev; - int err = 0; + unsigned int flags; + int err; ASSERT_RTNL(); if (port->mode != nval) { + list_for_each_entry(ipvlan, &port->ipvlans, pnode) { + flags = ipvlan->dev->flags; + if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S) { + err = dev_change_flags(ipvlan->dev, + flags | IFF_NOARP); + } else { + err = dev_change_flags(ipvlan->dev, + flags & ~IFF_NOARP); + } + if (unlikely(err)) + goto fail; + } if (nval == IPVLAN_MODE_L3S) { /* New mode is L3S */ err = ipvlan_register_nf_hook(); @@ -74,21 +87,28 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval) mdev->l3mdev_ops = &ipvl_l3mdev_ops; mdev->priv_flags |= IFF_L3MDEV_MASTER; } else - return err; + goto fail; } else if (port->mode == IPVLAN_MODE_L3S) { /* Old mode was L3S */ mdev->priv_flags &= ~IFF_L3MDEV_MASTER; ipvlan_unregister_nf_hook(); mdev->l3mdev_ops = NULL; } - list_for_each_entry(ipvlan, &port->ipvlans, pnode) { - if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S) - ipvlan->dev->flags |= IFF_NOARP; - else - ipvlan->dev->flags &= ~IFF_NOARP; - } port->mode = nval; } + return 0; + +fail: + /* Undo the flags changes that have been done so far. */ + list_for_each_entry_continue_reverse(ipvlan, &port->ipvlans, pnode) { + flags = ipvlan->dev->flags; + if (port->mode == IPVLAN_MODE_L3 || + port->mode == IPVLAN_MODE_L3S) + dev_change_flags(ipvlan->dev, flags | IFF_NOARP); + else + dev_change_flags(ipvlan->dev, flags & ~IFF_NOARP); + } + return err; } From 53a93eb7a8243e9a0ddd2404823a54b2d93ecd4e Mon Sep 17 00:00:00 2001 From: Jason Gerecke Date: Tue, 26 Jun 2018 09:58:02 -0700 Subject: [PATCH 736/970] HID: wacom: Correct touch maximum XY of 2nd-gen Intuos [ Upstream commit 3b8d573586d1b9dee33edf6cb6f2ca05f4bca568 ] The touch sensors on the 2nd-gen Intuos tablets don't use a 4096x4096 sensor like other similar tablets (3rd-gen Bamboo, Intuos5, etc.). The incorrect maximum XY values don't normally affect userspace since touch input from these devices is typically relative rather than absolute. It does, however, cause problems when absolute distances need to be measured, e.g. for gesture recognition. Since the resolution of the touch sensor on these devices is 10 units / mm (versus 100 for the pen sensor), the proper maximum values can be calculated by simply dividing by 10. Fixes: b5fd2a3e92 ("Input: wacom - add support for three new Intuos devices") Signed-off-by: Jason Gerecke Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/hid/wacom_wac.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/hid/wacom_wac.c b/drivers/hid/wacom_wac.c index db951c4fd6dd..b1ad378cb2a6 100644 --- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -2429,8 +2429,14 @@ void wacom_setup_device_quirks(struct wacom *wacom) if (features->type >= INTUOSHT && features->type <= BAMBOO_PT) features->device_type |= WACOM_DEVICETYPE_PAD; - features->x_max = 4096; - features->y_max = 4096; + if (features->type == INTUOSHT2) { + features->x_max = features->x_max / 10; + features->y_max = features->y_max / 10; + } + else { + features->x_max = 4096; + features->y_max = 4096; + } } else if (features->pktlen == WACOM_PKGLEN_BBTOUCH) { features->device_type |= WACOM_DEVICETYPE_PAD; From 721476147fd2571309b6aa6daa695b39170602ef Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Mon, 25 Jun 2018 09:34:03 -0300 Subject: [PATCH 737/970] ARM: imx_v6_v7_defconfig: Select ULPI support [ Upstream commit 157bcc06094c3c5800d3f4676527047b79b618e7 ] Select CONFIG_USB_CHIPIDEA_ULPI and CONFIG_USB_ULPI_BUS so that USB ULPI can be functional on some boards like imx51-babbge. This fixes a kernel hang in 4.18-rc1 on i.mx51-babbage, caused by commit 03e6275ae381 ("usb: chipidea: Fix ULPI on imx51"). Suggested-by: Andrey Smirnov Signed-off-by: Fabio Estevam Signed-off-by: Shawn Guo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/configs/imx_v6_v7_defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig index 8ec4dbbb50b0..6b7d4f535984 100644 --- a/arch/arm/configs/imx_v6_v7_defconfig +++ b/arch/arm/configs/imx_v6_v7_defconfig @@ -271,6 +271,7 @@ CONFIG_USB_STORAGE=y CONFIG_USB_CHIPIDEA=y CONFIG_USB_CHIPIDEA_UDC=y CONFIG_USB_CHIPIDEA_HOST=y +CONFIG_USB_CHIPIDEA_ULPI=y CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_GENERIC=y CONFIG_USB_SERIAL_FTDI_SIO=m @@ -307,6 +308,7 @@ CONFIG_USB_GADGETFS=m CONFIG_USB_FUNCTIONFS=m CONFIG_USB_MASS_STORAGE=m CONFIG_USB_G_SERIAL=m +CONFIG_USB_ULPI_BUS=y CONFIG_MMC=y CONFIG_MMC_SDHCI=y CONFIG_MMC_SDHCI_PLTFM=y From eaccc6f0b8ed90d23d3712690c2575fd6a572d67 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Tue, 26 Jun 2018 08:37:09 -0300 Subject: [PATCH 738/970] ARM: imx_v4_v5_defconfig: Select ULPI support [ Upstream commit 2ceb2780b790b74bc408a949f6aedbad8afa693e ] Select CONFIG_USB_CHIPIDEA_ULPI and CONFIG_USB_ULPI_BUS so that USB ULPI can be functional on some boards like that use ULPI interface. Signed-off-by: Fabio Estevam Signed-off-by: Shawn Guo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/configs/imx_v4_v5_defconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/configs/imx_v4_v5_defconfig b/arch/arm/configs/imx_v4_v5_defconfig index 5f013c9fc1ed..8290c9ace6e9 100644 --- a/arch/arm/configs/imx_v4_v5_defconfig +++ b/arch/arm/configs/imx_v4_v5_defconfig @@ -145,9 +145,11 @@ CONFIG_USB_STORAGE=y CONFIG_USB_CHIPIDEA=y CONFIG_USB_CHIPIDEA_UDC=y CONFIG_USB_CHIPIDEA_HOST=y +CONFIG_USB_CHIPIDEA_ULPI=y CONFIG_NOP_USB_XCEIV=y CONFIG_USB_GADGET=y CONFIG_USB_ETH=m +CONFIG_USB_ULPI_BUS=y CONFIG_MMC=y CONFIG_MMC_SDHCI=y CONFIG_MMC_SDHCI_PLTFM=y From bca139fc9b67c865cb94b2780d203f2d0ac5a353 Mon Sep 17 00:00:00 2001 From: Mathieu Malaterre Date: Thu, 8 Mar 2018 21:58:43 +0100 Subject: [PATCH 739/970] tracing: Use __printf markup to silence compiler MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 26b68dd2f48fe7699a89f0cfbb9f4a650dc1c837 ] Silence warnings (triggered at W=1) by adding relevant __printf attributes. CC kernel/trace/trace.o kernel/trace/trace.c: In function ‘__trace_array_vprintk’: kernel/trace/trace.c:2979:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] len = vscnprintf(tbuffer, TRACE_BUF_SIZE, fmt, args); ^~~ AR kernel/trace/built-in.o Link: http://lkml.kernel.org/r/20180308205843.27447-1-malat@debian.org Signed-off-by: Mathieu Malaterre Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 901c7f15f6e2..148c2210a2b8 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2525,6 +2525,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args) } EXPORT_SYMBOL_GPL(trace_vbprintk); +__printf(3, 0) static int __trace_array_vprintk(struct ring_buffer *buffer, unsigned long ip, const char *fmt, va_list args) @@ -2579,12 +2580,14 @@ __trace_array_vprintk(struct ring_buffer *buffer, return len; } +__printf(3, 0) int trace_array_vprintk(struct trace_array *tr, unsigned long ip, const char *fmt, va_list args) { return __trace_array_vprintk(tr->trace_buffer.buffer, ip, fmt, args); } +__printf(3, 0) int trace_array_printk(struct trace_array *tr, unsigned long ip, const char *fmt, ...) { @@ -2600,6 +2603,7 @@ int trace_array_printk(struct trace_array *tr, return ret; } +__printf(3, 4) int trace_array_printk_buf(struct ring_buffer *buffer, unsigned long ip, const char *fmt, ...) { @@ -2615,6 +2619,7 @@ int trace_array_printk_buf(struct ring_buffer *buffer, return ret; } +__printf(2, 0) int trace_vprintk(unsigned long ip, const char *fmt, va_list args) { return trace_array_vprintk(&global_trace, ip, fmt, args); From 2b7f885326c3d8b286ff93ba82e58013c7ee0136 Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Tue, 3 Jul 2018 17:02:46 -0700 Subject: [PATCH 740/970] kasan: fix shadow_size calculation error in kasan_module_alloc [ Upstream commit 1e8e18f694a52d703665012ca486826f64bac29d ] There is a special case that the size is "(N << KASAN_SHADOW_SCALE_SHIFT) Pages plus X", the value of X is [1, KASAN_SHADOW_SCALE_SIZE-1]. The operation "size >> KASAN_SHADOW_SCALE_SHIFT" will drop X, and the roundup operation can not retrieve the missed one page. For example: size=0x28006, PAGE_SIZE=0x1000, KASAN_SHADOW_SCALE_SHIFT=3, we will get shadow_size=0x5000, but actually we need 6 pages. shadow_size = round_up(size >> KASAN_SHADOW_SCALE_SHIFT, PAGE_SIZE); This can lead to a kernel crash when kasan is enabled and the value of mod->core_layout.size or mod->init_layout.size is like above. Because the shadow memory of X has not been allocated and mapped. move_module: ptr = module_alloc(mod->core_layout.size); ... memset(ptr, 0, mod->core_layout.size); //crashed Unable to handle kernel paging request at virtual address ffff0fffff97b000 ...... Call trace: __asan_storeN+0x174/0x1a8 memset+0x24/0x48 layout_and_allocate+0xcd8/0x1800 load_module+0x190/0x23e8 SyS_finit_module+0x148/0x180 Link: http://lkml.kernel.org/r/1529659626-12660-1-git-send-email-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei Reviewed-by: Dmitriy Vyukov Acked-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: Hanjun Guo Cc: Libin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/kasan/kasan.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c index 73c258129257..4ce386c44cf1 100644 --- a/mm/kasan/kasan.c +++ b/mm/kasan/kasan.c @@ -660,12 +660,13 @@ void kasan_kfree_large(const void *ptr) int kasan_module_alloc(void *addr, size_t size) { void *ret; + size_t scaled_size; size_t shadow_size; unsigned long shadow_start; shadow_start = (unsigned long)kasan_mem_to_shadow(addr); - shadow_size = round_up(size >> KASAN_SHADOW_SCALE_SHIFT, - PAGE_SIZE); + scaled_size = (size + KASAN_SHADOW_MASK) >> KASAN_SHADOW_SCALE_SHIFT; + shadow_size = round_up(scaled_size, PAGE_SIZE); if (WARN_ON(!PAGE_ALIGNED(shadow_start))) return -EINVAL; From 667f0367f27c79c98f1c4b2ea089dcc11f198a05 Mon Sep 17 00:00:00 2001 From: Yuiko Oshino Date: Tue, 3 Jul 2018 11:21:46 -0400 Subject: [PATCH 741/970] smsc75xx: Add workaround for gigabit link up hardware errata. [ Upstream commit d461e3da905332189aad546b2ad9adbe6071c7cc ] In certain conditions, the device may not be able to link in gigabit mode. This software workaround ensures that the device will not enter the failure state. Fixes: d0cad871703b898a442e4049c532ec39168e5b57 ("SMSC75XX USB 2.0 Gigabit Ethernet Devices") Signed-off-by: Yuiko Oshino Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/smsc75xx.c | 62 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c index 2cc0f28f4fd2..03d04011d653 100644 --- a/drivers/net/usb/smsc75xx.c +++ b/drivers/net/usb/smsc75xx.c @@ -82,6 +82,9 @@ static bool turbo_mode = true; module_param(turbo_mode, bool, 0644); MODULE_PARM_DESC(turbo_mode, "Enable multiple frames per Rx transaction"); +static int smsc75xx_link_ok_nopm(struct usbnet *dev); +static int smsc75xx_phy_gig_workaround(struct usbnet *dev); + static int __must_check __smsc75xx_read_reg(struct usbnet *dev, u32 index, u32 *data, int in_pm) { @@ -852,6 +855,9 @@ static int smsc75xx_phy_initialize(struct usbnet *dev) return -EIO; } + /* phy workaround for gig link */ + smsc75xx_phy_gig_workaround(dev); + smsc75xx_mdio_write(dev->net, dev->mii.phy_id, MII_ADVERTISE, ADVERTISE_ALL | ADVERTISE_CSMA | ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM); @@ -990,6 +996,62 @@ static int smsc75xx_wait_ready(struct usbnet *dev, int in_pm) return -EIO; } +static int smsc75xx_phy_gig_workaround(struct usbnet *dev) +{ + struct mii_if_info *mii = &dev->mii; + int ret = 0, timeout = 0; + u32 buf, link_up = 0; + + /* Set the phy in Gig loopback */ + smsc75xx_mdio_write(dev->net, mii->phy_id, MII_BMCR, 0x4040); + + /* Wait for the link up */ + do { + link_up = smsc75xx_link_ok_nopm(dev); + usleep_range(10000, 20000); + timeout++; + } while ((!link_up) && (timeout < 1000)); + + if (timeout >= 1000) { + netdev_warn(dev->net, "Timeout waiting for PHY link up\n"); + return -EIO; + } + + /* phy reset */ + ret = smsc75xx_read_reg(dev, PMT_CTL, &buf); + if (ret < 0) { + netdev_warn(dev->net, "Failed to read PMT_CTL: %d\n", ret); + return ret; + } + + buf |= PMT_CTL_PHY_RST; + + ret = smsc75xx_write_reg(dev, PMT_CTL, buf); + if (ret < 0) { + netdev_warn(dev->net, "Failed to write PMT_CTL: %d\n", ret); + return ret; + } + + timeout = 0; + do { + usleep_range(10000, 20000); + ret = smsc75xx_read_reg(dev, PMT_CTL, &buf); + if (ret < 0) { + netdev_warn(dev->net, "Failed to read PMT_CTL: %d\n", + ret); + return ret; + } + timeout++; + } while ((buf & PMT_CTL_PHY_RST) && (timeout < 100)); + + if (timeout >= 100) { + netdev_warn(dev->net, "timeout waiting for PHY Reset\n"); + return -EIO; + } + + return 0; +} + static int smsc75xx_reset(struct usbnet *dev) { struct smsc75xx_priv *pdata = (struct smsc75xx_priv *)(dev->data[0]); From f4bc80f5a51d6fe6f3c31c6e22e5dd410e4651a8 Mon Sep 17 00:00:00 2001 From: Taeung Song Date: Wed, 4 Jul 2018 22:36:36 +0900 Subject: [PATCH 742/970] samples/bpf: add missing [ Upstream commit 4d5d33a085335ef469c9a87792bcaaaa8e64d8c4 ] This fixes build error regarding redefinition: CLANG-bpf samples/bpf/parse_varlen.o samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr' struct vlan_hdr { ^ ./include/linux/if_vlan.h:38:8: note: previous definition is here So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h Signed-off-by: Taeung Song Acked-by: David S. Miller Signed-off-by: Daniel Borkmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- samples/bpf/parse_varlen.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/samples/bpf/parse_varlen.c b/samples/bpf/parse_varlen.c index 95c16324760c..0b6f22feb2c9 100644 --- a/samples/bpf/parse_varlen.c +++ b/samples/bpf/parse_varlen.c @@ -6,6 +6,7 @@ */ #define KBUILD_MODNAME "foo" #include +#include #include #include #include @@ -108,11 +109,6 @@ static int parse_ipv6(void *data, uint64_t nh_off, void *data_end) return 0; } -struct vlan_hdr { - uint16_t h_vlan_TCI; - uint16_t h_vlan_encapsulated_proto; -}; - SEC("varlen") int handle_ingress(struct __sk_buff *skb) { From fbfd753aebcac74be5b2a90c55da8e3a90018cf3 Mon Sep 17 00:00:00 2001 From: Taeung Song Date: Wed, 4 Jul 2018 22:36:38 +0900 Subject: [PATCH 743/970] samples/bpf: Check the error of write() and read() [ Upstream commit 02a2f000a3629274bfad60bfc4de9edec49e63e7 ] test_task_rename() and test_urandom_read() can be failed during write() and read(), So check the result of them. Reviewed-by: David Laight Signed-off-by: Taeung Song Acked-by: David S. Miller Signed-off-by: Daniel Borkmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- samples/bpf/test_overhead_user.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/samples/bpf/test_overhead_user.c b/samples/bpf/test_overhead_user.c index d291167fd3c7..7dad9a3168e1 100644 --- a/samples/bpf/test_overhead_user.c +++ b/samples/bpf/test_overhead_user.c @@ -6,6 +6,7 @@ */ #define _GNU_SOURCE #include +#include #include #include #include @@ -44,8 +45,13 @@ static void test_task_rename(int cpu) exit(1); } start_time = time_get_ns(); - for (i = 0; i < MAX_CNT; i++) - write(fd, buf, sizeof(buf)); + for (i = 0; i < MAX_CNT; i++) { + if (write(fd, buf, sizeof(buf)) < 0) { + printf("task rename failed: %s\n", strerror(errno)); + close(fd); + return; + } + } printf("task_rename:%d: %lld events per sec\n", cpu, MAX_CNT * 1000000000ll / (time_get_ns() - start_time)); close(fd); @@ -63,8 +69,13 @@ static void test_urandom_read(int cpu) exit(1); } start_time = time_get_ns(); - for (i = 0; i < MAX_CNT; i++) - read(fd, buf, sizeof(buf)); + for (i = 0; i < MAX_CNT; i++) { + if (read(fd, buf, sizeof(buf)) < 0) { + printf("failed to read from /dev/urandom: %s\n", strerror(errno)); + close(fd); + return; + } + } printf("urandom_read:%d: %lld events per sec\n", cpu, MAX_CNT * 1000000000ll / (time_get_ns() - start_time)); close(fd); From 1c7e225fa83ee3f22b3278ba55b900b7405ee008 Mon Sep 17 00:00:00 2001 From: Lubomir Rintel Date: Mon, 2 Jul 2018 11:21:47 +0200 Subject: [PATCH 744/970] ieee802154: 6lowpan: set IFLA_LINK [ Upstream commit b30c122c0bbb0a1dc413085e177ea09467e65fdb ] Otherwise NetworkManager (and iproute alike) is not able to identify the parent IEEE 802.15.4 interface of a 6LoWPAN link. Signed-off-by: Lubomir Rintel Acked-by: Alexander Aring Signed-off-by: Stefan Schmidt Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ieee802154/6lowpan/core.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c index 83af5339e582..ba8bd24eb8b6 100644 --- a/net/ieee802154/6lowpan/core.c +++ b/net/ieee802154/6lowpan/core.c @@ -90,12 +90,18 @@ static int lowpan_neigh_construct(struct net_device *dev, struct neighbour *n) return 0; } +static int lowpan_get_iflink(const struct net_device *dev) +{ + return lowpan_802154_dev(dev)->wdev->ifindex; +} + static const struct net_device_ops lowpan_netdev_ops = { .ndo_init = lowpan_dev_init, .ndo_start_xmit = lowpan_xmit, .ndo_open = lowpan_open, .ndo_stop = lowpan_stop, .ndo_neigh_construct = lowpan_neigh_construct, + .ndo_get_iflink = lowpan_get_iflink, }; static void lowpan_setup(struct net_device *ldev) From 894b753c4ee7a600307266e8f0f028815f1eee03 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 4 Jul 2018 20:25:32 +0200 Subject: [PATCH 745/970] netfilter: x_tables: set module owner for icmp(6) matches [ Upstream commit d376bef9c29b3c65aeee4e785fffcd97ef0a9a81 ] nft_compat relies on xt_request_find_match to increment refcount of the module that provides the match/target. The (builtin) icmp matches did't set the module owner so it was possible to rmmod ip(6)tables while icmp extensions were still in use. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/netfilter/ip_tables.c | 1 + net/ipv6/netfilter/ip6_tables.c | 1 + 2 files changed, 2 insertions(+) diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 06aa4948d0c0..4822459e8f42 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -1913,6 +1913,7 @@ static struct xt_match ipt_builtin_mt[] __read_mostly = { .checkentry = icmp_checkentry, .proto = IPPROTO_ICMP, .family = NFPROTO_IPV4, + .me = THIS_MODULE, }, }; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 180f19526a80..21cad30e4546 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -1934,6 +1934,7 @@ static struct xt_match ip6t_builtin_mt[] __read_mostly = { .checkentry = icmp6_checkentry, .proto = IPPROTO_ICMPV6, .family = NFPROTO_IPV6, + .me = THIS_MODULE, }, }; From 8823c73bd719e74378dd77bc0d894815d0ef4102 Mon Sep 17 00:00:00 2001 From: Paul Moore Date: Wed, 4 Jul 2018 09:58:05 -0400 Subject: [PATCH 746/970] ipv6: make ipv6_renew_options() interrupt/kernel safe [ Upstream commit a9ba23d48dbc6ffd08426bb10f05720e0b9f5c14 ] At present the ipv6_renew_options_kern() function ends up calling into access_ok() which is problematic if done from inside an interrupt as access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures (x86-64 is affected). Example warning/backtrace is shown below: WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90 ... Call Trace: ipv6_renew_option+0xb2/0xf0 ipv6_renew_options+0x26a/0x340 ipv6_renew_options_kern+0x2c/0x40 calipso_req_setattr+0x72/0xe0 netlbl_req_setattr+0x126/0x1b0 selinux_netlbl_inet_conn_request+0x80/0x100 selinux_inet_conn_request+0x6d/0xb0 security_inet_conn_request+0x32/0x50 tcp_conn_request+0x35f/0xe00 ? __lock_acquire+0x250/0x16c0 ? selinux_socket_sock_rcv_skb+0x1ae/0x210 ? tcp_rcv_state_process+0x289/0x106b tcp_rcv_state_process+0x289/0x106b ? tcp_v6_do_rcv+0x1a7/0x3c0 tcp_v6_do_rcv+0x1a7/0x3c0 tcp_v6_rcv+0xc82/0xcf0 ip6_input_finish+0x10d/0x690 ip6_input+0x45/0x1e0 ? ip6_rcv_finish+0x1d0/0x1d0 ipv6_rcv+0x32b/0x880 ? ip6_make_skb+0x1e0/0x1e0 __netif_receive_skb_core+0x6f2/0xdf0 ? process_backlog+0x85/0x250 ? process_backlog+0x85/0x250 ? process_backlog+0xec/0x250 process_backlog+0xec/0x250 net_rx_action+0x153/0x480 __do_softirq+0xd9/0x4f7 do_softirq_own_stack+0x2a/0x40 ... While not present in the backtrace, ipv6_renew_option() ends up calling access_ok() via the following chain: access_ok() _copy_from_user() copy_from_user() ipv6_renew_option() The fix presented in this patch is to perform the userspace copy earlier in the call chain such that it is only called when the option data is actually coming from userspace; that place is do_ipv6_setsockopt(). Not only does this solve the problem seen in the backtrace above, it also allows us to simplify the code quite a bit by removing ipv6_renew_options_kern() completely. We also take this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option() a small amount as well. This patch is heavily based on a rough patch by Al Viro. I've taken his original patch, converted a kmemdup() call in do_ipv6_setsockopt() to a memdup_user() call, made better use of the e_inval jump target in the same function, and cleaned up the use ipv6_renew_option() by ipv6_renew_options(). CC: Al Viro Signed-off-by: Paul Moore Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/net/ipv6.h | 9 +--- net/ipv6/calipso.c | 9 ++-- net/ipv6/exthdrs.c | 111 +++++++++++---------------------------- net/ipv6/ipv6_sockglue.c | 27 +++++++--- 4 files changed, 53 insertions(+), 103 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 407087d686a7..64b0e9df31c7 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -312,14 +312,7 @@ struct ipv6_txoptions *ipv6_dup_options(struct sock *sk, struct ipv6_txoptions *ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt, int newtype, - struct ipv6_opt_hdr __user *newopt, - int newoptlen); -struct ipv6_txoptions * -ipv6_renew_options_kern(struct sock *sk, - struct ipv6_txoptions *opt, - int newtype, - struct ipv6_opt_hdr *newopt, - int newoptlen); + struct ipv6_opt_hdr *newopt); struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space, struct ipv6_txoptions *opt); diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c index 8d772fea1dde..9742abf5ac26 100644 --- a/net/ipv6/calipso.c +++ b/net/ipv6/calipso.c @@ -799,8 +799,7 @@ static int calipso_opt_update(struct sock *sk, struct ipv6_opt_hdr *hop) { struct ipv6_txoptions *old = txopt_get(inet6_sk(sk)), *txopts; - txopts = ipv6_renew_options_kern(sk, old, IPV6_HOPOPTS, - hop, hop ? ipv6_optlen(hop) : 0); + txopts = ipv6_renew_options(sk, old, IPV6_HOPOPTS, hop); txopt_put(old); if (IS_ERR(txopts)) return PTR_ERR(txopts); @@ -1222,8 +1221,7 @@ static int calipso_req_setattr(struct request_sock *req, if (IS_ERR(new)) return PTR_ERR(new); - txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, - new, new ? ipv6_optlen(new) : 0); + txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new); kfree(new); @@ -1260,8 +1258,7 @@ static void calipso_req_delattr(struct request_sock *req) if (calipso_opt_del(req_inet->ipv6_opt->hopopt, &new)) return; /* Nothing to do */ - txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, - new, new ? ipv6_optlen(new) : 0); + txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new); if (!IS_ERR(txopts)) { txopts = xchg(&req_inet->ipv6_opt, txopts); diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c index 139ceb68bd37..b909c772453f 100644 --- a/net/ipv6/exthdrs.c +++ b/net/ipv6/exthdrs.c @@ -760,29 +760,21 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions *opt) } EXPORT_SYMBOL_GPL(ipv6_dup_options); -static int ipv6_renew_option(void *ohdr, - struct ipv6_opt_hdr __user *newopt, int newoptlen, - int inherit, - struct ipv6_opt_hdr **hdr, - char **p) +static void ipv6_renew_option(int renewtype, + struct ipv6_opt_hdr **dest, + struct ipv6_opt_hdr *old, + struct ipv6_opt_hdr *new, + int newtype, char **p) { - if (inherit) { - if (ohdr) { - memcpy(*p, ohdr, ipv6_optlen((struct ipv6_opt_hdr *)ohdr)); - *hdr = (struct ipv6_opt_hdr *)*p; - *p += CMSG_ALIGN(ipv6_optlen(*hdr)); - } - } else { - if (newopt) { - if (copy_from_user(*p, newopt, newoptlen)) - return -EFAULT; - *hdr = (struct ipv6_opt_hdr *)*p; - if (ipv6_optlen(*hdr) > newoptlen) - return -EINVAL; - *p += CMSG_ALIGN(newoptlen); - } - } - return 0; + struct ipv6_opt_hdr *src; + + src = (renewtype == newtype ? new : old); + if (!src) + return; + + memcpy(*p, src, ipv6_optlen(src)); + *dest = (struct ipv6_opt_hdr *)*p; + *p += CMSG_ALIGN(ipv6_optlen(*dest)); } /** @@ -808,13 +800,11 @@ static int ipv6_renew_option(void *ohdr, */ struct ipv6_txoptions * ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt, - int newtype, - struct ipv6_opt_hdr __user *newopt, int newoptlen) + int newtype, struct ipv6_opt_hdr *newopt) { int tot_len = 0; char *p; struct ipv6_txoptions *opt2; - int err; if (opt) { if (newtype != IPV6_HOPOPTS && opt->hopopt) @@ -827,8 +817,8 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt, tot_len += CMSG_ALIGN(ipv6_optlen(opt->dst1opt)); } - if (newopt && newoptlen) - tot_len += CMSG_ALIGN(newoptlen); + if (newopt) + tot_len += CMSG_ALIGN(ipv6_optlen(newopt)); if (!tot_len) return NULL; @@ -843,29 +833,19 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt, opt2->tot_len = tot_len; p = (char *)(opt2 + 1); - err = ipv6_renew_option(opt ? opt->hopopt : NULL, newopt, newoptlen, - newtype != IPV6_HOPOPTS, - &opt2->hopopt, &p); - if (err) - goto out; - - err = ipv6_renew_option(opt ? opt->dst0opt : NULL, newopt, newoptlen, - newtype != IPV6_RTHDRDSTOPTS, - &opt2->dst0opt, &p); - if (err) - goto out; - - err = ipv6_renew_option(opt ? opt->srcrt : NULL, newopt, newoptlen, - newtype != IPV6_RTHDR, - (struct ipv6_opt_hdr **)&opt2->srcrt, &p); - if (err) - goto out; - - err = ipv6_renew_option(opt ? opt->dst1opt : NULL, newopt, newoptlen, - newtype != IPV6_DSTOPTS, - &opt2->dst1opt, &p); - if (err) - goto out; + ipv6_renew_option(IPV6_HOPOPTS, &opt2->hopopt, + (opt ? opt->hopopt : NULL), + newopt, newtype, &p); + ipv6_renew_option(IPV6_RTHDRDSTOPTS, &opt2->dst0opt, + (opt ? opt->dst0opt : NULL), + newopt, newtype, &p); + ipv6_renew_option(IPV6_RTHDR, + (struct ipv6_opt_hdr **)&opt2->srcrt, + (opt ? (struct ipv6_opt_hdr *)opt->srcrt : NULL), + newopt, newtype, &p); + ipv6_renew_option(IPV6_DSTOPTS, &opt2->dst1opt, + (opt ? opt->dst1opt : NULL), + newopt, newtype, &p); opt2->opt_nflen = (opt2->hopopt ? ipv6_optlen(opt2->hopopt) : 0) + (opt2->dst0opt ? ipv6_optlen(opt2->dst0opt) : 0) + @@ -873,37 +853,6 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt, opt2->opt_flen = (opt2->dst1opt ? ipv6_optlen(opt2->dst1opt) : 0); return opt2; -out: - sock_kfree_s(sk, opt2, opt2->tot_len); - return ERR_PTR(err); -} - -/** - * ipv6_renew_options_kern - replace a specific ext hdr with a new one. - * - * @sk: sock from which to allocate memory - * @opt: original options - * @newtype: option type to replace in @opt - * @newopt: new option of type @newtype to replace (kernel-mem) - * @newoptlen: length of @newopt - * - * See ipv6_renew_options(). The difference is that @newopt is - * kernel memory, rather than user memory. - */ -struct ipv6_txoptions * -ipv6_renew_options_kern(struct sock *sk, struct ipv6_txoptions *opt, - int newtype, struct ipv6_opt_hdr *newopt, - int newoptlen) -{ - struct ipv6_txoptions *ret_val; - const mm_segment_t old_fs = get_fs(); - - set_fs(KERNEL_DS); - ret_val = ipv6_renew_options(sk, opt, newtype, - (struct ipv6_opt_hdr __user *)newopt, - newoptlen); - set_fs(old_fs); - return ret_val; } struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space, diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index c66b9a87e995..81fd35ed8732 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -390,6 +390,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, case IPV6_DSTOPTS: { struct ipv6_txoptions *opt; + struct ipv6_opt_hdr *new = NULL; + + /* hop-by-hop / destination options are privileged option */ + retv = -EPERM; + if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW)) + break; /* remove any sticky options header with a zero option * length, per RFC3542. @@ -401,17 +407,22 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, else if (optlen < sizeof(struct ipv6_opt_hdr) || optlen & 0x7 || optlen > 8 * 255) goto e_inval; - - /* hop-by-hop / destination options are privileged option */ - retv = -EPERM; - if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW)) - break; + else { + new = memdup_user(optval, optlen); + if (IS_ERR(new)) { + retv = PTR_ERR(new); + break; + } + if (unlikely(ipv6_optlen(new) > optlen)) { + kfree(new); + goto e_inval; + } + } opt = rcu_dereference_protected(np->opt, lockdep_sock_is_held(sk)); - opt = ipv6_renew_options(sk, opt, optname, - (struct ipv6_opt_hdr __user *)optval, - optlen); + opt = ipv6_renew_options(sk, opt, optname, new); + kfree(new); if (IS_ERR(opt)) { retv = PTR_ERR(opt); break; From 865c4f9a0c336c1d24d09f1d792e1060b1b6e850 Mon Sep 17 00:00:00 2001 From: Arun Kumar Neelakantam Date: Wed, 4 Jul 2018 19:49:32 +0530 Subject: [PATCH 747/970] net: qrtr: Broadcast messages only from control port [ Upstream commit fdf5fd3975666804118e62c69de25dc85cc0909c ] The broadcast node id should only be sent with the control port id. Signed-off-by: Arun Kumar Neelakantam Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/qrtr/qrtr.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c index ae5ac175b2be..7b670a9a375e 100644 --- a/net/qrtr/qrtr.c +++ b/net/qrtr/qrtr.c @@ -621,6 +621,10 @@ static int qrtr_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) node = NULL; if (addr->sq_node == QRTR_NODE_BCAST) { enqueue_fn = qrtr_bcast_enqueue; + if (addr->sq_port != QRTR_PORT_CTRL) { + release_sock(sk); + return -ENOTCONN; + } } else if (addr->sq_node == ipc->us.sq_node) { enqueue_fn = qrtr_local_enqueue; } else { From e90f9f50bc03da0fd2ddc6bb2f2002bae7220c18 Mon Sep 17 00:00:00 2001 From: Vladimir Zapolskiy Date: Wed, 4 Jul 2018 11:12:39 +0300 Subject: [PATCH 748/970] sh_eth: fix invalid context bug while calling auto-negotiation by ethtool [ Upstream commit 53a710b5044d8475faa6813000b6dd659400ef7b ] Since commit 35b5f6b1a82b ("PHYLIB: Locking fixes for PHY I/O potentially sleeping") phy_start_aneg() function utilizes a mutex to serialize changes to phy state, however the helper function is called in atomic context. The bug can be reproduced by running "ethtool -r" command, the bug is reported if CONFIG_DEBUG_ATOMIC_SLEEP build option is enabled. Fixes: dc19e4e5e02f ("sh: sh_eth: Add support ethtool") Signed-off-by: Vladimir Zapolskiy Reviewed-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/renesas/sh_eth.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index c8fd99b3ca29..4eba8eaabdab 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -2079,18 +2079,10 @@ static void sh_eth_get_regs(struct net_device *ndev, struct ethtool_regs *regs, static int sh_eth_nway_reset(struct net_device *ndev) { - struct sh_eth_private *mdp = netdev_priv(ndev); - unsigned long flags; - int ret; - if (!ndev->phydev) return -ENODEV; - spin_lock_irqsave(&mdp->lock, flags); - ret = phy_start_aneg(ndev->phydev); - spin_unlock_irqrestore(&mdp->lock, flags); - - return ret; + return phy_start_aneg(ndev->phydev); } static u32 sh_eth_get_msglevel(struct net_device *ndev) From c5c80efb2b8850a79fc3d9e15990c852d3c9bbc7 Mon Sep 17 00:00:00 2001 From: Vladimir Zapolskiy Date: Wed, 4 Jul 2018 11:12:40 +0300 Subject: [PATCH 749/970] sh_eth: fix invalid context bug while changing link options by ethtool [ Upstream commit 5cb3f52a11e18628fc4bee76dd14b1f0b76349de ] The change fixes sleep in atomic context bug, which is encountered every time when link settings are changed by ethtool. Since commit 35b5f6b1a82b ("PHYLIB: Locking fixes for PHY I/O potentially sleeping") phy_start_aneg() function utilizes a mutex to serialize changes to phy state, however that helper function is called in atomic context under a grabbed spinlock, because phy_start_aneg() is called by phy_ethtool_ksettings_set() and by replaced phy_ethtool_sset() helpers from phylib. Now duplex mode setting is enforced in sh_eth_adjust_link() only, also now RX/TX is disabled when link is put down or modifications to E-MAC registers ECMR and GECMR are expected for both cases of checked and ignored link status pin state from E-MAC interrupt handler. For reference the change is a partial rework of commit 1e1b812bbe10 ("sh_eth: fix handling of no LINK signal"). Fixes: dc19e4e5e02f ("sh: sh_eth: Add support ethtool") Signed-off-by: Vladimir Zapolskiy Reviewed-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/renesas/sh_eth.c | 49 ++++++++------------------- 1 file changed, 15 insertions(+), 34 deletions(-) diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index 4eba8eaabdab..c59e8fe37069 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -1743,8 +1743,15 @@ static void sh_eth_adjust_link(struct net_device *ndev) { struct sh_eth_private *mdp = netdev_priv(ndev); struct phy_device *phydev = ndev->phydev; + unsigned long flags; int new_state = 0; + spin_lock_irqsave(&mdp->lock, flags); + + /* Disable TX and RX right over here, if E-MAC change is ignored */ + if (mdp->cd->no_psr || mdp->no_ether_link) + sh_eth_rcv_snd_disable(ndev); + if (phydev->link) { if (phydev->duplex != mdp->duplex) { new_state = 1; @@ -1763,18 +1770,21 @@ static void sh_eth_adjust_link(struct net_device *ndev) sh_eth_modify(ndev, ECMR, ECMR_TXF, 0); new_state = 1; mdp->link = phydev->link; - if (mdp->cd->no_psr || mdp->no_ether_link) - sh_eth_rcv_snd_enable(ndev); } } else if (mdp->link) { new_state = 1; mdp->link = 0; mdp->speed = 0; mdp->duplex = -1; - if (mdp->cd->no_psr || mdp->no_ether_link) - sh_eth_rcv_snd_disable(ndev); } + /* Enable TX and RX right over here, if E-MAC change is ignored */ + if ((mdp->cd->no_psr || mdp->no_ether_link) && phydev->link) + sh_eth_rcv_snd_enable(ndev); + + mmiowb(); + spin_unlock_irqrestore(&mdp->lock, flags); + if (new_state && netif_msg_link(mdp)) phy_print_status(phydev); } @@ -1856,39 +1866,10 @@ static int sh_eth_get_link_ksettings(struct net_device *ndev, static int sh_eth_set_link_ksettings(struct net_device *ndev, const struct ethtool_link_ksettings *cmd) { - struct sh_eth_private *mdp = netdev_priv(ndev); - unsigned long flags; - int ret; - if (!ndev->phydev) return -ENODEV; - spin_lock_irqsave(&mdp->lock, flags); - - /* disable tx and rx */ - sh_eth_rcv_snd_disable(ndev); - - ret = phy_ethtool_ksettings_set(ndev->phydev, cmd); - if (ret) - goto error_exit; - - if (cmd->base.duplex == DUPLEX_FULL) - mdp->duplex = 1; - else - mdp->duplex = 0; - - if (mdp->cd->set_duplex) - mdp->cd->set_duplex(ndev); - -error_exit: - mdelay(1); - - /* enable tx and rx */ - sh_eth_rcv_snd_enable(ndev); - - spin_unlock_irqrestore(&mdp->lock, flags); - - return ret; + return phy_ethtool_ksettings_set(ndev->phydev, cmd); } /* If it is ever necessary to increase SH_ETH_REG_DUMP_MAX_REGS, the From 354077c09b58d119fd7dfef976711adfe9aaf8d5 Mon Sep 17 00:00:00 2001 From: Vladimir Zapolskiy Date: Wed, 4 Jul 2018 11:14:50 +0300 Subject: [PATCH 750/970] ravb: fix invalid context bug while calling auto-negotiation by ethtool [ Upstream commit 0973a4dd79fe56a3beecfcff675ba4c01df0b0c1 ] Since commit 35b5f6b1a82b ("PHYLIB: Locking fixes for PHY I/O potentially sleeping") phy_start_aneg() function utilizes a mutex to serialize changes to phy state, however the helper function is called in atomic context. The bug can be reproduced by running "ethtool -r" command, the bug is reported if CONFIG_DEBUG_ATOMIC_SLEEP build option is enabled. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Vladimir Zapolskiy Reviewed-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/renesas/ravb_main.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 10d3a9f6349e..316db1fd371d 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1122,15 +1122,10 @@ static int ravb_set_link_ksettings(struct net_device *ndev, static int ravb_nway_reset(struct net_device *ndev) { - struct ravb_private *priv = netdev_priv(ndev); int error = -ENODEV; - unsigned long flags; - if (ndev->phydev) { - spin_lock_irqsave(&priv->lock, flags); + if (ndev->phydev) error = phy_start_aneg(ndev->phydev); - spin_unlock_irqrestore(&priv->lock, flags); - } return error; } From fb96d97a4af4007bf72ca42d9067092b74f5929f Mon Sep 17 00:00:00 2001 From: Vladimir Zapolskiy Date: Wed, 4 Jul 2018 11:14:51 +0300 Subject: [PATCH 751/970] ravb: fix invalid context bug while changing link options by ethtool [ Upstream commit 05925e52a7d379192a5fdff2c33710f573190ead ] The change fixes sleep in atomic context bug, which is encountered every time when link settings are changed by ethtool. Since commit 35b5f6b1a82b ("PHYLIB: Locking fixes for PHY I/O potentially sleeping") phy_start_aneg() function utilizes a mutex to serialize changes to phy state, however that helper function is called in atomic context under a grabbed spinlock, because phy_start_aneg() is called by phy_ethtool_ksettings_set() and by replaced phy_ethtool_sset() helpers from phylib. Now duplex mode setting is enforced in ravb_adjust_link() only, also now RX/TX is disabled when link is put down or modifications to E-MAC registers ECMR and GECMR are expected for both cases of checked and ignored link status pin state from E-MAC interrupt handler. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Vladimir Zapolskiy Reviewed-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/renesas/ravb_main.c | 49 ++++++++---------------- 1 file changed, 15 insertions(+), 34 deletions(-) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 316db1fd371d..307ecd500dac 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -955,6 +955,13 @@ static void ravb_adjust_link(struct net_device *ndev) struct ravb_private *priv = netdev_priv(ndev); struct phy_device *phydev = ndev->phydev; bool new_state = false; + unsigned long flags; + + spin_lock_irqsave(&priv->lock, flags); + + /* Disable TX and RX right over here, if E-MAC change is ignored */ + if (priv->no_avb_link) + ravb_rcv_snd_disable(ndev); if (phydev->link) { if (phydev->duplex != priv->duplex) { @@ -972,18 +979,21 @@ static void ravb_adjust_link(struct net_device *ndev) ravb_modify(ndev, ECMR, ECMR_TXF, 0); new_state = true; priv->link = phydev->link; - if (priv->no_avb_link) - ravb_rcv_snd_enable(ndev); } } else if (priv->link) { new_state = true; priv->link = 0; priv->speed = 0; priv->duplex = -1; - if (priv->no_avb_link) - ravb_rcv_snd_disable(ndev); } + /* Enable TX and RX right over here, if E-MAC change is ignored */ + if (priv->no_avb_link && phydev->link) + ravb_rcv_snd_enable(ndev); + + mmiowb(); + spin_unlock_irqrestore(&priv->lock, flags); + if (new_state && netif_msg_link(priv)) phy_print_status(phydev); } @@ -1085,39 +1095,10 @@ static int ravb_get_link_ksettings(struct net_device *ndev, static int ravb_set_link_ksettings(struct net_device *ndev, const struct ethtool_link_ksettings *cmd) { - struct ravb_private *priv = netdev_priv(ndev); - unsigned long flags; - int error; - if (!ndev->phydev) return -ENODEV; - spin_lock_irqsave(&priv->lock, flags); - - /* Disable TX and RX */ - ravb_rcv_snd_disable(ndev); - - error = phy_ethtool_ksettings_set(ndev->phydev, cmd); - if (error) - goto error_exit; - - if (cmd->base.duplex == DUPLEX_FULL) - priv->duplex = 1; - else - priv->duplex = 0; - - ravb_set_duplex(ndev); - -error_exit: - mdelay(1); - - /* Enable TX and RX */ - ravb_rcv_snd_enable(ndev); - - mmiowb(); - spin_unlock_irqrestore(&priv->lock, flags); - - return error; + return phy_ethtool_ksettings_set(ndev->phydev, cmd); } static int ravb_nway_reset(struct net_device *ndev) From ecbef3e398c1df665e9e4429a44c2367364cfc4b Mon Sep 17 00:00:00 2001 From: Daniel Mack Date: Fri, 6 Jul 2018 22:15:00 +0200 Subject: [PATCH 752/970] ARM: pxa: irq: fix handling of ICMR registers in suspend/resume [ Upstream commit 0c1049dcb4ceec640d8bd797335bcbebdcab44d2 ] PXA3xx platforms have 56 interrupts that are stored in two ICMR registers. The code in pxa_irq_suspend() and pxa_irq_resume() however does a simple division by 32 which only leads to one register being saved at suspend and restored at resume time. The NAND interrupt setting, for instance, is lost. Fix this by using DIV_ROUND_UP() instead. Signed-off-by: Daniel Mack Signed-off-by: Robert Jarzmik Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-pxa/irq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-pxa/irq.c b/arch/arm/mach-pxa/irq.c index 9c10248fadcc..4e8c2116808e 100644 --- a/arch/arm/mach-pxa/irq.c +++ b/arch/arm/mach-pxa/irq.c @@ -185,7 +185,7 @@ static int pxa_irq_suspend(void) { int i; - for (i = 0; i < pxa_internal_irq_nr / 32; i++) { + for (i = 0; i < DIV_ROUND_UP(pxa_internal_irq_nr, 32); i++) { void __iomem *base = irq_base(i); saved_icmr[i] = __raw_readl(base + ICMR); @@ -204,7 +204,7 @@ static void pxa_irq_resume(void) { int i; - for (i = 0; i < pxa_internal_irq_nr / 32; i++) { + for (i = 0; i < DIV_ROUND_UP(pxa_internal_irq_nr, 32); i++) { void __iomem *base = irq_base(i); __raw_writel(saved_icmr[i], base + ICMR); From 7754ed7d9d3fe775c58a04c9cbc33508db34458e Mon Sep 17 00:00:00 2001 From: Davide Caratti Date: Fri, 6 Jul 2018 21:01:06 +0200 Subject: [PATCH 753/970] net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used [ Upstream commit 38230a3e0e0933bbcf5df6fa469ba0667f667568 ] the control action in the common member of struct tcf_tunnel_key must be a valid value, as it can contain the chain index when 'goto chain' is used. Ensure that the control action can be read as x->tcfa_action, when x is a pointer to struct tc_action and x->ops->type is TCA_ACT_TUNNEL_KEY, to prevent the following command: # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \ > $tcflags dst_mac $h2mac action tunnel_key unset goto chain 1 from causing a NULL dereference when a matching packet is received: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 80000001097ac067 P4D 80000001097ac067 PUD 103b0a067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 3491 Comm: mausezahn Tainted: G E 4.18.0-rc2.auguri+ #421 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013 RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246 RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800 R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40 FS: 00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0 Call Trace: fl_classify+0x1ad/0x1c0 [cls_flower] ? __update_load_avg_se.isra.47+0x1ca/0x1d0 ? __update_load_avg_se.isra.47+0x1ca/0x1d0 ? update_load_avg+0x665/0x690 ? update_load_avg+0x665/0x690 ? kmem_cache_alloc+0x38/0x1c0 tcf_classify+0x89/0x140 __netif_receive_skb_core+0x5ea/0xb70 ? enqueue_entity+0xd0/0x270 ? process_backlog+0x97/0x150 process_backlog+0x97/0x150 net_rx_action+0x14b/0x3e0 __do_softirq+0xde/0x2b4 do_softirq_own_stack+0x2a/0x40 do_softirq.part.18+0x49/0x50 __local_bh_enable_ip+0x49/0x50 __dev_queue_xmit+0x4ab/0x8a0 ? wait_woken+0x80/0x80 ? packet_sendmsg+0x38f/0x810 ? __dev_queue_xmit+0x8a0/0x8a0 packet_sendmsg+0x38f/0x810 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 ? do_vfs_ioctl+0xa4/0x630 ? syscall_trace_enter+0x1df/0x2e0 ? __audit_syscall_exit+0x22a/0x290 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fd67e18dc93 Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24 RSP: 002b:00007ffe0189b748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000020ca010 RCX: 00007fd67e18dc93 RDX: 0000000000000062 RSI: 00000000020ca322 RDI: 0000000000000003 RBP: 00007ffe0189b780 R08: 00007ffe0189b760 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062 R13: 00000000020ca322 R14: 00007ffe0189b760 R15: 0000000000000003 Modules linked in: act_tunnel_key act_gact cls_flower sch_ingress vrf veth act_csum(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm_intel kvm irqbypass snd_hda_intel crct10dif_pclmul crc32_pclmul hp_wmi ghash_clmulni_intel pcbc snd_hda_codec aesni_intel sparse_keymap rfkill snd_hda_core snd_hwdep snd_seq crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support wmi_bmof cryptd mei_wdt glue_helper snd_seq_device snd_pcm pcspkr snd_timer snd i2c_i801 lpc_ich sg soundcore wmi mei_me mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom i915 video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_intel libahci serio_raw sfc libata mtd drm ixgbe mdio i2c_core e1000e dca CR2: 0000000000000000 ---[ end trace 1ab8b5b5d4639dfc ]--- RIP: 0010:tcf_action_exec+0xb8/0x100 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246 RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800 R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40 FS: 00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key") Signed-off-by: Davide Caratti Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/net/tc_act/tc_tunnel_key.h | 1 - net/sched/act_tunnel_key.c | 6 +++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/include/net/tc_act/tc_tunnel_key.h b/include/net/tc_act/tc_tunnel_key.h index 253f8da6c2a6..2dcd80d62206 100644 --- a/include/net/tc_act/tc_tunnel_key.h +++ b/include/net/tc_act/tc_tunnel_key.h @@ -16,7 +16,6 @@ struct tcf_tunnel_key_params { struct rcu_head rcu; int tcft_action; - int action; struct metadata_dst *tcft_enc_metadata; }; diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c index 901fb8bb9dce..41835f6e86bc 100644 --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -39,7 +39,7 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a, tcf_lastuse_update(&t->tcf_tm); bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb); - action = params->action; + action = READ_ONCE(t->tcf_action); switch (params->tcft_action) { case TCA_TUNNEL_KEY_ACT_RELEASE: @@ -170,7 +170,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, params_old = rtnl_dereference(t->params); - params_new->action = parm->action; + t->tcf_action = parm->action; params_new->tcft_action = parm->t_action; params_new->tcft_enc_metadata = metadata; @@ -242,13 +242,13 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a, .index = t->tcf_index, .refcnt = t->tcf_refcnt - ref, .bindcnt = t->tcf_bindcnt - bind, + .action = t->tcf_action, }; struct tcf_t tm; params = rtnl_dereference(t->params); opt.t_action = params->tcft_action; - opt.action = params->action; if (nla_put(skb, TCA_TUNNEL_KEY_PARMS, sizeof(opt), &opt)) goto nla_put_failure; From afb72ef0cc0e85c75352ae5d4dfa41fbcad9da59 Mon Sep 17 00:00:00 2001 From: Stefan Schmidt Date: Fri, 22 Sep 2017 14:13:53 +0200 Subject: [PATCH 754/970] ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem [ Upstream commit 20f330452ad8814f2289a589baf65e21270879a7 ] The check is valid but it does not warrant to crash the kernel. A WARN_ON() is good enough here. Found by checkpatch. Signed-off-by: Stefan Schmidt Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ieee802154/at86rf230.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c index 9f10da60e02d..7d392e93faa1 100644 --- a/drivers/net/ieee802154/at86rf230.c +++ b/drivers/net/ieee802154/at86rf230.c @@ -941,7 +941,7 @@ at86rf230_xmit(struct ieee802154_hw *hw, struct sk_buff *skb) static int at86rf230_ed(struct ieee802154_hw *hw, u8 *level) { - BUG_ON(!level); + WARN_ON(!level); *level = 0xbe; return 0; } From 7a389e0d192a9f84a7b85784f32823b9de12f31e Mon Sep 17 00:00:00 2001 From: Stefan Schmidt Date: Fri, 22 Sep 2017 14:13:54 +0200 Subject: [PATCH 755/970] ieee802154: at86rf230: use __func__ macro for debug messages [ Upstream commit 8a81388ec27c4c0adbdecd20e67bb5f411ab46b2 ] Instead of having the function name hard-coded (it might change and we forgot to update them in the debug output) we can use __func__ instead and also shorter the line so we do not need to break it. Also fix an extra blank line while being here. Found by checkpatch. Signed-off-by: Stefan Schmidt Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ieee802154/at86rf230.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c index 7d392e93faa1..ce3b7fb7eda0 100644 --- a/drivers/net/ieee802154/at86rf230.c +++ b/drivers/net/ieee802154/at86rf230.c @@ -1117,8 +1117,7 @@ at86rf230_set_hw_addr_filt(struct ieee802154_hw *hw, if (changed & IEEE802154_AFILT_SADDR_CHANGED) { u16 addr = le16_to_cpu(filt->short_addr); - dev_vdbg(&lp->spi->dev, - "at86rf230_set_hw_addr_filt called for saddr\n"); + dev_vdbg(&lp->spi->dev, "%s called for saddr\n", __func__); __at86rf230_write(lp, RG_SHORT_ADDR_0, addr); __at86rf230_write(lp, RG_SHORT_ADDR_1, addr >> 8); } @@ -1126,8 +1125,7 @@ at86rf230_set_hw_addr_filt(struct ieee802154_hw *hw, if (changed & IEEE802154_AFILT_PANID_CHANGED) { u16 pan = le16_to_cpu(filt->pan_id); - dev_vdbg(&lp->spi->dev, - "at86rf230_set_hw_addr_filt called for pan id\n"); + dev_vdbg(&lp->spi->dev, "%s called for pan id\n", __func__); __at86rf230_write(lp, RG_PAN_ID_0, pan); __at86rf230_write(lp, RG_PAN_ID_1, pan >> 8); } @@ -1136,15 +1134,13 @@ at86rf230_set_hw_addr_filt(struct ieee802154_hw *hw, u8 i, addr[8]; memcpy(addr, &filt->ieee_addr, 8); - dev_vdbg(&lp->spi->dev, - "at86rf230_set_hw_addr_filt called for IEEE addr\n"); + dev_vdbg(&lp->spi->dev, "%s called for IEEE addr\n", __func__); for (i = 0; i < 8; i++) __at86rf230_write(lp, RG_IEEE_ADDR_0 + i, addr[i]); } if (changed & IEEE802154_AFILT_PANC_CHANGED) { - dev_vdbg(&lp->spi->dev, - "at86rf230_set_hw_addr_filt called for panc change\n"); + dev_vdbg(&lp->spi->dev, "%s called for panc change\n", __func__); if (filt->pan_coord) at86rf230_write_subreg(lp, SR_AACK_I_AM_COORD, 1); else @@ -1248,7 +1244,6 @@ at86rf230_set_cca_mode(struct ieee802154_hw *hw, return at86rf230_write_subreg(lp, SR_CCA_MODE, val); } - static int at86rf230_set_cca_ed_level(struct ieee802154_hw *hw, s32 mbm) { From f8b8e026d51dba150fa04af39bd1163fbdb787ca Mon Sep 17 00:00:00 2001 From: Stefan Schmidt Date: Fri, 22 Sep 2017 14:14:05 +0200 Subject: [PATCH 756/970] ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem [ Upstream commit 8f2fbc6c60ff213369e06a73610fc882a42fdf20 ] The check is valid but it does not warrant to crash the kernel. A WARN_ON() is good enough here. Found by checkpatch. Signed-off-by: Stefan Schmidt Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ieee802154/fakelb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ieee802154/fakelb.c b/drivers/net/ieee802154/fakelb.c index ec387efb61d0..685398191995 100644 --- a/drivers/net/ieee802154/fakelb.c +++ b/drivers/net/ieee802154/fakelb.c @@ -49,7 +49,7 @@ struct fakelb_phy { static int fakelb_hw_ed(struct ieee802154_hw *hw, u8 *level) { - BUG_ON(!level); + WARN_ON(!level); *level = 0xbe; return 0; From b96834386f9e827c201553a0cd0ab8da0df2c07b Mon Sep 17 00:00:00 2001 From: Russell King Date: Sun, 24 Jun 2018 14:35:10 +0100 Subject: [PATCH 757/970] drm/armada: fix colorkey mode property [ Upstream commit d378859a667edc99e3473704847698cae97ca2b1 ] The colorkey mode property was not correctly disabling the colorkeying when "disabled" mode was selected. Arrange for this to work as one would expect. Signed-off-by: Russell King Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/armada/armada_hw.h | 1 + drivers/gpu/drm/armada/armada_overlay.c | 30 ++++++++++++++++++------- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/armada/armada_hw.h b/drivers/gpu/drm/armada/armada_hw.h index 27319a8335e2..345dc4d0851e 100644 --- a/drivers/gpu/drm/armada/armada_hw.h +++ b/drivers/gpu/drm/armada/armada_hw.h @@ -160,6 +160,7 @@ enum { CFG_ALPHAM_GRA = 0x1 << 16, CFG_ALPHAM_CFG = 0x2 << 16, CFG_ALPHA_MASK = 0xff << 8, +#define CFG_ALPHA(x) ((x) << 8) CFG_PIXCMD_MASK = 0xff, }; diff --git a/drivers/gpu/drm/armada/armada_overlay.c b/drivers/gpu/drm/armada/armada_overlay.c index 152b4e716269..6a9bba7206df 100644 --- a/drivers/gpu/drm/armada/armada_overlay.c +++ b/drivers/gpu/drm/armada/armada_overlay.c @@ -27,6 +27,7 @@ struct armada_ovl_plane_properties { uint16_t contrast; uint16_t saturation; uint32_t colorkey_mode; + uint32_t colorkey_enable; }; struct armada_ovl_plane { @@ -62,11 +63,13 @@ armada_ovl_update_attr(struct armada_ovl_plane_properties *prop, writel_relaxed(0x00002000, dcrtc->base + LCD_SPU_CBSH_HUE); spin_lock_irq(&dcrtc->irq_lock); - armada_updatel(prop->colorkey_mode | CFG_ALPHAM_GRA, - CFG_CKMODE_MASK | CFG_ALPHAM_MASK | CFG_ALPHA_MASK, - dcrtc->base + LCD_SPU_DMA_CTRL1); - - armada_updatel(ADV_GRACOLORKEY, 0, dcrtc->base + LCD_SPU_ADV_REG); + armada_updatel(prop->colorkey_mode, + CFG_CKMODE_MASK | CFG_ALPHAM_MASK | CFG_ALPHA_MASK, + dcrtc->base + LCD_SPU_DMA_CTRL1); + if (dcrtc->variant->has_spu_adv_reg) + armada_updatel(prop->colorkey_enable, + ADV_GRACOLORKEY | ADV_VIDCOLORKEY, + dcrtc->base + LCD_SPU_ADV_REG); spin_unlock_irq(&dcrtc->irq_lock); } @@ -340,8 +343,17 @@ static int armada_ovl_plane_set_property(struct drm_plane *plane, dplane->prop.colorkey_vb |= K2B(val); update_attr = true; } else if (property == priv->colorkey_mode_prop) { - dplane->prop.colorkey_mode &= ~CFG_CKMODE_MASK; - dplane->prop.colorkey_mode |= CFG_CKMODE(val); + if (val == CKMODE_DISABLE) { + dplane->prop.colorkey_mode = + CFG_CKMODE(CKMODE_DISABLE) | + CFG_ALPHAM_CFG | CFG_ALPHA(255); + dplane->prop.colorkey_enable = 0; + } else { + dplane->prop.colorkey_mode = + CFG_CKMODE(val) | + CFG_ALPHAM_GRA | CFG_ALPHA(0); + dplane->prop.colorkey_enable = ADV_GRACOLORKEY; + } update_attr = true; } else if (property == priv->brightness_prop) { dplane->prop.brightness = val - 256; @@ -470,7 +482,9 @@ int armada_overlay_plane_create(struct drm_device *dev, unsigned long crtcs) dplane->prop.colorkey_yr = 0xfefefe00; dplane->prop.colorkey_ug = 0x01010100; dplane->prop.colorkey_vb = 0x01010100; - dplane->prop.colorkey_mode = CFG_CKMODE(CKMODE_RGB); + dplane->prop.colorkey_mode = CFG_CKMODE(CKMODE_RGB) | + CFG_ALPHAM_GRA | CFG_ALPHA(0); + dplane->prop.colorkey_enable = ADV_GRACOLORKEY; dplane->prop.brightness = 0; dplane->prop.contrast = 0x4000; dplane->prop.saturation = 0x4000; From d8a77d118ccd6056228f3fc1bc75c99cdeecfab9 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Fri, 6 Jul 2018 16:38:53 +0300 Subject: [PATCH 758/970] netfilter: nf_conntrack: Fix possible possible crash on module loading. [ Upstream commit 2045cdfa1b40d66f126f3fd05604fc7c754f0022 ] Loading the nf_conntrack module with doubled hashsize parameter, i.e. modprobe nf_conntrack hashsize=12345 hashsize=12345 causes NULL-ptr deref. If 'hashsize' specified twice, the nf_conntrack_set_hashsize() function will be called also twice. The first nf_conntrack_set_hashsize() call will set the 'nf_conntrack_htable_size' variable: nf_conntrack_set_hashsize() ... /* On boot, we can set this without any fancy locking. */ if (!nf_conntrack_htable_size) return param_set_uint(val, kp); But on the second invocation, the nf_conntrack_htable_size is already set, so the nf_conntrack_set_hashsize() will take a different path and call the nf_conntrack_hash_resize() function. Which will crash on the attempt to dereference 'nf_conntrack_hash' pointer: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 0010:nf_conntrack_hash_resize+0x255/0x490 [nf_conntrack] Call Trace: nf_conntrack_set_hashsize+0xcd/0x100 [nf_conntrack] parse_args+0x1f9/0x5a0 load_module+0x1281/0x1a50 __se_sys_finit_module+0xbe/0xf0 do_syscall_64+0x7c/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix this, by checking !nf_conntrack_hash instead of !nf_conntrack_htable_size. nf_conntrack_hash will be initialized only after the module loaded, so the second invocation of the nf_conntrack_set_hashsize() won't crash, it will just reinitialize nf_conntrack_htable_size again. Signed-off-by: Andrey Ryabinin Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_conntrack_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 2039fd7daf4e..db3586ba1211 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1822,7 +1822,7 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp) return -EOPNOTSUPP; /* On boot, we can set this without any fancy locking. */ - if (!nf_conntrack_htable_size) + if (!nf_conntrack_hash) return param_set_uint(val, kp); rc = kstrtouint(val, 0, &hashsize); From 6cfe79de46e812aa83aabc1e97cb6d0ebaa793da Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 19 Jun 2018 17:22:05 +0300 Subject: [PATCH 759/970] ARC: Improve cmpxchg syscall implementation [ Upstream commit e8708786d4fe21c043d38d760f768949a3d71185 ] This is used in configs lacking hardware atomics to emulate atomic r-m-w for user space, implemented by disabling preemption in kernel. However there are issues in current implementation: 1. Process not terminated if invalid user pointer passed: i.e. __get_user() failed. 2. The reason for this patch was __put_user() failure not being handled either, specifically for the COW break scenario. The zero page is initially wired up and read from __get_user() succeeds. A subsequent write by __put_user() induces a Protection Violation, but COW can't finish as Linux page fault handler is disabled due to preempt disable. And what's worse is we silently return the stale value to user space. Fix this specific case by re-enabling preemption and explicitly fixing up the fault and retrying the whole sequence over. Cc: Max Filippov Cc: linux-arch@vger.kernel.org Signed-off-by: Alexey Brodkin Signed-off-by: Peter Zijlstra Signed-off-by: Vineet Gupta [vgupta: rewrote the changelog] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arc/kernel/process.c | 47 ++++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 11 deletions(-) diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c index a41a79a4f4fe..0e8c0151a390 100644 --- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -44,7 +44,8 @@ SYSCALL_DEFINE0(arc_gettls) SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) { struct pt_regs *regs = current_pt_regs(); - int uval = -EFAULT; + u32 uval; + int ret; /* * This is only for old cores lacking LLOCK/SCOND, which by defintion @@ -57,23 +58,47 @@ SYSCALL_DEFINE3(arc_usr_cmpxchg, int *, uaddr, int, expected, int, new) /* Z indicates to userspace if operation succeded */ regs->status32 &= ~STATUS_Z_MASK; - if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int))) - return -EFAULT; + ret = access_ok(VERIFY_WRITE, uaddr, sizeof(*uaddr)); + if (!ret) + goto fail; +again: preempt_disable(); - if (__get_user(uval, uaddr)) - goto done; + ret = __get_user(uval, uaddr); + if (ret) + goto fault; - if (uval == expected) { - if (!__put_user(new, uaddr)) - regs->status32 |= STATUS_Z_MASK; - } + if (uval != expected) + goto out; -done: + ret = __put_user(new, uaddr); + if (ret) + goto fault; + + regs->status32 |= STATUS_Z_MASK; + +out: + preempt_enable(); + return uval; + +fault: preempt_enable(); - return uval; + if (unlikely(ret != -EFAULT)) + goto fail; + + down_read(¤t->mm->mmap_sem); + ret = fixup_user_fault(current, current->mm, (unsigned long) uaddr, + FAULT_FLAG_WRITE, NULL); + up_read(¤t->mm->mmap_sem); + + if (likely(!ret)) + goto again; + +fail: + force_sig(SIGSEGV, current); + return ret; } void arch_cpu_idle(void) From f4a1792da49fb7be8485e70e84710bc4201800c1 Mon Sep 17 00:00:00 2001 From: Michael Chan Date: Mon, 9 Jul 2018 02:24:49 -0400 Subject: [PATCH 760/970] bnxt_en: Always set output parameters in bnxt_get_max_rings(). [ Upstream commit 78f058a4aa0f2280dc4d45d2c4a95728398ef857 ] The current code returns -ENOMEM and does not bother to set the output parameters to 0 when no rings are available. Some callers, such as bnxt_get_channels() will display garbage ring numbers when that happens. Fix it by always setting the output parameters. Fixes: 6e6c5a57fbe1 ("bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.") Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 8777c3a4c095..d5f388511b41 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -6862,11 +6862,11 @@ int bnxt_get_max_rings(struct bnxt *bp, int *max_rx, int *max_tx, bool shared) int rx, tx, cp; _bnxt_get_max_rings(bp, &rx, &tx, &cp); + *max_rx = rx; + *max_tx = tx; if (!rx || !tx || !cp) return -ENOMEM; - *max_rx = rx; - *max_tx = tx; return bnxt_trim_rings(bp, max_rx, max_tx, cp, shared); } From 0bf550c07194dae60fbbeacb0c7b0714c2e1425e Mon Sep 17 00:00:00 2001 From: Vikas Gupta Date: Mon, 9 Jul 2018 02:24:52 -0400 Subject: [PATCH 761/970] bnxt_en: Fix for system hang if request_irq fails [ Upstream commit c58387ab1614f6d7fb9e244f214b61e7631421fc ] Fix bug in the error code path when bnxt_request_irq() returns failure. bnxt_disable_napi() should not be called in this error path because NAPI has not been enabled yet. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Vikas Gupta Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index d5f388511b41..72297b76944f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -5560,7 +5560,7 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) rc = bnxt_request_irq(bp); if (rc) { netdev_err(bp->dev, "bnxt_request_irq err: %x\n", rc); - goto open_err; + goto open_err_irq; } } @@ -5593,6 +5593,8 @@ static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init) open_err: bnxt_disable_napi(bp); + +open_err_irq: bnxt_del_napi(bp); open_err_free_mem: From b81825b74dd40e10de83f336a24bec375f497819 Mon Sep 17 00:00:00 2001 From: Kim Phillips Date: Fri, 29 Jun 2018 12:46:52 -0500 Subject: [PATCH 762/970] perf llvm-utils: Remove bashism from kernel include fetch script [ Upstream commit f6432b9f65001651412dbc3589d251534822d4ab ] Like system(), popen() calls /bin/sh, which may/may not be bash. Script when run on dash and encounters the line, yields: exit: Illegal number: -1 checkbashisms report on script content: possible bashism (exit|return with negative status code): exit -1 Remove the bashism and use the more portable non-zero failure status code 1. Signed-off-by: Kim Phillips Cc: Alexander Shishkin Cc: Hendrik Brueckner Cc: Jiri Olsa Cc: Michael Petlan Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Sandipan Das Cc: Thomas Richter Link: http://lkml.kernel.org/r/20180629124652.8d0af7e2281fd3fd8262cacc@arm.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/llvm-utils.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c index bf7216b8731d..621f6527b790 100644 --- a/tools/perf/util/llvm-utils.c +++ b/tools/perf/util/llvm-utils.c @@ -260,16 +260,16 @@ static const char *kinc_fetch_script = "#!/usr/bin/env sh\n" "if ! test -d \"$KBUILD_DIR\"\n" "then\n" -" exit -1\n" +" exit 1\n" "fi\n" "if ! test -f \"$KBUILD_DIR/include/generated/autoconf.h\"\n" "then\n" -" exit -1\n" +" exit 1\n" "fi\n" "TMPDIR=`mktemp -d`\n" "if test -z \"$TMPDIR\"\n" "then\n" -" exit -1\n" +" exit 1\n" "fi\n" "cat << EOF > $TMPDIR/Makefile\n" "obj-y := dummy.o\n" From 8e4449f431d8d57cf56f279ab784722b78f4acb3 Mon Sep 17 00:00:00 2001 From: Dave Jiang Date: Wed, 11 Jul 2018 10:10:11 -0700 Subject: [PATCH 763/970] nfit: fix unchecked dereference in acpi_nfit_ctl [ Upstream commit ee6581ceba7f8314b81b2f2a81f1cf3f67c679e2 ] Incremental patch to fix the unchecked dereference in acpi_nfit_ctl. Reported by Dan Carpenter: "acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value" from Jun 28, 2018, leads to the following Smatch complaint: drivers/acpi/nfit/core.c:578 acpi_nfit_ctl() warn: variable dereferenced before check 'cmd_rc' (see line 411) drivers/acpi/nfit/core.c 410 411 *cmd_rc = -EINVAL; ^^^^^^^^^^^^^^^^^^ Patch adds unchecked dereference. Fixes: c1985cefd844 ("acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value") Signed-off-by: Dave Jiang Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/nfit/core.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index fb73b4a9cf34..ef32e5766a01 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -201,7 +201,8 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, const u8 *uuid; int rc, i; - *cmd_rc = -EINVAL; + if (cmd_rc) + *cmd_rc = -EINVAL; func = cmd; if (cmd == ND_CMD_CALL) { call_pkg = buf; @@ -289,7 +290,8 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, * If we return an error (like elsewhere) then caller wouldn't * be able to rely upon data returned to make calculation. */ - *cmd_rc = 0; + if (cmd_rc) + *cmd_rc = 0; return 0; } From 199b59a524af7f615087d40ff20bd159b1da26d5 Mon Sep 17 00:00:00 2001 From: Kamal Heib Date: Tue, 10 Jul 2018 11:56:50 +0300 Subject: [PATCH 764/970] RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path [ Upstream commit d63c46734c545ad0488761059004a65c46efdde3 ] Fix memory leak in the error path of mlx5_ib_create_srq() by making sure to free the allocated srq. Fixes: c2b37f76485f ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq") Signed-off-by: Kamal Heib Acked-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx5/srq.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index 5c1dbe2f8757..d6e5002edb92 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -268,18 +268,24 @@ struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd, desc_size = sizeof(struct mlx5_wqe_srq_next_seg) + srq->msrq.max_gs * sizeof(struct mlx5_wqe_data_seg); - if (desc_size == 0 || srq->msrq.max_gs > desc_size) - return ERR_PTR(-EINVAL); + if (desc_size == 0 || srq->msrq.max_gs > desc_size) { + err = -EINVAL; + goto err_srq; + } desc_size = roundup_pow_of_two(desc_size); desc_size = max_t(size_t, 32, desc_size); - if (desc_size < sizeof(struct mlx5_wqe_srq_next_seg)) - return ERR_PTR(-EINVAL); + if (desc_size < sizeof(struct mlx5_wqe_srq_next_seg)) { + err = -EINVAL; + goto err_srq; + } srq->msrq.max_avail_gather = (desc_size - sizeof(struct mlx5_wqe_srq_next_seg)) / sizeof(struct mlx5_wqe_data_seg); srq->msrq.wqe_shift = ilog2(desc_size); buf_size = srq->msrq.max * desc_size; - if (buf_size < desc_size) - return ERR_PTR(-EINVAL); + if (buf_size < desc_size) { + err = -EINVAL; + goto err_srq; + } in.type = init_attr->srq_type; if (pd->uobject) From b1baa1168712eb9d2240a0e6efa5e3422ec3bcc7 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Tue, 10 Jul 2018 08:22:40 +0100 Subject: [PATCH 765/970] ARM: 8780/1: ftrace: Only set kernel memory back to read-only after boot [ Upstream commit b4c7e2bd2eb4764afe3af9409ff3b1b87116fa30 ] Dynamic ftrace requires modifying the code segments that are usually set to read-only. To do this, a per arch function is called both before and after the ftrace modifications are performed. The "before" function will set kernel code text to read-write to allow for ftrace to make the modifications, and the "after" function will set the kernel code text back to "read-only" to keep the kernel code text protected. The issue happens when dynamic ftrace is tested at boot up. The test is done before the kernel code text has been set to read-only. But the "before" and "after" calls are still performed. The "after" call will change the kernel code text to read-only prematurely, and other boot code that expects this code to be read-write will fail. The solution is to add a variable that is set when the kernel code text is expected to be converted to read-only, and make the ftrace "before" and "after" calls do nothing if that variable is not yet set. This is similar to the x86 solution from commit 162396309745 ("ftrace, x86: make kernel text writable only for conversions"). Link: http://lkml.kernel.org/r/20180620212906.24b7b66e@vmware.local.home Reported-by: Stefan Agner Tested-by: Stefan Agner Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Russell King Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/mm/init.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 4c587ad8bfe3..1565d6b67163 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -722,19 +722,28 @@ int __mark_rodata_ro(void *unused) return 0; } +static int kernel_set_to_readonly __read_mostly; + void mark_rodata_ro(void) { + kernel_set_to_readonly = 1; stop_machine(__mark_rodata_ro, NULL, NULL); } void set_kernel_text_rw(void) { + if (!kernel_set_to_readonly) + return; + set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), false, current->active_mm); } void set_kernel_text_ro(void) { + if (!kernel_set_to_readonly) + return; + set_section_perms(ro_perms, ARRAY_SIZE(ro_perms), true, current->active_mm); } From 66b29e23fba6cce2f16e2637c90415d7fc31c96d Mon Sep 17 00:00:00 2001 From: Nishanth Menon Date: Tue, 10 Jul 2018 14:47:25 -0500 Subject: [PATCH 766/970] ARM: DRA7/OMAP5: Enable ACTLR[0] (Enable invalidates of BTB) for secondary cores [ Upstream commit 2f8b5b21830aea95989a6e67d8a971297272a086 ] Call secure services to enable ACTLR[0] (Enable invalidates of BTB with ICIALLU) when branch hardening is enabled for kernel. On GP devices OMAP5/DRA7, there is no possibility to update secure side since "secure world" is ROM and there are no override mechanisms possible. On HS devices, appropriate PPA should do the workarounds as well. However, the configuration is only done for secondary core, since it is expected that firmware/bootloader will have enabled the required configuration for the primary boot core (note: bootloaders typically will NOT enable secondary processors, since it has no need to do so). Signed-off-by: Nishanth Menon Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/mach-omap2/omap-smp.c | 41 ++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/arch/arm/mach-omap2/omap-smp.c b/arch/arm/mach-omap2/omap-smp.c index b4de3da6dffa..f7f36da80f40 100644 --- a/arch/arm/mach-omap2/omap-smp.c +++ b/arch/arm/mach-omap2/omap-smp.c @@ -104,6 +104,45 @@ void omap5_erratum_workaround_801819(void) static inline void omap5_erratum_workaround_801819(void) { } #endif +#ifdef CONFIG_HARDEN_BRANCH_PREDICTOR +/* + * Configure ACR and enable ACTLR[0] (Enable invalidates of BTB with + * ICIALLU) to activate the workaround for secondary Core. + * NOTE: it is assumed that the primary core's configuration is done + * by the boot loader (kernel will detect a misconfiguration and complain + * if this is not done). + * + * In General Purpose(GP) devices, ACR bit settings can only be done + * by ROM code in "secure world" using the smc call and there is no + * option to update the "firmware" on such devices. This also works for + * High security(HS) devices, as a backup option in case the + * "update" is not done in the "security firmware". + */ +static void omap5_secondary_harden_predictor(void) +{ + u32 acr, acr_mask; + + asm volatile ("mrc p15, 0, %0, c1, c0, 1" : "=r" (acr)); + + /* + * ACTLR[0] (Enable invalidates of BTB with ICIALLU) + */ + acr_mask = BIT(0); + + /* Do we already have it done.. if yes, skip expensive smc */ + if ((acr & acr_mask) == acr_mask) + return; + + acr |= acr_mask; + omap_smc1(OMAP5_DRA7_MON_SET_ACR_INDEX, acr); + + pr_debug("%s: ARM ACR setup for CVE_2017_5715 applied on CPU%d\n", + __func__, smp_processor_id()); +} +#else +static inline void omap5_secondary_harden_predictor(void) { } +#endif + static void omap4_secondary_init(unsigned int cpu) { /* @@ -126,6 +165,8 @@ static void omap4_secondary_init(unsigned int cpu) set_cntfreq(); /* Configure ACR to disable streaming WA for 801819 */ omap5_erratum_workaround_801819(); + /* Enable ACR to allow for ICUALLU workaround */ + omap5_secondary_harden_predictor(); } /* From c8c9e45f9225b908bafe26fb265985aef65fb4cd Mon Sep 17 00:00:00 2001 From: Adam Ford Date: Wed, 11 Jul 2018 12:54:54 -0500 Subject: [PATCH 767/970] ARM: dts: am3517.dtsi: Disable reference to OMAP3 OTG controller [ Upstream commit 923847413f7316b5ced3491769b3fefa6c56a79a ] The AM3517 has a different OTG controller location than the OMAP3, which is included from omap3.dtsi. This results in a hwmod error. Since the AM3517 has a different OTG controller address, this patch disabes one that is isn't available. Signed-off-by: Adam Ford Signed-off-by: Tony Lindgren Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/am3517.dtsi | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/am3517.dtsi b/arch/arm/boot/dts/am3517.dtsi index 0db19d39d24c..d022b6b03273 100644 --- a/arch/arm/boot/dts/am3517.dtsi +++ b/arch/arm/boot/dts/am3517.dtsi @@ -74,6 +74,11 @@ }; }; +/* Table Table 5-79 of the TRM shows 480ab000 is reserved */ +&usb_otg_hs { + status = "disabled"; +}; + &iva { status = "disabled"; }; From 43db78fd44d7fd5790d13b6f0a69b93d7864477d Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Mon, 18 Jun 2018 12:02:00 -0400 Subject: [PATCH 768/970] ixgbe: Be more careful when modifying MAC filters [ Upstream commit d14c780c11fbc10f66c43e7b64eefe87ca442bd3 ] This change makes it so that we are much more explicit about the ordering of updates to the receive address register (RAR) table. Prior to this patch I believe we may have been updating the table while entries were still active, or possibly allowing for reordering of things since we weren't explicitly flushing writes to either the lower or upper portion of the register prior to accessing the other half. Signed-off-by: Alexander Duyck Reviewed-by: Shannon Nelson Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c index ad3362293cbd..0d2baec546e1 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c @@ -1847,7 +1847,12 @@ s32 ixgbe_set_rar_generic(struct ixgbe_hw *hw, u32 index, u8 *addr, u32 vmdq, if (enable_addr != 0) rar_high |= IXGBE_RAH_AV; + /* Record lower 32 bits of MAC address and then make + * sure that write is flushed to hardware before writing + * the upper 16 bits and setting the valid bit. + */ IXGBE_WRITE_REG(hw, IXGBE_RAL(index), rar_low); + IXGBE_WRITE_FLUSH(hw); IXGBE_WRITE_REG(hw, IXGBE_RAH(index), rar_high); return 0; @@ -1879,8 +1884,13 @@ s32 ixgbe_clear_rar_generic(struct ixgbe_hw *hw, u32 index) rar_high = IXGBE_READ_REG(hw, IXGBE_RAH(index)); rar_high &= ~(0x0000FFFF | IXGBE_RAH_AV); - IXGBE_WRITE_REG(hw, IXGBE_RAL(index), 0); + /* Clear the address valid bit and upper 16 bits of the address + * before clearing the lower bits. This way we aren't updating + * a live filter. + */ IXGBE_WRITE_REG(hw, IXGBE_RAH(index), rar_high); + IXGBE_WRITE_FLUSH(hw); + IXGBE_WRITE_REG(hw, IXGBE_RAL(index), 0); /* clear VMDq pool/queue selection for this RAR */ hw->mac.ops.clear_vmdq(hw, index, IXGBE_CLEAR_VMDQ_ALL); From faf0464e333010ecc3c4214e7942633d9f761296 Mon Sep 17 00:00:00 2001 From: Laura Abbott Date: Mon, 9 Jul 2018 17:45:57 -0700 Subject: [PATCH 769/970] tools: build: Use HOSTLDFLAGS with fixdep [ Upstream commit 8b247a92ebd0cda7dec49a6f771d9c4950f3d3ad ] The final link of fixdep uses LDFLAGS but not the existing HOSTLDFLAGS. Fix this. Signed-off-by: Laura Abbott Acked-by: Jiri Olsa Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/build/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/build/Makefile b/tools/build/Makefile index 8332959fbca4..20d9ae11b6e1 100644 --- a/tools/build/Makefile +++ b/tools/build/Makefile @@ -42,7 +42,7 @@ $(OUTPUT)fixdep-in.o: FORCE $(Q)$(MAKE) $(build)=fixdep $(OUTPUT)fixdep: $(OUTPUT)fixdep-in.o - $(QUIET_LINK)$(HOSTCC) $(LDFLAGS) -o $@ $< + $(QUIET_LINK)$(HOSTCC) $(HOSTLDFLAGS) -o $@ $< FORCE: From 4770fdc633650794717f5f1c2011a8c570e09cce Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Wed, 11 Jul 2018 12:00:45 -0400 Subject: [PATCH 770/970] packet: reset network header if packet shorter than ll reserved space [ Upstream commit 993675a3100b16a4c80dfd70cbcde8ea7127b31d ] If variable length link layer headers result in a packet shorter than dev->hard_header_len, reset the network header offset. Else skb->mac_len may exceed skb->len after skb_mac_reset_len. packet_sendmsg_spkt already has similar logic. Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index ea601f7ca2f8..61eec3e9ef66 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2917,6 +2917,8 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) goto out_free; } else if (reserve) { skb_reserve(skb, -reserve); + if (len < reserve) + skb_reset_network_header(skb); } /* Returns -EFAULT on error */ From d793d5ba6b5f4bce4c3d7994064ada548d65fedb Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 12 Jul 2018 15:23:45 +0300 Subject: [PATCH 771/970] qlogic: check kstrtoul() for errors [ Upstream commit 5fc853cc01c68f84984ecc2d5fd777ecad78240f ] We accidentally left out the error handling for kstrtoul(). Fixes: a520030e326a ("qlcnic: Implement flash sysfs callback for 83xx adapter") Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c index ccbb04503b27..b53a18e365c2 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sysfs.c @@ -1128,6 +1128,8 @@ static ssize_t qlcnic_83xx_sysfs_flash_write_handler(struct file *filp, struct qlcnic_adapter *adapter = dev_get_drvdata(dev); ret = kstrtoul(buf, 16, &data); + if (ret) + return ret; switch (data) { case QLC_83XX_FLASH_SECTOR_ERASE_CMD: From d4efb85f5f5a5eeba2a461b579bbe8b4af843f95 Mon Sep 17 00:00:00 2001 From: Yuchung Cheng Date: Thu, 12 Jul 2018 06:04:53 -0700 Subject: [PATCH 772/970] tcp: remove DELAYED ACK events in DCTCP [ Upstream commit a69258f7aa2623e0930212f09c586fd06674ad79 ] After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK related callbacks are no longer needed Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell Acked-by: Lawrence Brakmo Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 2 -- net/ipv4/tcp_dctcp.c | 25 ------------------------- net/ipv4/tcp_output.c | 4 ---- 3 files changed, 31 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 97d210535cdd..c3f4f6a9e6c3 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -850,8 +850,6 @@ enum tcp_ca_event { CA_EVENT_LOSS, /* loss timeout */ CA_EVENT_ECN_NO_CE, /* ECT set, but not CE marked */ CA_EVENT_ECN_IS_CE, /* received CE marked IP packet */ - CA_EVENT_DELAYED_ACK, /* Delayed ack is sent */ - CA_EVENT_NON_DELAYED_ACK, }; /* Information about inbound ACK, passed to cong_ops->in_ack_event() */ diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 8905a0aec8ee..a08cedf9d286 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -55,7 +55,6 @@ struct dctcp { u32 dctcp_alpha; u32 next_seq; u32 ce_state; - u32 delayed_ack_reserved; u32 loss_cwnd; }; @@ -96,7 +95,6 @@ static void dctcp_init(struct sock *sk) ca->dctcp_alpha = min(dctcp_alpha_on_init, DCTCP_MAX_ALPHA); - ca->delayed_ack_reserved = 0; ca->loss_cwnd = 0; ca->ce_state = 0; @@ -230,25 +228,6 @@ static void dctcp_state(struct sock *sk, u8 new_state) } } -static void dctcp_update_ack_reserved(struct sock *sk, enum tcp_ca_event ev) -{ - struct dctcp *ca = inet_csk_ca(sk); - - switch (ev) { - case CA_EVENT_DELAYED_ACK: - if (!ca->delayed_ack_reserved) - ca->delayed_ack_reserved = 1; - break; - case CA_EVENT_NON_DELAYED_ACK: - if (ca->delayed_ack_reserved) - ca->delayed_ack_reserved = 0; - break; - default: - /* Don't care for the rest. */ - break; - } -} - static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) { switch (ev) { @@ -258,10 +237,6 @@ static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev) case CA_EVENT_ECN_NO_CE: dctcp_ce_state_1_to_0(sk); break; - case CA_EVENT_DELAYED_ACK: - case CA_EVENT_NON_DELAYED_ACK: - dctcp_update_ack_reserved(sk, ev); - break; default: /* Don't care for the rest. */ break; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 5f916953b28e..bd68f073570b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3444,8 +3444,6 @@ void tcp_send_delayed_ack(struct sock *sk) int ato = icsk->icsk_ack.ato; unsigned long timeout; - tcp_ca_event(sk, CA_EVENT_DELAYED_ACK); - if (ato > TCP_DELACK_MIN) { const struct tcp_sock *tp = tcp_sk(sk); int max_ato = HZ / 2; @@ -3502,8 +3500,6 @@ void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) if (sk->sk_state == TCP_CLOSE) return; - tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK); - /* We are not putting this on the write queue, so * tcp_transmit_skb() will set the ownership to this * sock. From 6219a83d92f3316c690408858e9453c691bd7d11 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 3 Jul 2018 15:04:25 +0300 Subject: [PATCH 773/970] pinctrl: nsp: off by ones in nsp_pinmux_enable() [ Upstream commit f90a21c898db58eaea14b8ad7e9af3b9e15e5f8a ] The > comparisons should be >= or else we read beyond the end of the pinctrl->functions[] array. Fixes: cc4fa83f66e9 ("pinctrl: nsp: add pinmux driver support for Broadcom NSP SoC") Signed-off-by: Dan Carpenter Reviewed-by: Ray Jui Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/bcm/pinctrl-nsp-mux.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pinctrl/bcm/pinctrl-nsp-mux.c b/drivers/pinctrl/bcm/pinctrl-nsp-mux.c index 35c17653c694..5cd8166fbbc8 100644 --- a/drivers/pinctrl/bcm/pinctrl-nsp-mux.c +++ b/drivers/pinctrl/bcm/pinctrl-nsp-mux.c @@ -460,8 +460,8 @@ static int nsp_pinmux_enable(struct pinctrl_dev *pctrl_dev, const struct nsp_pin_function *func; const struct nsp_pin_group *grp; - if (grp_select > pinctrl->num_groups || - func_select > pinctrl->num_functions) + if (grp_select >= pinctrl->num_groups || + func_select >= pinctrl->num_functions) return -EINVAL; func = &pinctrl->functions[func_select]; From 3a2b9faaaca3e462d123693c9371eba087632a50 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Wed, 11 Jul 2018 12:34:21 +0000 Subject: [PATCH 774/970] pinctrl: nsp: Fix potential NULL dereference [ Upstream commit c29e9da56bebb4c2c794e871b0dc0298bbf08142 ] platform_get_resource() may fail and return NULL, so we should better check it's return value to avoid a NULL pointer dereference a bit later in the code. This is detected by Coccinelle semantic patch. @@ expression pdev, res, n, t, e, e1, e2; @@ res = platform_get_resource(pdev, t, n); + if (!res) + return -EINVAL; ... when != res == NULL e = devm_ioremap_nocache(e1, res->start, e2); Fixes: cc4fa83f66e9 ("pinctrl: nsp: add pinmux driver support for Broadcom NSP SoC") Signed-off-by: Wei Yongjun Reviewed-by: Ray Jui Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/bcm/pinctrl-nsp-mux.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pinctrl/bcm/pinctrl-nsp-mux.c b/drivers/pinctrl/bcm/pinctrl-nsp-mux.c index 5cd8166fbbc8..87618a4e90e4 100644 --- a/drivers/pinctrl/bcm/pinctrl-nsp-mux.c +++ b/drivers/pinctrl/bcm/pinctrl-nsp-mux.c @@ -577,6 +577,8 @@ static int nsp_pinmux_probe(struct platform_device *pdev) return PTR_ERR(pinctrl->base0); res = platform_get_resource(pdev, IORESOURCE_MEM, 1); + if (!res) + return -EINVAL; pinctrl->base1 = devm_ioremap_nocache(&pdev->dev, res->start, resource_size(res)); if (!pinctrl->base1) { From 700cbb69ed547f9f4428dc8db0dd1089c12d8186 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 3 Jul 2018 15:30:56 +0300 Subject: [PATCH 775/970] drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply() [ Upstream commit 7f073d011f93e92d4d225526b9ab6b8b0bbd6613 ] The bo array has req->nr_buffers elements so the > should be >= so we don't read beyond the end of the array. Fixes: a1606a9596e5 ("drm/nouveau: new gem pushbuf interface, bump to 0.0.16") Signed-off-by: Dan Carpenter Signed-off-by: Ben Skeggs Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/nouveau/nouveau_gem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index 909f69a302ee..505dca48b9f8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -601,7 +601,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, struct nouveau_bo *nvbo; uint32_t data; - if (unlikely(r->bo_index > req->nr_buffers)) { + if (unlikely(r->bo_index >= req->nr_buffers)) { NV_PRINTK(err, cli, "reloc bo index invalid\n"); ret = -EINVAL; break; @@ -611,7 +611,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli, if (b->presumed.valid) continue; - if (unlikely(r->reloc_bo_index > req->nr_buffers)) { + if (unlikely(r->reloc_bo_index >= req->nr_buffers)) { NV_PRINTK(err, cli, "reloc container bo index invalid\n"); ret = -EINVAL; break; From d2a4505931ffa75fdcad2722787dc1a8793d3382 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 13 Jul 2018 21:25:19 -0700 Subject: [PATCH 776/970] net/ethernet/freescale/fman: fix cross-build error [ Upstream commit c133459765fae249ba482f62e12f987aec4376f0 ] CC [M] drivers/net/ethernet/freescale/fman/fman.o In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35: ../include/linux/fsl/guts.h: In function 'guts_set_dmacr': ../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration] clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift); ^~~~~~~~~~~~~~~ Signed-off-by: Randy Dunlap Cc: Madalin Bucur Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/fsl/guts.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/fsl/guts.h b/include/linux/fsl/guts.h index 649e9171a9b3..270eef7d2267 100644 --- a/include/linux/fsl/guts.h +++ b/include/linux/fsl/guts.h @@ -16,6 +16,7 @@ #define __FSL_GUTS_H__ #include +#include /** * Global Utility Registers. From 97093827cd5e0331da3cd0d07678911576ed766a Mon Sep 17 00:00:00 2001 From: David Lechner Date: Mon, 16 Jul 2018 17:58:10 -0500 Subject: [PATCH 777/970] net: usb: rtl8150: demote allmulti message to dev_dbg() [ Upstream commit 3a9b0455062ffb9d2f6cd4473a76e3456f318c9f ] This driver can spam the kernel log with multiple messages of: net eth0: eth0: allmulti set Usually 4 or 8 at a time (probably because of using ConnMan). This message doesn't seem useful, so let's demote it from dev_info() to dev_dbg(). Signed-off-by: David Lechner Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/rtl8150.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/rtl8150.c b/drivers/net/usb/rtl8150.c index dc4f7ea95c9b..9504800217ed 100644 --- a/drivers/net/usb/rtl8150.c +++ b/drivers/net/usb/rtl8150.c @@ -681,7 +681,7 @@ static void rtl8150_set_multicast(struct net_device *netdev) (netdev->flags & IFF_ALLMULTI)) { rx_creg &= 0xfffe; rx_creg |= 0x0002; - dev_info(&netdev->dev, "%s: allmulti set\n", netdev->name); + dev_dbg(&netdev->dev, "%s: allmulti set\n", netdev->name); } else { /* ~RX_MULTICAST, ~RX_PROMISCUOUS */ rx_creg &= 0x00fc; From 3934e010d80dd944c4de659e24f7546b3828c0ec Mon Sep 17 00:00:00 2001 From: Sergei Shtylyov Date: Wed, 18 Jul 2018 15:40:26 -0500 Subject: [PATCH 778/970] PCI: OF: Fix I/O space page leak commit a5fb9fb023a1435f2b42bccd7f547560f3a21dc3 upstream. When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY driver was left disabled, the kernel crashed with this BUG: kernel BUG at lib/ioremap.c:72! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092 Hardware name: Renesas Condor board based on r8a77980 (DT) Workqueue: events deferred_probe_work_func pstate: 80000005 (Nzcv daif -PAN -UAO) pc : ioremap_page_range+0x370/0x3c8 lr : ioremap_page_range+0x40/0x3c8 sp : ffff000008da39e0 x29: ffff000008da39e0 x28: 00e8000000000f07 x27: ffff7dfffee00000 x26: 0140000000000000 x25: ffff7dfffef00000 x24: 00000000000fe100 x23: ffff80007b906000 x22: ffff000008ab8000 x21: ffff000008bb1d58 x20: ffff7dfffef00000 x19: ffff800009c30fb8 x18: 0000000000000001 x17: 00000000000152d0 x16: 00000000014012d0 x15: 0000000000000000 x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 x11: 0720072007300730 x10: 00000000000000ae x9 : 0000000000000000 x8 : ffff7dffff000000 x7 : 0000000000000000 x6 : 0000000000000100 x5 : 0000000000000000 x4 : 000000007b906000 x3 : ffff80007c61a880 x2 : ffff7dfffeefffff x1 : 0000000040000000 x0 : 00e80000fe100f07 Process kworker/0:1 (pid: 39, stack limit = 0x (ptrval)) Call trace: ioremap_page_range+0x370/0x3c8 pci_remap_iospace+0x7c/0xac pci_parse_request_of_pci_ranges+0x13c/0x190 rcar_pcie_probe+0x4c/0xb04 platform_drv_probe+0x50/0xbc driver_probe_device+0x21c/0x308 __device_attach_driver+0x98/0xc8 bus_for_each_drv+0x54/0x94 __device_attach+0xc4/0x12c device_initial_probe+0x10/0x18 bus_probe_device+0x90/0x98 deferred_probe_work_func+0xb0/0x150 process_one_work+0x12c/0x29c worker_thread+0x200/0x3fc kthread+0x108/0x134 ret_from_fork+0x10/0x18 Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000) It turned out that pci_remap_iospace() wasn't undone when the driver's probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER, the probe was retried, finally causing the BUG due to trying to remap already remapped pages. Introduce the devm_pci_remap_iospace() managed API and replace the pci_remap_iospace() call with it to fix the bug. Fixes: dbf9826d5797 ("PCI: generic: Convert to DT resource parsing API") Signed-off-by: Sergei Shtylyov [lorenzo.pieralisi@arm.com: split commit/updated the commit log] Signed-off-by: Lorenzo Pieralisi Signed-off-by: Bjorn Helgaas Reviewed-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/pci/host/pci-host-common.c | 2 +- drivers/pci/host/pcie-rcar.c | 2 +- drivers/pci/pci.c | 38 ++++++++++++++++++++++++++++++ include/linux/pci.h | 2 ++ 4 files changed, 42 insertions(+), 2 deletions(-) diff --git a/drivers/pci/host/pci-host-common.c b/drivers/pci/host/pci-host-common.c index e3c48b5deb93..5c90d7be2184 100644 --- a/drivers/pci/host/pci-host-common.c +++ b/drivers/pci/host/pci-host-common.c @@ -45,7 +45,7 @@ static int gen_pci_parse_request_of_pci_ranges(struct device *dev, switch (resource_type(res)) { case IORESOURCE_IO: - err = pci_remap_iospace(res, iobase); + err = devm_pci_remap_iospace(dev, res, iobase); if (err) { dev_warn(dev, "error %d: failed to map resource %pR\n", err, res); diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/host/pcie-rcar.c index 62700d1896f4..d6196f7b1d58 100644 --- a/drivers/pci/host/pcie-rcar.c +++ b/drivers/pci/host/pcie-rcar.c @@ -1102,7 +1102,7 @@ static int rcar_pcie_parse_request_of_pci_ranges(struct rcar_pcie *pci) struct resource *res = win->res; if (resource_type(res) == IORESOURCE_IO) { - err = pci_remap_iospace(res, iobase); + err = devm_pci_remap_iospace(dev, res, iobase); if (err) { dev_warn(dev, "error %d: failed to map resource %pR\n", err, res); diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 9c13aeeeb973..6b3c5c4cbb37 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3407,6 +3407,44 @@ void pci_unmap_iospace(struct resource *res) #endif } +static void devm_pci_unmap_iospace(struct device *dev, void *ptr) +{ + struct resource **res = ptr; + + pci_unmap_iospace(*res); +} + +/** + * devm_pci_remap_iospace - Managed pci_remap_iospace() + * @dev: Generic device to remap IO address for + * @res: Resource describing the I/O space + * @phys_addr: physical address of range to be mapped + * + * Managed pci_remap_iospace(). Map is automatically unmapped on driver + * detach. + */ +int devm_pci_remap_iospace(struct device *dev, const struct resource *res, + phys_addr_t phys_addr) +{ + const struct resource **ptr; + int error; + + ptr = devres_alloc(devm_pci_unmap_iospace, sizeof(*ptr), GFP_KERNEL); + if (!ptr) + return -ENOMEM; + + error = pci_remap_iospace(res, phys_addr); + if (error) { + devres_free(ptr); + } else { + *ptr = res; + devres_add(dev, ptr); + } + + return error; +} +EXPORT_SYMBOL(devm_pci_remap_iospace); + static void __pci_set_master(struct pci_dev *dev, bool enable) { u16 old_cmd, cmd; diff --git a/include/linux/pci.h b/include/linux/pci.h index 36522905685b..78c9f4a91d94 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1190,6 +1190,8 @@ int pci_register_io_range(phys_addr_t addr, resource_size_t size); unsigned long pci_address_to_pio(phys_addr_t addr); phys_addr_t pci_pio_to_address(unsigned long pio); int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr); +int devm_pci_remap_iospace(struct device *dev, const struct resource *res, + phys_addr_t phys_addr); void pci_unmap_iospace(struct resource *res); static inline pci_bus_addr_t pci_bus_address(struct pci_dev *pdev, int bar) From 90788ea4ff753732b6958a628914094e5bd93670 Mon Sep 17 00:00:00 2001 From: Sergei Shtylyov Date: Wed, 18 Jul 2018 15:40:40 -0500 Subject: [PATCH 779/970] PCI: versatile: Fix I/O space page leak [ Upstream commit 0018b265adf7e251f90d3ca1c7c0e32e2a0ad262 ] When testing the R-Car PCIe driver on the Condor board, if the PCIe PHY driver was left disabled, the kernel crashed with this BUG: kernel BUG at lib/ioremap.c:72! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 39 Comm: kworker/0:1 Not tainted 4.17.0-dirty #1092 Hardware name: Renesas Condor board based on r8a77980 (DT) Workqueue: events deferred_probe_work_func pstate: 80000005 (Nzcv daif -PAN -UAO) pc : ioremap_page_range+0x370/0x3c8 lr : ioremap_page_range+0x40/0x3c8 sp : ffff000008da39e0 x29: ffff000008da39e0 x28: 00e8000000000f07 x27: ffff7dfffee00000 x26: 0140000000000000 x25: ffff7dfffef00000 x24: 00000000000fe100 x23: ffff80007b906000 x22: ffff000008ab8000 x21: ffff000008bb1d58 x20: ffff7dfffef00000 x19: ffff800009c30fb8 x18: 0000000000000001 x17: 00000000000152d0 x16: 00000000014012d0 x15: 0000000000000000 x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 x11: 0720072007300730 x10: 00000000000000ae x9 : 0000000000000000 x8 : ffff7dffff000000 x7 : 0000000000000000 x6 : 0000000000000100 x5 : 0000000000000000 x4 : 000000007b906000 x3 : ffff80007c61a880 x2 : ffff7dfffeefffff x1 : 0000000040000000 x0 : 00e80000fe100f07 Process kworker/0:1 (pid: 39, stack limit = 0x (ptrval)) Call trace: ioremap_page_range+0x370/0x3c8 pci_remap_iospace+0x7c/0xac pci_parse_request_of_pci_ranges+0x13c/0x190 rcar_pcie_probe+0x4c/0xb04 platform_drv_probe+0x50/0xbc driver_probe_device+0x21c/0x308 __device_attach_driver+0x98/0xc8 bus_for_each_drv+0x54/0x94 __device_attach+0xc4/0x12c device_initial_probe+0x10/0x18 bus_probe_device+0x90/0x98 deferred_probe_work_func+0xb0/0x150 process_one_work+0x12c/0x29c worker_thread+0x200/0x3fc kthread+0x108/0x134 ret_from_fork+0x10/0x18 Code: f9004ba2 54000080 aa0003fb 17ffff48 (d4210000) It turned out that pci_remap_iospace() wasn't undone when the driver's probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER, the probe was retried, finally causing the BUG due to trying to remap already remapped pages. The Versatile PCI controller driver has the same issue. Replace pci_remap_iospace() with the devm_ managed version to fix the bug. Fixes: b7e78170efd4 ("PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver") Signed-off-by: Sergei Shtylyov [lorenzo.pieralisi@arm.com: updated the commit log] Signed-off-by: Lorenzo Pieralisi Signed-off-by: Bjorn Helgaas Reviewed-by: Linus Walleij Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/host/pci-versatile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/host/pci-versatile.c b/drivers/pci/host/pci-versatile.c index b7dc07002f13..4096cce0ef0e 100644 --- a/drivers/pci/host/pci-versatile.c +++ b/drivers/pci/host/pci-versatile.c @@ -89,7 +89,7 @@ static int versatile_pci_parse_request_of_pci_ranges(struct device *dev, switch (resource_type(res)) { case IORESOURCE_IO: - err = pci_remap_iospace(res, iobase); + err = devm_pci_remap_iospace(dev, res, iobase); if (err) { dev_warn(dev, "error %d: failed to map resource %pR\n", err, res); From 167e93c3e131b29a117a13719d052dbe752290ae Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Wed, 18 Jul 2018 08:31:43 +0200 Subject: [PATCH 780/970] net: qca_spi: Avoid packet drop during initial sync [ Upstream commit b2bab426dc715de147f8039a3fccff27d795f4eb ] As long as the synchronization with the QCA7000 isn't finished, we cannot accept packets from the upper layers. So let the SPI thread enable the TX queue after sync and avoid unwanted packet drop. Signed-off-by: Stefan Wahren Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qualcomm/qca_spi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 8bbb55f31909..d8ec8194f99d 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -635,7 +635,7 @@ qcaspi_netdev_open(struct net_device *dev) return ret; } - netif_start_queue(qca->net_dev); + /* SPI thread takes care of TX queue */ return 0; } From c8697ad8ec0e7b08a61efee60b4359e533dc038a Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Wed, 18 Jul 2018 08:31:44 +0200 Subject: [PATCH 781/970] net: qca_spi: Make sure the QCA7000 reset is triggered [ Upstream commit 711c62dfa6bdb4326ca6c587f295ea5c4f7269de ] In case the SPI thread is not running, a simple reset of sync state won't fix the transmit timeout. We also need to wake up the kernel thread. Signed-off-by: Stefan Wahren Fixes: ed7d42e24eff ("net: qca_spi: fix transmit queue timeout handling") Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qualcomm/qca_spi.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index d8ec8194f99d..837134f9f921 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -739,6 +739,9 @@ qcaspi_netdev_tx_timeout(struct net_device *dev) qca->net_dev->stats.tx_errors++; /* Trigger tx queue flush and QCA7000 reset */ qca->sync = QCASPI_SYNC_UNKNOWN; + + if (qca->spi_thread) + wake_up_process(qca->spi_thread); } static int From 1e15542511f37da441934cbe1c672ed68781833b Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Wed, 18 Jul 2018 08:31:45 +0200 Subject: [PATCH 782/970] net: qca_spi: Fix log level if probe fails [ Upstream commit 50973993260a6934f0a00da53d9b746cfbea89ab ] In cases the probing fails the log level of the messages should be an error. Signed-off-by: Stefan Wahren Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qualcomm/qca_spi.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index 837134f9f921..21f546587e3d 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -868,22 +868,22 @@ qca_spi_probe(struct spi_device *spi) if ((qcaspi_clkspeed < QCASPI_CLK_SPEED_MIN) || (qcaspi_clkspeed > QCASPI_CLK_SPEED_MAX)) { - dev_info(&spi->dev, "Invalid clkspeed: %d\n", - qcaspi_clkspeed); + dev_err(&spi->dev, "Invalid clkspeed: %d\n", + qcaspi_clkspeed); return -EINVAL; } if ((qcaspi_burst_len < QCASPI_BURST_LEN_MIN) || (qcaspi_burst_len > QCASPI_BURST_LEN_MAX)) { - dev_info(&spi->dev, "Invalid burst len: %d\n", - qcaspi_burst_len); + dev_err(&spi->dev, "Invalid burst len: %d\n", + qcaspi_burst_len); return -EINVAL; } if ((qcaspi_pluggable < QCASPI_PLUGGABLE_MIN) || (qcaspi_pluggable > QCASPI_PLUGGABLE_MAX)) { - dev_info(&spi->dev, "Invalid pluggable: %d\n", - qcaspi_pluggable); + dev_err(&spi->dev, "Invalid pluggable: %d\n", + qcaspi_pluggable); return -EINVAL; } @@ -944,8 +944,8 @@ qca_spi_probe(struct spi_device *spi) } if (register_netdev(qcaspi_devs)) { - dev_info(&spi->dev, "Unable to register net device %s\n", - qcaspi_devs->name); + dev_err(&spi->dev, "Unable to register net device %s\n", + qcaspi_devs->name); free_netdev(qcaspi_devs); return -EFAULT; } From e63303e63cdcc3ae6fd51c39e3400f58c43fee55 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Tue, 17 Jul 2018 18:27:45 -0700 Subject: [PATCH 783/970] tcp: identify cryptic messages as TCP seq # bugs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit e56b8ce363a36fb7b74b80aaa5cc9084f2c908b4 ] Attempt to make cryptic TCP seq number error messages clearer by (1) identifying the source of the message as "TCP", (2) identifying the errors as "seq # bug", and (3) grouping the field identifiers and values by separating them with commas. E.g., the following message is changed from: recvmsg bug 2: copied 73BCB6CD seq 70F17CBE rcvnxt 73BCB9AA fl 0 WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:1881 tcp_recvmsg+0x649/0xb90 to: TCP recvmsg seq # bug 2: copied 73BCB6CD, seq 70F17CBE, rcvnxt 73BCB9AA, fl 0 WARNING: CPU: 2 PID: 1501 at /linux/net/ipv4/tcp.c:2011 tcp_recvmsg+0x694/0xba0 Suggested-by: 積丹尼 Dan Jacobson Signed-off-by: Randy Dunlap Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8f15eae3325b..9de77d946f5a 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1696,7 +1696,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, * shouldn't happen. */ if (WARN(before(*seq, TCP_SKB_CB(skb)->seq), - "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n", + "TCP recvmsg seq # bug: copied %X, seq %X, rcvnxt %X, fl %X\n", *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt, flags)) break; @@ -1711,7 +1711,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) goto found_fin_ok; WARN(!(flags & MSG_PEEK), - "recvmsg bug 2: copied %X seq %X rcvnxt %X fl %X\n", + "TCP recvmsg seq # bug 2: copied %X, seq %X, rcvnxt %X, fl %X\n", *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt, flags); } From 1cd0c7d732b9e59449ef57daf245bbfafa956601 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 28 May 2018 13:31:13 +0200 Subject: [PATCH 784/970] KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer commit 9432a3175770e06cb83eada2d91fac90c977cb99 upstream. A comment warning against this bug is there, but the code is not doing what the comment says. Therefore it is possible that an EPOLLHUP races against irq_bypass_register_consumer. The EPOLLHUP handler schedules irqfd_shutdown, and if that runs soon enough, you get a use-after-free. Reported-by: syzbot Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Reviewed-by: David Hildenbrand Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman --- virt/kvm/eventfd.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 3f24eb1e8554..16e17ea06e1e 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -405,11 +405,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) if (events & POLLIN) schedule_work(&irqfd->inject); - /* - * do not drop the file until the irqfd is fully initialized, otherwise - * we might race against the POLLHUP - */ - fdput(f); #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS if (kvm_arch_has_irq_bypass()) { irqfd->consumer.token = (void *)irqfd->eventfd; @@ -425,6 +420,12 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) #endif srcu_read_unlock(&kvm->irq_srcu, idx); + + /* + * do not drop the file until the irqfd is fully initialized, otherwise + * we might race against the POLLHUP + */ + fdput(f); return 0; fail: From 51ada11083605de581b3c29212b5641d18cc8fcb Mon Sep 17 00:00:00 2001 From: Jeremy Cline Date: Thu, 2 Aug 2018 00:03:40 -0400 Subject: [PATCH 785/970] ext4: fix spectre gadget in ext4_mb_regular_allocator() commit 1a5d5e5d51e75a5bca67dadbcea8c841934b7b85 upstream. 'ac->ac_g_ex.fe_len' is a user-controlled value which is used in the derivation of 'ac->ac_2order'. 'ac->ac_2order', in turn, is used to index arrays which makes it a potential spectre gadget. Fix this by sanitizing the value assigned to 'ac->ac2_order'. This covers the following accesses found with the help of smatch: * fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential spectre issue 'grp->bb_counters' [w] (local cap) * fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap) * fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue 'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap) Suggested-by: Josh Poimboeuf Signed-off-by: Jeremy Cline Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/mballoc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 53e1890660a2..a49d0e5d7baf 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -2144,7 +2145,8 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) * This should tell if fe_len is exactly power of 2 */ if ((ac->ac_g_ex.fe_len & (~(1 << (i - 1)))) == 0) - ac->ac_2order = i - 1; + ac->ac_2order = array_index_nospec(i - 1, + sb->s_blocksize_bits + 2); } /* if stream allocation is enabled, use global goal */ From 8725807e91e293552a130acb120f316f6f28938a Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Sun, 12 Aug 2018 16:38:03 -0400 Subject: [PATCH 786/970] parisc: Remove ordered stores from syscall.S MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 7797167ffde1f00446301cb22b37b7c03194cfaf upstream. Now that we use a sync prior to releasing the locks in syscall.S, we don't need the PA 2.0 ordered stores used to release some locks.  Using an ordered store, potentially slows the release and subsequent code. There are a number of other ordered stores and loads that serve no purpose.  I have converted these to normal stores. Signed-off-by: John David Anglin Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/kernel/syscall.S | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S index 4886a6db42e9..5f7e57fcaeef 100644 --- a/arch/parisc/kernel/syscall.S +++ b/arch/parisc/kernel/syscall.S @@ -629,12 +629,12 @@ cas_action: stw %r1, 4(%sr2,%r20) #endif /* The load and store could fail */ -1: ldw,ma 0(%r26), %r28 +1: ldw 0(%r26), %r28 sub,<> %r28, %r25, %r0 -2: stw,ma %r24, 0(%r26) +2: stw %r24, 0(%r26) /* Free lock */ sync - stw,ma %r20, 0(%sr2,%r20) + stw %r20, 0(%sr2,%r20) #if ENABLE_LWS_DEBUG /* Clear thread register indicator */ stw %r0, 4(%sr2,%r20) @@ -798,30 +798,30 @@ cas2_action: ldo 1(%r0),%r28 /* 8bit CAS */ -13: ldb,ma 0(%r26), %r29 +13: ldb 0(%r26), %r29 sub,= %r29, %r25, %r0 b,n cas2_end -14: stb,ma %r24, 0(%r26) +14: stb %r24, 0(%r26) b cas2_end copy %r0, %r28 nop nop /* 16bit CAS */ -15: ldh,ma 0(%r26), %r29 +15: ldh 0(%r26), %r29 sub,= %r29, %r25, %r0 b,n cas2_end -16: sth,ma %r24, 0(%r26) +16: sth %r24, 0(%r26) b cas2_end copy %r0, %r28 nop nop /* 32bit CAS */ -17: ldw,ma 0(%r26), %r29 +17: ldw 0(%r26), %r29 sub,= %r29, %r25, %r0 b,n cas2_end -18: stw,ma %r24, 0(%r26) +18: stw %r24, 0(%r26) b cas2_end copy %r0, %r28 nop @@ -829,10 +829,10 @@ cas2_action: /* 64bit CAS */ #ifdef CONFIG_64BIT -19: ldd,ma 0(%r26), %r29 +19: ldd 0(%r26), %r29 sub,*= %r29, %r25, %r0 b,n cas2_end -20: std,ma %r24, 0(%r26) +20: std %r24, 0(%r26) copy %r0, %r28 #else /* Compare first word */ @@ -851,7 +851,7 @@ cas2_action: cas2_end: /* Free lock */ sync - stw,ma %r20, 0(%sr2,%r20) + stw %r20, 0(%sr2,%r20) /* Enable interrupts */ ssm PSW_SM_I, %r0 /* Return to userspace, set no error */ From 2038a9e1c7f037bf811111b552487c3a39501a38 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 18 Jun 2018 21:35:07 -0700 Subject: [PATCH 787/970] xfrm_user: prevent leaking 2 bytes of kernel memory commit 45c180bc29babbedd6b8c01b975780ef44d9d09c upstream. struct xfrm_userpolicy_type has two holes, so we should not use C99 style initializer. KMSAN report: BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:140 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571 CPU: 1 PID: 4520 Comm: syz-executor841 Not tainted 4.17.0+ #5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1211 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1253 copyout lib/iov_iter.c:140 [inline] _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571 copy_to_iter include/linux/uio.h:106 [inline] skb_copy_datagram_iter+0x422/0xfa0 net/core/datagram.c:431 skb_copy_datagram_msg include/linux/skbuff.h:3268 [inline] netlink_recvmsg+0x6f1/0x1900 net/netlink/af_netlink.c:1959 sock_recvmsg_nosec net/socket.c:802 [inline] sock_recvmsg+0x1d6/0x230 net/socket.c:809 ___sys_recvmsg+0x3fe/0x810 net/socket.c:2279 __sys_recvmmsg+0x58e/0xe30 net/socket.c:2391 do_sys_recvmmsg+0x2a6/0x3e0 net/socket.c:2472 __do_sys_recvmmsg net/socket.c:2485 [inline] __se_sys_recvmmsg net/socket.c:2481 [inline] __x64_sys_recvmmsg+0x15d/0x1c0 net/socket.c:2481 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x446ce9 RSP: 002b:00007fc307918db8 EFLAGS: 00000293 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00000000006dbc24 RCX: 0000000000446ce9 RDX: 000000000000000a RSI: 0000000020005040 RDI: 0000000000000003 RBP: 00000000006dbc20 R08: 0000000020004e40 R09: 0000000000000000 R10: 0000000040000000 R11: 0000000000000293 R12: 0000000000000000 R13: 00007ffc8d2df32f R14: 00007fc3079199c0 R15: 0000000000000001 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527 __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:413 __nla_put lib/nlattr.c:569 [inline] nla_put+0x276/0x340 lib/nlattr.c:627 copy_to_user_policy_type net/xfrm/xfrm_user.c:1678 [inline] dump_one_policy+0xbe1/0x1090 net/xfrm/xfrm_user.c:1708 xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013 xfrm_dump_policy+0x1c0/0x2a0 net/xfrm/xfrm_user.c:1749 netlink_dump+0x9b5/0x1550 net/netlink/af_netlink.c:2226 __netlink_dump_start+0x1131/0x1270 net/netlink/af_netlink.c:2323 netlink_dump_start include/linux/netlink.h:214 [inline] xfrm_user_rcv_msg+0x8a3/0x9b0 net/xfrm/xfrm_user.c:2577 netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448 xfrm_netlink_rcv+0xb2/0xf0 net/xfrm/xfrm_user.c:2598 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Local variable description: ----upt.i@dump_one_policy Variable was created at: dump_one_policy+0x78/0x1090 net/xfrm/xfrm_user.c:1689 xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013 Byte 130 of 137 is uninitialized Memory access starts at ffff88019550407f Fixes: c0144beaeca42 ("[XFRM] netlink: Use nla_put()/NLA_PUT() variantes") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Steffen Klassert Cc: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_user.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 6a029358bfd1..bb61956c0f9c 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -1628,9 +1628,11 @@ static inline size_t userpolicy_type_attrsize(void) #ifdef CONFIG_XFRM_SUB_POLICY static int copy_to_user_policy_type(u8 type, struct sk_buff *skb) { - struct xfrm_userpolicy_type upt = { - .type = type, - }; + struct xfrm_userpolicy_type upt; + + /* Sadly there are two holes in struct xfrm_userpolicy_type */ + memset(&upt, 0, sizeof(upt)); + upt.type = type; return nla_put(skb, XFRMA_POLICY_TYPE, sizeof(upt), &upt); } From f29eb8ee47957892155c8b7ad33abe29f50bc4e4 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 17 Jul 2018 21:03:15 +0200 Subject: [PATCH 788/970] netfilter: conntrack: dccp: treat SYNC/SYNCACK as invalid if no prior state commit 6613b6173dee098997229caf1f3b961c49da75e6 upstream. When first DCCP packet is SYNC or SYNCACK, we insert a new conntrack that has an un-initialized timeout value, i.e. such entry could be reaped at any time. Mark them as INVALID and only ignore SYNC/SYNCACK when connection had an old state. Reported-by: syzbot+6f18401420df260e37ed@syzkaller.appspotmail.com Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_conntrack_proto_dccp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c index a45bee52dccc..d5560ae8c4b3 100644 --- a/net/netfilter/nf_conntrack_proto_dccp.c +++ b/net/netfilter/nf_conntrack_proto_dccp.c @@ -244,14 +244,14 @@ dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = * We currently ignore Sync packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ - sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG, + sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, [DCCP_PKT_SYNCACK] = { /* * We currently ignore SyncAck packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ - sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG, + sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, }, [CT_DCCP_ROLE_SERVER] = { @@ -372,14 +372,14 @@ dccp_state_table[CT_DCCP_ROLE_MAX + 1][DCCP_PKT_SYNCACK + 1][CT_DCCP_MAX + 1] = * We currently ignore Sync packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ - sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG, + sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, [DCCP_PKT_SYNCACK] = { /* * We currently ignore SyncAck packets * * sNO, sRQ, sRS, sPO, sOP, sCR, sCG, sTW */ - sIG, sIG, sIG, sIG, sIG, sIG, sIG, sIG, + sIV, sIG, sIG, sIG, sIG, sIG, sIG, sIG, }, }, }; From 59629848737a9b3a1ff461e3ece1ec3d3bf87ced Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Mon, 6 Aug 2018 10:38:34 -0400 Subject: [PATCH 789/970] packet: refine ring v3 block size test to hold one frame commit 4576cd469d980317c4edd9173f8b694aa71ea3a3 upstream. TPACKET_V3 stores variable length frames in fixed length blocks. Blocks must be able to store a block header, optional private space and at least one minimum sized frame. Frames, even for a zero snaplen packet, store metadata headers and optional reserved space. In the block size bounds check, ensure that the frame of the chosen configuration fits. This includes sockaddr_ll and optional tp_reserve. Syzbot was able to construct a ring with insuffient room for the sockaddr_ll in the header of a zero-length frame, triggering an out-of-bounds write in dev_parse_header. Convert the comparison to less than, as zero is a valid snap len. This matches the test for minimum tp_frame_size immediately below. Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Fixes: eb73190f4fbe ("net/packet: refine check for priv area size") Reported-by: syzbot Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 61eec3e9ef66..24412e8f4061 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4275,6 +4275,8 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, } if (req->tp_block_nr) { + unsigned int min_frame_size; + /* Sanity tests and some calculations */ err = -EBUSY; if (unlikely(rb->pg_vec)) @@ -4297,12 +4299,12 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, goto out; if (unlikely(!PAGE_ALIGNED(req->tp_block_size))) goto out; + min_frame_size = po->tp_hdrlen + po->tp_reserve; if (po->tp_version >= TPACKET_V3 && - req->tp_block_size <= - BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv) + sizeof(struct tpacket3_hdr)) + req->tp_block_size < + BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv) + min_frame_size) goto out; - if (unlikely(req->tp_frame_size < po->tp_hdrlen + - po->tp_reserve)) + if (unlikely(req->tp_frame_size < min_frame_size)) goto out; if (unlikely(req->tp_frame_size & (TPACKET_ALIGNMENT - 1))) goto out; From eba0611e98f117f034efa6ad7ff488892c33e601 Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Sun, 12 Aug 2018 16:31:17 -0400 Subject: [PATCH 790/970] parisc: Remove unnecessary barriers from spinlock.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3b885ac1dc35b87a39ee176a6c7e2af9c789d8b8 upstream. Now that mb() is an instruction barrier, it will slow performance if we issue unnecessary barriers. The spinlock defines have a number of unnecessary barriers.  The __ldcw() define is both a hardware and compiler barrier.  The mb() barriers in the routines using __ldcw() serve no purpose. The only barrier needed is the one in arch_spin_unlock().  We need to ensure all accesses are complete prior to releasing the lock. Signed-off-by: John David Anglin Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/include/asm/spinlock.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/parisc/include/asm/spinlock.h b/arch/parisc/include/asm/spinlock.h index e32936cd7f10..7031483b110c 100644 --- a/arch/parisc/include/asm/spinlock.h +++ b/arch/parisc/include/asm/spinlock.h @@ -26,7 +26,6 @@ static inline void arch_spin_lock_flags(arch_spinlock_t *x, { volatile unsigned int *a; - mb(); a = __ldcw_align(x); while (__ldcw(a) == 0) while (*a == 0) @@ -36,16 +35,15 @@ static inline void arch_spin_lock_flags(arch_spinlock_t *x, local_irq_disable(); } else cpu_relax(); - mb(); } static inline void arch_spin_unlock(arch_spinlock_t *x) { volatile unsigned int *a; - mb(); + a = __ldcw_align(x); - *a = 1; mb(); + *a = 1; } static inline int arch_spin_trylock(arch_spinlock_t *x) @@ -53,10 +51,8 @@ static inline int arch_spin_trylock(arch_spinlock_t *x) volatile unsigned int *a; int ret; - mb(); a = __ldcw_align(x); ret = __ldcw(a) != 0; - mb(); return ret; } From f2842452e21d088ca997a9eb26669023812402a8 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Thu, 19 Jul 2018 17:27:31 -0500 Subject: [PATCH 791/970] PCI: hotplug: Don't leak pci_slot on registration failure commit 4ce6435820d1f1cc2c2788e232735eb244bcc8a3 upstream. If addition of sysfs files fails on registration of a hotplug slot, the struct pci_slot as well as the entry in the slot_list is leaked. The issue has been present since the hotplug core was introduced in 2002: https://git.kernel.org/tglx/history/c/a8a2069f432c Perhaps the idea was that even though sysfs addition fails, the slot should still be usable. But that's not how drivers use the interface, they abort probe if a non-zero value is returned. Signed-off-by: Lukas Wunner Signed-off-by: Bjorn Helgaas Cc: stable@vger.kernel.org # v2.4.15+ Cc: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/pci_hotplug_core.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/pci/hotplug/pci_hotplug_core.c b/drivers/pci/hotplug/pci_hotplug_core.c index fea0b8b33589..0a3b3f72c94b 100644 --- a/drivers/pci/hotplug/pci_hotplug_core.c +++ b/drivers/pci/hotplug/pci_hotplug_core.c @@ -455,8 +455,17 @@ int __pci_hp_register(struct hotplug_slot *slot, struct pci_bus *bus, list_add(&slot->slot_list, &pci_hotplug_slot_list); result = fs_add_slot(pci_slot); + if (result) + goto err_list_del; + kobject_uevent(&pci_slot->kobj, KOBJ_ADD); dbg("Added slot %s to the list\n", name); + goto out; + +err_list_del: + list_del(&slot->slot_list); + pci_slot->hotplug = NULL; + pci_destroy_slot(pci_slot); out: mutex_unlock(&pci_hp_mutex); return result; From 73aae596d106859e1e297ff9a106a8222721b4bd Mon Sep 17 00:00:00 2001 From: Myron Stowe Date: Mon, 13 Aug 2018 12:19:39 -0600 Subject: [PATCH 792/970] PCI: Skip MPS logic for Virtual Functions (VFs) commit 3dbe97efe8bf450b183d6dee2305cbc032e6b8a4 upstream. PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for VFs. Just prior to the table it states: "PF and VF functionality is defined in Section 7.5.3.4 except where noted in Table 9-16. For VF fields marked 'RsvdP', the PF setting applies to the VF." All of which implies that with respect to Max_Payload_Size Supported (MPSS), MPS, and MRRS values, we should not be paying any attention to the VF's fields, but rather only to the PF's. Only looking at the PF's fields also logically makes sense as it's the sole physical interface to the PCIe bus. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527 Fixes: 27d868b5e6cf ("PCI: Set MPS to match upstream bridge") Signed-off-by: Myron Stowe Signed-off-by: Bjorn Helgaas Cc: stable@vger.kernel.org # 4.3+ Cc: Keith Busch Cc: Sinan Kaya Cc: Dongdong Liu Cc: Jon Mason Signed-off-by: Greg Kroah-Hartman --- drivers/pci/probe.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 56340abe4fc6..16611cf3aba4 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1363,6 +1363,10 @@ static void pci_configure_mps(struct pci_dev *dev) if (!pci_is_pcie(dev) || !bridge || !pci_is_pcie(bridge)) return; + /* MPS and MRRS fields are of type 'RsvdP' for VFs, short-circuit out */ + if (dev->is_virtfn) + return; + mps = pcie_get_mps(dev); p_mps = pcie_get_mps(bridge); From 3fcdcdd50c1f9e60bbaab05f9a42079e481c2454 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Thu, 19 Jul 2018 17:27:32 -0500 Subject: [PATCH 793/970] PCI: pciehp: Fix use-after-free on unplug commit 281e878eab191cce4259abbbf1a0322e3adae02c upstream. When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the hotplug_slot struct is deregistered and thus freed before freeing the IRQ. The IRQ handler and the work items it schedules print the slot name referenced from the freed structure in various informational and debug log messages, each time resulting in a quadruple dereference of freed pointers (hotplug_slot -> pci_slot -> kobject -> name). At best the slot name is logged as "(null)", at worst kernel memory is exposed in logs or the driver crashes: pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present An attacker may provoke the bug by unplugging multiple devices on a Thunderbolt daisy chain at once. Unplugging can also be simulated by powering down slots via sysfs. The bug is particularly easy to trigger in poll mode. It has been present since the driver's introduction in 2004: https://git.kernel.org/tglx/history/c/c16b4b14d980 Fix by rearranging teardown such that the IRQ is freed first. Run the work items queued by the IRQ handler to completion before freeing the hotplug_slot struct by draining the work queue from the ->release_slot callback which is invoked by pci_hp_deregister(). Signed-off-by: Lukas Wunner Signed-off-by: Bjorn Helgaas Cc: stable@vger.kernel.org # v2.6.4 Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/pciehp.h | 1 + drivers/pci/hotplug/pciehp_core.c | 7 +++++++ drivers/pci/hotplug/pciehp_hpc.c | 5 ++--- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 2bba8481beb1..56c0b60df25e 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -132,6 +132,7 @@ int pciehp_unconfigure_device(struct slot *p_slot); void pciehp_queue_pushbutton_work(struct work_struct *work); struct controller *pcie_init(struct pcie_device *dev); int pcie_init_notification(struct controller *ctrl); +void pcie_shutdown_notification(struct controller *ctrl); int pciehp_enable_slot(struct slot *p_slot); int pciehp_disable_slot(struct slot *p_slot); void pcie_reenable_notification(struct controller *ctrl); diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index 6620b1095046..a7485bc1e975 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -76,6 +76,12 @@ static int reset_slot(struct hotplug_slot *slot, int probe); */ static void release_slot(struct hotplug_slot *hotplug_slot) { + struct slot *slot = hotplug_slot->private; + + /* queued work needs hotplug_slot name */ + cancel_delayed_work(&slot->work); + drain_workqueue(slot->wq); + kfree(hotplug_slot->ops); kfree(hotplug_slot->info); kfree(hotplug_slot); @@ -278,6 +284,7 @@ static void pciehp_remove(struct pcie_device *dev) { struct controller *ctrl = get_service_data(dev); + pcie_shutdown_notification(ctrl); cleanup_slot(ctrl); pciehp_release_ctrl(ctrl); } diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 8d811ea353c8..32b97881c7c5 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -786,7 +786,7 @@ int pcie_init_notification(struct controller *ctrl) return 0; } -static void pcie_shutdown_notification(struct controller *ctrl) +void pcie_shutdown_notification(struct controller *ctrl) { if (ctrl->notification_enabled) { pcie_disable_notification(ctrl); @@ -821,7 +821,7 @@ static int pcie_init_slot(struct controller *ctrl) static void pcie_cleanup_slot(struct controller *ctrl) { struct slot *slot = ctrl->slot; - cancel_delayed_work(&slot->work); + destroy_workqueue(slot->wq); kfree(slot); } @@ -902,7 +902,6 @@ struct controller *pcie_init(struct pcie_device *dev) void pciehp_release_ctrl(struct controller *ctrl) { - pcie_shutdown_notification(ctrl); pcie_cleanup_slot(ctrl); kfree(ctrl); } From 86a3d597235d7305e3acad62fe97dc59ec2db76f Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Thu, 19 Jul 2018 17:27:34 -0500 Subject: [PATCH 794/970] PCI: pciehp: Fix unprotected list iteration in IRQ handler commit 1204e35bedf4e5015cda559ed8c84789a6dae24e upstream. Commit b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") iterates over the devices on a hotplug port's subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem. It is thus possible for a user to cause a crash by concurrently manipulating the device list, e.g. by disabling slot power via sysfs on a different CPU or by initiating a remove/rescan via sysfs. This can't be fixed by acquiring pci_bus_sem because it may sleep. The simplest fix is to avoid the list iteration altogether and just check the ignore_hotplug flag on the port itself. This works because pci_ignore_hotplug() sets the flag both on the device as well as on its parent bridge. We do lose the ability to print the name of the device blocking hotplug in the debug message, but that's probably bearable. Fixes: b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") Signed-off-by: Lukas Wunner Signed-off-by: Bjorn Helgaas Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/pci/hotplug/pciehp_hpc.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 32b97881c7c5..8b8b096167d7 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -562,8 +562,6 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) { struct controller *ctrl = (struct controller *)dev_id; struct pci_dev *pdev = ctrl_dev(ctrl); - struct pci_bus *subordinate = pdev->subordinate; - struct pci_dev *dev; struct slot *slot = ctrl->slot; u16 status, events; u8 present; @@ -611,14 +609,9 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) wake_up(&ctrl->queue); } - if (subordinate) { - list_for_each_entry(dev, &subordinate->devices, bus_list) { - if (dev->ignore_hotplug) { - ctrl_dbg(ctrl, "ignoring hotplug event %#06x (%s requested no hotplug)\n", - events, pci_name(dev)); - return IRQ_HANDLED; - } - } + if (pdev->ignore_hotplug) { + ctrl_dbg(ctrl, "ignoring hotplug event %#06x\n", events); + return IRQ_HANDLED; } /* Check Attention Button Pressed */ From 44745bd15116818e831e4d1ec5fbe668c2fdf942 Mon Sep 17 00:00:00 2001 From: Esben Haabendal Date: Thu, 16 Aug 2018 10:43:12 +0200 Subject: [PATCH 795/970] i2c: imx: Fix race condition in dma read MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit bed4ff1ed4d8f2ef5007c5c6ae1b29c5677a3632 upstream. This fixes a race condition, where the DMAEN bit ends up being set after I2C slave has transmitted a byte following the dummy read. When that happens, an interrupt is generated instead, and no DMA request is generated to kickstart the DMA read, and a timeout happens after DMA_TIMEOUT (1 sec). Fixed by setting the DMAEN bit before the dummy read. Signed-off-by: Esben Haabendal Acked-by: Uwe Kleine-König Signed-off-by: Wolfram Sang Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-imx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c index 4d297d554e52..c4188308cefa 100644 --- a/drivers/i2c/busses/i2c-imx.c +++ b/drivers/i2c/busses/i2c-imx.c @@ -665,9 +665,6 @@ static int i2c_imx_dma_read(struct imx_i2c_struct *i2c_imx, struct imx_i2c_dma *dma = i2c_imx->dma; struct device *dev = &i2c_imx->adapter.dev; - temp = imx_i2c_read_reg(i2c_imx, IMX_I2C_I2CR); - temp |= I2CR_DMAEN; - imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2CR); dma->chan_using = dma->chan_rx; dma->dma_transfer_dir = DMA_DEV_TO_MEM; @@ -780,6 +777,7 @@ static int i2c_imx_read(struct imx_i2c_struct *i2c_imx, struct i2c_msg *msgs, bo int i, result; unsigned int temp; int block_data = msgs->flags & I2C_M_RECV_LEN; + int use_dma = i2c_imx->dma && msgs->len >= DMA_THRESHOLD && !block_data; dev_dbg(&i2c_imx->adapter.dev, "<%s> write slave address: addr=0x%x\n", @@ -806,12 +804,14 @@ static int i2c_imx_read(struct imx_i2c_struct *i2c_imx, struct i2c_msg *msgs, bo */ if ((msgs->len - 1) || block_data) temp &= ~I2CR_TXAK; + if (use_dma) + temp |= I2CR_DMAEN; imx_i2c_write_reg(temp, i2c_imx, IMX_I2C_I2CR); imx_i2c_read_reg(i2c_imx, IMX_I2C_I2DR); /* dummy read */ dev_dbg(&i2c_imx->adapter.dev, "<%s> read data\n", __func__); - if (i2c_imx->dma && msgs->len >= DMA_THRESHOLD && !block_data) + if (use_dma) return i2c_imx_dma_read(i2c_imx, msgs, is_lastmsg); /* read data */ From 696d906b168d5b8082bf87dcbb92c67218a7fb1f Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 21 Aug 2018 21:59:37 -0700 Subject: [PATCH 796/970] reiserfs: fix broken xattr handling (heap corruption, bad retval) commit a13f085d111e90469faf2d9965eb39b11c114d7e upstream. This fixes the following issues: - When a buffer size is supplied to reiserfs_listxattr() such that each individual name fits, but the concatenation of all names doesn't fit, reiserfs_listxattr() overflows the supplied buffer. This leads to a kernel heap overflow (verified using KASAN) followed by an out-of-bounds usercopy and is therefore a security bug. - When a buffer size is supplied to reiserfs_listxattr() such that a name doesn't fit, -ERANGE should be returned. But reiserfs instead just truncates the list of names; I have verified that if the only xattr on a file has a longer name than the supplied buffer length, listxattr() incorrectly returns zero. With my patch applied, -ERANGE is returned in both cases and the memory corruption doesn't happen anymore. Credit for making me clean this code up a bit goes to Al Viro, who pointed out that the ->actor calling convention is suboptimal and should be changed. Link: http://lkml.kernel.org/r/20180802151539.5373-1-jannh@google.com Fixes: 48b32a3553a5 ("reiserfs: use generic xattr handlers") Signed-off-by: Jann Horn Acked-by: Jeff Mahoney Cc: Eric Biggers Cc: Al Viro Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/reiserfs/xattr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c index e87aa21c30de..06a9fae202a7 100644 --- a/fs/reiserfs/xattr.c +++ b/fs/reiserfs/xattr.c @@ -791,8 +791,10 @@ static int listxattr_filler(struct dir_context *ctx, const char *name, return 0; size = namelen + 1; if (b->buf) { - if (size > b->size) + if (b->pos + size > b->size) { + b->pos = -ERANGE; return -ERANGE; + } memcpy(b->buf + b->pos, name, namelen); b->buf[b->pos + namelen] = 0; } From e8d49e4292d9156a081752dee3f5a0cd12857da9 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 24 Aug 2018 13:12:43 +0200 Subject: [PATCH 797/970] Linux 4.9.124 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index b11e375bb18e..53d57acfc17e 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 123 +SUBLEVEL = 124 EXTRAVERSION = NAME = Roaring Lionus From ce723f865c98ac68e37815c6f9000f6ce5d2b3dd Mon Sep 17 00:00:00 2001 From: Eyal Birger Date: Thu, 7 Jun 2018 10:11:02 +0300 Subject: [PATCH 798/970] vti6: fix PMTU caching and reporting on xmit [ Upstream commit d6990976af7c5d8f55903bfb4289b6fb030bf754 ] When setting the skb->dst before doing the MTU check, the route PMTU caching and reporting is done on the new dst which is about to be released. Instead, PMTU handling should be done using the original dst. This is aligned with IPv4 VTI. Fixes: ccd740cbc6 ("vti6: Add pmtu handling to vti6_xmit.") Signed-off-by: Eyal Birger Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_vti.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index beae93fd66d5..a5aeeb613fac 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -480,10 +480,6 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) goto tx_err_dst_release; } - skb_scrub_packet(skb, !net_eq(t->net, dev_net(dev))); - skb_dst_set(skb, dst); - skb->dev = skb_dst(skb)->dev; - mtu = dst_mtu(dst); if (!skb->ignore_df && skb->len > mtu) { skb_dst(skb)->ops->update_pmtu(dst, NULL, skb, mtu); @@ -498,9 +494,14 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) htonl(mtu)); } - return -EMSGSIZE; + err = -EMSGSIZE; + goto tx_err_dst_release; } + skb_scrub_packet(skb, !net_eq(t->net, dev_net(dev))); + skb_dst_set(skb, dst); + skb->dev = skb_dst(skb)->dev; + err = dst_output(t->net, skb->sk, skb); if (net_xmit_eval(err) == 0) { struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats); From 590f312a78bf86a1c9230ef6bb43795ac71e14b1 Mon Sep 17 00:00:00 2001 From: Tommi Rantala Date: Thu, 21 Jun 2018 09:30:47 +0300 Subject: [PATCH 799/970] xfrm: fix missing dst_release() after policy blocking lbcast and multicast [ Upstream commit 8cc88773855f988d6a3bbf102bbd9dd9c828eb81 ] Fix missing dst_release() when local broadcast or multicast traffic is xfrm policy blocked. For IPv4 this results to dst leak: ip_route_output_flow() allocates dst_entry via __ip_route_output_key() and passes it to xfrm_lookup_route(). xfrm_lookup returns ERR_PTR(-EPERM) that is propagated. The dst that was allocated is never released. IPv4 local broadcast testcase: ping -b 192.168.1.255 & sleep 1 ip xfrm policy add src 0.0.0.0/0 dst 192.168.1.255/32 dir out action block IPv4 multicast testcase: ping 224.0.0.1 & sleep 1 ip xfrm policy add src 0.0.0.0/0 dst 224.0.0.1/32 dir out action block For IPv6 the missing dst_release() causes trouble e.g. when used in netns: ip netns add TEST ip netns exec TEST ip link set lo up ip link add dummy0 type dummy ip link set dev dummy0 netns TEST ip netns exec TEST ip addr add fd00::1111 dev dummy0 ip netns exec TEST ip link set dummy0 up ip netns exec TEST ping -6 -c 5 ff02::1%dummy0 & sleep 1 ip netns exec TEST ip xfrm policy add src ::/0 dst ff02::1 dir out action block wait ip netns del TEST After netns deletion we see: [ 258.239097] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 268.279061] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 278.367018] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 288.375259] unregister_netdevice: waiting for lo to become free. Usage count = 2 Fixes: ac37e2515c1a ("xfrm: release dst_orig in case of error in xfrm_lookup()") Signed-off-by: Tommi Rantala Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_policy.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 5b8fa6832687..1f943d97dc29 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2354,6 +2354,9 @@ struct dst_entry *xfrm_lookup_route(struct net *net, struct dst_entry *dst_orig, if (IS_ERR(dst) && PTR_ERR(dst) == -EREMOTE) return make_blackhole(net, dst_orig->ops->family, dst_orig); + if (IS_ERR(dst)) + dst_release(dst_orig); + return dst; } EXPORT_SYMBOL(xfrm_lookup_route); From 301a6da4c8bee125c0e2e2fa4380a2cf9619c6cc Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 25 Jun 2018 14:00:07 +0200 Subject: [PATCH 800/970] xfrm: free skb if nlsk pointer is NULL [ Upstream commit 86126b77dcd551ce223e7293bb55854e3df05646 ] nlmsg_multicast() always frees the skb, so in case we cannot call it we must do that ourselves. Fixes: 21ee543edc0dea ("xfrm: fix race between netns cleanup and state expire notification") Signed-off-by: Florian Westphal Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_user.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index bb61956c0f9c..6e768093d7c8 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -984,10 +984,12 @@ static inline int xfrm_nlmsg_multicast(struct net *net, struct sk_buff *skb, { struct sock *nlsk = rcu_dereference(net->xfrm.nlsk); - if (nlsk) - return nlmsg_multicast(nlsk, skb, pid, group, GFP_ATOMIC); - else - return -1; + if (!nlsk) { + kfree_skb(skb); + return -EPIPE; + } + + return nlmsg_multicast(nlsk, skb, pid, group, GFP_ATOMIC); } static inline size_t xfrm_spdinfo_msgsize(void) From f1ddbb19469bf071453928c9776e2aa28507a9e1 Mon Sep 17 00:00:00 2001 From: "mpubbise@codeaurora.org" Date: Mon, 2 Jul 2018 15:40:14 +0530 Subject: [PATCH 801/970] mac80211: add stations tied to AP_VLANs during hw reconfig [ Upstream commit 19103a4bfb42f320395daa5616ece3e89e759d63 ] As part of hw reconfig, only stations linked to AP interfaces are added back to the driver ignoring those which are tied to AP_VLAN interfaces. It is true that there could be stations tied to the AP_VLAN interface while serving 4addr clients or when using AP_VLAN for VLAN operations; we should be adding these stations back to the driver as part of hw reconfig, failing to do so can cause functional issues. In the case of ath10k driver, the following errors were observed. ath10k_pci : failed to install key for non-existent peer XX:XX:XX:XX:XX:XX Workqueue: events_freezable ieee80211_restart_work [mac80211] (unwind_backtrace) from (show_stack+0x10/0x14) (show_stack) (dump_stack+0x80/0xa0) (dump_stack) (warn_slowpath_common+0x68/0x8c) (warn_slowpath_common) (warn_slowpath_null+0x18/0x20) (warn_slowpath_null) (ieee80211_enable_keys+0x88/0x154 [mac80211]) (ieee80211_enable_keys) (ieee80211_reconfig+0xc90/0x19c8 [mac80211]) (ieee80211_reconfig]) (ieee80211_restart_work+0x8c/0xa0 [mac80211]) (ieee80211_restart_work) (process_one_work+0x284/0x488) (process_one_work) (worker_thread+0x228/0x360) (worker_thread) (kthread+0xd8/0xec) (kthread) (ret_from_fork+0x14/0x24) Also while bringing down the AP VAP, WARN_ONs and errors related to peer removal were observed. ath10k_pci : failed to clear all peer wep keys for vdev 0: -2 ath10k_pci : failed to disassociate station: 8c:fd:f0:0a:8c:f5 vdev 0: -2 (unwind_backtrace) (show_stack+0x10/0x14) (show_stack) (dump_stack+0x80/0xa0) (dump_stack) (warn_slowpath_common+0x68/0x8c) (warn_slowpath_common) (warn_slowpath_null+0x18/0x20) (warn_slowpath_null) (sta_set_sinfo+0xb98/0xc9c [mac80211]) (sta_set_sinfo [mac80211]) (__sta_info_flush+0xf0/0x134 [mac80211]) (__sta_info_flush [mac80211]) (ieee80211_stop_ap+0xe8/0x390 [mac80211]) (ieee80211_stop_ap [mac80211]) (__cfg80211_stop_ap+0xe0/0x3dc [cfg80211]) (__cfg80211_stop_ap [cfg80211]) (cfg80211_stop_ap+0x30/0x44 [cfg80211]) (cfg80211_stop_ap [cfg80211]) (genl_rcv_msg+0x274/0x30c) (genl_rcv_msg) (netlink_rcv_skb+0x58/0xac) (netlink_rcv_skb) (genl_rcv+0x20/0x34) (genl_rcv) (netlink_unicast+0x11c/0x204) (netlink_unicast) (netlink_sendmsg+0x30c/0x370) (netlink_sendmsg) (sock_sendmsg+0x70/0x84) (sock_sendmsg) (___sys_sendmsg.part.3+0x188/0x228) (___sys_sendmsg.part.3) (__sys_sendmsg+0x4c/0x70) (__sys_sendmsg) (ret_fast_syscall+0x0/0x44) These issues got fixed by adding the stations which are tied to AP_VLANs back to the driver. Signed-off-by: Manikanta Pubbisetty Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/mac80211/util.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/mac80211/util.c b/net/mac80211/util.c index a2756096b94a..ca7de02e0a6e 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -2061,7 +2061,8 @@ int ieee80211_reconfig(struct ieee80211_local *local) if (!sta->uploaded) continue; - if (sta->sdata->vif.type != NL80211_IFTYPE_AP) + if (sta->sdata->vif.type != NL80211_IFTYPE_AP && + sta->sdata->vif.type != NL80211_IFTYPE_AP_VLAN) continue; for (state = IEEE80211_STA_NOTEXIST; From 4fd089723f6f4bd77c67134c3643b37033e61b4d Mon Sep 17 00:00:00 2001 From: Bernd Edlinger Date: Sun, 8 Jul 2018 09:57:22 +0000 Subject: [PATCH 802/970] nl80211: Add a missing break in parse_station_flags [ Upstream commit 5cf3006cc81d9aa09a10aa781fc065546b12919d ] I was looking at usually suppressed gcc warnings, [-Wimplicit-fallthrough=] in this case: The code definitely looks like a break is missing here. However I am not able to test the NL80211_IFTYPE_MESH_POINT, nor do I actually know what might be :) So please use this patch with caution and only if you are able to do some testing. Signed-off-by: Bernd Edlinger [johannes: looks obvious enough to apply as is, interesting though that it never seems to have been a problem] Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/wireless/nl80211.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 5b75468b5acd..146d83785b37 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -4058,6 +4058,7 @@ static int parse_station_flags(struct genl_info *info, params->sta_flags_mask = BIT(NL80211_STA_FLAG_AUTHENTICATED) | BIT(NL80211_STA_FLAG_MFP) | BIT(NL80211_STA_FLAG_AUTHORIZED); + break; default: return -EINVAL; } From fa1c6d2371b170a8c89de50b25612673080620c8 Mon Sep 17 00:00:00 2001 From: Sean Paul Date: Tue, 3 Jul 2018 12:56:03 -0400 Subject: [PATCH 803/970] drm/bridge: adv7511: Reset registers on hotplug [ Upstream commit 5f3417569165a8ee57654217f73e0160312f409c ] The bridge loses its hw state when the cable is unplugged. If we detect this case in the hpd handler, reset its state. Reported-by: Rob Clark Tested-by: Rob Clark Reviewed-by: Archit Taneja Signed-off-by: Sean Paul Link: https://patchwork.freedesktop.org/patch/msgid/20180703165648.120401-1-seanpaul@chromium.org Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/bridge/adv7511/adv7511_drv.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c index a68f94daf9b6..32ab5c32834b 100644 --- a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c +++ b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c @@ -424,6 +424,18 @@ static void adv7511_hpd_work(struct work_struct *work) else status = connector_status_disconnected; + /* + * The bridge resets its registers on unplug. So when we get a plug + * event and we're already supposed to be powered, cycle the bridge to + * restore its state. + */ + if (status == connector_status_connected && + adv7511->connector.status == connector_status_disconnected && + adv7511->powered) { + regcache_mark_dirty(adv7511->regmap); + adv7511_power_on(adv7511); + } + if (adv7511->connector.status != status) { adv7511->connector.status = status; drm_kms_helper_hotplug_event(adv7511->connector.dev); From 0a04fdb19d03408ad9bc7aecbbdb53667243ae35 Mon Sep 17 00:00:00 2001 From: Varun Prakash Date: Wed, 11 Jul 2018 22:09:52 +0530 Subject: [PATCH 804/970] scsi: libiscsi: fix possible NULL pointer dereference in case of TMF [ Upstream commit a17037e7d59075053b522048742a08ac9500bde8 ] In iscsi_check_tmf_restrictions() task->hdr is dereferenced to print the opcode, it is possible that task->hdr is NULL. There are two cases based on opcode argument: 1. ISCSI_OP_SCSI_CMD - In this case alloc_pdu() is called after iscsi_check_tmf_restrictions() iscsi_prep_scsi_cmd_pdu() -> iscsi_check_tmf_restrictions() -> alloc_pdu(). Transport drivers allocate memory for iSCSI hdr in alloc_pdu() and assign it to task->hdr. In case of TMF task->hdr will be NULL resulting in NULL pointer dereference. 2. ISCSI_OP_SCSI_DATA_OUT - In this case transport driver can free the memory for iSCSI hdr after transmitting the pdu so task->hdr can be NULL or invalid. This patch fixes this issue by removing task->hdr->opcode from the printk statement. Signed-off-by: Varun Prakash Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/libiscsi.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index c2b682916337..cc8f2a7c2463 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -283,11 +283,11 @@ static int iscsi_check_tmf_restrictions(struct iscsi_task *task, int opcode) */ if (opcode != ISCSI_OP_SCSI_DATA_OUT) { iscsi_conn_printk(KERN_INFO, conn, - "task [op %x/%x itt " + "task [op %x itt " "0x%x/0x%x] " "rejected.\n", - task->hdr->opcode, opcode, - task->itt, task->hdr_itt); + opcode, task->itt, + task->hdr_itt); return -EACCES; } /* @@ -296,10 +296,10 @@ static int iscsi_check_tmf_restrictions(struct iscsi_task *task, int opcode) */ if (conn->session->fast_abort) { iscsi_conn_printk(KERN_INFO, conn, - "task [op %x/%x itt " + "task [op %x itt " "0x%x/0x%x] fast abort.\n", - task->hdr->opcode, opcode, - task->itt, task->hdr_itt); + opcode, task->itt, + task->hdr_itt); return -EACCES; } break; From cb94b5ea2455abc66740cf7b6e4b50e53254891a Mon Sep 17 00:00:00 2001 From: Lucas Stach Date: Wed, 11 Apr 2018 17:31:35 +0200 Subject: [PATCH 805/970] drm/imx: imx-ldb: disable LDB on driver bind [ Upstream commit b58262396fabd43dc869b576e3defdd23b32fe94 ] The LVDS signal integrity is only guaranteed when the correct enable sequence (first IPU DI, then LDB) is used. If the LDB display output was active before the imx-drm driver is loaded (like when a bootsplash was active) the DI will be disabled by the full IPU reset we do when loading the driver. The LDB control registers are not part of the IPU range and thus will remain unchanged. This leads to the LDB still being active when the DI is getting enabled, effectively reversing the required enable sequence. Fix this by also disabling the LDB on driver bind. Signed-off-by: Lucas Stach Signed-off-by: Philipp Zabel Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/imx/imx-ldb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/imx/imx-ldb.c b/drivers/gpu/drm/imx/imx-ldb.c index 3ce391c239b0..7c563b5f0683 100644 --- a/drivers/gpu/drm/imx/imx-ldb.c +++ b/drivers/gpu/drm/imx/imx-ldb.c @@ -634,6 +634,9 @@ static int imx_ldb_bind(struct device *dev, struct device *master, void *data) return PTR_ERR(imx_ldb->regmap); } + /* disable LDB by resetting the control register to POR default */ + regmap_write(imx_ldb->regmap, IOMUXC_GPR2, 0); + imx_ldb->dev = dev; if (of_id) From 13f03bd19c9b7af30ddd12283a6345b0f88eb7e0 Mon Sep 17 00:00:00 2001 From: Lucas Stach Date: Wed, 11 Apr 2018 17:31:36 +0200 Subject: [PATCH 806/970] drm/imx: imx-ldb: check if channel is enabled before printing warning [ Upstream commit c80d673b91a6c81d765864e10f2b15110ee900ad ] If the second LVDS channel has been disabled in the DT when using dual-channel mode we should not print a warning. Signed-off-by: Lucas Stach Signed-off-by: Philipp Zabel Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/imx/imx-ldb.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/imx/imx-ldb.c b/drivers/gpu/drm/imx/imx-ldb.c index 7c563b5f0683..67881e5517fb 100644 --- a/drivers/gpu/drm/imx/imx-ldb.c +++ b/drivers/gpu/drm/imx/imx-ldb.c @@ -678,14 +678,14 @@ static int imx_ldb_bind(struct device *dev, struct device *master, void *data) if (ret || i < 0 || i > 1) return -EINVAL; + if (!of_device_is_available(child)) + continue; + if (dual && i > 0) { dev_warn(dev, "dual-channel mode, ignoring second output\n"); continue; } - if (!of_device_is_available(child)) - continue; - channel = &imx_ldb->channel[i]; channel->ldb = imx_ldb; channel->chno = i; From 62dd6eda364145ddc918e771a4fc47017ce809cb Mon Sep 17 00:00:00 2001 From: Jia-Ju Bai Date: Wed, 20 Jun 2018 11:54:53 +0800 Subject: [PATCH 807/970] usb: gadget: r8a66597: Fix two possible sleep-in-atomic-context bugs in init_controller() [ Upstream commit 0602088b10a7c0b4e044a810678ef93d7cc5bf48 ] The driver may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16.7 are: [FUNC] msleep drivers/usb/gadget/udc/r8a66597-udc.c, 839: msleep in init_controller drivers/usb/gadget/udc/r8a66597-udc.c, 96: init_controller in r8a66597_usb_disconnect drivers/usb/gadget/udc/r8a66597-udc.c, 93: spin_lock in r8a66597_usb_disconnect [FUNC] msleep drivers/usb/gadget/udc/r8a66597-udc.c, 835: msleep in init_controller drivers/usb/gadget/udc/r8a66597-udc.c, 96: init_controller in r8a66597_usb_disconnect drivers/usb/gadget/udc/r8a66597-udc.c, 93: spin_lock in r8a66597_usb_disconnect To fix these bugs, msleep() is replaced with mdelay(). This bug is found by my static analysis tool (DSAC-2) and checked by my code review. Signed-off-by: Jia-Ju Bai Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/udc/r8a66597-udc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/gadget/udc/r8a66597-udc.c b/drivers/usb/gadget/udc/r8a66597-udc.c index f2c8862093a2..2cd748da4289 100644 --- a/drivers/usb/gadget/udc/r8a66597-udc.c +++ b/drivers/usb/gadget/udc/r8a66597-udc.c @@ -835,11 +835,11 @@ static void init_controller(struct r8a66597 *r8a66597) r8a66597_bset(r8a66597, XCKE, SYSCFG0); - msleep(3); + mdelay(3); r8a66597_bset(r8a66597, PLLC, SYSCFG0); - msleep(1); + mdelay(1); r8a66597_bset(r8a66597, SCKE, SYSCFG0); From 02e8b4fb4b50de59f79aa5e31d7a9932acf47167 Mon Sep 17 00:00:00 2001 From: Jia-Ju Bai Date: Wed, 20 Jun 2018 11:55:08 +0800 Subject: [PATCH 808/970] usb: gadget: r8a66597: Fix a possible sleep-in-atomic-context bugs in r8a66597_queue() [ Upstream commit f36b507c14c4b6e634463a610294e9cb0065c8ea ] The driver may sleep in an interrupt handler. The function call path (from bottom to top) in Linux-4.16.7 is: [FUNC] r8a66597_queue(GFP_KERNEL) drivers/usb/gadget/udc/r8a66597-udc.c, 1193: r8a66597_queue in get_status drivers/usb/gadget/udc/r8a66597-udc.c, 1301: get_status in setup_packet drivers/usb/gadget/udc/r8a66597-udc.c, 1381: setup_packet in irq_control_stage drivers/usb/gadget/udc/r8a66597-udc.c, 1508: irq_control_stage in r8a66597_irq (interrupt handler) To fix this bug, GFP_KERNEL is replaced with GFP_ATOMIC. This bug is found by my static analysis tool (DSAC-2) and checked by my code review. Signed-off-by: Jia-Ju Bai Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/udc/r8a66597-udc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/udc/r8a66597-udc.c b/drivers/usb/gadget/udc/r8a66597-udc.c index 2cd748da4289..230e3248f386 100644 --- a/drivers/usb/gadget/udc/r8a66597-udc.c +++ b/drivers/usb/gadget/udc/r8a66597-udc.c @@ -1193,7 +1193,7 @@ __acquires(r8a66597->lock) r8a66597->ep0_req->length = 2; /* AV: what happens if we get called again before that gets through? */ spin_unlock(&r8a66597->lock); - r8a66597_queue(r8a66597->gadget.ep0, r8a66597->ep0_req, GFP_KERNEL); + r8a66597_queue(r8a66597->gadget.ep0, r8a66597->ep0_req, GFP_ATOMIC); spin_lock(&r8a66597->lock); } From 2b4fd19878c75b13c45cdd2c4a3478f1e687373e Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 15 Jul 2018 10:37:37 -0700 Subject: [PATCH 809/970] usb/phy: fix PPC64 build errors in phy-fsl-usb.c [ Upstream commit a39ba90a1cc7010edb0a7132e1b67f3d80b994e9 ] Fix build errors when built for PPC64: These variables are only used on PPC32 so they don't need to be initialized for PPC64. ../drivers/usb/phy/phy-fsl-usb.c: In function 'usb_otg_start': ../drivers/usb/phy/phy-fsl-usb.c:865:3: error: '_fsl_readl' undeclared (first use in this function); did you mean 'fsl_readl'? _fsl_readl = _fsl_readl_be; ../drivers/usb/phy/phy-fsl-usb.c:865:16: error: '_fsl_readl_be' undeclared (first use in this function); did you mean 'fsl_readl'? _fsl_readl = _fsl_readl_be; ../drivers/usb/phy/phy-fsl-usb.c:866:3: error: '_fsl_writel' undeclared (first use in this function); did you mean 'fsl_writel'? _fsl_writel = _fsl_writel_be; ../drivers/usb/phy/phy-fsl-usb.c:866:17: error: '_fsl_writel_be' undeclared (first use in this function); did you mean 'fsl_writel'? _fsl_writel = _fsl_writel_be; ../drivers/usb/phy/phy-fsl-usb.c:868:16: error: '_fsl_readl_le' undeclared (first use in this function); did you mean 'fsl_readl'? _fsl_readl = _fsl_readl_le; ../drivers/usb/phy/phy-fsl-usb.c:869:17: error: '_fsl_writel_le' undeclared (first use in this function); did you mean 'fsl_writel'? _fsl_writel = _fsl_writel_le; and the sysfs "show" function return type should be ssize_t, not int: ../drivers/usb/phy/phy-fsl-usb.c:1042:49: error: initialization of 'ssize_t (*)(struct device *, struct device_attribute *, char *)' {aka 'long int (*)(struct device *, struct device_attribute *, char *)'} from incompatible pointer type 'int (*)(struct device *, struct device_attribute *, char *)' [-Werror=incompatible-pointer-types] static DEVICE_ATTR(fsl_usb2_otg_state, S_IRUGO, show_fsl_usb2_otg_state, NULL); Signed-off-by: Randy Dunlap Cc: Felipe Balbi Cc: linux-usb@vger.kernel.org Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/phy/phy-fsl-usb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/phy/phy-fsl-usb.c b/drivers/usb/phy/phy-fsl-usb.c index 94eb2923afed..85d031ce85c1 100644 --- a/drivers/usb/phy/phy-fsl-usb.c +++ b/drivers/usb/phy/phy-fsl-usb.c @@ -879,6 +879,7 @@ int usb_otg_start(struct platform_device *pdev) if (pdata->init && pdata->init(pdev) != 0) return -EINVAL; +#ifdef CONFIG_PPC32 if (pdata->big_endian_mmio) { _fsl_readl = _fsl_readl_be; _fsl_writel = _fsl_writel_be; @@ -886,6 +887,7 @@ int usb_otg_start(struct platform_device *pdev) _fsl_readl = _fsl_readl_le; _fsl_writel = _fsl_writel_le; } +#endif /* request irq */ p_otg->irq = platform_get_irq(pdev, 0); @@ -976,7 +978,7 @@ int usb_otg_start(struct platform_device *pdev) /* * state file in sysfs */ -static int show_fsl_usb2_otg_state(struct device *dev, +static ssize_t show_fsl_usb2_otg_state(struct device *dev, struct device_attribute *attr, char *buf) { struct otg_fsm *fsm = &fsl_otg_dev->fsm; From 3b7c96a9b0ae1c4973d93a4a30939462153c9d73 Mon Sep 17 00:00:00 2001 From: Peter Senna Tschudin Date: Tue, 10 Jul 2018 16:01:45 +0200 Subject: [PATCH 810/970] tools: usb: ffs-test: Fix build on big endian systems MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit a2b22dddc7bb6110ac3b5ed1a60aa9279836fadb ] The tools/usb/ffs-test.c file defines cpu_to_le16/32 by using the C library htole16/32 function calls. However, cpu_to_le16/32 are used when initializing structures, i.e in a context where a function call is not allowed. It works fine on little endian systems because htole16/32 are defined by the C library as no-ops. But on big-endian systems, they are actually doing something, which might involve calling a function, causing build failures, such as: ffs-test.c:48:25: error: initializer element is not constant #define cpu_to_le32(x) htole32(x) ^~~~~~~ ffs-test.c:128:12: note: in expansion of macro ‘cpu_to_le32’ .magic = cpu_to_le32(FUNCTIONFS_DESCRIPTORS_MAGIC_V2), ^~~~~~~~~~~ To solve this, we code cpu_to_le16/32 in a way that allows them to be used when initializing structures. This fix was imported from meta-openembedded/android-tools/fix-big-endian-build.patch written by Thomas Petazzoni . CC: Thomas Petazzoni Signed-off-by: Peter Senna Tschudin Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/usb/ffs-test.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/tools/usb/ffs-test.c b/tools/usb/ffs-test.c index 88d5e71be044..47dfa0b0fcd7 100644 --- a/tools/usb/ffs-test.c +++ b/tools/usb/ffs-test.c @@ -44,12 +44,25 @@ /******************** Little Endian Handling ********************************/ -#define cpu_to_le16(x) htole16(x) -#define cpu_to_le32(x) htole32(x) +/* + * cpu_to_le16/32 are used when initializing structures, a context where a + * function call is not allowed. To solve this, we code cpu_to_le16/32 in a way + * that allows them to be used when initializing structures. + */ + +#if __BYTE_ORDER == __LITTLE_ENDIAN +#define cpu_to_le16(x) (x) +#define cpu_to_le32(x) (x) +#else +#define cpu_to_le16(x) ((((x) >> 8) & 0xffu) | (((x) & 0xffu) << 8)) +#define cpu_to_le32(x) \ + ((((x) & 0xff000000u) >> 24) | (((x) & 0x00ff0000u) >> 8) | \ + (((x) & 0x0000ff00u) << 8) | (((x) & 0x000000ffu) << 24)) +#endif + #define le32_to_cpu(x) le32toh(x) #define le16_to_cpu(x) le16toh(x) - /******************** Messages and Errors ***********************************/ static const char argv0[] = "ffs-test"; From e2838a2262fdf216c090279956375bacb2dabbba Mon Sep 17 00:00:00 2001 From: Eugeniu Rosca Date: Mon, 2 Jul 2018 23:46:47 +0200 Subject: [PATCH 811/970] usb: gadget: f_uac2: fix endianness of 'struct cntrl_*_lay3' [ Upstream commit eec24f2a0d4dc3b1d95a3ccd2feb523ede3ba775 ] The list [1] of commits doing endianness fixes in USB subsystem is long due to below quote from USB spec Revision 2.0 from April 27, 2000: ------------ 8.1 Byte/Bit Ordering Multiple byte fields in standard descriptors, requests, and responses are interpreted as and moved over the bus in little-endian order, i.e. LSB to MSB. ------------ This commit belongs to the same family. [1] Example of endianness fixes in USB subsystem: commit 14e1d56cbea6 ("usb: gadget: f_uac2: endianness fixes.") commit 42370b821168 ("usb: gadget: f_uac1: endianness fixes.") commit 63afd5cc7877 ("USB: chaoskey: fix Alea quirk on big-endian hosts") commit 74098c4ac782 ("usb: gadget: acm: fix endianness in notifications") commit cdd7928df0d2 ("ACM gadget: fix endianness in notifications") commit 323ece54e076 ("cdc-wdm: fix endianness bug in debug statements") commit e102609f1072 ("usb: gadget: uvc: Fix endianness mismatches") list goes on Fixes: 132fcb460839 ("usb: gadget: Add Audio Class 2.0 Driver") Signed-off-by: Eugeniu Rosca Reviewed-by: Ruslan Bilovol Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_uac2.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/usb/gadget/function/f_uac2.c b/drivers/usb/gadget/function/f_uac2.c index 5474b5187be0..f4bd08cfac11 100644 --- a/drivers/usb/gadget/function/f_uac2.c +++ b/drivers/usb/gadget/function/f_uac2.c @@ -929,14 +929,14 @@ static struct usb_descriptor_header *hs_audio_desc[] = { }; struct cntrl_cur_lay3 { - __u32 dCUR; + __le32 dCUR; }; struct cntrl_range_lay3 { - __u16 wNumSubRanges; - __u32 dMIN; - __u32 dMAX; - __u32 dRES; + __le16 wNumSubRanges; + __le32 dMIN; + __le32 dMAX; + __le32 dRES; } __packed; static inline void @@ -1285,9 +1285,9 @@ in_rq_cur(struct usb_function *fn, const struct usb_ctrlrequest *cr) memset(&c, 0, sizeof(struct cntrl_cur_lay3)); if (entity_id == USB_IN_CLK_ID) - c.dCUR = p_srate; + c.dCUR = cpu_to_le32(p_srate); else if (entity_id == USB_OUT_CLK_ID) - c.dCUR = c_srate; + c.dCUR = cpu_to_le32(c_srate); value = min_t(unsigned, w_length, sizeof c); memcpy(req->buf, &c, value); @@ -1325,15 +1325,15 @@ in_rq_range(struct usb_function *fn, const struct usb_ctrlrequest *cr) if (control_selector == UAC2_CS_CONTROL_SAM_FREQ) { if (entity_id == USB_IN_CLK_ID) - r.dMIN = p_srate; + r.dMIN = cpu_to_le32(p_srate); else if (entity_id == USB_OUT_CLK_ID) - r.dMIN = c_srate; + r.dMIN = cpu_to_le32(c_srate); else return -EOPNOTSUPP; r.dMAX = r.dMIN; r.dRES = 0; - r.wNumSubRanges = 1; + r.wNumSubRanges = cpu_to_le16(1); value = min_t(unsigned, w_length, sizeof r); memcpy(req->buf, &r, value); From 6e9261aac3d8216563647e0c671750783ccf1993 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Thu, 19 Jul 2018 18:18:35 +0200 Subject: [PATCH 812/970] bpf, ppc64: fix unexpected r0=0 exit path inside bpf_xadd [ Upstream commit b9c1e60e7bf4e64ac1b4f4d6d593f0bb57886973 ] None of the JITs is allowed to implement exit paths from the BPF insn mappings other than BPF_JMP | BPF_EXIT. In the BPF core code we have a couple of rewrites in eBPF (e.g. LD_ABS / LD_IND) and in eBPF to cBPF translation to retain old existing behavior where exceptions may occur; they are also tightly controlled by the verifier where it disallows some of the features such as BPF to BPF calls when legacy LD_ABS / LD_IND ops are present in the BPF program. During recent review of all BPF_XADD JIT implementations I noticed that the ppc64 one is buggy in that it contains two jumps to exit paths. This is problematic as this can bypass verifier expectations e.g. pointed out in commit f6b1b3bf0d5f ("bpf: fix subprog verifier bypass by div/mod by 0 exception"). The first exit path is obsoleted by the fix in ca36960211eb ("bpf: allow xadd only on aligned memory") anyway, and for the second one we need to do a fetch, add and store loop if the reservation from lwarx/ldarx was lost in the meantime. Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Reviewed-by: Naveen N. Rao Reviewed-by: Sandipan Das Tested-by: Sandipan Das Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/net/bpf_jit_comp64.c | 29 +++++------------------------ 1 file changed, 5 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index c0e817f35e69..bdbbc320b006 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -326,6 +326,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u64 imm64; u8 *func; u32 true_cond; + u32 tmp_idx; /* * addrs[] maps a BPF bytecode address into a real offset from @@ -685,11 +686,7 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, case BPF_STX | BPF_XADD | BPF_W: /* Get EA into TMP_REG_1 */ PPC_ADDI(b2p[TMP_REG_1], dst_reg, off); - /* error if EA is not word-aligned */ - PPC_ANDI(b2p[TMP_REG_2], b2p[TMP_REG_1], 0x03); - PPC_BCC_SHORT(COND_EQ, (ctx->idx * 4) + 12); - PPC_LI(b2p[BPF_REG_0], 0); - PPC_JMP(exit_addr); + tmp_idx = ctx->idx * 4; /* load value from memory into TMP_REG_2 */ PPC_BPF_LWARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0); /* add value from src_reg into this */ @@ -697,32 +694,16 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, /* store result back */ PPC_BPF_STWCX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1]); /* we're done if this succeeded */ - PPC_BCC_SHORT(COND_EQ, (ctx->idx * 4) + (7*4)); - /* otherwise, let's try once more */ - PPC_BPF_LWARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0); - PPC_ADD(b2p[TMP_REG_2], b2p[TMP_REG_2], src_reg); - PPC_BPF_STWCX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1]); - /* exit if the store was not successful */ - PPC_LI(b2p[BPF_REG_0], 0); - PPC_BCC(COND_NE, exit_addr); + PPC_BCC_SHORT(COND_NE, tmp_idx); break; /* *(u64 *)(dst + off) += src */ case BPF_STX | BPF_XADD | BPF_DW: PPC_ADDI(b2p[TMP_REG_1], dst_reg, off); - /* error if EA is not doubleword-aligned */ - PPC_ANDI(b2p[TMP_REG_2], b2p[TMP_REG_1], 0x07); - PPC_BCC_SHORT(COND_EQ, (ctx->idx * 4) + (3*4)); - PPC_LI(b2p[BPF_REG_0], 0); - PPC_JMP(exit_addr); + tmp_idx = ctx->idx * 4; PPC_BPF_LDARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0); PPC_ADD(b2p[TMP_REG_2], b2p[TMP_REG_2], src_reg); PPC_BPF_STDCX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1]); - PPC_BCC_SHORT(COND_EQ, (ctx->idx * 4) + (7*4)); - PPC_BPF_LDARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0); - PPC_ADD(b2p[TMP_REG_2], b2p[TMP_REG_2], src_reg); - PPC_BPF_STDCX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1]); - PPC_LI(b2p[BPF_REG_0], 0); - PPC_BCC(COND_NE, exit_addr); + PPC_BCC_SHORT(COND_NE, tmp_idx); break; /* From 354d5a345b910fc0d4b74f0f7bec1bc6e7b6ea42 Mon Sep 17 00:00:00 2001 From: Len Brown Date: Fri, 20 Jul 2018 14:47:03 -0400 Subject: [PATCH 813/970] tools/power turbostat: fix -S on UP systems [ Upstream commit 9d83601a9cc1884d1b5706ee2acc661d558c6838 ] The -S (system summary) option failed to print any data on a 1-processor system. Reported-by: Artem Bityutskiy Signed-off-by: Len Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/power/x86/turbostat/turbostat.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 9664b1ff4285..6e22798a367d 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -733,9 +733,7 @@ void format_all_counters(struct thread_data *t, struct core_data *c, struct pkg_ if (!printed || !summary_only) print_header(); - if (topo.num_cpus > 1) - format_counters(&average.threads, &average.cores, - &average.packages); + format_counters(&average.threads, &average.cores, &average.packages); printed = 1; From eca9953f31a7365767c17a4018c058951d2b056f Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Thu, 19 Jul 2018 10:27:13 +0800 Subject: [PATCH 814/970] net: caif: Add a missing rcu_read_unlock() in caif_flow_cb [ Upstream commit 64119e05f7b31e83e2555f6782e6cdc8f81c63f4 ] Add a missing rcu_read_unlock in the error path Fixes: c95567c80352 ("caif: added check for potential null return") Signed-off-by: YueHaibing Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/caif/caif_dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/caif/caif_dev.c b/net/caif/caif_dev.c index d730a0f68f46..a0443d40d677 100644 --- a/net/caif/caif_dev.c +++ b/net/caif/caif_dev.c @@ -131,8 +131,10 @@ static void caif_flow_cb(struct sk_buff *skb) caifd = caif_get(skb->dev); WARN_ON(caifd == NULL); - if (caifd == NULL) + if (!caifd) { + rcu_read_unlock(); return; + } caifd_hold(caifd); rcu_read_unlock(); From 4646860d187c19f62c84647e966f66de1063953b Mon Sep 17 00:00:00 2001 From: Sudarsana Reddy Kalluru Date: Wed, 18 Jul 2018 22:50:03 -0700 Subject: [PATCH 815/970] qed: Fix possible race for the link state value. [ Upstream commit 58874c7b246109d8efb2b0099d1aa296d6bfc3fa ] There's a possible race where driver can read link status in mid-transition and see that virtual-link is up yet speed is 0. Since in this mid-transition we're guaranteed to see a mailbox from MFW soon, we can afford to treat this as link down. Fixes: cc875c2e ("qed: Add link support") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qlogic/qed/qed_mcp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c index 8b7d2f963ee1..eaa242df4131 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c @@ -613,6 +613,7 @@ static void qed_mcp_handle_link_change(struct qed_hwfn *p_hwfn, break; default: p_link->speed = 0; + p_link->link_up = 0; } if (p_link->link_up && p_link->speed) From 13afaaecac6e015068fec10bc47c701d31a10695 Mon Sep 17 00:00:00 2001 From: Sudarsana Reddy Kalluru Date: Wed, 18 Jul 2018 22:50:04 -0700 Subject: [PATCH 816/970] qed: Correct Multicast API to reflect existence of 256 approximate buckets. [ Upstream commit 25c020a90919632b3425c19dc09188d56b9ed59a ] FW hsi contains 256 approximation buckets which are split in ramrod into eight u32 values, but driver is using eight 'unsigned long' variables. This patch fixes the mcast logic by making the API utilize u32. Fixes: 83aeb933 ("qed*: Trivial modifications") Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: Ariel Elior Signed-off-by: Michal Kalderon Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/qlogic/qed/qed_l2.c | 15 +++++++-------- drivers/net/ethernet/qlogic/qed/qed_l2.h | 2 +- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_vf.c | 4 ++-- drivers/net/ethernet/qlogic/qed/qed_vf.h | 7 ++++++- 5 files changed, 17 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index ddd410a91e13..715776e2cfe5 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -313,7 +313,7 @@ qed_sp_update_mcast_bin(struct qed_hwfn *p_hwfn, p_ramrod->common.update_approx_mcast_flg = 1; for (i = 0; i < ETH_MULTICAST_MAC_BINS_IN_REGS; i++) { - u32 *p_bins = (u32 *)p_params->bins; + u32 *p_bins = p_params->bins; p_ramrod->approx_mcast.bins[i] = cpu_to_le32(p_bins[i]); } @@ -1182,8 +1182,8 @@ qed_sp_eth_filter_mcast(struct qed_hwfn *p_hwfn, enum spq_mode comp_mode, struct qed_spq_comp_cb *p_comp_data) { - unsigned long bins[ETH_MULTICAST_MAC_BINS_IN_REGS]; struct vport_update_ramrod_data *p_ramrod = NULL; + u32 bins[ETH_MULTICAST_MAC_BINS_IN_REGS]; struct qed_spq_entry *p_ent = NULL; struct qed_sp_init_data init_data; u8 abs_vport_id = 0; @@ -1219,26 +1219,25 @@ qed_sp_eth_filter_mcast(struct qed_hwfn *p_hwfn, /* explicitly clear out the entire vector */ memset(&p_ramrod->approx_mcast.bins, 0, sizeof(p_ramrod->approx_mcast.bins)); - memset(bins, 0, sizeof(unsigned long) * - ETH_MULTICAST_MAC_BINS_IN_REGS); + memset(bins, 0, sizeof(bins)); /* filter ADD op is explicit set op and it removes * any existing filters for the vport */ if (p_filter_cmd->opcode == QED_FILTER_ADD) { for (i = 0; i < p_filter_cmd->num_mc_addrs; i++) { - u32 bit; + u32 bit, nbits; bit = qed_mcast_bin_from_mac(p_filter_cmd->mac[i]); - __set_bit(bit, bins); + nbits = sizeof(u32) * BITS_PER_BYTE; + bins[bit / nbits] |= 1 << (bit % nbits); } /* Convert to correct endianity */ for (i = 0; i < ETH_MULTICAST_MAC_BINS_IN_REGS; i++) { struct vport_update_ramrod_mcast *p_ramrod_bins; - u32 *p_bins = (u32 *)bins; p_ramrod_bins = &p_ramrod->approx_mcast; - p_ramrod_bins->bins[i] = cpu_to_le32(p_bins[i]); + p_ramrod_bins->bins[i] = cpu_to_le32(bins[i]); } } diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.h b/drivers/net/ethernet/qlogic/qed/qed_l2.h index e495d62fcc03..14d00173cad0 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.h +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.h @@ -156,7 +156,7 @@ struct qed_sp_vport_update_params { u8 anti_spoofing_en; u8 update_accept_any_vlan_flg; u8 accept_any_vlan; - unsigned long bins[8]; + u32 bins[8]; struct qed_rss_params *rss_params; struct qed_filter_accept_flags accept_flags; struct qed_sge_tpa_params *sge_tpa_params; diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c b/drivers/net/ethernet/qlogic/qed/qed_sriov.c index 48bc5c151336..6379bfedc9f0 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c +++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c @@ -2157,7 +2157,7 @@ qed_iov_vp_update_mcast_bin_param(struct qed_hwfn *p_hwfn, p_data->update_approx_mcast_flg = 1; memcpy(p_data->bins, p_mcast_tlv->bins, - sizeof(unsigned long) * ETH_MULTICAST_MAC_BINS_IN_REGS); + sizeof(u32) * ETH_MULTICAST_MAC_BINS_IN_REGS); *tlvs_mask |= 1 << QED_IOV_VP_UPDATE_MCAST; } diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c b/drivers/net/ethernet/qlogic/qed/qed_vf.c index 0645124a887b..faf8215872de 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_vf.c +++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c @@ -786,7 +786,7 @@ int qed_vf_pf_vport_update(struct qed_hwfn *p_hwfn, resp_size += sizeof(struct pfvf_def_resp_tlv); memcpy(p_mcast_tlv->bins, p_params->bins, - sizeof(unsigned long) * ETH_MULTICAST_MAC_BINS_IN_REGS); + sizeof(u32) * ETH_MULTICAST_MAC_BINS_IN_REGS); } update_rx = p_params->accept_flags.update_rx_mode_config; @@ -972,7 +972,7 @@ void qed_vf_pf_filter_mcast(struct qed_hwfn *p_hwfn, u32 bit; bit = qed_mcast_bin_from_mac(p_filter_cmd->mac[i]); - __set_bit(bit, sp_params.bins); + sp_params.bins[bit / 32] |= 1 << (bit % 32); } } diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.h b/drivers/net/ethernet/qlogic/qed/qed_vf.h index 35db7a28aa13..b962ef8e98ef 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_vf.h +++ b/drivers/net/ethernet/qlogic/qed/qed_vf.h @@ -336,7 +336,12 @@ struct vfpf_vport_update_mcast_bin_tlv { struct channel_tlv tl; u8 padding[4]; - u64 bins[8]; + /* There are only 256 approx bins, and in HSI they're divided into + * 32-bit values. As old VFs used to set-bit to the values on its side, + * the upper half of the array is never expected to contain any data. + */ + u64 bins[4]; + u64 obsolete_bins[4]; }; struct vfpf_vport_update_accept_param_tlv { From 0924ac4e7b3138959e9ece8621cd9c562fc511e3 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Fri, 20 Jul 2018 19:30:57 +0200 Subject: [PATCH 817/970] atl1c: reserve min skb headroom [ Upstream commit 6e56830776828d8ca9897fc4429eeab47c3bb432 ] Got crash report with following backtrace: BUG: unable to handle kernel paging request at ffff8801869daffe RIP: 0010:[] [] ip6_finish_output2+0x394/0x4c0 RSP: 0018:ffff880186c83a98 EFLAGS: 00010283 RAX: ffff8801869db00e ... [] ip6_finish_output+0x8c/0xf0 [] ip6_output+0x57/0x100 [] ip6_forward+0x4b9/0x840 [] ip6_rcv_finish+0x66/0xc0 [] ipv6_rcv+0x319/0x530 [] netif_receive_skb+0x1c/0x70 [] atl1c_clean+0x1ec/0x310 [atl1c] ... The bad access is in neigh_hh_output(), at skb->data - 16 (HH_DATA_MOD). atl1c driver provided skb with no headroom, so 14 bytes (ethernet header) got pulled, but then 16 are copied. Reserve NET_SKB_PAD bytes headroom, like netdev_alloc_skb(). Compile tested only; I lack hardware. Fixes: 7b7017642199 ("atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring") Signed-off-by: Florian Westphal Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index a3200ea6d765..85e7177c479f 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -1678,6 +1678,7 @@ static struct sk_buff *atl1c_alloc_skb(struct atl1c_adapter *adapter) skb = build_skb(page_address(page) + adapter->rx_page_offset, adapter->rx_frag_size); if (likely(skb)) { + skb_reserve(skb, NET_SKB_PAD); adapter->rx_page_offset += adapter->rx_frag_size; if (adapter->rx_page_offset >= PAGE_SIZE) adapter->rx_page = NULL; From 76a4e0e6c4e34286140fe1f827facd90268d32d9 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sat, 21 Jul 2018 12:59:25 -0700 Subject: [PATCH 818/970] net: prevent ISA drivers from building on PPC32 [ Upstream commit c9ce1fa1c24b08e13c2a3b5b1f94a19c9eaa982c ] Prevent drivers from building on PPC32 if they use isa_bus_to_virt(), isa_virt_to_bus(), or isa_page_to_bus(), which are not available and thus cause build errors. ../drivers/net/ethernet/3com/3c515.c: In function 'corkscrew_open': ../drivers/net/ethernet/3com/3c515.c:824:9: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration] ../drivers/net/ethernet/amd/lance.c: In function 'lance_rx': ../drivers/net/ethernet/amd/lance.c:1203:23: error: implicit declaration of function 'isa_bus_to_virt'; did you mean 'bus_to_virt'? [-Werror=implicit-function-declaration] ../drivers/net/ethernet/amd/ni65.c: In function 'ni65_init_lance': ../drivers/net/ethernet/amd/ni65.c:585:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration] ../drivers/net/ethernet/cirrus/cs89x0.c: In function 'net_open': ../drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration] Signed-off-by: Randy Dunlap Suggested-by: Michael Ellerman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/3com/Kconfig | 2 +- drivers/net/ethernet/amd/Kconfig | 4 ++-- drivers/net/ethernet/cirrus/Kconfig | 1 + 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/3com/Kconfig b/drivers/net/ethernet/3com/Kconfig index 5b7658bcf020..5c3ef9fc8207 100644 --- a/drivers/net/ethernet/3com/Kconfig +++ b/drivers/net/ethernet/3com/Kconfig @@ -32,7 +32,7 @@ config EL3 config 3C515 tristate "3c515 ISA \"Fast EtherLink\"" - depends on ISA && ISA_DMA_API + depends on ISA && ISA_DMA_API && !PPC32 ---help--- If you have a 3Com ISA EtherLink XL "Corkscrew" 3c515 Fast Ethernet network card, say Y here. diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig index 0038709fd317..ec59425fdbff 100644 --- a/drivers/net/ethernet/amd/Kconfig +++ b/drivers/net/ethernet/amd/Kconfig @@ -44,7 +44,7 @@ config AMD8111_ETH config LANCE tristate "AMD LANCE and PCnet (AT1500 and NE2100) support" - depends on ISA && ISA_DMA_API && !ARM + depends on ISA && ISA_DMA_API && !ARM && !PPC32 ---help--- If you have a network (Ethernet) card of this type, say Y here. Some LinkSys cards are of this type. @@ -138,7 +138,7 @@ config PCMCIA_NMCLAN config NI65 tristate "NI6510 support" - depends on ISA && ISA_DMA_API && !ARM + depends on ISA && ISA_DMA_API && !ARM && !PPC32 ---help--- If you have a network (Ethernet) card of this type, say Y here. diff --git a/drivers/net/ethernet/cirrus/Kconfig b/drivers/net/ethernet/cirrus/Kconfig index 5ab912937aff..ec0b545197e2 100644 --- a/drivers/net/ethernet/cirrus/Kconfig +++ b/drivers/net/ethernet/cirrus/Kconfig @@ -19,6 +19,7 @@ if NET_VENDOR_CIRRUS config CS89x0 tristate "CS89x0 support" depends on ISA || EISA || ARM + depends on !PPC32 ---help--- Support for CS89x0 chipset based Ethernet cards. If you have a network (Ethernet) card of this type, say Y and read the file From 82ad267b27b8c874617765eaa883885fecc12dc0 Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Mon, 9 Jul 2018 21:16:40 +0200 Subject: [PATCH 819/970] can: mpc5xxx_can: check of_iomap return before use [ Upstream commit b5c1a23b17e563b656cc9bb76ce5323b997d90e8 ] of_iomap() can return NULL so that return needs to be checked and NULL treated as failure. While at it also take care of the missing of_node_put() in the error path. Signed-off-by: Nicholas Mc Guire Fixes: commit afa17a500a36 ("net/can: add driver for mscan family & mpc52xx_mscan") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/mscan/mpc5xxx_can.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/can/mscan/mpc5xxx_can.c b/drivers/net/can/mscan/mpc5xxx_can.c index c7427bdd3a4b..2949a381a94d 100644 --- a/drivers/net/can/mscan/mpc5xxx_can.c +++ b/drivers/net/can/mscan/mpc5xxx_can.c @@ -86,6 +86,11 @@ static u32 mpc52xx_can_get_clock(struct platform_device *ofdev, return 0; } cdm = of_iomap(np_cdm, 0); + if (!cdm) { + of_node_put(np_cdm); + dev_err(&ofdev->dev, "can't map clock node!\n"); + return 0; + } if (in_8(&cdm->ipb_clk_sel) & 0x1) freq *= 2; From c5d7e5e83b2f94deacf891fba538a93ee9a6052c Mon Sep 17 00:00:00 2001 From: Alexander Sverdlin Date: Fri, 13 Jul 2018 17:20:17 +0200 Subject: [PATCH 820/970] i2c: davinci: Avoid zero value of CLKH [ Upstream commit cc8de9a68599b261244ea453b38678229f06ada7 ] If CLKH is set to 0 I2C clock is not generated at all, so avoid this value and stretch the clock in this case. Signed-off-by: Alexander Sverdlin Acked-by: Sekhar Nori Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-davinci.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-davinci.c b/drivers/i2c/busses/i2c-davinci.c index 9e7ef5cf5d49..b2d8b63176db 100644 --- a/drivers/i2c/busses/i2c-davinci.c +++ b/drivers/i2c/busses/i2c-davinci.c @@ -234,12 +234,16 @@ static void i2c_davinci_calc_clk_dividers(struct davinci_i2c_dev *dev) /* * It's not always possible to have 1 to 2 ratio when d=7, so fall back * to minimal possible clkh in this case. + * + * Note: + * CLKH is not allowed to be 0, in this case I2C clock is not generated + * at all */ - if (clk >= clkl + d) { + if (clk > clkl + d) { clkh = clk - clkl - d; clkl -= d; } else { - clkh = 0; + clkh = 1; clkl = clk - (d << 1); } From b3c2509872504a99eec049fac370906bf1864539 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 20 Jul 2018 10:39:07 +0200 Subject: [PATCH 821/970] perf/x86/amd/ibs: Don't access non-started event [ Upstream commit d2753e6b4882a637a0e8fb3b9c2e15f33265300e ] Paul Menzel reported the following bug: > Enabling the undefined behavior sanitizer and building GNU/Linux 4.18-rc5+ > (with some unrelated commits) with GCC 8.1.0 from Debian Sid/unstable, the > warning below is shown. > > > [ 2.111913] > > ================================================================================ > > [ 2.111917] UBSAN: Undefined behaviour in arch/x86/events/amd/ibs.c:582:24 > > [ 2.111919] member access within null pointer of type 'struct perf_event' > > [ 2.111926] CPU: 0 PID: 144 Comm: udevadm Not tainted 4.18.0-rc5-00316-g4864b68cedf2 #104 > > [ 2.111928] Hardware name: ASROCK E350M1/E350M1, BIOS TIMELESS 01/01/1970 > > [ 2.111930] Call Trace: > > [ 2.111943] dump_stack+0x55/0x89 > > [ 2.111949] ubsan_epilogue+0xb/0x33 > > [ 2.111953] handle_null_ptr_deref+0x7f/0x90 > > [ 2.111958] __ubsan_handle_type_mismatch_v1+0x55/0x60 > > [ 2.111964] perf_ibs_handle_irq+0x596/0x620 The code dereferences event before checking the STARTED bit. Patch below should cure the issue. The warning should not trigger, if I analyzed the thing correctly. (And Paul's testing confirms this.) Reported-by: Paul Menzel Tested-by: Paul Menzel Signed-off-by: Thomas Gleixner Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: Jiri Olsa Cc: Linus Torvalds Cc: Paul Menzel Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Vince Weaver Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807200958390.1580@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/amd/ibs.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index b26ee32f73e8..fd4484ae3ffc 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -578,7 +578,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs) { struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu); struct perf_event *event = pcpu->event; - struct hw_perf_event *hwc = &event->hw; + struct hw_perf_event *hwc; struct perf_sample_data data; struct perf_raw_record raw; struct pt_regs regs; @@ -601,6 +601,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs) return 0; } + if (WARN_ON_ONCE(!event)) + goto fail; + + hwc = &event->hw; msr = hwc->config_base; buf = ibs_data.regs; rdmsrl(msr, *buf); From d3313fe179a2f221b9a9b309819157acc39fdcda Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Mon, 23 Jul 2018 14:39:33 -0700 Subject: [PATCH 822/970] media: staging: omap4iss: Include asm/cacheflush.h after generic includes [ Upstream commit 0894da849f145af51bde88a6b84f95b9c9e0bc66 ] Including asm/cacheflush.h first results in the following build error when trying to build sparc32:allmodconfig, because 'struct page' has not been declared, and the function declaration ends up creating a separate (private) declaration of struct page (as a result of function arguments being in the scope of the function declaration and definition, not in global scope). The C scoping rules do not just affect variable visibility, they also affect type declaration visibility. The end result is that when the actual call site is seen in , the 'struct page' type in the caller is not the same 'struct page' that the function was declared with, resulting in: In file included from arch/sparc/include/asm/page.h:10:0, ... from drivers/staging/media/omap4iss/iss_video.c:15: include/linux/highmem.h: In function 'clear_user_highpage': include/linux/highmem.h:137:31: error: passing argument 1 of 'sparc_flush_page_to_ram' from incompatible pointer type Include generic includes files first to fix the problem. Fixes: fc96d58c10162 ("[media] v4l: omap4iss: Add support for OMAP4 camera interface - Video devices") Suggested-by: Linus Torvalds Acked-by: David S. Miller Cc: Randy Dunlap Signed-off-by: Guenter Roeck [ Added explanation of C scope rules - Linus ] Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/media/omap4iss/iss_video.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/staging/media/omap4iss/iss_video.c b/drivers/staging/media/omap4iss/iss_video.c index c16927ac8eb0..395c7a2244ff 100644 --- a/drivers/staging/media/omap4iss/iss_video.c +++ b/drivers/staging/media/omap4iss/iss_video.c @@ -11,7 +11,6 @@ * (at your option) any later version. */ -#include #include #include #include @@ -24,6 +23,8 @@ #include #include +#include + #include "iss_video.h" #include "iss.h" From 32680dc030c1ef6a1a9751be65c1eb7566b5fbdc Mon Sep 17 00:00:00 2001 From: Sudarsana Reddy Kalluru Date: Tue, 24 Jul 2018 02:43:52 -0700 Subject: [PATCH 823/970] bnx2x: Fix invalid memory access in rss hash config path. [ Upstream commit ae2dcb28c24794a87e424a726a1cf1a61980f52d ] Rx hash/filter table configuration uses rss_conf_obj to configure filters in the hardware. This object is initialized only when the interface is brought up. This patch adds driver changes to configure rss params only when the device is in opened state. In port disabled case, the config will be cached in the driver structure which will be applied in the successive load path. Please consider applying it to 'net' branch. Signed-off-by: Sudarsana Reddy Kalluru Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c index 5f19427c7b27..8aecd8ef6542 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c @@ -3367,14 +3367,18 @@ static int bnx2x_set_rss_flags(struct bnx2x *bp, struct ethtool_rxnfc *info) DP(BNX2X_MSG_ETHTOOL, "rss re-configured, UDP 4-tupple %s\n", udp_rss_requested ? "enabled" : "disabled"); - return bnx2x_rss(bp, &bp->rss_conf_obj, false, true); + if (bp->state == BNX2X_STATE_OPEN) + return bnx2x_rss(bp, &bp->rss_conf_obj, false, + true); } else if ((info->flow_type == UDP_V6_FLOW) && (bp->rss_conf_obj.udp_rss_v6 != udp_rss_requested)) { bp->rss_conf_obj.udp_rss_v6 = udp_rss_requested; DP(BNX2X_MSG_ETHTOOL, "rss re-configured, UDP 4-tupple %s\n", udp_rss_requested ? "enabled" : "disabled"); - return bnx2x_rss(bp, &bp->rss_conf_obj, false, true); + if (bp->state == BNX2X_STATE_OPEN) + return bnx2x_rss(bp, &bp->rss_conf_obj, false, + true); } return 0; @@ -3488,7 +3492,10 @@ static int bnx2x_set_rxfh(struct net_device *dev, const u32 *indir, bp->rss_conf_obj.ind_table[i] = indir[i] + bp->fp->cl_id; } - return bnx2x_config_rss_eth(bp, false); + if (bp->state == BNX2X_STATE_OPEN) + return bnx2x_config_rss_eth(bp, false); + + return 0; } /** From d638725a56a38b6976d96c21564db6b8e5613bc4 Mon Sep 17 00:00:00 2001 From: Aleksander Morgado Date: Tue, 24 Jul 2018 01:31:07 +0200 Subject: [PATCH 824/970] qmi_wwan: fix interface number for DW5821e production firmware MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit f25e1392fdb556290957142ac2da33a02cbff403 ] The original mapping for the DW5821e was done using a development version of the firmware. Confirmed with the vendor that the final USB layout ends up exposing the QMI control/data ports in USB config #1, interface #0, not in interface #1 (which is now a HID interface). T: Bus=01 Lev=03 Prnt=04 Port=00 Cnt=01 Dev#= 16 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 2 P: Vendor=413c ProdID=81d7 Rev=03.18 S: Manufacturer=DELL S: Product=DW5821e Snapdragon X20 LTE S: SerialNumber=0123456789ABCDEF C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 1 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option Fixes: e7e197edd09c25 ("qmi_wwan: add support for the Dell Wireless 5821e module") Signed-off-by: Aleksander Morgado Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/qmi_wwan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 31a6d87b61b2..0d4440f28f6b 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -946,7 +946,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x413c, 0x81b3, 8)}, /* Dell Wireless 5809e Gobi(TM) 4G LTE Mobile Broadband Card (rev3) */ {QMI_FIXED_INTF(0x413c, 0x81b6, 8)}, /* Dell Wireless 5811e */ {QMI_FIXED_INTF(0x413c, 0x81b6, 10)}, /* Dell Wireless 5811e */ - {QMI_FIXED_INTF(0x413c, 0x81d7, 1)}, /* Dell Wireless 5821e */ + {QMI_FIXED_INTF(0x413c, 0x81d7, 0)}, /* Dell Wireless 5821e */ {QMI_FIXED_INTF(0x03f0, 0x4e1d, 8)}, /* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */ {QMI_FIXED_INTF(0x03f0, 0x9d1d, 1)}, /* HP lt4120 Snapdragon X5 LTE */ {QMI_FIXED_INTF(0x22de, 0x9061, 3)}, /* WeTelecom WPD-600N */ From 88adb09fe82652a87d6fdf7a721942210719cdee Mon Sep 17 00:00:00 2001 From: Shubhrajyoti Datta Date: Tue, 24 Jul 2018 10:09:53 +0530 Subject: [PATCH 825/970] net: axienet: Fix double deregister of mdio [ Upstream commit 03bc7cab7d7218088412a75e141696a89059ab00 ] If the registration fails then mdio_unregister is called. However at unbind the unregister ia attempted again resulting in the below crash [ 73.544038] kernel BUG at drivers/net/phy/mdio_bus.c:415! [ 73.549362] Internal error: Oops - BUG: 0 [#1] SMP [ 73.554127] Modules linked in: [ 73.557168] CPU: 0 PID: 2249 Comm: sh Not tainted 4.14.0 #183 [ 73.562895] Hardware name: xlnx,zynqmp (DT) [ 73.567062] task: ffffffc879e41180 task.stack: ffffff800cbe0000 [ 73.572973] PC is at mdiobus_unregister+0x84/0x88 [ 73.577656] LR is at axienet_mdio_teardown+0x18/0x30 [ 73.582601] pc : [] lr : [] pstate: 20000145 [ 73.589981] sp : ffffff800cbe3c30 [ 73.593277] x29: ffffff800cbe3c30 x28: ffffffc879e41180 [ 73.598573] x27: ffffff8008a21000 x26: 0000000000000040 [ 73.603868] x25: 0000000000000124 x24: ffffffc879efe920 [ 73.609164] x23: 0000000000000060 x22: ffffffc879e02000 [ 73.614459] x21: ffffffc879e02800 x20: ffffffc87b0b8870 [ 73.619754] x19: ffffffc879e02800 x18: 000000000000025d [ 73.625050] x17: 0000007f9a719ad0 x16: ffffff8008195bd8 [ 73.630345] x15: 0000007f9a6b3d00 x14: 0000000000000010 [ 73.635640] x13: 74656e7265687465 x12: 0000000000000030 [ 73.640935] x11: 0000000000000030 x10: 0101010101010101 [ 73.646231] x9 : 241f394f42533300 x8 : ffffffc8799f6e98 [ 73.651526] x7 : ffffffc8799f6f18 x6 : ffffffc87b0ba318 [ 73.656822] x5 : ffffffc87b0ba498 x4 : 0000000000000000 [ 73.662117] x3 : 0000000000000000 x2 : 0000000000000008 [ 73.667412] x1 : 0000000000000004 x0 : ffffffc8799f4000 [ 73.672708] Process sh (pid: 2249, stack limit = 0xffffff800cbe0000) Fix the same by making the bus NULL on unregister. Signed-off-by: Shubhrajyoti Datta Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c index 63307ea97846..9beea13e2e1f 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c @@ -217,6 +217,7 @@ int axienet_mdio_setup(struct axienet_local *lp, struct device_node *np) ret = of_mdiobus_register(bus, np1); if (ret) { mdiobus_free(bus); + lp->mii_bus = NULL; return ret; } return 0; From f3c28466520c5436decdae19b5d4515bad07d956 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 24 Jul 2018 16:08:27 -0700 Subject: [PATCH 826/970] x86/boot: Fix if_changed build flip/flop bug [ Upstream commit 92a4728608a8fd228c572bc8ff50dd98aa0ddf2a ] Dirk Gouders reported that two consecutive "make" invocations on an already compiled tree will show alternating behaviors: $ make CALL scripts/checksyscalls.sh DESCEND objtool CHK include/generated/compile.h DATAREL arch/x86/boot/compressed/vmlinux Kernel: arch/x86/boot/bzImage is ready (#48) Building modules, stage 2. MODPOST 165 modules $ make CALL scripts/checksyscalls.sh DESCEND objtool CHK include/generated/compile.h LD arch/x86/boot/compressed/vmlinux ZOFFSET arch/x86/boot/zoffset.h AS arch/x86/boot/header.o LD arch/x86/boot/setup.elf OBJCOPY arch/x86/boot/setup.bin OBJCOPY arch/x86/boot/vmlinux.bin BUILD arch/x86/boot/bzImage Setup is 15644 bytes (padded to 15872 bytes). System is 6663 kB CRC 3eb90f40 Kernel: arch/x86/boot/bzImage is ready (#48) Building modules, stage 2. MODPOST 165 modules He bisected it back to: commit 98f78525371b ("x86/boot: Refuse to build with data relocations") The root cause was the use of the "if_changed" kbuild function multiple times for the same target. It was designed to only be used once per target, otherwise it will effectively always trigger, flipping back and forth between the two commands getting recorded by "if_changed". Instead, this patch merges the two commands into a single function to get stable build artifacts (i.e. .vmlinux.cmd), and a single build behavior. Bisected-and-Reported-by: Dirk Gouders Fix-Suggested-by: Masahiro Yamada Signed-off-by: Kees Cook Reviewed-by: Masahiro Yamada Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20180724230827.GA37823@beast Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/boot/compressed/Makefile | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 4669b3a931ed..cda8e14bd72a 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -101,9 +101,13 @@ define cmd_check_data_rel done endef +# We need to run two commands under "if_changed", so merge them into a +# single invocation. +quiet_cmd_check-and-link-vmlinux = LD $@ + cmd_check-and-link-vmlinux = $(cmd_check_data_rel); $(cmd_ld) + $(obj)/vmlinux: $(vmlinux-objs-y) FORCE - $(call if_changed,check_data_rel) - $(call if_changed,ld) + $(call if_changed,check-and-link-vmlinux) OBJCOPYFLAGS_vmlinux.bin := -R .comment -S $(obj)/vmlinux.bin: vmlinux FORCE From 1a0ffb53002914c72b358365240bedc67dc6a6a1 Mon Sep 17 00:00:00 2001 From: Kiran Kumar Modukuri Date: Wed, 25 Jul 2018 14:31:20 +0100 Subject: [PATCH 827/970] fscache: Allow cancelled operations to be enqueued [ Upstream commit d0eb06afe712b7b103b6361f40a9a0c638524669 ] Alter the state-check assertion in fscache_enqueue_operation() to allow cancelled operations to be given processing time so they can be cleaned up. Also fix a debugging statement that was requiring such operations to have an object assigned. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Kiran Kumar Modukuri Signed-off-by: David Howells Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/fscache/operation.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c index de67745e1cd7..77946d6f617d 100644 --- a/fs/fscache/operation.c +++ b/fs/fscache/operation.c @@ -66,7 +66,8 @@ void fscache_enqueue_operation(struct fscache_operation *op) ASSERT(op->processor != NULL); ASSERT(fscache_object_is_available(op->object)); ASSERTCMP(atomic_read(&op->usage), >, 0); - ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS); + ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS, + op->state, ==, FSCACHE_OP_ST_CANCELLED); fscache_stat(&fscache_n_op_enqueue); switch (op->flags & FSCACHE_OP_TYPE) { @@ -481,7 +482,8 @@ void fscache_put_operation(struct fscache_operation *op) struct fscache_cache *cache; _enter("{OBJ%x OP%x,%d}", - op->object->debug_id, op->debug_id, atomic_read(&op->usage)); + op->object ? op->object->debug_id : 0, + op->debug_id, atomic_read(&op->usage)); ASSERTCMP(atomic_read(&op->usage), >, 0); From 75960a41a598cd279d70a6d9d92e77896c6b8248 Mon Sep 17 00:00:00 2001 From: Kiran Kumar Modukuri Date: Tue, 18 Jul 2017 16:25:49 -0700 Subject: [PATCH 828/970] cachefiles: Fix refcounting bug in backing-file read monitoring [ Upstream commit 934140ab028713a61de8bca58c05332416d037d1 ] cachefiles_read_waiter() has the right to access a 'monitor' object by virtue of being called under the waitqueue lock for one of the pages in its purview. However, it has no ref on that monitor object or on the associated operation. What it is allowed to do is to move the monitor object to the operation's to_do list, but once it drops the work_lock, it's actually no longer permitted to access that object. However, it is trying to enqueue the retrieval operation for processing - but it can only do this via a pointer in the monitor object, something it shouldn't be doing. If it doesn't enqueue the operation, the operation may not get processed. If the order is flipped so that the enqueue is first, then it's possible for the work processor to look at the to_do list before the monitor is enqueued upon it. Fix this by getting a ref on the operation so that we can trust that it will still be there once we've added the monitor to the to_do list and dropped the work_lock. The op can then be enqueued after the lock is dropped. The bug can manifest in one of a couple of ways. The first manifestation looks like: FS-Cache: FS-Cache: Assertion failed FS-Cache: 6 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:494! RIP: 0010:fscache_put_operation+0x1e3/0x1f0 ... fscache_op_work_func+0x26/0x50 process_one_work+0x131/0x290 worker_thread+0x45/0x360 kthread+0xf8/0x130 ? create_worker+0x190/0x190 ? kthread_cancel_work_sync+0x10/0x10 ret_from_fork+0x1f/0x30 This is due to the operation being in the DEAD state (6) rather than INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through fscache_put_operation(). The bug can also manifest like the following: kernel BUG at fs/fscache/operation.c:69! ... [exception RIP: fscache_enqueue_operation+246] ... #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not entirely clear which assertion failed. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Lei Xue Reported-by: Vegard Nossum Reported-by: Anthony DeRobertis Reported-by: NeilBrown Reported-by: Daniel Axtens Reported-by: Kiran Kumar Modukuri Signed-off-by: David Howells Reviewed-by: Daniel Axtens Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/cachefiles/rdwr.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index afbdc418966d..5e3bc9de7a16 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -27,6 +27,7 @@ static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode, struct cachefiles_one_read *monitor = container_of(wait, struct cachefiles_one_read, monitor); struct cachefiles_object *object; + struct fscache_retrieval *op = monitor->op; struct wait_bit_key *key = _key; struct page *page = wait->private; @@ -51,16 +52,22 @@ static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode, list_del(&wait->task_list); /* move onto the action list and queue for FS-Cache thread pool */ - ASSERT(monitor->op); + ASSERT(op); - object = container_of(monitor->op->op.object, - struct cachefiles_object, fscache); + /* We need to temporarily bump the usage count as we don't own a ref + * here otherwise cachefiles_read_copier() may free the op between the + * monitor being enqueued on the op->to_do list and the op getting + * enqueued on the work queue. + */ + fscache_get_retrieval(op); + object = container_of(op->op.object, struct cachefiles_object, fscache); spin_lock(&object->work_lock); - list_add_tail(&monitor->op_link, &monitor->op->to_do); + list_add_tail(&monitor->op_link, &op->to_do); spin_unlock(&object->work_lock); - fscache_enqueue_retrieval(monitor->op); + fscache_enqueue_retrieval(op); + fscache_put_retrieval(op); return 0; } From 6194fba249cd4ffc05d2b6311b5364d5b8e57338 Mon Sep 17 00:00:00 2001 From: Kiran Kumar Modukuri Date: Thu, 21 Jun 2018 13:25:53 -0700 Subject: [PATCH 829/970] cachefiles: Wait rather than BUG'ing on "Unexpected object collision" [ Upstream commit c2412ac45a8f8f1cd582723c1a139608694d410d ] If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in the active object tree, we have been emitting a BUG after logging information about it and the new object. Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared on the old object (or return an error). The ACTIVE flag should be cleared after it has been removed from the active object tree. A timeout of 60s is used in the wait, so we shouldn't be able to get stuck there. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") Signed-off-by: Kiran Kumar Modukuri Signed-off-by: David Howells Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/cachefiles/namei.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 41df8a27d7eb..2026885702a2 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -195,7 +195,6 @@ static int cachefiles_mark_object_active(struct cachefiles_cache *cache, pr_err("\n"); pr_err("Error: Unexpected object collision\n"); cachefiles_printk_object(object, xobject); - BUG(); } atomic_inc(&xobject->usage); write_unlock(&cache->active_lock); From 579381fca56b15cad7b809ab6e8a54d3d9cdcd17 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sat, 14 Jul 2018 01:28:44 +0900 Subject: [PATCH 830/970] selftests/ftrace: Add snapshot and tracing_on test case [ Upstream commit 82f4f3e69c5c29bce940dd87a2c0f16c51d48d17 ] Add a testcase for checking snapshot and tracing_on relationship. This ensures that the snapshotting doesn't affect current tracing on/off settings. Link: http://lkml.kernel.org/r/153149932412.11274.15289227592627901488.stgit@devbox Cc: Tom Zanussi Cc: Hiraku Toyooka Signed-off-by: Masami Hiramatsu Cc: Ingo Molnar Cc: Shuah Khan Cc: linux-kselftest@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- .../ftrace/test.d/00basic/snapshot.tc | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc diff --git a/tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc b/tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc new file mode 100644 index 000000000000..3b1f45e13a2e --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/00basic/snapshot.tc @@ -0,0 +1,28 @@ +#!/bin/sh +# description: Snapshot and tracing setting +# flags: instance + +[ ! -f snapshot ] && exit_unsupported + +echo "Set tracing off" +echo 0 > tracing_on + +echo "Allocate and take a snapshot" +echo 1 > snapshot + +# Since trace buffer is empty, snapshot is also empty, but allocated +grep -q "Snapshot is allocated" snapshot + +echo "Ensure keep tracing off" +test `cat tracing_on` -eq 0 + +echo "Set tracing on" +echo 1 > tracing_on + +echo "Take a snapshot again" +echo 1 > snapshot + +echo "Ensure keep tracing on" +test `cat tracing_on` -eq 1 + +exit 0 From b55993f4b2eb14e34ccede29de892471588f2bb1 Mon Sep 17 00:00:00 2001 From: Li Wang Date: Thu, 26 Jul 2018 16:37:42 -0700 Subject: [PATCH 831/970] zswap: re-check zswap_is_full() after do zswap_shrink() [ Upstream commit 16e536ef47f567289a5699abee9ff7bb304bc12d ] /sys/../zswap/stored_pages keeps rising in a zswap test with "zswap.max_pool_percent=0" parameter. But it should not compress or store pages any more since there is no space in the compressed pool. Reproduce steps: 1. Boot kernel with "zswap.enabled=1" 2. Set the max_pool_percent to 0 # echo 0 > /sys/module/zswap/parameters/max_pool_percent 3. Do memory stress test to see if some pages have been compressed # stress --vm 1 --vm-bytes $mem_available"M" --timeout 60s 4. Watching the 'stored_pages' number increasing or not The root cause is: When zswap_max_pool_percent is set to 0 via kernel parameter, zswap_is_full() will always return true due to zswap_shrink(). But if the shinking is able to reclain a page successfully the code then proceeds to compressing/storing another page, so the value of stored_pages will keep changing. To solve the issue, this patch adds a zswap_is_full() check again after zswap_shrink() to make sure it's now under the max_pool_percent, and to not compress/store if we reached the limit. Link: http://lkml.kernel.org/r/20180530103936.17812-1-liwang@redhat.com Signed-off-by: Li Wang Acked-by: Dan Streetman Cc: Seth Jennings Cc: Huang Ying Cc: Yu Zhao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/zswap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/zswap.c b/mm/zswap.c index ded051e3433d..c2b5435fe617 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1018,6 +1018,15 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, ret = -ENOMEM; goto reject; } + + /* A second zswap_is_full() check after + * zswap_shrink() to make sure it's now + * under the max_pool_percent + */ + if (zswap_is_full()) { + ret = -ENOMEM; + goto reject; + } } /* allocate entry */ From d0995e103e19a26f71a8e3c45f721d6f3af5f202 Mon Sep 17 00:00:00 2001 From: Calvin Walton Date: Fri, 27 Jul 2018 07:50:53 -0400 Subject: [PATCH 832/970] tools/power turbostat: Read extended processor family from CPUID [ Upstream commit 5aa3d1a20a233d4a5f1ec3d62da3f19d9afea682 ] This fixes the reported family on modern AMD processors (e.g. Ryzen, which is family 0x17). Previously these processors all showed up as family 0xf. See the document https://support.amd.com/TechDocs/56255_OSRR.pdf section CPUID_Fn00000001_EAX for how to calculate the family from the BaseFamily and ExtFamily values. This matches the code in arch/x86/lib/cpu.c Signed-off-by: Calvin Walton Signed-off-by: Len Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/power/x86/turbostat/turbostat.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index 6e22798a367d..5ec2de8f49b4 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -3200,7 +3200,9 @@ void process_cpuid() family = (fms >> 8) & 0xf; model = (fms >> 4) & 0xf; stepping = fms & 0xf; - if (family == 6 || family == 0xf) + if (family == 0xf) + family += (fms >> 20) & 0xff; + if (family >= 6) model += ((fms >> 16) & 0xf) << 4; if (debug) { From 5f56ddcc078dd556ee206b44bfc8c0f13f9f50b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= Date: Fri, 27 Jul 2018 13:13:39 +0200 Subject: [PATCH 833/970] Revert "MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit d5ea019f8a381f88545bb26993b62ec24a2796b7 ] This reverts commit 2a027b47dba6 ("MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum"). Enabling ExternalSync caused a regression for BCM4718A1 (used e.g. in Netgear E3000 and ASUS RT-N16): it simply hangs during PCIe initialization. It's likely that BCM4717A1 is also affected. I didn't notice that earlier as the only BCM47XX devices with PCIe I own are: 1) BCM4706 with 2 x 14e4:4331 2) BCM4706 with 14e4:4360 and 14e4:4331 it appears that BCM4706 is unaffected. While BCM5300X-ES300-RDS.pdf seems to document that erratum and its workarounds (according to quotes provided by Tokunori) it seems not even Broadcom follows them. According to the provided info Broadcom should define CONF7_ES in their SDK's mipsinc.h and implement workaround in the si_mips_init(). Checking both didn't reveal such code. It *could* mean Broadcom also had some problems with the given workaround. Signed-off-by: Rafał Miłecki Signed-off-by: Paul Burton Reported-by: Michael Marley Patchwork: https://patchwork.linux-mips.org/patch/20032/ URL: https://bugs.openwrt.org/index.php?do=details&task_id=1688 Cc: Tokunori Ikegami Cc: Hauke Mehrtens Cc: Chris Packham Cc: James Hogan Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/mips/bcm47xx/setup.c | 6 ------ arch/mips/include/asm/mipsregs.h | 3 --- 2 files changed, 9 deletions(-) diff --git a/arch/mips/bcm47xx/setup.c b/arch/mips/bcm47xx/setup.c index 8c9cbf13d32a..6054d49e608e 100644 --- a/arch/mips/bcm47xx/setup.c +++ b/arch/mips/bcm47xx/setup.c @@ -212,12 +212,6 @@ static int __init bcm47xx_cpu_fixes(void) */ if (bcm47xx_bus.bcma.bus.chipinfo.id == BCMA_CHIP_ID_BCM4706) cpu_wait = NULL; - - /* - * BCM47XX Erratum "R10: PCIe Transactions Periodically Fail" - * Enable ExternalSync for sync instruction to take effect - */ - set_c0_config7(MIPS_CONF7_ES); break; #endif } diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h index 22a6782f84f5..df78b2ca70eb 100644 --- a/arch/mips/include/asm/mipsregs.h +++ b/arch/mips/include/asm/mipsregs.h @@ -663,8 +663,6 @@ #define MIPS_CONF7_WII (_ULCAST_(1) << 31) #define MIPS_CONF7_RPS (_ULCAST_(1) << 2) -/* ExternalSync */ -#define MIPS_CONF7_ES (_ULCAST_(1) << 8) #define MIPS_CONF7_IAR (_ULCAST_(1) << 10) #define MIPS_CONF7_AR (_ULCAST_(1) << 16) @@ -2643,7 +2641,6 @@ __BUILD_SET_C0(status) __BUILD_SET_C0(cause) __BUILD_SET_C0(config) __BUILD_SET_C0(config5) -__BUILD_SET_C0(config7) __BUILD_SET_C0(intcontrol) __BUILD_SET_C0(intctl) __BUILD_SET_C0(srsmap) From 9bf9e4ad5ebef0dd00327369f804d78b166e5b61 Mon Sep 17 00:00:00 2001 From: Govindarajulu Varadarajan Date: Fri, 27 Jul 2018 11:19:29 -0700 Subject: [PATCH 834/970] enic: handle mtu change for vf properly [ Upstream commit ab123fe071c9aa9680ecd62eb080eb26cff4892c ] When driver gets notification for mtu change, driver does not handle it for all RQs. It handles only RQ[0]. Fix is to use enic_change_mtu() interface to change mtu for vf. Signed-off-by: Govindarajulu Varadarajan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cisco/enic/enic_main.c | 78 +++++++-------------- 1 file changed, 27 insertions(+), 51 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index 2e9bab45d419..f7e7b79c6050 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -1842,10 +1842,32 @@ static int enic_stop(struct net_device *netdev) return 0; } +static int _enic_change_mtu(struct net_device *netdev, int new_mtu) +{ + bool running = netif_running(netdev); + int err = 0; + + ASSERT_RTNL(); + if (running) { + err = enic_stop(netdev); + if (err) + return err; + } + + netdev->mtu = new_mtu; + + if (running) { + err = enic_open(netdev); + if (err) + return err; + } + + return 0; +} + static int enic_change_mtu(struct net_device *netdev, int new_mtu) { struct enic *enic = netdev_priv(netdev); - int running = netif_running(netdev); if (new_mtu < ENIC_MIN_MTU || new_mtu > ENIC_MAX_MTU) return -EINVAL; @@ -1853,20 +1875,12 @@ static int enic_change_mtu(struct net_device *netdev, int new_mtu) if (enic_is_dynamic(enic) || enic_is_sriov_vf(enic)) return -EOPNOTSUPP; - if (running) - enic_stop(netdev); - - netdev->mtu = new_mtu; - if (netdev->mtu > enic->port_mtu) netdev_warn(netdev, - "interface MTU (%d) set higher than port MTU (%d)\n", - netdev->mtu, enic->port_mtu); + "interface MTU (%d) set higher than port MTU (%d)\n", + netdev->mtu, enic->port_mtu); - if (running) - enic_open(netdev); - - return 0; + return _enic_change_mtu(netdev, new_mtu); } static void enic_change_mtu_work(struct work_struct *work) @@ -1874,47 +1888,9 @@ static void enic_change_mtu_work(struct work_struct *work) struct enic *enic = container_of(work, struct enic, change_mtu_work); struct net_device *netdev = enic->netdev; int new_mtu = vnic_dev_mtu(enic->vdev); - int err; - unsigned int i; - - new_mtu = max_t(int, ENIC_MIN_MTU, min_t(int, ENIC_MAX_MTU, new_mtu)); rtnl_lock(); - - /* Stop RQ */ - del_timer_sync(&enic->notify_timer); - - for (i = 0; i < enic->rq_count; i++) - napi_disable(&enic->napi[i]); - - vnic_intr_mask(&enic->intr[0]); - enic_synchronize_irqs(enic); - err = vnic_rq_disable(&enic->rq[0]); - if (err) { - rtnl_unlock(); - netdev_err(netdev, "Unable to disable RQ.\n"); - return; - } - vnic_rq_clean(&enic->rq[0], enic_free_rq_buf); - vnic_cq_clean(&enic->cq[0]); - vnic_intr_clean(&enic->intr[0]); - - /* Fill RQ with new_mtu-sized buffers */ - netdev->mtu = new_mtu; - vnic_rq_fill(&enic->rq[0], enic_rq_alloc_buf); - /* Need at least one buffer on ring to get going */ - if (vnic_rq_desc_used(&enic->rq[0]) == 0) { - rtnl_unlock(); - netdev_err(netdev, "Unable to alloc receive buffers.\n"); - return; - } - - /* Start RQ */ - vnic_rq_enable(&enic->rq[0]); - napi_enable(&enic->napi[0]); - vnic_intr_unmask(&enic->intr[0]); - enic_notify_timer_start(enic); - + (void)_enic_change_mtu(netdev, new_mtu); rtnl_unlock(); netdev_info(netdev, "interface MTU set as %d\n", netdev->mtu); From fcc80f8ed96db26ffbf4a7c5049b7e99b0bcedbe Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 29 Jul 2018 11:10:51 -0700 Subject: [PATCH 835/970] arc: [plat-eznps] fix data type errors in platform headers [ Upstream commit b1f32ce1c3d2c11959b7e6a2c58dc5197c581966 ] Add to fix build errors. Both ctop.h and use u32 types and cause many errors. Examples: ../include/soc/nps/common.h:71:4: error: unknown type name 'u32' u32 __reserved:20, cluster:4, core:4, thread:4; ../include/soc/nps/common.h:76:3: error: unknown type name 'u32' u32 value; ../include/soc/nps/common.h:124:4: error: unknown type name 'u32' u32 base:8, cl_x:4, cl_y:4, ../include/soc/nps/common.h:127:3: error: unknown type name 'u32' u32 value; ../arch/arc/plat-eznps/include/plat/ctop.h:83:4: error: unknown type name 'u32' u32 gen:1, gdis:1, clk_gate_dis:1, asb:1, ../arch/arc/plat-eznps/include/plat/ctop.h:86:3: error: unknown type name 'u32' u32 value; ../arch/arc/plat-eznps/include/plat/ctop.h:93:4: error: unknown type name 'u32' u32 csa:22, dmsid:6, __reserved:3, cs:1; ../arch/arc/plat-eznps/include/plat/ctop.h:95:3: error: unknown type name 'u32' u32 value; Cc: linux-snps-arc@lists.infradead.org Cc: Ofer Levi Reviewed-by: Leon Romanovsky Signed-off-by: Randy Dunlap Signed-off-by: Vineet Gupta Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arc/plat-eznps/include/plat/ctop.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arc/plat-eznps/include/plat/ctop.h b/arch/arc/plat-eznps/include/plat/ctop.h index 9d6718c1a199..3c401ce0351e 100644 --- a/arch/arc/plat-eznps/include/plat/ctop.h +++ b/arch/arc/plat-eznps/include/plat/ctop.h @@ -21,6 +21,7 @@ #error "Incorrect ctop.h include" #endif +#include #include /* core auxiliary registers */ From 6e64609123832dfa093ae8cce3d15d116ab3383c Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 26 Jul 2018 20:16:35 -0700 Subject: [PATCH 836/970] arc: fix build errors in arc/include/asm/delay.h [ Upstream commit 2423665ec53f2a29191b35382075e9834288a975 ] Fix build errors in arch/arc/'s delay.h: - add "extern unsigned long loops_per_jiffy;" - add for "u64" In file included from ../drivers/infiniband/hw/cxgb3/cxio_hal.c:32: ../arch/arc/include/asm/delay.h: In function '__udelay': ../arch/arc/include/asm/delay.h:61:12: error: 'u64' undeclared (first use in this function) loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32; ^~~ In file included from ../drivers/infiniband/hw/cxgb3/cxio_hal.c:32: ../arch/arc/include/asm/delay.h: In function '__udelay': ../arch/arc/include/asm/delay.h:63:37: error: 'loops_per_jiffy' undeclared (first use in this function) loops = ((u64) usecs * 4295 * HZ * loops_per_jiffy) >> 32; ^~~~~~~~~~~~~~~ Signed-off-by: Randy Dunlap Cc: Vineet Gupta Cc: linux-snps-arc@lists.infradead.org Cc: Elad Kanfi Cc: Leon Romanovsky Cc: Ofer Levi Signed-off-by: Vineet Gupta Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arc/include/asm/delay.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arc/include/asm/delay.h b/arch/arc/include/asm/delay.h index d5da2115d78a..03d6bb0f4e13 100644 --- a/arch/arc/include/asm/delay.h +++ b/arch/arc/include/asm/delay.h @@ -17,8 +17,11 @@ #ifndef __ASM_ARC_UDELAY_H #define __ASM_ARC_UDELAY_H +#include #include /* HZ */ +extern unsigned long loops_per_jiffy; + static inline void __delay(unsigned long loops) { __asm__ __volatile__( From 8620db886d78954e656a47320f5d3593efac992d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 26 Jul 2018 20:16:35 -0700 Subject: [PATCH 837/970] arc: fix type warnings in arc/mm/cache.c [ Upstream commit ec837d620c750c0d4996a907c8c4f7febe1bbeee ] Fix type warnings in arch/arc/mm/cache.c. ../arch/arc/mm/cache.c: In function 'flush_anon_page': ../arch/arc/mm/cache.c:1062:55: warning: passing argument 2 of '__flush_dcache_page' makes integer from pointer without a cast [-Wint-conversion] __flush_dcache_page((phys_addr_t)page_address(page), page_address(page)); ^~~~~~~~~~~~~~~~~~ ../arch/arc/mm/cache.c:1013:59: note: expected 'long unsigned int' but argument is of type 'void *' void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr) ~~~~~~~~~~~~~~^~~~~ Signed-off-by: Randy Dunlap Cc: Vineet Gupta Cc: linux-snps-arc@lists.infradead.org Cc: Elad Kanfi Cc: Leon Romanovsky Cc: Ofer Levi Signed-off-by: Vineet Gupta Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arc/mm/cache.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c index bbdfeb31dee6..fefe357c3d31 100644 --- a/arch/arc/mm/cache.c +++ b/arch/arc/mm/cache.c @@ -840,7 +840,7 @@ void flush_cache_mm(struct mm_struct *mm) void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr, unsigned long pfn) { - unsigned int paddr = pfn << PAGE_SHIFT; + phys_addr_t paddr = pfn << PAGE_SHIFT; u_vaddr &= PAGE_MASK; @@ -860,8 +860,9 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long u_vaddr) { /* TBD: do we really need to clear the kernel mapping */ - __flush_dcache_page(page_address(page), u_vaddr); - __flush_dcache_page(page_address(page), page_address(page)); + __flush_dcache_page((phys_addr_t)page_address(page), u_vaddr); + __flush_dcache_page((phys_addr_t)page_address(page), + (phys_addr_t)page_address(page)); } From ab99a2bac2ea8ff6fc55437523e29a984040f505 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 1 Aug 2018 10:38:43 -0700 Subject: [PATCH 838/970] squashfs metadata 2: electric boogaloo [ Upstream commit cdbb65c4c7ead680ebe54f4f0d486e2847a500ea ] Anatoly continues to find issues with fuzzed squashfs images. This time, corrupt, missing, or undersized data for the page filling wasn't checked for, because the squashfs_{copy,read}_cache() functions did the squashfs_copy_data() call without checking the resulting data size. Which could result in the page cache pages being incompletely filled in, and no error indication to the user space reading garbage data. So make a helper function for the "fill in pages" case, because the exact same incomplete sequence existed in two places. [ I should have made a squashfs branch for these things, but I didn't intend to start doing them in the first place. My historical connection through cramfs is why I got into looking at these issues at all, and every time I (continue to) think it's a one-off. Because _this_ time is always the last time. Right? - Linus ] Reported-by: Anatoly Trosinenko Tested-by: Willy Tarreau Cc: Al Viro Cc: Phillip Lougher Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/squashfs/file.c | 25 ++++++++++++++++++------- fs/squashfs/file_direct.c | 8 +------- fs/squashfs/squashfs.h | 1 + 3 files changed, 20 insertions(+), 14 deletions(-) diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index fcff2e0487fe..cce3060650ae 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -374,13 +374,29 @@ static int read_blocklist(struct inode *inode, int index, u64 *block) return squashfs_block_size(size); } +void squashfs_fill_page(struct page *page, struct squashfs_cache_entry *buffer, int offset, int avail) +{ + int copied; + void *pageaddr; + + pageaddr = kmap_atomic(page); + copied = squashfs_copy_data(pageaddr, buffer, offset, avail); + memset(pageaddr + copied, 0, PAGE_SIZE - copied); + kunmap_atomic(pageaddr); + + flush_dcache_page(page); + if (copied == avail) + SetPageUptodate(page); + else + SetPageError(page); +} + /* Copy data into page cache */ void squashfs_copy_cache(struct page *page, struct squashfs_cache_entry *buffer, int bytes, int offset) { struct inode *inode = page->mapping->host; struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info; - void *pageaddr; int i, mask = (1 << (msblk->block_log - PAGE_SHIFT)) - 1; int start_index = page->index & ~mask, end_index = start_index | mask; @@ -406,12 +422,7 @@ void squashfs_copy_cache(struct page *page, struct squashfs_cache_entry *buffer, if (PageUptodate(push_page)) goto skip_page; - pageaddr = kmap_atomic(push_page); - squashfs_copy_data(pageaddr, buffer, offset, avail); - memset(pageaddr + avail, 0, PAGE_SIZE - avail); - kunmap_atomic(pageaddr); - flush_dcache_page(push_page); - SetPageUptodate(push_page); + squashfs_fill_page(push_page, buffer, offset, avail); skip_page: unlock_page(push_page); if (i != page->index) diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index cb485d8e0e91..096990254a2e 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -144,7 +144,6 @@ static int squashfs_read_cache(struct page *target_page, u64 block, int bsize, struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb, block, bsize); int bytes = buffer->length, res = buffer->error, n, offset = 0; - void *pageaddr; if (res) { ERROR("Unable to read page, block %llx, size %x\n", block, @@ -159,12 +158,7 @@ static int squashfs_read_cache(struct page *target_page, u64 block, int bsize, if (page[n] == NULL) continue; - pageaddr = kmap_atomic(page[n]); - squashfs_copy_data(pageaddr, buffer, offset, avail); - memset(pageaddr + avail, 0, PAGE_SIZE - avail); - kunmap_atomic(pageaddr); - flush_dcache_page(page[n]); - SetPageUptodate(page[n]); + squashfs_fill_page(page[n], buffer, offset, avail); unlock_page(page[n]); if (page[n] != target_page) put_page(page[n]); diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h index 887d6d270080..d8d43724cf2a 100644 --- a/fs/squashfs/squashfs.h +++ b/fs/squashfs/squashfs.h @@ -67,6 +67,7 @@ extern __le64 *squashfs_read_fragment_index_table(struct super_block *, u64, u64, unsigned int); /* file.c */ +void squashfs_fill_page(struct page *, struct squashfs_cache_entry *, int, int); void squashfs_copy_cache(struct page *, struct squashfs_cache_entry *, int, int); From 6b9882c69ee12e1a1437e3ea9fa71edb1d36313f Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Thu, 2 Aug 2018 16:45:15 +0100 Subject: [PATCH 839/970] Squashfs: Compute expected length from inode size rather than block length MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit a3f94cb99a854fa381fe7fadd97c4f61633717a5 ] Previously in squashfs_readpage() when copying data into the page cache, it used the length of the datablock read from the filesystem (after decompression). However, if the filesystem has been corrupted this data block may be short, which will leave pages unfilled. The fix for this is to compute the expected number of bytes to copy from the inode size, and use this to detect if the block is short. Signed-off-by: Phillip Lougher Tested-by: Willy Tarreau Cc: Анатолий Тросиненко Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/squashfs/file.c | 25 ++++++++++--------------- fs/squashfs/file_cache.c | 4 ++-- fs/squashfs/file_direct.c | 16 +++++++++++----- fs/squashfs/squashfs.h | 2 +- 4 files changed, 24 insertions(+), 23 deletions(-) diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index cce3060650ae..f1c1430ae721 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -431,10 +431,9 @@ void squashfs_copy_cache(struct page *page, struct squashfs_cache_entry *buffer, } /* Read datablock stored packed inside a fragment (tail-end packed block) */ -static int squashfs_readpage_fragment(struct page *page) +static int squashfs_readpage_fragment(struct page *page, int expected) { struct inode *inode = page->mapping->host; - struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info; struct squashfs_cache_entry *buffer = squashfs_get_fragment(inode->i_sb, squashfs_i(inode)->fragment_block, squashfs_i(inode)->fragment_size); @@ -445,23 +444,16 @@ static int squashfs_readpage_fragment(struct page *page) squashfs_i(inode)->fragment_block, squashfs_i(inode)->fragment_size); else - squashfs_copy_cache(page, buffer, i_size_read(inode) & - (msblk->block_size - 1), + squashfs_copy_cache(page, buffer, expected, squashfs_i(inode)->fragment_offset); squashfs_cache_put(buffer); return res; } -static int squashfs_readpage_sparse(struct page *page, int index, int file_end) +static int squashfs_readpage_sparse(struct page *page, int expected) { - struct inode *inode = page->mapping->host; - struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info; - int bytes = index == file_end ? - (i_size_read(inode) & (msblk->block_size - 1)) : - msblk->block_size; - - squashfs_copy_cache(page, NULL, bytes, 0); + squashfs_copy_cache(page, NULL, expected, 0); return 0; } @@ -471,6 +463,9 @@ static int squashfs_readpage(struct file *file, struct page *page) struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info; int index = page->index >> (msblk->block_log - PAGE_SHIFT); int file_end = i_size_read(inode) >> msblk->block_log; + int expected = index == file_end ? + (i_size_read(inode) & (msblk->block_size - 1)) : + msblk->block_size; int res; void *pageaddr; @@ -489,11 +484,11 @@ static int squashfs_readpage(struct file *file, struct page *page) goto error_out; if (bsize == 0) - res = squashfs_readpage_sparse(page, index, file_end); + res = squashfs_readpage_sparse(page, expected); else - res = squashfs_readpage_block(page, block, bsize); + res = squashfs_readpage_block(page, block, bsize, expected); } else - res = squashfs_readpage_fragment(page); + res = squashfs_readpage_fragment(page, expected); if (!res) return 0; diff --git a/fs/squashfs/file_cache.c b/fs/squashfs/file_cache.c index f2310d2a2019..a9ba8d96776a 100644 --- a/fs/squashfs/file_cache.c +++ b/fs/squashfs/file_cache.c @@ -20,7 +20,7 @@ #include "squashfs.h" /* Read separately compressed datablock and memcopy into page cache */ -int squashfs_readpage_block(struct page *page, u64 block, int bsize) +int squashfs_readpage_block(struct page *page, u64 block, int bsize, int expected) { struct inode *i = page->mapping->host; struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb, @@ -31,7 +31,7 @@ int squashfs_readpage_block(struct page *page, u64 block, int bsize) ERROR("Unable to read page, block %llx, size %x\n", block, bsize); else - squashfs_copy_cache(page, buffer, buffer->length, 0); + squashfs_copy_cache(page, buffer, expected, 0); squashfs_cache_put(buffer); return res; diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index 096990254a2e..80db1b86a27c 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -21,10 +21,11 @@ #include "page_actor.h" static int squashfs_read_cache(struct page *target_page, u64 block, int bsize, - int pages, struct page **page); + int pages, struct page **page, int bytes); /* Read separately compressed datablock directly into page cache */ -int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) +int squashfs_readpage_block(struct page *target_page, u64 block, int bsize, + int expected) { struct inode *inode = target_page->mapping->host; @@ -83,7 +84,7 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) * using an intermediate buffer. */ res = squashfs_read_cache(target_page, block, bsize, pages, - page); + page, expected); if (res < 0) goto mark_errored; @@ -95,6 +96,11 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) if (res < 0) goto mark_errored; + if (res != expected) { + res = -EIO; + goto mark_errored; + } + /* Last page may have trailing bytes not filled */ bytes = res % PAGE_SIZE; if (bytes) { @@ -138,12 +144,12 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize) static int squashfs_read_cache(struct page *target_page, u64 block, int bsize, - int pages, struct page **page) + int pages, struct page **page, int bytes) { struct inode *i = target_page->mapping->host; struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb, block, bsize); - int bytes = buffer->length, res = buffer->error, n, offset = 0; + int res = buffer->error, n, offset = 0; if (res) { ERROR("Unable to read page, block %llx, size %x\n", block, diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h index d8d43724cf2a..f89f8a74c6ce 100644 --- a/fs/squashfs/squashfs.h +++ b/fs/squashfs/squashfs.h @@ -72,7 +72,7 @@ void squashfs_copy_cache(struct page *, struct squashfs_cache_entry *, int, int); /* file_xxx.c */ -extern int squashfs_readpage_block(struct page *, u64, int); +extern int squashfs_readpage_block(struct page *, u64, int, int); /* id.c */ extern int squashfs_get_id(struct super_block *, unsigned int, unsigned int *); From a6c6516eee1591bc09cf3dd839f71e023dae5ddf Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Wed, 1 Aug 2018 18:22:41 +0100 Subject: [PATCH 840/970] drivers: net: lmc: fix case value for target abort error [ Upstream commit afb41bb039656f0cecb54eeb8b2e2088201295f5 ] Current value for a target abort error is 0x010, however, this value should in fact be 0x002. As it stands, the range of error is 0..7 so it is currently never being detected. This bug has been in the driver since the early 2.6.12 days (or before). Detected by CoverityScan, CID#744290 ("Logically dead code") Signed-off-by: Colin Ian King Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wan/lmc/lmc_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wan/lmc/lmc_main.c b/drivers/net/wan/lmc/lmc_main.c index 299140c04556..04b60ed59ea0 100644 --- a/drivers/net/wan/lmc/lmc_main.c +++ b/drivers/net/wan/lmc/lmc_main.c @@ -1372,7 +1372,7 @@ static irqreturn_t lmc_interrupt (int irq, void *dev_instance) /*fold00*/ case 0x001: printk(KERN_WARNING "%s: Master Abort (naughty)\n", dev->name); break; - case 0x010: + case 0x002: printk(KERN_WARNING "%s: Target Abort (not so naughty)\n", dev->name); break; default: From f108e46efaac72dedc96204c9f235d6aa31c80fd Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Thu, 2 Aug 2018 15:36:01 -0700 Subject: [PATCH 841/970] memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure [ Upstream commit 7e97de0b033bcac4fa9a35cef72e0c06e6a22c67 ] In case of memcg_online_kmem() failure, memcg_cgroup::id remains hashed in mem_cgroup_idr even after memcg memory is freed. This leads to leak of ID in mem_cgroup_idr. This patch adds removal into mem_cgroup_css_alloc(), which fixes the problem. For better readability, it adds a generic helper which is used in mem_cgroup_alloc() and mem_cgroup_id_put_many() as well. Link: http://lkml.kernel.org/r/152354470916.22460.14397070748001974638.stgit@localhost.localdomain Fixes 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Signed-off-by: Kirill Tkhai Acked-by: Johannes Weiner Acked-by: Vladimir Davydov Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/memcontrol.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 349f4a8e3c4f..86a6b331b964 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4072,6 +4072,14 @@ static struct cftype mem_cgroup_legacy_files[] = { static DEFINE_IDR(mem_cgroup_idr); +static void mem_cgroup_id_remove(struct mem_cgroup *memcg) +{ + if (memcg->id.id > 0) { + idr_remove(&mem_cgroup_idr, memcg->id.id); + memcg->id.id = 0; + } +} + static void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n) { VM_BUG_ON(atomic_read(&memcg->id.ref) <= 0); @@ -4082,8 +4090,7 @@ static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n) { VM_BUG_ON(atomic_read(&memcg->id.ref) < n); if (atomic_sub_and_test(n, &memcg->id.ref)) { - idr_remove(&mem_cgroup_idr, memcg->id.id); - memcg->id.id = 0; + mem_cgroup_id_remove(memcg); /* Memcg ID pins CSS */ css_put(&memcg->css); @@ -4208,8 +4215,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void) idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); return memcg; fail: - if (memcg->id.id > 0) - idr_remove(&mem_cgroup_idr, memcg->id.id); + mem_cgroup_id_remove(memcg); __mem_cgroup_free(memcg); return NULL; } @@ -4268,6 +4274,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) return &memcg->css; fail: + mem_cgroup_id_remove(memcg); mem_cgroup_free(memcg); return ERR_PTR(-ENOMEM); } From cfbe3cabb589835e1426f0ac8339c19cf22eb005 Mon Sep 17 00:00:00 2001 From: Johannes Thumshirn Date: Tue, 31 Jul 2018 15:46:02 +0200 Subject: [PATCH 842/970] scsi: fcoe: drop frames in ELS LOGO error path [ Upstream commit 63d0e3dffda311e77b9a8c500d59084e960a824a ] Drop the frames in the ELS LOGO error path instead of just returning an error. This fixes the following kmemleak report: unreferenced object 0xffff880064cb1000 (size 424): comm "kworker/0:2", pid 24, jiffies 4294904293 (age 68.504s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<(____ptrval____)>] _fc_frame_alloc+0x2c/0x180 [libfc] [<(____ptrval____)>] fc_lport_enter_logo+0x106/0x360 [libfc] [<(____ptrval____)>] fc_fabric_logoff+0x8c/0xc0 [libfc] [<(____ptrval____)>] fcoe_if_destroy+0x79/0x3b0 [fcoe] [<(____ptrval____)>] fcoe_destroy_work+0xd2/0x170 [fcoe] [<(____ptrval____)>] process_one_work+0x7ff/0x1420 [<(____ptrval____)>] worker_thread+0x87/0xef0 [<(____ptrval____)>] kthread+0x2db/0x390 [<(____ptrval____)>] ret_from_fork+0x35/0x40 [<(____ptrval____)>] 0xffffffffffffffff which can be triggered by issuing echo eth0 > /sys/bus/fcoe/ctlr_destroy Signed-off-by: Johannes Thumshirn Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/fcoe/fcoe_ctlr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c index dcf36537a767..cc3994d4e7bc 100644 --- a/drivers/scsi/fcoe/fcoe_ctlr.c +++ b/drivers/scsi/fcoe/fcoe_ctlr.c @@ -755,9 +755,9 @@ int fcoe_ctlr_els_send(struct fcoe_ctlr *fip, struct fc_lport *lport, case ELS_LOGO: if (fip->mode == FIP_MODE_VN2VN) { if (fip->state != FIP_ST_VNMP_UP) - return -EINVAL; + goto drop; if (ntoh24(fh->fh_d_id) == FC_FID_FLOGI) - return -EINVAL; + goto drop; } else { if (fip->state != FIP_ST_ENABLED) return 0; From a8871571f3561251f5f92d8cb0d2cd4f11d38123 Mon Sep 17 00:00:00 2001 From: Johannes Thumshirn Date: Tue, 31 Jul 2018 15:46:03 +0200 Subject: [PATCH 843/970] scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO [ Upstream commit 1550ec458e0cf1a40a170ab1f4c46e3f52860f65 ] When receiving a LOGO request we forget to clear the FC_RP_STARTED flag before starting the rport delete routine. As the started flag was not cleared, we're not deleting the rport but waiting for a restart and thus are keeping the reference count of the rdata object at 1. This leads to the following kmemleak report: unreferenced object 0xffff88006542aa00 (size 512): comm "kworker/0:2", pid 24, jiffies 4294899222 (age 226.880s) hex dump (first 32 bytes): 68 96 fe 65 00 88 ff ff 00 00 00 00 00 00 00 00 h..e............ 01 00 00 00 08 00 00 00 02 c5 45 24 ac b8 00 10 ..........E$.... backtrace: [<(____ptrval____)>] fcoe_ctlr_vn_add.isra.5+0x7f/0x770 [libfcoe] [<(____ptrval____)>] fcoe_ctlr_vn_recv+0x12af/0x27f0 [libfcoe] [<(____ptrval____)>] fcoe_ctlr_recv_work+0xd01/0x32f0 [libfcoe] [<(____ptrval____)>] process_one_work+0x7ff/0x1420 [<(____ptrval____)>] worker_thread+0x87/0xef0 [<(____ptrval____)>] kthread+0x2db/0x390 [<(____ptrval____)>] ret_from_fork+0x35/0x40 [<(____ptrval____)>] 0xffffffffffffffff Signed-off-by: Johannes Thumshirn Reported-by: ard Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/libfc/fc_rport.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c index 97aeaddd600d..e3ffd244603e 100644 --- a/drivers/scsi/libfc/fc_rport.c +++ b/drivers/scsi/libfc/fc_rport.c @@ -1935,6 +1935,7 @@ static void fc_rport_recv_logo_req(struct fc_lport *lport, struct fc_frame *fp) FC_RPORT_DBG(rdata, "Received LOGO request while in state %s\n", fc_rport_state(rdata)); + rdata->flags &= ~FC_RP_STARTED; fc_rport_enter_delete(rdata, RPORT_EV_STOP); mutex_unlock(&rdata->rp_mutex); kref_put(&rdata->kref, rdata->local_port->tt.rport_destroy); From b88f17ee2fe315bbff767cdcf4ad6e1c952ee0ca Mon Sep 17 00:00:00 2001 From: Jim Gill Date: Thu, 2 Aug 2018 14:13:30 -0700 Subject: [PATCH 844/970] scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED [ Upstream commit e95153b64d03c2b6e8d62e51bdcc33fcad6e0856 ] Commands that are reset are returned with status SAM_STAT_COMMAND_TERMINATED. PVSCSI currently returns DID_OK | SAM_STAT_COMMAND_TERMINATED which fails the command. Instead, set hostbyte to DID_RESET to allow upper layers to retry. Tested by copying a large file between two pvscsi disks on same adapter while performing a bus reset at 1-second intervals. Before fix, commands sometimes fail with DID_OK. After fix, commands observed to fail with DID_RESET. Signed-off-by: Jim Gill Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/vmw_pvscsi.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c index 15ca09cd16f3..874e9f085326 100644 --- a/drivers/scsi/vmw_pvscsi.c +++ b/drivers/scsi/vmw_pvscsi.c @@ -564,9 +564,14 @@ static void pvscsi_complete_request(struct pvscsi_adapter *adapter, (btstat == BTSTAT_SUCCESS || btstat == BTSTAT_LINKED_COMMAND_COMPLETED || btstat == BTSTAT_LINKED_COMMAND_COMPLETED_WITH_FLAG)) { - cmd->result = (DID_OK << 16) | sdstat; - if (sdstat == SAM_STAT_CHECK_CONDITION && cmd->sense_buffer) - cmd->result |= (DRIVER_SENSE << 24); + if (sdstat == SAM_STAT_COMMAND_TERMINATED) { + cmd->result = (DID_RESET << 16); + } else { + cmd->result = (DID_OK << 16) | sdstat; + if (sdstat == SAM_STAT_CHECK_CONDITION && + cmd->sense_buffer) + cmd->result |= (DRIVER_SENSE << 24); + } } else switch (btstat) { case BTSTAT_SUCCESS: From af669a0b2d165d2a88bff0f66776327b00e7ed98 Mon Sep 17 00:00:00 2001 From: "jie@chenjie6@huwei.com" Date: Fri, 10 Aug 2018 17:23:06 -0700 Subject: [PATCH 845/970] mm/memory.c: check return value of ioremap_prot [ Upstream commit 24eee1e4c47977bdfb71d6f15f6011e7b6188d04 ] ioremap_prot() can return NULL which could lead to an oops. Link: http://lkml.kernel.org/r/1533195441-58594-1-git-send-email-chenjie6@huawei.com Signed-off-by: chen jie Reviewed-by: Andrew Morton Cc: Li Zefan Cc: chenjie Cc: Yang Shi Cc: Alexey Dobriyan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/memory.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 88f8d6a2af05..0ff735601654 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3861,6 +3861,9 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, return -EINVAL; maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot); + if (!maddr) + return -ENOMEM; + if (write) memcpy_toio(maddr + offset, buf, len); else From fe0034a08b49a25ad91376e03045dc2e7bcf3647 Mon Sep 17 00:00:00 2001 From: Ethan Zhao Date: Mon, 4 Sep 2017 13:59:34 +0800 Subject: [PATCH 846/970] sched/sysctl: Check user input value of sysctl_sched_time_avg commit 5ccba44ba118a5000cccc50076b0344632459779 upstream. System will hang if user set sysctl_sched_time_avg to 0: [root@XXX ~]# sysctl kernel.sched_time_avg_ms=0 Stack traceback for pid 0 0xffff883f6406c600 0 0 1 3 R 0xffff883f6406cf50 *swapper/3 ffff883f7ccc3ae8 0000000000000018 ffffffff810c4dd0 0000000000000000 0000000000017800 ffff883f7ccc3d78 0000000000000003 ffff883f7ccc3bf8 ffffffff810c4fc9 ffff883f7ccc3c08 00000000810c5043 ffff883f7ccc3c08 Call Trace: [] ? update_group_capacity+0x110/0x200 [] ? update_sd_lb_stats+0x109/0x600 [] ? find_busiest_group+0x47/0x530 [] ? load_balance+0x194/0x900 [] ? update_rq_clock.part.83+0x1a/0xe0 [] ? rebalance_domains+0x152/0x290 [] ? run_rebalance_domains+0xdc/0x1d0 [] ? __do_softirq+0xfb/0x320 [] ? irq_exit+0x125/0x130 [] ? scheduler_ipi+0x97/0x160 [] ? smp_reschedule_interrupt+0x29/0x30 [] ? reschedule_interrupt+0x6e/0x80 [] ? cpuidle_enter_state+0xcc/0x230 [] ? cpuidle_enter_state+0x9c/0x230 [] ? cpuidle_enter+0x17/0x20 [] ? cpu_startup_entry+0x38c/0x420 [] ? start_secondary+0x173/0x1e0 Because divide-by-zero error happens in function: update_group_capacity() update_cpu_capacity() scale_rt_capacity() { ... total = sched_avg_period() + delta; used = div_u64(avg, total); ... } To fix this issue, check user input value of sysctl_sched_time_avg, keep it unchanged when hitting invalid input, and set the minimum limit of sysctl_sched_time_avg to 1 ms. Reported-by: James Puthukattukaran Signed-off-by: Ethan Zhao Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: efault@gmx.de Cc: ethan.kernel@gmail.com Cc: keescook@chromium.org Cc: mcgrof@kernel.org Cc: Link: http://lkml.kernel.org/r/1504504774-18253-1-git-send-email-ethan.zhao@oracle.com Signed-off-by: Ingo Molnar Cc: Steve Muckle Signed-off-by: Greg Kroah-Hartman --- kernel/sysctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 24d603d29512..7df6be31be36 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -345,7 +345,8 @@ static struct ctl_table kern_table[] = { .data = &sysctl_sched_time_avg, .maxlen = sizeof(unsigned int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dointvec_minmax, + .extra1 = &one, }, { .procname = "sched_shares_window_ns", From e8e519f8ec33ce670abef2cfc0613ec26319841e Mon Sep 17 00:00:00 2001 From: "yujuan.qi" Date: Mon, 31 Jul 2017 11:23:01 +0800 Subject: [PATCH 847/970] Cipso: cipso_v4_optptr enter infinite loop commit 40413955ee265a5e42f710940ec78f5450d49149 upstream. in for(),if((optlen > 0) && (optptr[1] == 0)), enter infinite loop. Test: receive a packet which the ip length > 20 and the first byte of ip option is 0, produce this issue Signed-off-by: yujuan.qi Acked-by: Paul Moore Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/cipso_ipv4.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index 972353cd1778..65a15889d432 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -1523,9 +1523,17 @@ unsigned char *cipso_v4_optptr(const struct sk_buff *skb) int taglen; for (optlen = iph->ihl*4 - sizeof(struct iphdr); optlen > 0; ) { - if (optptr[0] == IPOPT_CIPSO) + switch (optptr[0]) { + case IPOPT_CIPSO: return optptr; - taglen = optptr[1]; + case IPOPT_END: + return NULL; + case IPOPT_NOOP: + taglen = 1; + break; + default: + taglen = optptr[1]; + } optlen -= taglen; optptr += taglen; } From b653d47ba52d11457397a84c32c3020c32e2c49d Mon Sep 17 00:00:00 2001 From: Alexander Usyskin Date: Mon, 9 Jul 2018 12:21:44 +0300 Subject: [PATCH 848/970] mei: don't update offset in write commit a103af1b64d74853a5e08ca6c86aeb0e5c6ca4f1 upstream. MEI enables writes of complete messages only while read can be performed in parts, hence write should not update the file offset to not break interleaving partial reads with writes. Cc: Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Signed-off-by: Greg Kroah-Hartman --- drivers/misc/mei/main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c index 60f5a8ded8dd..8904491dfda4 100644 --- a/drivers/misc/mei/main.c +++ b/drivers/misc/mei/main.c @@ -304,7 +304,6 @@ static ssize_t mei_write(struct file *file, const char __user *ubuf, goto out; } - *offset = 0; cb = mei_cl_alloc_cb(cl, length, MEI_FOP_WRITE, file); if (!cb) { rets = -ENOMEM; From fbd314e29d27cb38f56e528ef06fde2b96daf7d1 Mon Sep 17 00:00:00 2001 From: Steve French Date: Thu, 28 Jun 2018 18:46:40 -0500 Subject: [PATCH 849/970] cifs: add missing debug entries for kconfig options commit 950132afd59385caf6e2b84e5235d069fa10681d upstream. /proc/fs/cifs/DebugData displays the features (Kconfig options) used to build cifs.ko but it was missing some, and needed comma separator. These can be useful in debugging certain problems so we know which optional features were enabled in the user's build. Also clarify them, by making them more closely match the corresponding CONFIG_CIFS_* parm. Old format: Features: dfs fscache posix spnego xattr acl New format: Features: DFS,FSCACHE,SMB_DIRECT,STATS,DEBUG2,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL Signed-off-by: Steve French Reviewed-by: Ronnie Sahlberg Reviewed-by: Pavel Shilovsky Reviewed-by: Paulo Alcantara CC: Stable Signed-off-by: Greg Kroah-Hartman --- fs/cifs/cifs_debug.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/fs/cifs/cifs_debug.c b/fs/cifs/cifs_debug.c index 3d03e48a9213..ad8bd96093f7 100644 --- a/fs/cifs/cifs_debug.c +++ b/fs/cifs/cifs_debug.c @@ -123,25 +123,41 @@ static int cifs_debug_data_proc_show(struct seq_file *m, void *v) seq_printf(m, "CIFS Version %s\n", CIFS_VERSION); seq_printf(m, "Features:"); #ifdef CONFIG_CIFS_DFS_UPCALL - seq_printf(m, " dfs"); + seq_printf(m, " DFS"); #endif #ifdef CONFIG_CIFS_FSCACHE - seq_printf(m, " fscache"); + seq_printf(m, ",FSCACHE"); +#endif +#ifdef CONFIG_CIFS_SMB_DIRECT + seq_printf(m, ",SMB_DIRECT"); +#endif +#ifdef CONFIG_CIFS_STATS2 + seq_printf(m, ",STATS2"); +#elif defined(CONFIG_CIFS_STATS) + seq_printf(m, ",STATS"); +#endif +#ifdef CONFIG_CIFS_DEBUG2 + seq_printf(m, ",DEBUG2"); +#elif defined(CONFIG_CIFS_DEBUG) + seq_printf(m, ",DEBUG"); +#endif +#ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY + seq_printf(m, ",ALLOW_INSECURE_LEGACY"); #endif #ifdef CONFIG_CIFS_WEAK_PW_HASH - seq_printf(m, " lanman"); + seq_printf(m, ",WEAK_PW_HASH"); #endif #ifdef CONFIG_CIFS_POSIX - seq_printf(m, " posix"); + seq_printf(m, ",CIFS_POSIX"); #endif #ifdef CONFIG_CIFS_UPCALL - seq_printf(m, " spnego"); + seq_printf(m, ",UPCALL(SPNEGO)"); #endif #ifdef CONFIG_CIFS_XATTR - seq_printf(m, " xattr"); + seq_printf(m, ",XATTR"); #endif #ifdef CONFIG_CIFS_ACL - seq_printf(m, " acl"); + seq_printf(m, ",ACL"); #endif seq_putc(m, '\n'); seq_printf(m, "Active VFS Requests: %d\n", GlobalTotalActiveXid); From c773c4fbb22b8e31699c547b28ec3fb51555a810 Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Thu, 23 Aug 2018 12:24:02 +0200 Subject: [PATCH 850/970] cifs: check kmalloc before use commit 126c97f4d0d1b5b956e8b0740c81a2b2a2ae548c upstream. The kmalloc was not being checked - if it fails issue a warning and return -ENOMEM to the caller. Signed-off-by: Nicholas Mc Guire Fixes: b8da344b74c8 ("cifs: dynamic allocation of ntlmssp blob") Signed-off-by: Steve French Reviewed-by: Pavel Shilovsky cc: Stable ` Signed-off-by: Greg Kroah-Hartman --- fs/cifs/sess.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c index c3db2a882aee..bb208076cb71 100644 --- a/fs/cifs/sess.c +++ b/fs/cifs/sess.c @@ -398,6 +398,12 @@ int build_ntlmssp_auth_blob(unsigned char **pbuffer, goto setup_ntlmv2_ret; } *pbuffer = kmalloc(size_of_ntlmssp_blob(ses), GFP_KERNEL); + if (!*pbuffer) { + rc = -ENOMEM; + cifs_dbg(VFS, "Error %d during NTLMSSP allocation\n", rc); + *buflen = 0; + goto setup_ntlmv2_ret; + } sec_blob = (AUTHENTICATE_MESSAGE *)*pbuffer; memcpy(sec_blob->Signature, NTLMSSP_SIGNATURE, 8); From a94703ff8e3647f8a9a3a92a468450299a7b77e9 Mon Sep 17 00:00:00 2001 From: Steve French Date: Thu, 9 Aug 2018 14:33:12 -0500 Subject: [PATCH 851/970] smb3: enumerating snapshots was leaving part of the data off end commit e02789a53d71334b067ad72eee5d4e88a0158083 upstream. When enumerating snapshots, the last few bytes of the final snapshot could be left off since we were miscalculating the length returned (leaving off the sizeof struct SRV_SNAPSHOT_ARRAY) See MS-SMB2 section 2.2.32.2. In addition fixup the length used to allow smaller buffer to be passed in, in order to allow returning the size of the whole snapshot array more easily. Sample userspace output with a kernel patched with this (mounted to a Windows volume with two snapshots). Before this patch, the second snapshot would be missing a few bytes at the end. ~/cifs-2.6# ~/enum-snapshots /mnt/file press enter to issue the ioctl to retrieve snapshot information ... size of snapshot array = 102 Num snapshots: 2 Num returned: 2 Array Size: 102 Snapshot 0:@GMT-2018.06.30-19.34.17 Snapshot 1:@GMT-2018.06.30-19.33.37 CC: Stable Signed-off-by: Steve French Reviewed-by: Pavel Shilovsky Signed-off-by: Greg Kroah-Hartman --- fs/cifs/smb2ops.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 812e4884c392..68622f1e706b 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -894,6 +894,13 @@ smb3_set_integrity(const unsigned int xid, struct cifs_tcon *tcon, } +/* GMT Token is @GMT-YYYY.MM.DD-HH.MM.SS Unicode which is 48 bytes + null */ +#define GMT_TOKEN_SIZE 50 + +/* + * Input buffer contains (empty) struct smb_snapshot array with size filled in + * For output see struct SRV_SNAPSHOT_ARRAY in MS-SMB2 section 2.2.32.2 + */ static int smb3_enum_snapshots(const unsigned int xid, struct cifs_tcon *tcon, struct cifsFileInfo *cfile, void __user *ioc_buf) @@ -922,14 +929,27 @@ smb3_enum_snapshots(const unsigned int xid, struct cifs_tcon *tcon, kfree(retbuf); return rc; } - if (snapshot_in.snapshot_array_size < sizeof(struct smb_snapshot_array)) { - rc = -ERANGE; - kfree(retbuf); - return rc; - } - if (ret_data_len > snapshot_in.snapshot_array_size) - ret_data_len = snapshot_in.snapshot_array_size; + /* + * Check for min size, ie not large enough to fit even one GMT + * token (snapshot). On the first ioctl some users may pass in + * smaller size (or zero) to simply get the size of the array + * so the user space caller can allocate sufficient memory + * and retry the ioctl again with larger array size sufficient + * to hold all of the snapshot GMT tokens on the second try. + */ + if (snapshot_in.snapshot_array_size < GMT_TOKEN_SIZE) + ret_data_len = sizeof(struct smb_snapshot_array); + + /* + * We return struct SRV_SNAPSHOT_ARRAY, followed by + * the snapshot array (of 50 byte GMT tokens) each + * representing an available previous version of the data + */ + if (ret_data_len > (snapshot_in.snapshot_array_size + + sizeof(struct smb_snapshot_array))) + ret_data_len = snapshot_in.snapshot_array_size + + sizeof(struct smb_snapshot_array); if (copy_to_user(ioc_buf, retbuf, ret_data_len)) rc = -EFAULT; From 707b0d2d9d3ae4b9cde197ffa1b237e29833b2a1 Mon Sep 17 00:00:00 2001 From: Steve French Date: Thu, 2 Aug 2018 20:28:18 -0500 Subject: [PATCH 852/970] smb3: Do not send SMB3 SET_INFO if nothing changed commit fd09b7d3b352105f08b8e02f7afecf7e816380ef upstream. An earlier commit had a typo which prevented the optimization from working: commit 18dd8e1a65dd ("Do not send SMB3 SET_INFO request if nothing is changing") Thank you to Metze for noticing this. Also clear a reserved field in the FILE_BASIC_INFO struct we send that should be zero (all the other fields in that struct were set or cleared explicitly already in cifs_set_file_info). Reviewed-by: Pavel Shilovsky CC: Stable # 4.9.x+ Reported-by: Stefan Metzmacher Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/inode.c | 2 ++ fs/cifs/smb2inode.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 24c19eb94fa3..a012f70bba5c 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1116,6 +1116,8 @@ cifs_set_file_info(struct inode *inode, struct iattr *attrs, unsigned int xid, if (!server->ops->set_file_info) return -ENOSYS; + info_buf.Pad = 0; + if (attrs->ia_valid & ATTR_ATIME) { set_time = true; info_buf.LastAccessTime = diff --git a/fs/cifs/smb2inode.c b/fs/cifs/smb2inode.c index 1238cd3552f9..0267d8cbc996 100644 --- a/fs/cifs/smb2inode.c +++ b/fs/cifs/smb2inode.c @@ -267,7 +267,7 @@ smb2_set_file_info(struct inode *inode, const char *full_path, int rc; if ((buf->CreationTime == 0) && (buf->LastAccessTime == 0) && - (buf->LastWriteTime == 0) && (buf->ChangeTime) && + (buf->LastWriteTime == 0) && (buf->ChangeTime == 0) && (buf->Attributes == 0)) return 0; /* would be a no op, no sense sending this */ From 893b282b7e4b88f2405cacf35e7109aab1cefe21 Mon Sep 17 00:00:00 2001 From: Steve French Date: Fri, 27 Jul 2018 22:01:49 -0500 Subject: [PATCH 853/970] smb3: don't request leases in symlink creation and query commit 22783155f4bf956c346a81624ec9258930a6fe06 upstream. Fixes problem pointed out by Pavel in discussions about commit 729c0c9dd55204f0c9a823ac8a7bfa83d36c7e78 Signed-off-by: Pavel Shilovsky Signed-off-by: Steve French Reviewed-by: Ronnie Sahlberg CC: Stable # 3.18.x+ Signed-off-by: Greg Kroah-Hartman --- fs/cifs/link.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/cifs/link.c b/fs/cifs/link.c index d031af8d3d4d..38d26cbcad07 100644 --- a/fs/cifs/link.c +++ b/fs/cifs/link.c @@ -419,7 +419,7 @@ smb3_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, struct cifs_io_parms io_parms; int buf_type = CIFS_NO_BUFFER; __le16 *utf16_path; - __u8 oplock = SMB2_OPLOCK_LEVEL_II; + __u8 oplock = SMB2_OPLOCK_LEVEL_NONE; struct smb2_file_all_info *pfile_info = NULL; oparms.tcon = tcon; @@ -481,7 +481,7 @@ smb3_create_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, struct cifs_io_parms io_parms; int create_options = CREATE_NOT_DIR; __le16 *utf16_path; - __u8 oplock = SMB2_OPLOCK_LEVEL_EXCLUSIVE; + __u8 oplock = SMB2_OPLOCK_LEVEL_NONE; struct kvec iov[2]; if (backup_cred(cifs_sb)) From 52ea94f6b328b8eb0c1a4499b1bc9ea966306cba Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sat, 28 Apr 2018 21:38:04 +0900 Subject: [PATCH 854/970] kprobes/arm64: Fix %p uses in error messages commit 0722867dcbc28cc9b269b57acd847c7c1aa638d6 upstream. Fix %p uses in error messages by removing it because those are redundant or meaningless. Signed-off-by: Masami Hiramatsu Acked-by: Will Deacon Cc: Ananth N Mavinakayanahalli Cc: Anil S Keshavamurthy Cc: Arnd Bergmann Cc: David Howells Cc: David S . Miller Cc: Heiko Carstens Cc: Jon Medhurst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Thomas Richter Cc: Tobin C . Harding Cc: acme@kernel.org Cc: akpm@linux-foundation.org Cc: brueckner@linux.vnet.ibm.com Cc: linux-arch@vger.kernel.org Cc: rostedt@goodmis.org Cc: schwidefsky@de.ibm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/152491908405.9916.12425053035317241111.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/probes/kprobes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index f5077ea7af6d..30bcae0aef2a 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -274,7 +274,7 @@ static int __kprobes reenter_kprobe(struct kprobe *p, break; case KPROBE_HIT_SS: case KPROBE_REENTER: - pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr); + pr_warn("Unrecoverable kprobe detected.\n"); dump_kprobe(p); BUG(); break; From a96371149006a872d00067c6684b487043f367ca Mon Sep 17 00:00:00 2001 From: Greg Hackmann Date: Wed, 15 Aug 2018 12:51:21 -0700 Subject: [PATCH 855/970] arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid() commit 5ad356eabc47d26a92140a0c4b20eba471c10de3 upstream. ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input before seeing if the PFN is valid. This leads to false positives when some of the upper bits are set, but the lower bits match a valid PFN. For example, the following userspace code looks up a bogus entry in /proc/kpageflags: int pagemap = open("/proc/self/pagemap", O_RDONLY); int pageflags = open("/proc/kpageflags", O_RDONLY); uint64_t pfn, val; lseek64(pagemap, [...], SEEK_SET); read(pagemap, &pfn, sizeof(pfn)); if (pfn & (1UL << 63)) { /* valid PFN */ pfn &= ((1UL << 55) - 1); /* clear flag bits */ pfn |= (1UL << 55); lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET); read(pageflags, &val, sizeof(val)); } On ARM64 this causes the userspace process to crash with SIGSEGV rather than reading (1 << KPF_NOPAGE). kpageflags_read() treats the offset as valid, and stable_page_flags() will try to access an address between the user and kernel address ranges. Fixes: c1cc1552616d ("arm64: MMU initialisation") Cc: stable@vger.kernel.org Signed-off-by: Greg Hackmann Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/mm/init.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 9d07b421f090..fa6b2fad7a3d 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -147,7 +147,11 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) #ifdef CONFIG_HAVE_ARCH_PFN_VALID int pfn_valid(unsigned long pfn) { - return memblock_is_map_memory(pfn << PAGE_SHIFT); + phys_addr_t addr = pfn << PAGE_SHIFT; + + if ((addr >> PAGE_SHIFT) != pfn) + return 0; + return memblock_is_map_memory(addr); } EXPORT_SYMBOL(pfn_valid); #endif From 1ad409840b14eedd28d4e057e028515027c03aa1 Mon Sep 17 00:00:00 2001 From: Claudio Imbrenda Date: Mon, 16 Jul 2018 10:38:57 +0200 Subject: [PATCH 856/970] s390/kvm: fix deadlock when killed by oom commit 306d6c49ac9ded11114cb53b0925da52f2c2ada1 upstream. When the oom killer kills a userspace process in the page fault handler while in guest context, the fault handler fails to release the mm_sem if the FAULT_FLAG_RETRY_NOWAIT option is set. This leads to a deadlock when tearing down the mm when the process terminates. This bug can only happen when pfault is enabled, so only KVM clients are affected. The problem arises in the rare cases in which handle_mm_fault does not release the mm_sem. This patch fixes the issue by manually releasing the mm_sem when needed. Fixes: 24eb3a824c4f3 ("KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault") Cc: # 3.15+ Signed-off-by: Claudio Imbrenda Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/mm/fault.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 661d9fe63c43..ba2f21873cbd 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -462,6 +462,8 @@ static inline int do_exception(struct pt_regs *regs, int access) /* No reason to continue if interrupted by SIGKILL. */ if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { fault = VM_FAULT_SIGNAL; + if (flags & FAULT_FLAG_RETRY_NOWAIT) + goto out_up; goto out; } if (unlikely(fault & VM_FAULT_ERROR)) From 9c2860f4564dfea57bbd670f3ecaf326b57180c1 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Wed, 1 Aug 2018 12:36:52 -0400 Subject: [PATCH 857/970] ext4: check for NUL characters in extended attribute's name commit 7d95178c77014dbd8dce36ee40bbbc5e6c121ff5 upstream. Extended attribute names are defined to be NUL-terminated, so the name must not contain a NUL character. This is important because there are places when remove extended attribute, the code uses strlen to determine the length of the entry. That should probably be fixed at some point, but code is currently really messy, so the simplest fix for now is to simply validate that the extended attributes are sane. https://bugzilla.kernel.org/show_bug.cgi?id=200401 Reported-by: Wen Xu Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/xattr.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 3fadfabcac39..fdcbe0f2814f 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -184,6 +184,8 @@ ext4_xattr_check_names(struct ext4_xattr_entry *entry, void *end, struct ext4_xattr_entry *next = EXT4_XATTR_NEXT(e); if ((void *)next >= end) return -EFSCORRUPTED; + if (strnlen(e->e_name, e->e_name_len) != e->e_name_len) + return -EFSCORRUPTED; e = next; } From e0cfc1c92e290963dc0cda5a1b4cf2a21cd7712d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Sun, 29 Jul 2018 15:48:00 -0400 Subject: [PATCH 858/970] ext4: sysfs: print ext4_super_block fields as little-endian commit a4d2aadca184ece182418950d45ba4ffc7b652d2 upstream. While working on extended rand for last_error/first_error timestamps, I noticed that the endianess is wrong; we access the little-endian fields in struct ext4_super_block as native-endian when we print them. This adds a special case in ext4_attr_show() and ext4_attr_store() to byteswap the superblock fields if needed. In older kernels, this code was part of super.c, it got moved to sysfs.c in linux-4.4. Cc: stable@vger.kernel.org Fixes: 52c198c6820f ("ext4: add sysfs entry showing whether the fs contains errors") Reviewed-by: Andreas Dilger Signed-off-by: Arnd Bergmann Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/sysfs.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c index 5dc655e410b4..54942d60e72a 100644 --- a/fs/ext4/sysfs.c +++ b/fs/ext4/sysfs.c @@ -277,8 +277,12 @@ static ssize_t ext4_attr_show(struct kobject *kobj, case attr_pointer_ui: if (!ptr) return 0; - return snprintf(buf, PAGE_SIZE, "%u\n", - *((unsigned int *) ptr)); + if (a->attr_ptr == ptr_ext4_super_block_offset) + return snprintf(buf, PAGE_SIZE, "%u\n", + le32_to_cpup(ptr)); + else + return snprintf(buf, PAGE_SIZE, "%u\n", + *((unsigned int *) ptr)); case attr_pointer_atomic: if (!ptr) return 0; @@ -311,7 +315,10 @@ static ssize_t ext4_attr_store(struct kobject *kobj, ret = kstrtoul(skip_spaces(buf), 0, &t); if (ret) return ret; - *((unsigned int *) ptr) = t; + if (a->attr_ptr == ptr_ext4_super_block_offset) + *((__le32 *) ptr) = cpu_to_le32(t); + else + *((unsigned int *) ptr) = t; return len; case attr_inode_readahead: return inode_readahead_blks_store(a, sbi, buf, len); From e85b9fbdf834c92a80e4eb2c2949f4d84e3edeb8 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Sun, 29 Jul 2018 17:13:42 -0400 Subject: [PATCH 859/970] ext4: reset error code in ext4_find_entry in fallback commit f39b3f45dbcb0343822cce31ea7636ad66e60bc2 upstream. When ext4_find_entry() falls back to "searching the old fashioned way" due to a corrupt dx dir, it needs to reset the error code to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned to userspace. https://bugzilla.kernel.org/show_bug.cgi?id=199947 Reported-by: Anatoly Trosinenko Reviewed-by: Andreas Dilger Signed-off-by: Eric Sandeen Signed-off-by: Theodore Ts'o Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- fs/ext4/namei.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 248c43b63f13..a225a21d04ad 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1415,6 +1415,7 @@ static struct buffer_head * ext4_find_entry (struct inode *dir, goto cleanup_and_exit; dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " "falling back\n")); + ret = NULL; } nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); if (!nblocks) { From 3fedc0cd376b34ad48b5917e64de2a0bba44deb5 Mon Sep 17 00:00:00 2001 From: Greg Hackmann Date: Fri, 31 Aug 2018 13:06:27 -0700 Subject: [PATCH 860/970] staging: android: ion: fix ION_IOC_{MAP,SHARE} use-after-free This patch is 4.9.y only. Kernels 4.12 and later are unaffected, since all the underlying ion_handle infrastructure has been ripped out. The ION_IOC_{MAP,SHARE} ioctls drop and reacquire client->lock several times while operating on one of the client's ion_handles. This creates windows where userspace can call ION_IOC_FREE on the same client with the same handle, and effectively make the kernel drop its own reference. For example: - thread A: ION_IOC_ALLOC creates an ion_handle with refcount 1 - thread A: starts ION_IOC_MAP and increments the refcount to 2 - thread B: ION_IOC_FREE decrements the refcount to 1 - thread B: ION_IOC_FREE decrements the refcount to 0 and frees the handle - thread A: continues ION_IOC_MAP with a dangling ion_handle * to freed memory Fix this by holding client->lock for the duration of ION_IOC_{MAP,SHARE}, preventing the concurrent ION_IOC_FREE. Also remove ion_handle_get_by_id(), since there's literally no way to use it safely. Cc: stable@vger.kernel.org # v4.11- Signed-off-by: Greg Hackmann Signed-off-by: Greg Kroah-Hartman --- drivers/staging/android/ion/ion-ioctl.c | 12 ++++--- drivers/staging/android/ion/ion.c | 48 +++++++++++++++---------- drivers/staging/android/ion/ion_priv.h | 6 ++-- 3 files changed, 40 insertions(+), 26 deletions(-) diff --git a/drivers/staging/android/ion/ion-ioctl.c b/drivers/staging/android/ion/ion-ioctl.c index 2b700e8455c6..e3596855a703 100644 --- a/drivers/staging/android/ion/ion-ioctl.c +++ b/drivers/staging/android/ion/ion-ioctl.c @@ -128,11 +128,15 @@ long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct ion_handle *handle; - handle = ion_handle_get_by_id(client, data.handle.handle); - if (IS_ERR(handle)) + mutex_lock(&client->lock); + handle = ion_handle_get_by_id_nolock(client, data.handle.handle); + if (IS_ERR(handle)) { + mutex_unlock(&client->lock); return PTR_ERR(handle); - data.fd.fd = ion_share_dma_buf_fd(client, handle); - ion_handle_put(handle); + } + data.fd.fd = ion_share_dma_buf_fd_nolock(client, handle); + ion_handle_put_nolock(handle); + mutex_unlock(&client->lock); if (data.fd.fd < 0) ret = data.fd.fd; break; diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 6f9974cb0e15..403df8bf4b48 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -352,18 +352,6 @@ struct ion_handle *ion_handle_get_by_id_nolock(struct ion_client *client, return handle ? handle : ERR_PTR(-EINVAL); } -struct ion_handle *ion_handle_get_by_id(struct ion_client *client, - int id) -{ - struct ion_handle *handle; - - mutex_lock(&client->lock); - handle = ion_handle_get_by_id_nolock(client, id); - mutex_unlock(&client->lock); - - return handle; -} - static bool ion_handle_validate(struct ion_client *client, struct ion_handle *handle) { @@ -1029,24 +1017,28 @@ static struct dma_buf_ops dma_buf_ops = { .kunmap = ion_dma_buf_kunmap, }; -struct dma_buf *ion_share_dma_buf(struct ion_client *client, - struct ion_handle *handle) +static struct dma_buf *__ion_share_dma_buf(struct ion_client *client, + struct ion_handle *handle, + bool lock_client) { DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct ion_buffer *buffer; struct dma_buf *dmabuf; bool valid_handle; - mutex_lock(&client->lock); + if (lock_client) + mutex_lock(&client->lock); valid_handle = ion_handle_validate(client, handle); if (!valid_handle) { WARN(1, "%s: invalid handle passed to share.\n", __func__); - mutex_unlock(&client->lock); + if (lock_client) + mutex_unlock(&client->lock); return ERR_PTR(-EINVAL); } buffer = handle->buffer; ion_buffer_get(buffer); - mutex_unlock(&client->lock); + if (lock_client) + mutex_unlock(&client->lock); exp_info.ops = &dma_buf_ops; exp_info.size = buffer->size; @@ -1061,14 +1053,21 @@ struct dma_buf *ion_share_dma_buf(struct ion_client *client, return dmabuf; } + +struct dma_buf *ion_share_dma_buf(struct ion_client *client, + struct ion_handle *handle) +{ + return __ion_share_dma_buf(client, handle, true); +} EXPORT_SYMBOL(ion_share_dma_buf); -int ion_share_dma_buf_fd(struct ion_client *client, struct ion_handle *handle) +static int __ion_share_dma_buf_fd(struct ion_client *client, + struct ion_handle *handle, bool lock_client) { struct dma_buf *dmabuf; int fd; - dmabuf = ion_share_dma_buf(client, handle); + dmabuf = __ion_share_dma_buf(client, handle, lock_client); if (IS_ERR(dmabuf)) return PTR_ERR(dmabuf); @@ -1078,8 +1077,19 @@ int ion_share_dma_buf_fd(struct ion_client *client, struct ion_handle *handle) return fd; } + +int ion_share_dma_buf_fd(struct ion_client *client, struct ion_handle *handle) +{ + return __ion_share_dma_buf_fd(client, handle, true); +} EXPORT_SYMBOL(ion_share_dma_buf_fd); +int ion_share_dma_buf_fd_nolock(struct ion_client *client, + struct ion_handle *handle) +{ + return __ion_share_dma_buf_fd(client, handle, false); +} + struct ion_handle *ion_import_dma_buf(struct ion_client *client, struct dma_buf *dmabuf) { diff --git a/drivers/staging/android/ion/ion_priv.h b/drivers/staging/android/ion/ion_priv.h index 3c3b3245275d..760e41885448 100644 --- a/drivers/staging/android/ion/ion_priv.h +++ b/drivers/staging/android/ion/ion_priv.h @@ -463,11 +463,11 @@ void ion_free_nolock(struct ion_client *client, struct ion_handle *handle); int ion_handle_put_nolock(struct ion_handle *handle); -struct ion_handle *ion_handle_get_by_id(struct ion_client *client, - int id); - int ion_handle_put(struct ion_handle *handle); int ion_query_heaps(struct ion_client *client, struct ion_heap_query *query); +int ion_share_dma_buf_fd_nolock(struct ion_client *client, + struct ion_handle *handle); + #endif /* _ION_PRIV_H */ From a22bdef6b9f8482426455190f8c0ce207fcc3613 Mon Sep 17 00:00:00 2001 From: Punit Agrawal Date: Mon, 13 Aug 2018 11:43:51 +0100 Subject: [PATCH 861/970] KVM: arm/arm64: Skip updating PTE entry if no change commit 976d34e2dab10ece5ea8fe7090b7692913f89084 upstream. When there is contention on faulting in a particular page table entry at stage 2, the break-before-make requirement of the architecture can lead to additional refaulting due to TLB invalidation. Avoid this by skipping a page table update if the new value of the PTE matches the previous value. Cc: stable@vger.kernel.org Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Reviewed-by: Suzuki Poulose Acked-by: Christoffer Dall Signed-off-by: Punit Agrawal Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/mmu.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 7f868d9bb5ed..cd62951b8c21 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -962,6 +962,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, /* Create 2nd stage page table mapping - Level 3 */ old_pte = *pte; if (pte_present(old_pte)) { + /* Skip page table update if there is no change */ + if (pte_val(old_pte) == pte_val(*new_pte)) + return 0; + kvm_set_pte(pte, __pte(0)); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { From 7fb227e97df9793c9d7b32962077bb3031cdb85c Mon Sep 17 00:00:00 2001 From: Punit Agrawal Date: Mon, 13 Aug 2018 11:43:50 +0100 Subject: [PATCH 862/970] KVM: arm/arm64: Skip updating PMD entry if no change commit 86658b819cd0a9aa584cd84453ed268a6f013770 upstream. Contention on updating a PMD entry by a large number of vcpus can lead to duplicate work when handling stage 2 page faults. As the page table update follows the break-before-make requirement of the architecture, it can lead to repeated refaults due to clearing the entry and flushing the tlbs. This problem is more likely when - * there are large number of vcpus * the mapping is large block mapping such as when using PMD hugepages (512MB) with 64k pages. Fix this by skipping the page table update if there is no change in the entry being updated. Cc: stable@vger.kernel.org Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages") Reviewed-by: Suzuki Poulose Acked-by: Christoffer Dall Signed-off-by: Punit Agrawal Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm/kvm/mmu.c | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index cd62951b8c21..b3d268a79f05 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -894,19 +894,35 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); - /* - * Mapping in huge pages should only happen through a fault. If a - * page is merged into a transparent huge page, the individual - * subpages of that huge page should be unmapped through MMU - * notifiers before we get here. - * - * Merging of CompoundPages is not supported; they should become - * splitting first, unmapped, merged, and mapped back in on-demand. - */ - VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd)); - old_pmd = *pmd; if (pmd_present(old_pmd)) { + /* + * Multiple vcpus faulting on the same PMD entry, can + * lead to them sequentially updating the PMD with the + * same value. Following the break-before-make + * (pmd_clear() followed by tlb_flush()) process can + * hinder forward progress due to refaults generated + * on missing translations. + * + * Skip updating the page table if the entry is + * unchanged. + */ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + + /* + * Mapping in huge pages should only happen through a + * fault. If a page is merged into a transparent huge + * page, the individual subpages of that huge page + * should be unmapped through MMU notifiers before we + * get here. + * + * Merging of CompoundPages is not supported; they + * should become splitting first, unmapped, merged, + * and mapped back in on-demand. + */ + VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); + pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { From cd4fdbb85e7fa74ea88f2db949ceb65f0aa8295b Mon Sep 17 00:00:00 2001 From: Thomas Petazzoni Date: Sun, 13 Aug 2017 23:14:58 +0200 Subject: [PATCH 863/970] sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 2dc77533f1e495788d73ffa4bee4323b2646d2bb upstream. When building the kernel for Sparc using gcc 7.x, the build fails with: arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’: arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized] cmd |= PCI_COMMAND_IO; ^~ The simplified code looks like this: unsigned int cmd; [...] pcic_read_config(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd); [...] cmd |= PCI_COMMAND_IO; I.e, the code assumes that pcic_read_config() will always initialize cmd. But it's not the case. Looking at pcic_read_config(), if bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will not be initialized. As a simple fix, we initialize cmd to zero at the beginning of pcibios_fixup_bus. Signed-off-by: Thomas Petazzoni Signed-off-by: David S. Miller Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- arch/sparc/kernel/pcic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/kernel/pcic.c b/arch/sparc/kernel/pcic.c index 24384e1dc33d..a7aeb036b070 100644 --- a/arch/sparc/kernel/pcic.c +++ b/arch/sparc/kernel/pcic.c @@ -602,7 +602,7 @@ void pcibios_fixup_bus(struct pci_bus *bus) { struct pci_dev *dev; int i, has_io, has_mem; - unsigned int cmd; + unsigned int cmd = 0; struct linux_pcic *pcic; /* struct linux_pbm_info* pbm = &pcic->pbm; */ int node; From fe0f40491bba65d904edd0147184bd9882cb7b7f Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Mon, 20 Aug 2018 11:58:35 +0200 Subject: [PATCH 864/970] x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit commit 9df9516940a61d29aedf4d91b483ca6597e7d480 upstream. On 32bit PAE kernels on 64bit hardware with enough physical bits, l1tf_pfn_limit() will overflow unsigned long. This in turn affects max_swapfile_size() and can lead to swapon returning -EINVAL. This has been observed in a 32bit guest with 42 bits physical address size, where max_swapfile_size() overflows exactly to 1 << 32, thus zero, and produces the following warning to dmesg: [ 6.396845] Truncating oversized swap area, only using 0k out of 2047996k Fix this by using unsigned long long instead. Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Reported-by: Dominique Leuenberger Reported-by: Adrian Schroeter Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Acked-by: Andi Kleen Acked-by: Michal Hocko Cc: "H . Peter Anvin" Cc: Linus Torvalds Cc: Dave Hansen Cc: Michal Hocko Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180820095835.5298-1-vbabka@suse.cz Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 4 ++-- arch/x86/mm/init.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index d5525a7e119e..c3d53afef073 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -173,9 +173,9 @@ extern const struct seq_operations cpuinfo_op; extern void cpu_detect(struct cpuinfo_x86 *c); -static inline unsigned long l1tf_pfn_limit(void) +static inline unsigned long long l1tf_pfn_limit(void) { - return BIT(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT) - 1; + return BIT_ULL(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT) - 1; } extern void early_cpu_init(void); diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 5d35b555115a..8c51000e3e61 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -792,7 +792,7 @@ unsigned long max_swapfile_size(void) if (boot_cpu_has_bug(X86_BUG_L1TF)) { /* Limit the swap file size to MAX_PA/2 for L1TF workaround */ - unsigned long l1tf_limit = l1tf_pfn_limit() + 1; + unsigned long long l1tf_limit = l1tf_pfn_limit() + 1; /* * We encode swap offsets also with 3 bits below those for pfn * which makes the usable limit higher. @@ -800,7 +800,7 @@ unsigned long max_swapfile_size(void) #if CONFIG_PGTABLE_LEVELS > 2 l1tf_limit <<= PAGE_SHIFT - SWP_OFFSET_FIRST_BIT; #endif - pages = min_t(unsigned long, l1tf_limit, pages); + pages = min_t(unsigned long long, l1tf_limit, pages); } return pages; } From f8d42d5c02084c936f0a7830e011b4395f30f06e Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 23 Aug 2018 15:44:18 +0200 Subject: [PATCH 865/970] x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM commit b0a182f875689647b014bc01d36b340217792852 upstream. Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. In fact it's a CPU with 36bits phys limit (64GB) and 32GB memory, but due to holes in the e820 map, the main region is almost 500MB over the 32GB limit: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000081effffff] usable Suggestions to use 'mem=32G' to enable the L1TF mitigation while losing the 500MB revealed, that there's an off-by-one error in the check in l1tf_select_mitigation(). l1tf_pfn_limit() returns the last usable pfn (inclusive) and the range check in the mitigation path does not take this into account. Instead of amending the range check, make l1tf_pfn_limit() return the first PFN which is over the limit which is less error prone. Adjust the other users accordingly. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536 Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: George Anchev Reported-by: Christopher Snowhill Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Cc: "H . Peter Anvin" Cc: Linus Torvalds Cc: Andi Kleen Cc: Dave Hansen Cc: Michal Hocko Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180823134418.17008-1-vbabka@suse.cz Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 2 +- arch/x86/mm/init.c | 2 +- arch/x86/mm/mmap.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index c3d53afef073..8d9ea8a4f229 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -175,7 +175,7 @@ extern void cpu_detect(struct cpuinfo_x86 *c); static inline unsigned long long l1tf_pfn_limit(void) { - return BIT_ULL(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT) - 1; + return BIT_ULL(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT); } extern void early_cpu_init(void); diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 8c51000e3e61..90801a8f19c9 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -792,7 +792,7 @@ unsigned long max_swapfile_size(void) if (boot_cpu_has_bug(X86_BUG_L1TF)) { /* Limit the swap file size to MAX_PA/2 for L1TF workaround */ - unsigned long long l1tf_limit = l1tf_pfn_limit() + 1; + unsigned long long l1tf_limit = l1tf_pfn_limit(); /* * We encode swap offsets also with 3 bits below those for pfn * which makes the usable limit higher. diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 5aad869fa205..74609a957c49 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -138,7 +138,7 @@ bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) /* If it's real memory always allow */ if (pfn_valid(pfn)) return true; - if (pfn > l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN)) + if (pfn >= l1tf_pfn_limit() && !capable(CAP_SYS_ADMIN)) return false; return true; } From 2421738c164f79868d7bbe13a38129e7a3ffb6f3 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Thu, 23 Aug 2018 16:21:29 +0200 Subject: [PATCH 866/970] x86/speculation/l1tf: Suggest what to do on systems with too much RAM commit 6a012288d6906fee1dbc244050ade1dafe4a9c8d upstream. Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. Make the warning more helpful by suggesting the proper mem=X kernel boot parameter to make it effective and a link to the L1TF document to help decide if the mitigation is worth the unusable RAM. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536 Suggested-by: Michal Hocko Signed-off-by: Vlastimil Babka Acked-by: Michal Hocko Cc: "H . Peter Anvin" Cc: Linus Torvalds Cc: Andi Kleen Cc: Dave Hansen Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/bugs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index ac67a76550bd..5acf477e8025 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -685,6 +685,10 @@ static void __init l1tf_select_mitigation(void) half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT; if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) { pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n"); + pr_info("You may make it effective by booting the kernel with mem=%llu parameter.\n", + half_pa); + pr_info("However, doing so will make a part of your RAM unusable.\n"); + pr_info("Reading https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html might help you decide.\n"); return; } From 62cfd81bcf5425f6db1eed1eef045fbac970ec7e Mon Sep 17 00:00:00 2001 From: Rian Hunter Date: Sun, 19 Aug 2018 16:08:53 -0700 Subject: [PATCH 867/970] x86/process: Re-export start_thread() commit dc76803e57cc86589c4efcb5362918f9b0c0436f upstream. The consolidation of the start_thread() functions removed the export unintentionally. This breaks binfmt handlers built as a module. Add it back. Fixes: e634d8fc792c ("x86-64: merge the standard and compat start_thread() functions") Signed-off-by: Rian Hunter Signed-off-by: Thomas Gleixner Cc: "H. Peter Anvin" Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Vitaly Kuznetsov Cc: Joerg Roedel Cc: Dmitry Safonov Cc: Josh Poimboeuf Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180819230854.7275-1-rian@alum.mit.edu Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/process_64.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index dffe81d3c261..a2661814bde0 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -360,6 +360,7 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp) start_thread_common(regs, new_ip, new_sp, __USER_CS, __USER_DS, 0); } +EXPORT_SYMBOL_GPL(start_thread); #ifdef CONFIG_COMPAT void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp) From cd4574c1b94c7158be372624a970e25c22583423 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 12 Aug 2018 20:41:45 +0200 Subject: [PATCH 868/970] KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 024d83cadc6b2af027e473720f3c3da97496c318 upstream. Mikhail reported the following lockdep splat: WARNING: possible irq lock inversion dependency detected CPU 0/KVM/10284 just changed the state of lock: 000000000d538a88 (&st->lock){+...}, at: speculative_store_bypass_update+0x10b/0x170 but this lock was taken by another, HARDIRQ-safe lock in the past: (&(&sighand->siglock)->rlock){-.-.} and interrupts could create inverse lock ordering between them. Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The code path which connects those locks is: speculative_store_bypass_update() ssb_prctl_set() do_seccomp() do_syscall_64() In svm_vcpu_run() speculative_store_bypass_update() is called with interupts enabled via x86_virt_spec_ctrl_set_guest/host(). This is actually a false positive, because GIF=0 so interrupts are disabled even if IF=1; however, we can easily move the invocations of x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to cure it, and it's a good idea to keep the GIF=0/IF=1 area as small and self-contained as possible. Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") Reported-by: Mikhail Gavrilov Signed-off-by: Thomas Gleixner Tested-by: Mikhail Gavrilov Cc: Joerg Roedel Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Matthew Wilcox Cc: Borislav Petkov Cc: Konrad Rzeszutek Wilk Cc: Tom Lendacky Cc: kvm@vger.kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c855080c7a71..5f44d63a9d69 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4973,8 +4973,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) clgi(); - local_irq_enable(); - /* * If this vCPU has touched SPEC_CTRL, restore the guest's value if * it's non-zero. Since vmentry is serialising on affected CPUs, there @@ -4983,6 +4981,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) */ x86_spec_ctrl_set_guest(svm->spec_ctrl, svm->virt_spec_ctrl); + local_irq_enable(); + asm volatile ( "push %%" _ASM_BP "; \n\t" "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" @@ -5105,12 +5105,12 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL))) svm->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); - x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); - reload_tss(vcpu); local_irq_disable(); + x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); + vcpu->arch.cr2 = svm->vmcb->save.cr2; vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax; vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp; From 6a2346f3229495f1338b7a2a869efd16822d4ee6 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Tue, 14 Aug 2018 12:32:19 -0500 Subject: [PATCH 869/970] x86/kvm/vmx: Remove duplicate l1d flush definitions commit 94d7a86c21a3d6046bf4616272313cb7d525075a upstream. These are already defined higher up in the file. Fixes: 7db92e165ac8 ("x86/kvm: Move l1tf setup function") Signed-off-by: Josh Poimboeuf Signed-off-by: Thomas Gleixner Cc: Paolo Bonzini Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/d7ca03ae210d07173452aeed85ffe344301219a5.1534253536.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 12826607a995..8e4ac0a91309 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8670,9 +8670,6 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) * information but as all relevant affected CPUs have 32KiB L1D cache size * there is no point in doing so. */ -#define L1D_CACHE_ORDER 4 -static void *vmx_l1d_flush_pages; - static void vmx_l1d_flush(struct kvm_vcpu *vcpu) { int size = PAGE_SIZE << L1D_CACHE_ORDER; From 67a9e4870ce1afcb66d314c57ca839aff421557e Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Tue, 17 Jul 2018 19:00:33 +0300 Subject: [PATCH 870/970] fuse: Don't access pipe->buffers without pipe_lock() commit a2477b0e67c52f4364a47c3ad70902bc2a61bd4c upstream. fuse_dev_splice_write() reads pipe->buffers to determine the size of 'bufs' array before taking the pipe_lock(). This is not safe as another thread might change the 'pipe->buffers' between the allocation and taking the pipe_lock(). So we end up with too small 'bufs' array. Move the bufs allocations inside pipe_lock()/pipe_unlock() to fix this. Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device") Signed-off-by: Andrey Ryabinin Cc: # v2.6.35 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dev.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index f11792672977..ce558c853722 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1935,11 +1935,14 @@ static ssize_t fuse_dev_splice_write(struct pipe_inode_info *pipe, if (!fud) return -EPERM; - bufs = kmalloc(pipe->buffers * sizeof(struct pipe_buffer), GFP_KERNEL); - if (!bufs) - return -ENOMEM; - pipe_lock(pipe); + + bufs = kmalloc(pipe->buffers * sizeof(struct pipe_buffer), GFP_KERNEL); + if (!bufs) { + pipe_unlock(pipe); + return -ENOMEM; + } + nbuf = 0; rem = 0; for (idx = 0; idx < pipe->nrbufs && rem < len; idx++) From 68fbfcb7cd4375889bb000d181dcf46aa45d06a9 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 26 Jul 2018 16:13:11 +0200 Subject: [PATCH 871/970] fuse: fix initial parallel dirops commit 63576c13bd17848376c8ba4a98f5d5151140c4ac upstream. If parallel dirops are enabled in FUSE_INIT reply, then first operation may leave fi->mutex held. Reported-by: syzbot Fixes: 5c672ab3f0ee ("fuse: serialize dirops by default") Cc: # v4.7 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dir.c | 10 ++++++---- fs/fuse/fuse_i.h | 4 ++-- fs/fuse/inode.c | 14 ++++++++++---- 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index cca8dd3bda09..60dd2bc10776 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -355,11 +355,12 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry, struct inode *inode; struct dentry *newent; bool outarg_valid = true; + bool locked; - fuse_lock_inode(dir); + locked = fuse_lock_inode(dir); err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name, &outarg, &inode); - fuse_unlock_inode(dir); + fuse_unlock_inode(dir, locked); if (err == -ENOENT) { outarg_valid = false; err = 0; @@ -1336,6 +1337,7 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx) struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; u64 attr_version = 0; + bool locked; if (is_bad_inode(inode)) return -EIO; @@ -1363,9 +1365,9 @@ static int fuse_readdir(struct file *file, struct dir_context *ctx) fuse_read_fill(req, file, ctx->pos, PAGE_SIZE, FUSE_READDIR); } - fuse_lock_inode(inode); + locked = fuse_lock_inode(inode); fuse_request_send(fc, req); - fuse_unlock_inode(inode); + fuse_unlock_inode(inode, locked); nbytes = req->out.args[0].size; err = req->out.h.error; fuse_put_request(fc, req); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 91307940c8ac..e48b10c88ce7 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -967,8 +967,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, void fuse_set_initialized(struct fuse_conn *fc); -void fuse_unlock_inode(struct inode *inode); -void fuse_lock_inode(struct inode *inode); +void fuse_unlock_inode(struct inode *inode, bool locked); +bool fuse_lock_inode(struct inode *inode); int fuse_setxattr(struct inode *inode, const char *name, const void *value, size_t size, int flags); diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index f95e1d49b048..0c28d8b58f2e 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -356,15 +356,21 @@ int fuse_reverse_inval_inode(struct super_block *sb, u64 nodeid, return 0; } -void fuse_lock_inode(struct inode *inode) +bool fuse_lock_inode(struct inode *inode) { - if (!get_fuse_conn(inode)->parallel_dirops) + bool locked = false; + + if (!get_fuse_conn(inode)->parallel_dirops) { mutex_lock(&get_fuse_inode(inode)->mutex); + locked = true; + } + + return locked; } -void fuse_unlock_inode(struct inode *inode) +void fuse_unlock_inode(struct inode *inode, bool locked) { - if (!get_fuse_conn(inode)->parallel_dirops) + if (locked) mutex_unlock(&get_fuse_inode(inode)->mutex); } From 501c4caeb3e437eb18bb95b2c58d153aadc70d6e Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 26 Jul 2018 16:13:11 +0200 Subject: [PATCH 872/970] fuse: fix double request_end() commit 87114373ea507895a62afb10d2910bd9adac35a8 upstream. Refcounting of request is broken when fuse_abort_conn() is called and request is on the fpq->io list: - ref is taken too late - then it is not dropped Fixes: 0d8e84b0432b ("fuse: simplify request abort") Cc: # v4.2 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ce558c853722..9382ba430be5 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -363,7 +363,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) struct fuse_iqueue *fiq = &fc->iq; if (test_and_set_bit(FR_FINISHED, &req->flags)) - return; + goto out_put_req; spin_lock(&fiq->waitq.lock); list_del_init(&req->intr_entry); @@ -393,6 +393,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) wake_up(&req->waitq); if (req->end) req->end(fc, req); +out_put_req: fuse_put_request(fc, req); } @@ -2097,6 +2098,7 @@ void fuse_abort_conn(struct fuse_conn *fc) set_bit(FR_ABORTED, &req->flags); if (!test_bit(FR_LOCKED, &req->flags)) { set_bit(FR_PRIVATE, &req->flags); + __fuse_get_request(req); list_move(&req->list, &to_end1); } spin_unlock(&req->waitq.lock); @@ -2123,7 +2125,6 @@ void fuse_abort_conn(struct fuse_conn *fc) while (!list_empty(&to_end1)) { req = list_first_entry(&to_end1, struct fuse_req, list); - __fuse_get_request(req); list_del_init(&req->list); request_end(fc, req); } From c66025c1085c5983a64ac2a8ed733b8cefe1659d Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 26 Jul 2018 16:13:11 +0200 Subject: [PATCH 873/970] fuse: fix unlocked access to processing queue commit 45ff350bbd9d0f0977ff270a0d427c71520c0c37 upstream. fuse_dev_release() assumes that it's the only one referencing the fpq->processing list, but that's not true, since fuse_abort_conn() can be doing the same without any serialization between the two. Fixes: c3696046beb3 ("fuse: separate pqueue for clones") Cc: # v4.2 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dev.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9382ba430be5..d7c02ebb50cc 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2142,9 +2142,15 @@ int fuse_dev_release(struct inode *inode, struct file *file) if (fud) { struct fuse_conn *fc = fud->fc; struct fuse_pqueue *fpq = &fud->pq; + LIST_HEAD(to_end); + spin_lock(&fpq->lock); WARN_ON(!list_empty(&fpq->io)); - end_requests(fc, &fpq->processing); + list_splice_init(&fpq->processing, &to_end); + spin_unlock(&fpq->lock); + + end_requests(fc, &to_end); + /* Are we the last open device? */ if (atomic_dec_and_test(&fc->dev_count)) { WARN_ON(fc->iq.fasync != NULL); From 6465d7688c2d568054c174231fb1d01e3d408e08 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 26 Jul 2018 16:13:11 +0200 Subject: [PATCH 874/970] fuse: umount should wait for all requests commit b8f95e5d13f5f0191dcb4b9113113d241636e7cb upstream. fuse_abort_conn() does not guarantee that all async requests have actually finished aborting (i.e. their ->end() function is called). This could actually result in still used inodes after umount. Add a helper to wait until all requests are fully done. This is done by looking at the "num_waiting" counter. When this counter drops to zero, we can be sure that no more requests are outstanding. Fixes: 0d8e84b0432b ("fuse: simplify request abort") Cc: # v4.2 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/dev.c | 23 +++++++++++++++++++---- fs/fuse/fuse_i.h | 1 + fs/fuse/inode.c | 2 ++ 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index d7c02ebb50cc..c94bab6103f5 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -130,6 +130,16 @@ static bool fuse_block_alloc(struct fuse_conn *fc, bool for_background) return !fc->initialized || (for_background && fc->blocked); } +static void fuse_drop_waiting(struct fuse_conn *fc) +{ + if (fc->connected) { + atomic_dec(&fc->num_waiting); + } else if (atomic_dec_and_test(&fc->num_waiting)) { + /* wake up aborters */ + wake_up_all(&fc->blocked_waitq); + } +} + static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages, bool for_background) { @@ -170,7 +180,7 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn *fc, unsigned npages, return req; out: - atomic_dec(&fc->num_waiting); + fuse_drop_waiting(fc); return ERR_PTR(err); } @@ -277,7 +287,7 @@ void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req) if (test_bit(FR_WAITING, &req->flags)) { __clear_bit(FR_WAITING, &req->flags); - atomic_dec(&fc->num_waiting); + fuse_drop_waiting(fc); } if (req->stolen_file) @@ -363,7 +373,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) struct fuse_iqueue *fiq = &fc->iq; if (test_and_set_bit(FR_FINISHED, &req->flags)) - goto out_put_req; + goto put_request; spin_lock(&fiq->waitq.lock); list_del_init(&req->intr_entry); @@ -393,7 +403,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) wake_up(&req->waitq); if (req->end) req->end(fc, req); -out_put_req: +put_request: fuse_put_request(fc, req); } @@ -2135,6 +2145,11 @@ void fuse_abort_conn(struct fuse_conn *fc) } EXPORT_SYMBOL_GPL(fuse_abort_conn); +void fuse_wait_aborted(struct fuse_conn *fc) +{ + wait_event(fc->blocked_waitq, atomic_read(&fc->num_waiting) == 0); +} + int fuse_dev_release(struct inode *inode, struct file *file) { struct fuse_dev *fud = fuse_get_dev(file); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e48b10c88ce7..1c905c7666de 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -854,6 +854,7 @@ void fuse_request_send_background_locked(struct fuse_conn *fc, /* Abort all requests */ void fuse_abort_conn(struct fuse_conn *fc); +void fuse_wait_aborted(struct fuse_conn *fc); /** * Invalidate inode attributes diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 0c28d8b58f2e..ce8f7f01107b 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -405,6 +405,8 @@ static void fuse_put_super(struct super_block *sb) fuse_send_destroy(fc); fuse_abort_conn(fc); + fuse_wait_aborted(fc); + mutex_lock(&fuse_mutex); list_del(&fc->entry); fuse_ctl_remove_conn(fc); From 263508c10106a97aba5f3e655bf145f3a671218a Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Thu, 26 Jul 2018 16:13:11 +0200 Subject: [PATCH 875/970] fuse: Fix oops at process_init_reply() commit e8f3bd773d22f488724dffb886a1618da85c2966 upstream. syzbot is hitting NULL pointer dereference at process_init_reply(). This is because deactivate_locked_super() is called before response for initial request is processed. Fix this by aborting and waiting for all requests (including FUSE_INIT) before resetting fc->sb. Original patch by Tetsuo Handa . Reported-by: syzbot Fixes: e27c9d3877a0 ("fuse: fuse: add time_gran to INIT_OUT") Cc: # v3.19 Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/inode.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index ce8f7f01107b..7a9b1069d267 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -402,11 +402,6 @@ static void fuse_put_super(struct super_block *sb) { struct fuse_conn *fc = get_fuse_conn_super(sb); - fuse_send_destroy(fc); - - fuse_abort_conn(fc); - fuse_wait_aborted(fc); - mutex_lock(&fuse_mutex); list_del(&fc->entry); fuse_ctl_remove_conn(fc); @@ -1206,16 +1201,25 @@ static struct dentry *fuse_mount(struct file_system_type *fs_type, return mount_nodev(fs_type, flags, raw_data, fuse_fill_super); } -static void fuse_kill_sb_anon(struct super_block *sb) +static void fuse_sb_destroy(struct super_block *sb) { struct fuse_conn *fc = get_fuse_conn_super(sb); if (fc) { + fuse_send_destroy(fc); + + fuse_abort_conn(fc); + fuse_wait_aborted(fc); + down_write(&fc->killsb); fc->sb = NULL; up_write(&fc->killsb); } +} +static void fuse_kill_sb_anon(struct super_block *sb) +{ + fuse_sb_destroy(sb); kill_anon_super(sb); } @@ -1238,14 +1242,7 @@ static struct dentry *fuse_mount_blk(struct file_system_type *fs_type, static void fuse_kill_sb_blk(struct super_block *sb) { - struct fuse_conn *fc = get_fuse_conn_super(sb); - - if (fc) { - down_write(&fc->killsb); - fc->sb = NULL; - up_write(&fc->killsb); - } - + fuse_sb_destroy(sb); kill_block_super(sb); } From 01f0772ad0063fbf34ec04fbac2070937cc3ce06 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Thu, 19 Jul 2018 15:49:39 +0300 Subject: [PATCH 876/970] fuse: Add missed unlock_page() to fuse_readpages_fill() commit 109728ccc5933151c68d1106e4065478a487a323 upstream. The above error path returns with page unlocked, so this place seems also to behave the same. Fixes: f8dbdf81821b ("fuse: rework fuse_readpages()") Signed-off-by: Kirill Tkhai Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/fuse/file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 996aa23c409e..4408abf6675b 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -868,6 +868,7 @@ static int fuse_readpages_fill(void *_data, struct page *page) } if (WARN_ON(req->num_pages >= req->max_pages)) { + unlock_page(page); fuse_put_request(fc, req); return -EIO; } From e0d786f7113bd1298699b7702232bded956bce4e Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Sun, 3 Jun 2018 16:40:55 +0200 Subject: [PATCH 877/970] udl-kms: change down_interruptible to down commit 8456b99c16d193c4c3b7df305cf431e027f0189c upstream. If we leave urbs around, it causes not only leak, but also memory corruption. This patch fixes the function udl_free_urb_list, so that it always waits for all urbs that are in progress. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/udl/udl_main.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/gpu/drm/udl/udl_main.c b/drivers/gpu/drm/udl/udl_main.c index 873f010d9616..04bbe15137cb 100644 --- a/drivers/gpu/drm/udl/udl_main.c +++ b/drivers/gpu/drm/udl/udl_main.c @@ -169,18 +169,13 @@ static void udl_free_urb_list(struct drm_device *dev) struct list_head *node; struct urb_node *unode; struct urb *urb; - int ret; unsigned long flags; DRM_DEBUG("Waiting for completes and freeing all render urbs\n"); /* keep waiting and freeing, until we've got 'em all */ while (count--) { - - /* Getting interrupted means a leak, but ok at shutdown*/ - ret = down_interruptible(&udl->urbs.limit_sem); - if (ret) - break; + down(&udl->urbs.limit_sem); spin_lock_irqsave(&udl->urbs.lock, flags); From 73aa57ac7cc31b23e67d885015a61a247b3bc862 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Sun, 3 Jun 2018 16:40:56 +0200 Subject: [PATCH 878/970] udl-kms: handle allocation failure commit 542bb9788a1f485eb1a2229178f665d8ea166156 upstream. Allocations larger than PAGE_ALLOC_COSTLY_ORDER are unreliable and they may fail anytime. This patch fixes the udl kms driver so that when a large alloactions fails, it tries to do multiple smaller allocations. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/udl/udl_main.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/udl/udl_main.c b/drivers/gpu/drm/udl/udl_main.c index 04bbe15137cb..10e2c198ad72 100644 --- a/drivers/gpu/drm/udl/udl_main.c +++ b/drivers/gpu/drm/udl/udl_main.c @@ -199,17 +199,22 @@ static void udl_free_urb_list(struct drm_device *dev) static int udl_alloc_urb_list(struct drm_device *dev, int count, size_t size) { struct udl_device *udl = dev->dev_private; - int i = 0; struct urb *urb; struct urb_node *unode; char *buf; + size_t wanted_size = count * size; spin_lock_init(&udl->urbs.lock); +retry: udl->urbs.size = size; INIT_LIST_HEAD(&udl->urbs.list); - while (i < count) { + sema_init(&udl->urbs.limit_sem, 0); + udl->urbs.count = 0; + udl->urbs.available = 0; + + while (udl->urbs.count * size < wanted_size) { unode = kzalloc(sizeof(struct urb_node), GFP_KERNEL); if (!unode) break; @@ -225,11 +230,16 @@ static int udl_alloc_urb_list(struct drm_device *dev, int count, size_t size) } unode->urb = urb; - buf = usb_alloc_coherent(udl->udev, MAX_TRANSFER, GFP_KERNEL, + buf = usb_alloc_coherent(udl->udev, size, GFP_KERNEL, &urb->transfer_dma); if (!buf) { kfree(unode); usb_free_urb(urb); + if (size > PAGE_SIZE) { + size /= 2; + udl_free_urb_list(dev); + goto retry; + } break; } @@ -240,16 +250,14 @@ static int udl_alloc_urb_list(struct drm_device *dev, int count, size_t size) list_add_tail(&unode->entry, &udl->urbs.list); - i++; + up(&udl->urbs.limit_sem); + udl->urbs.count++; + udl->urbs.available++; } - sema_init(&udl->urbs.limit_sem, i); - udl->urbs.count = i; - udl->urbs.available = i; + DRM_DEBUG("allocated %d %d byte urbs\n", udl->urbs.count, (int) size); - DRM_DEBUG("allocated %d %d byte urbs\n", i, (int) size); - - return i; + return udl->urbs.count; } struct urb *udl_get_urb(struct drm_device *dev) From 268143ee0b8d71975bf16056579da57ee9e91be0 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Sun, 3 Jun 2018 16:40:57 +0200 Subject: [PATCH 879/970] udl-kms: fix crash due to uninitialized memory commit 09a00abe3a9941c2715ca83eb88172cd2f54d8fd upstream. We must use kzalloc when allocating the fb_deferred_io structure. Otherwise, the field first_io is undefined and it causes a crash. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/udl/udl_fb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c index 39d0fdcb17d2..6a7994a79f55 100644 --- a/drivers/gpu/drm/udl/udl_fb.c +++ b/drivers/gpu/drm/udl/udl_fb.c @@ -217,7 +217,7 @@ static int udl_fb_open(struct fb_info *info, int user) struct fb_deferred_io *fbdefio; - fbdefio = kmalloc(sizeof(struct fb_deferred_io), GFP_KERNEL); + fbdefio = kzalloc(sizeof(struct fb_deferred_io), GFP_KERNEL); if (fbdefio) { fbdefio->delay = DL_DEFIO_WRITE_DELAY; From a8625b187d7e75af3962d0840e2595f1008ca52b Mon Sep 17 00:00:00 2001 From: Michael Buesch Date: Tue, 31 Jul 2018 21:14:16 +0200 Subject: [PATCH 880/970] b43legacy/leds: Ensure NUL-termination of LED name string commit 4d77a89e3924b12f4a5628b21237e57ab4703866 upstream. strncpy might not NUL-terminate the string, if the name equals the buffer size. Use strlcpy instead. Signed-off-by: Michael Buesch Cc: stable@vger.kernel.org Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/broadcom/b43legacy/leds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/b43legacy/leds.c b/drivers/net/wireless/broadcom/b43legacy/leds.c index fd4565389c77..bc922118b6ac 100644 --- a/drivers/net/wireless/broadcom/b43legacy/leds.c +++ b/drivers/net/wireless/broadcom/b43legacy/leds.c @@ -101,7 +101,7 @@ static int b43legacy_register_led(struct b43legacy_wldev *dev, led->dev = dev; led->index = led_index; led->activelow = activelow; - strncpy(led->name, name, sizeof(led->name)); + strlcpy(led->name, name, sizeof(led->name)); led->led_dev.name = led->name; led->led_dev.default_trigger = default_trigger; From 4a92d74e4f311635d12c593b3ef2b66c460a843b Mon Sep 17 00:00:00 2001 From: Michael Buesch Date: Tue, 31 Jul 2018 21:14:04 +0200 Subject: [PATCH 881/970] b43/leds: Ensure NUL-termination of LED name string commit 2aa650d1950fce94f696ebd7db30b8830c2c946f upstream. strncpy might not NUL-terminate the string, if the name equals the buffer size. Use strlcpy instead. Signed-off-by: Michael Buesch Cc: stable@vger.kernel.org Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/broadcom/b43/leds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/b43/leds.c b/drivers/net/wireless/broadcom/b43/leds.c index cb987c2ecc6b..87131f663292 100644 --- a/drivers/net/wireless/broadcom/b43/leds.c +++ b/drivers/net/wireless/broadcom/b43/leds.c @@ -131,7 +131,7 @@ static int b43_register_led(struct b43_wldev *dev, struct b43_led *led, led->wl = dev->wl; led->index = led_index; led->activelow = activelow; - strncpy(led->name, name, sizeof(led->name)); + strlcpy(led->name, name, sizeof(led->name)); atomic_set(&led->state, 0); led->led_dev.name = led->name; From 5304f2ac6b6b093a577498a3a00b3aef466b7a0c Mon Sep 17 00:00:00 2001 From: Jerome Brunet Date: Wed, 27 Jun 2018 17:36:38 +0200 Subject: [PATCH 882/970] ASoC: dpcm: don't merge format from invalid codec dai commit 4febced15ac8ddb9cf3e603edb111842e4863d9a upstream. When merging codec formats, dpcm_runtime_base_format() should skip the codecs which are not supporting the current stream direction. At the moment, if a BE link has more than one codec, and only one of these codecs has no capture DAI, it becomes impossible to start a capture stream because the merged format would be 0. Skipping invalid codec DAI solves the problem. Fixes: b073ed4e2126 ("ASoC: soc-pcm: DPCM cares BE format") Signed-off-by: Jerome Brunet Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- sound/soc/soc-pcm.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 20680a490897..b111ecda6439 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1621,6 +1621,14 @@ static u64 dpcm_runtime_base_format(struct snd_pcm_substream *substream) int i; for (i = 0; i < be->num_codecs; i++) { + /* + * Skip CODECs which don't support the current stream + * type. See soc_pcm_init_runtime_hw() for more details + */ + if (!snd_soc_dai_stream_valid(be->codec_dais[i], + stream)) + continue; + codec_dai_drv = be->codec_dais[i]->driver; if (stream == SNDRV_PCM_STREAM_PLAYBACK) codec_stream = &codec_dai_drv->playback; From b71230cb582089e2309d25482494bc991eb51371 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 26 Jul 2018 15:49:10 -0500 Subject: [PATCH 883/970] ASoC: sirf: Fix potential NULL pointer dereference commit ae1c696a480c67c45fb23b35162183f72c6be0e1 upstream. There is a potential execution path in which function platform_get_resource() returns NULL. If this happens, we will end up having a NULL pointer dereference. Fix this by replacing devm_ioremap with devm_ioremap_resource, which has the NULL check and the memory region request. This code was detected with the help of Coccinelle. Cc: stable@vger.kernel.org Fixes: 2bd8d1d5cf89 ("ASoC: sirf: Add audio usp interface driver") Signed-off-by: Gustavo A. R. Silva Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/sirf/sirf-usp.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/sound/soc/sirf/sirf-usp.c b/sound/soc/sirf/sirf-usp.c index 45fc06c0e0e5..6b504f407079 100644 --- a/sound/soc/sirf/sirf-usp.c +++ b/sound/soc/sirf/sirf-usp.c @@ -367,10 +367,9 @@ static int sirf_usp_pcm_probe(struct platform_device *pdev) platform_set_drvdata(pdev, usp); mem_res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - base = devm_ioremap(&pdev->dev, mem_res->start, - resource_size(mem_res)); - if (base == NULL) - return -ENOMEM; + base = devm_ioremap_resource(&pdev->dev, mem_res); + if (IS_ERR(base)) + return PTR_ERR(base); usp->regmap = devm_regmap_init_mmio(&pdev->dev, base, &sirf_usp_regmap_config); if (IS_ERR(usp->regmap)) From 597ea10b9247ba2964a70fe5b0b6f07639602a79 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 13 Jul 2018 17:55:15 +0300 Subject: [PATCH 884/970] pinctrl: freescale: off by one in imx1_pinconf_group_dbg_show() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 19da44cd33a3a6ff7c97fff0189999ff15b241e4 upstream. The info->groups[] array is allocated in imx1_pinctrl_parse_dt(). It has info->ngroups elements. Thus the > here should be >= to prevent reading one element beyond the end of the array. Cc: stable@vger.kernel.org Fixes: 30612cd90005 ("pinctrl: imx1 core driver") Signed-off-by: Dan Carpenter Reviewed-by: Uwe Kleine-König Acked-by: Dong Aisheng Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/freescale/pinctrl-imx1-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/freescale/pinctrl-imx1-core.c b/drivers/pinctrl/freescale/pinctrl-imx1-core.c index a4e9f430d452..e2cca91fd266 100644 --- a/drivers/pinctrl/freescale/pinctrl-imx1-core.c +++ b/drivers/pinctrl/freescale/pinctrl-imx1-core.c @@ -433,7 +433,7 @@ static void imx1_pinconf_group_dbg_show(struct pinctrl_dev *pctldev, const char *name; int i, ret; - if (group > info->ngroups) + if (group >= info->ngroups) return; seq_puts(s, "\n"); From c148246b87cef922d1773c21b33970fdde84809c Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Mon, 27 Aug 2018 14:40:09 -0700 Subject: [PATCH 885/970] x86/irqflags: Mark native_restore_fl extern inline commit 1f59a4581b5ecfe9b4f049a7a2cf904d8352842d upstream. This should have been marked extern inline in order to pick up the out of line definition in arch/x86/kernel/irqflags.S. Fixes: 208cbb325589 ("x86/irqflags: Provide a declaration for native_save_fl") Reported-by: Ben Hutchings Signed-off-by: Nick Desaulniers Signed-off-by: Thomas Gleixner Reviewed-by: Juergen Gross Cc: "H. Peter Anvin" Cc: Boris Ostrovsky Cc: Greg Kroah-Hartman Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180827214011.55428-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/irqflags.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 5b1177f5a963..508a062e6cf1 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -32,7 +32,8 @@ extern inline unsigned long native_save_fl(void) return flags; } -static inline void native_restore_fl(unsigned long flags) +extern inline void native_restore_fl(unsigned long flags); +extern inline void native_restore_fl(unsigned long flags) { asm volatile("push %0 ; popf" : /* no output */ From d8fa9ed041d31cd3a22fcc6498f12a63b70dfbfb Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 24 Aug 2018 10:03:51 -0700 Subject: [PATCH 886/970] x86/spectre: Add missing family 6 check to microcode check commit 1ab534e85c93945f7862378d8c8adcf408205b19 upstream. The check for Spectre microcodes does not check for family 6, only the model numbers. Add a family 6 check to avoid ambiguity with other families. Fixes: a5b296636453 ("x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes") Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180824170351.34874-2-andi@firstfloor.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/intel.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 9ad86c4bf360..cee0fec0d232 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -109,6 +109,9 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_HYPERVISOR)) return false; + if (c->x86 != 6) + return false; + for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) { if (c->x86_model == spectre_bad_microcodes[i].model && c->x86_stepping == spectre_bad_microcodes[i].stepping) From ef3d45c9576415e657da3eca10216df1bb8d7ffa Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 24 Aug 2018 10:03:50 -0700 Subject: [PATCH 887/970] x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+ commit cc51e5428ea54f575d49cfcede1d4cb3a72b4ec4 upstream. On Nehalem and newer core CPUs the CPU cache internally uses 44 bits physical address space. The L1TF workaround is limited by this internal cache address width, and needs to have one bit free there for the mitigation to work. Older client systems report only 36bit physical address space so the range check decides that L1TF is not mitigated for a 36bit phys/32GB system with some memory holes. But since these actually have the larger internal cache width this warning is bogus because it would only really be needed if the system had more than 43bits of memory. Add a new internal x86_cache_bits field. Normally it is the same as the physical bits field reported by CPUID, but for Nehalem and newerforce it to be at least 44bits. Change the L1TF memory size warning to use the new cache_bits field to avoid bogus warnings and remove the bogus comment about memory size. Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: George Anchev Reported-by: Christopher Snowhill Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Michael Hocko Cc: vbabka@suse.cz Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 4 ++- arch/x86/kernel/cpu/bugs.c | 46 ++++++++++++++++++++++++++++---- arch/x86/kernel/cpu/common.c | 1 + 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 8d9ea8a4f229..ee8c6290c421 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -136,6 +136,8 @@ struct cpuinfo_x86 { /* Index into per_cpu list: */ u16 cpu_index; u32 microcode; + /* Address space bits used by the cache internally */ + u8 x86_cache_bits; }; #define X86_VENDOR_INTEL 0 @@ -175,7 +177,7 @@ extern void cpu_detect(struct cpuinfo_x86 *c); static inline unsigned long long l1tf_pfn_limit(void) { - return BIT_ULL(boot_cpu_data.x86_phys_bits - 1 - PAGE_SHIFT); + return BIT_ULL(boot_cpu_data.x86_cache_bits - 1 - PAGE_SHIFT); } extern void early_cpu_init(void); diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 5acf477e8025..8103adacbc83 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -651,6 +651,45 @@ EXPORT_SYMBOL_GPL(l1tf_mitigation); enum vmx_l1d_flush_state l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO; EXPORT_SYMBOL_GPL(l1tf_vmx_mitigation); +/* + * These CPUs all support 44bits physical address space internally in the + * cache but CPUID can report a smaller number of physical address bits. + * + * The L1TF mitigation uses the top most address bit for the inversion of + * non present PTEs. When the installed memory reaches into the top most + * address bit due to memory holes, which has been observed on machines + * which report 36bits physical address bits and have 32G RAM installed, + * then the mitigation range check in l1tf_select_mitigation() triggers. + * This is a false positive because the mitigation is still possible due to + * the fact that the cache uses 44bit internally. Use the cache bits + * instead of the reported physical bits and adjust them on the affected + * machines to 44bit if the reported bits are less than 44. + */ +static void override_cache_bits(struct cpuinfo_x86 *c) +{ + if (c->x86 != 6) + return; + + switch (c->x86_model) { + case INTEL_FAM6_NEHALEM: + case INTEL_FAM6_WESTMERE: + case INTEL_FAM6_SANDYBRIDGE: + case INTEL_FAM6_IVYBRIDGE: + case INTEL_FAM6_HASWELL_CORE: + case INTEL_FAM6_HASWELL_ULT: + case INTEL_FAM6_HASWELL_GT3E: + case INTEL_FAM6_BROADWELL_CORE: + case INTEL_FAM6_BROADWELL_GT3E: + case INTEL_FAM6_SKYLAKE_MOBILE: + case INTEL_FAM6_SKYLAKE_DESKTOP: + case INTEL_FAM6_KABYLAKE_MOBILE: + case INTEL_FAM6_KABYLAKE_DESKTOP: + if (c->x86_cache_bits < 44) + c->x86_cache_bits = 44; + break; + } +} + static void __init l1tf_select_mitigation(void) { u64 half_pa; @@ -658,6 +697,8 @@ static void __init l1tf_select_mitigation(void) if (!boot_cpu_has_bug(X86_BUG_L1TF)) return; + override_cache_bits(&boot_cpu_data); + switch (l1tf_mitigation) { case L1TF_MITIGATION_OFF: case L1TF_MITIGATION_FLUSH_NOWARN: @@ -677,11 +718,6 @@ static void __init l1tf_select_mitigation(void) return; #endif - /* - * This is extremely unlikely to happen because almost all - * systems have far more MAX_PA/2 than RAM can be fit into - * DIMM slots. - */ half_pa = (u64)l1tf_pfn_limit() << PAGE_SHIFT; if (e820_any_mapped(half_pa, ULLONG_MAX - half_pa, E820_RAM)) { pr_warn("System has more than MAX_PA/2 memory. L1TF mitigation not effective.\n"); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 13471b71bec7..dc0850bb74be 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -882,6 +882,7 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c) } } #endif + c->x86_cache_bits = c->x86_phys_bits; } static const __initconst struct x86_cpu_id cpu_no_speculation[] = { From ba064e81be70ecc013f65f65c6c083ec155afd16 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 28 Aug 2018 20:40:33 +0200 Subject: [PATCH 888/970] x86/entry/64: Wipe KASAN stack shadow before rewind_stack_do_exit() commit f12d11c5c184626b4befdee3d573ec8237405a33 upstream. Reset the KASAN shadow state of the task stack before rewinding RSP. Without this, a kernel oops will leave parts of the stack poisoned, and code running under do_exit() can trip over such poisoned regions and cause nonsensical false-positive KASAN reports about stack-out-of-bounds bugs. This does not wipe the exception stacks; if an oops happens on an exception stack, it might result in random KASAN false-positives from other tasks afterwards. This is probably relatively uninteresting, since if the kernel oopses on an exception stack, there are most likely bigger things to worry about. It'd be more interesting if vmapped stacks and KASAN were compatible, since then handle_stack_overflow() would oops from exception stack context. Fixes: 2deb4be28077 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()") Signed-off-by: Jann Horn Signed-off-by: Thomas Gleixner Acked-by: Andrey Ryabinin Cc: Andy Lutomirski Cc: Dmitry Vyukov Cc: Alexander Potapenko Cc: Kees Cook Cc: kasan-dev@googlegroups.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/dumpstack.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 85f854b98a9d..3576ece9ef88 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -229,7 +230,10 @@ void oops_end(unsigned long flags, struct pt_regs *regs, int signr) * We're not going to return, but we might be on an IST stack or * have very little stack space left. Rewind the stack and kill * the task. + * Before we rewind the stack, we have to tell KASAN that we're going to + * reuse the task stack and that existing poisons are invalid. */ + kasan_unpoison_task_stack(current); rewind_stack_do_exit(signr); } NOKPROBE_SYMBOL(oops_end); From 33a9081eaa2c608001ed6dfe6a2e58c2cdd731c6 Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Mon, 6 Aug 2018 14:26:39 +0200 Subject: [PATCH 889/970] s390: fix br_r1_trampoline for machines without exrl commit 26f843848bae973817b3587780ce6b7b0200d3e4 upstream. For machines without the exrl instruction the BFP jit generates code that uses an "br %r1" instruction located in the lowcore page. Unfortunately there is a cut & paste error that puts an additional "larl %r1,.+14" instruction in the code that clobbers the branch target address in %r1. Remove the larl instruction. Cc: # v4.17+ Fixes: de5cb6eb51 ("s390: use expoline thunks in the BPF JIT") Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/net/bpf_jit_comp.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 949a871e9506..8bd25aebf488 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -517,8 +517,6 @@ static void bpf_jit_epilogue(struct bpf_jit *jit) /* br %r1 */ _EMIT2(0x07f1); } else { - /* larl %r1,.+14 */ - EMIT6_PCREL_RILB(0xc0000000, REG_1, jit->prg + 14); /* ex 0,S390_lowcore.br_r1_tampoline */ EMIT4_DISP(0x44000000, REG_0, REG_0, offsetof(struct lowcore, br_r1_trampoline)); From b9f66a2ba6e69f1b13e968af71cc5bd9c5ccb95a Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Wed, 16 May 2018 09:37:25 +0200 Subject: [PATCH 890/970] s390/qdio: reset old sbal_state flags commit 64e03ff72623b8c2ea89ca3cb660094e019ed4ae upstream. When allocating a new AOB fails, handle_outbound() is still capable of transmitting the selected buffer (just without async completion). But if a previous transfer on this queue slot used async completion, its sbal_state flags field is still set to QDIO_OUTBUF_STATE_FLAG_PENDING. So when the upper layer driver sees this stale flag, it expects an async completion that never happens. Fix this by unconditionally clearing the flags field. Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Cc: #v3.2+ Signed-off-by: Julian Wiedmann Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/qdio.h | 1 - drivers/s390/cio/qdio_main.c | 5 ++--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/qdio.h b/arch/s390/include/asm/qdio.h index 998b61cd0e56..4b39ba700d32 100644 --- a/arch/s390/include/asm/qdio.h +++ b/arch/s390/include/asm/qdio.h @@ -261,7 +261,6 @@ struct qdio_outbuf_state { void *user; }; -#define QDIO_OUTBUF_STATE_FLAG_NONE 0x00 #define QDIO_OUTBUF_STATE_FLAG_PENDING 0x01 #define CHSC_AC1_INITIATE_INPUTQ 0x80 diff --git a/drivers/s390/cio/qdio_main.c b/drivers/s390/cio/qdio_main.c index 66e9bb053629..18ab84e9c6b2 100644 --- a/drivers/s390/cio/qdio_main.c +++ b/drivers/s390/cio/qdio_main.c @@ -640,21 +640,20 @@ static inline unsigned long qdio_aob_for_buffer(struct qdio_output_q *q, unsigned long phys_aob = 0; if (!q->use_cq) - goto out; + return 0; if (!q->aobs[bufnr]) { struct qaob *aob = qdio_allocate_aob(); q->aobs[bufnr] = aob; } if (q->aobs[bufnr]) { - q->sbal_state[bufnr].flags = QDIO_OUTBUF_STATE_FLAG_NONE; q->sbal_state[bufnr].aob = q->aobs[bufnr]; q->aobs[bufnr]->user1 = (u64) q->sbal_state[bufnr].user; phys_aob = virt_to_phys(q->aobs[bufnr]); WARN_ON_ONCE(phys_aob & 0xFF); } -out: + q->sbal_state[bufnr].flags = 0; return phys_aob; } From d519aab009a132f66eac44f4d6ae32b615e97859 Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Tue, 31 Jul 2018 16:14:18 +0200 Subject: [PATCH 891/970] s390/numa: move initial setup of node_to_cpumask_map commit fb7d7518b0d65955f91c7b875c36eae7694c69bd upstream. The numa_init_early initcall sets the node_to_cpumask_map[0] to the full cpu_possible_mask. Unfortunately this early_initcall is too late, the NUMA setup for numa=emu is done even earlier. The order of calls is numa_setup() -> emu_update_cpu_topology(), then the early_initcalls(), followed by sched_init_domains(). Starting with git commit 051f3ca02e46432c0965e8948f00c07d8a2f09c0 "sched/topology: Introduce NUMA identity node sched domain" the incorrect node_to_cpumask_map[0] really screws up the domain setup and the kernel panics with the follow oops: Cc: # v4.15+ Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/numa/numa.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/arch/s390/numa/numa.c b/arch/s390/numa/numa.c index f576f1073378..0dac2640c3a7 100644 --- a/arch/s390/numa/numa.c +++ b/arch/s390/numa/numa.c @@ -133,26 +133,14 @@ void __init numa_setup(void) { pr_info("NUMA mode: %s\n", mode->name); nodes_clear(node_possible_map); + /* Initially attach all possible CPUs to node 0. */ + cpumask_copy(&node_to_cpumask_map[0], cpu_possible_mask); if (mode->setup) mode->setup(); numa_setup_memory(); memblock_dump_all(); } -/* - * numa_init_early() - Initialization initcall - * - * This runs when only one CPU is online and before the first - * topology update is called for by the scheduler. - */ -static int __init numa_init_early(void) -{ - /* Attach all possible CPUs to node 0 for now. */ - cpumask_copy(&node_to_cpumask_map[0], cpu_possible_mask); - return 0; -} -early_initcall(numa_init_early); - /* * numa_init_late() - Initialization initcall * From 664a64bd6df6028a2bbd3755959aa0916649dac2 Mon Sep 17 00:00:00 2001 From: Sebastian Ott Date: Mon, 13 Aug 2018 11:26:46 +0200 Subject: [PATCH 892/970] s390/pci: fix out of bounds access during irq setup commit 866f3576a72b2233a76dffb80290f8086dc49e17 upstream. During interrupt setup we allocate interrupt vectors, walk the list of msi descriptors, and fill in the message data. Requesting more interrupts than supported on s390 can lead to an out of bounds access. When we restrict the number of interrupts we should also stop walking the msi list after all supported interrupts are handled. Cc: stable@vger.kernel.org Signed-off-by: Sebastian Ott Signed-off-by: Heiko Carstens Signed-off-by: Greg Kroah-Hartman --- arch/s390/pci/pci.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 03a1d5976ff5..87574110394d 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -407,6 +407,8 @@ int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) hwirq = 0; for_each_pci_msi_entry(msi, pdev) { rc = -EIO; + if (hwirq >= msi_vecs) + break; irq = irq_alloc_desc(0); /* Alloc irq on node 0 */ if (irq < 0) goto out_msi; From 5749cd62208707fb0ae0bf6c50477d9ee860413e Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sat, 28 Apr 2018 21:35:01 +0900 Subject: [PATCH 893/970] kprobes: Make list and blacklist root user read only commit f2a3ab36077222437b4826fc76111caa14562b7c upstream. Since the blacklist and list files on debugfs indicates a sensitive address information to reader, it should be restricted to the root user. Suggested-by: Thomas Richter Suggested-by: Ingo Molnar Signed-off-by: Masami Hiramatsu Cc: Ananth N Mavinakayanahalli Cc: Anil S Keshavamurthy Cc: Arnd Bergmann Cc: David Howells Cc: David S . Miller Cc: Heiko Carstens Cc: Jon Medhurst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tobin C . Harding Cc: Will Deacon Cc: acme@kernel.org Cc: akpm@linux-foundation.org Cc: brueckner@linux.vnet.ibm.com Cc: linux-arch@vger.kernel.org Cc: rostedt@goodmis.org Cc: schwidefsky@de.ibm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/lkml/152491890171.9916.5183693615601334087.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/kprobes.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 69485183af79..b9e966bcdd20 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2441,7 +2441,7 @@ static int __init debugfs_kprobe_init(void) if (!dir) return -ENOMEM; - file = debugfs_create_file("list", 0444, dir, NULL, + file = debugfs_create_file("list", 0400, dir, NULL, &debugfs_kprobes_operations); if (!file) goto error; @@ -2451,7 +2451,7 @@ static int __init debugfs_kprobe_init(void) if (!file) goto error; - file = debugfs_create_file("blacklist", 0444, dir, NULL, + file = debugfs_create_file("blacklist", 0400, dir, NULL, &debugfs_kprobe_blacklist_ops); if (!file) goto error; From eb3717fbcc3c371b0d4d173746d9c20facd33d65 Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Tue, 15 May 2018 23:33:26 +0100 Subject: [PATCH 894/970] MIPS: Correct the 64-bit DSP accumulator register size commit f5958b4cf4fc38ed4583ab83fb7c4cd1ab05f47b upstream. Use the `unsigned long' rather than `__u32' type for DSP accumulator registers, like with the regular MIPS multiply/divide accumulator and general-purpose registers, as all are 64-bit in 64-bit implementations and using a 32-bit data type leads to contents truncation on context saving. Update `arch_ptrace' and `compat_arch_ptrace' accordingly, removing casts that are similarly not used with multiply/divide accumulator or general-purpose register accesses. Signed-off-by: Maciej W. Rozycki Signed-off-by: Paul Burton Fixes: e50c0a8fa60d ("Support the MIPS32 / MIPS64 DSP ASE.") Patchwork: https://patchwork.linux-mips.org/patch/19329/ Cc: Alexander Viro Cc: James Hogan Cc: Ralf Baechle Cc: linux-fsdevel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 2.6.15+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/include/asm/processor.h | 2 +- arch/mips/kernel/ptrace.c | 2 +- arch/mips/kernel/ptrace32.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index 0d36c87acbe2..ad6f019ff776 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -141,7 +141,7 @@ struct mips_fpu_struct { #define NUM_DSP_REGS 6 -typedef __u32 dspreg_t; +typedef unsigned long dspreg_t; struct mips_dsp_state { dspreg_t dspr[NUM_DSP_REGS]; diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c index 4f64913b4b4c..b702ba3a0df3 100644 --- a/arch/mips/kernel/ptrace.c +++ b/arch/mips/kernel/ptrace.c @@ -876,7 +876,7 @@ long arch_ptrace(struct task_struct *child, long request, goto out; } dregs = __get_dsp_regs(child); - tmp = (unsigned long) (dregs[addr - DSP_BASE]); + tmp = dregs[addr - DSP_BASE]; break; } case DSP_CONTROL: diff --git a/arch/mips/kernel/ptrace32.c b/arch/mips/kernel/ptrace32.c index b1e945738138..4840af169683 100644 --- a/arch/mips/kernel/ptrace32.c +++ b/arch/mips/kernel/ptrace32.c @@ -140,7 +140,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request, goto out; } dregs = __get_dsp_regs(child); - tmp = (unsigned long) (dregs[addr - DSP_BASE]); + tmp = dregs[addr - DSP_BASE]; break; } case DSP_CONTROL: From a6b728be40aef5cd66e7cbbd19c4c05db1cd0d22 Mon Sep 17 00:00:00 2001 From: Paul Burton Date: Tue, 21 Aug 2018 12:12:59 -0700 Subject: [PATCH 895/970] MIPS: lib: Provide MIPS64r6 __multi3() for GCC < 7 commit 690d9163bf4b8563a2682e619f938e6a0443947f upstream. Some versions of GCC suboptimally generate calls to the __multi3() intrinsic for MIPS64r6 builds, resulting in link failures due to the missing function: LD vmlinux.o MODPOST vmlinux.o kernel/bpf/verifier.o: In function `kmalloc_array': include/linux/slab.h:631: undefined reference to `__multi3' fs/select.o: In function `kmalloc_array': include/linux/slab.h:631: undefined reference to `__multi3' ... We already have a workaround for this in which we provide the instrinsic, but we do so selectively for GCC 7 only. Unfortunately the issue occurs with older GCC versions too - it has been observed with both GCC 5.4.0 & GCC 6.4.0. MIPSr6 support was introduced in GCC 5, so all major GCC versions prior to GCC 8 are affected and we extend our workaround accordingly to all MIPS64r6 builds using GCC versions older than GCC 8. Signed-off-by: Paul Burton Reported-by: Vladimir Kondratiev Fixes: ebabcf17bcd7 ("MIPS: Implement __multi3 for GCC7 MIPS64r6 builds") Patchwork: https://patchwork.linux-mips.org/patch/20297/ Cc: James Hogan Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Greg Kroah-Hartman --- arch/mips/lib/multi3.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/mips/lib/multi3.c b/arch/mips/lib/multi3.c index 111ad475aa0c..4c2483f410c2 100644 --- a/arch/mips/lib/multi3.c +++ b/arch/mips/lib/multi3.c @@ -4,12 +4,12 @@ #include "libgcc.h" /* - * GCC 7 suboptimally generates __multi3 calls for mips64r6, so for that - * specific case only we'll implement it here. + * GCC 7 & older can suboptimally generate __multi3 calls for mips64r6, so for + * that specific case only we implement that intrinsic here. * * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82981 */ -#if defined(CONFIG_64BIT) && defined(CONFIG_CPU_MIPSR6) && (__GNUC__ == 7) +#if defined(CONFIG_64BIT) && defined(CONFIG_CPU_MIPSR6) && (__GNUC__ < 8) /* multiply 64-bit values, low 64-bits returned */ static inline long long notrace dmulu(long long a, long long b) From 8a5e02a0f46ea33ed19e48e096a8e8d28e73d10a Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 2 Aug 2018 10:51:40 -0700 Subject: [PATCH 896/970] scsi: sysfs: Introduce sysfs_{un,}break_active_protection() commit 2afc9166f79b8f6da5f347f48515215ceee4ae37 upstream. Introduce these two functions and export them such that the next patch can add calls to these functions from the SCSI core. Signed-off-by: Bart Van Assche Acked-by: Tejun Heo Acked-by: Greg Kroah-Hartman Cc: Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- fs/sysfs/file.c | 44 +++++++++++++++++++++++++++++++++++++++++++ include/linux/sysfs.h | 14 ++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 39c75a86c67f..666986b95c5d 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -407,6 +407,50 @@ int sysfs_chmod_file(struct kobject *kobj, const struct attribute *attr, } EXPORT_SYMBOL_GPL(sysfs_chmod_file); +/** + * sysfs_break_active_protection - break "active" protection + * @kobj: The kernel object @attr is associated with. + * @attr: The attribute to break the "active" protection for. + * + * With sysfs, just like kernfs, deletion of an attribute is postponed until + * all active .show() and .store() callbacks have finished unless this function + * is called. Hence this function is useful in methods that implement self + * deletion. + */ +struct kernfs_node *sysfs_break_active_protection(struct kobject *kobj, + const struct attribute *attr) +{ + struct kernfs_node *kn; + + kobject_get(kobj); + kn = kernfs_find_and_get(kobj->sd, attr->name); + if (kn) + kernfs_break_active_protection(kn); + return kn; +} +EXPORT_SYMBOL_GPL(sysfs_break_active_protection); + +/** + * sysfs_unbreak_active_protection - restore "active" protection + * @kn: Pointer returned by sysfs_break_active_protection(). + * + * Undo the effects of sysfs_break_active_protection(). Since this function + * calls kernfs_put() on the kernfs node that corresponds to the 'attr' + * argument passed to sysfs_break_active_protection() that attribute may have + * been removed between the sysfs_break_active_protection() and + * sysfs_unbreak_active_protection() calls, it is not safe to access @kn after + * this function has returned. + */ +void sysfs_unbreak_active_protection(struct kernfs_node *kn) +{ + struct kobject *kobj = kn->parent->priv; + + kernfs_unbreak_active_protection(kn); + kernfs_put(kn); + kobject_put(kobj); +} +EXPORT_SYMBOL_GPL(sysfs_unbreak_active_protection); + /** * sysfs_remove_file_ns - remove an object attribute with a custom ns tag * @kobj: object we're acting for diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index 00a1f330f93a..d3c19f8c4564 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -238,6 +238,9 @@ int __must_check sysfs_create_files(struct kobject *kobj, const struct attribute **attr); int __must_check sysfs_chmod_file(struct kobject *kobj, const struct attribute *attr, umode_t mode); +struct kernfs_node *sysfs_break_active_protection(struct kobject *kobj, + const struct attribute *attr); +void sysfs_unbreak_active_protection(struct kernfs_node *kn); void sysfs_remove_file_ns(struct kobject *kobj, const struct attribute *attr, const void *ns); bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr); @@ -351,6 +354,17 @@ static inline int sysfs_chmod_file(struct kobject *kobj, return 0; } +static inline struct kernfs_node * +sysfs_break_active_protection(struct kobject *kobj, + const struct attribute *attr) +{ + return NULL; +} + +static inline void sysfs_unbreak_active_protection(struct kernfs_node *kn) +{ +} + static inline void sysfs_remove_file_ns(struct kobject *kobj, const struct attribute *attr, const void *ns) From 6d70dea5113047bf1174c5efe116a946095736e3 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 2 Aug 2018 10:51:41 -0700 Subject: [PATCH 897/970] scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlock commit 0ee223b2e1f67cb2de9c0e3247c510d846e74d63 upstream. A long time ago the unfortunate decision was taken to add a self-deletion attribute to the sysfs SCSI device directory. That decision was unfortunate because self-deletion is really tricky. We can't drop that attribute because widely used user space software depends on it, namely the rescan-scsi-bus.sh script. Hence this patch that avoids that writing into that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete scsi_devices"). This patch avoids that self-removal triggers the following deadlock: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc2-dbg+ #5 Not tainted ------------------------------------------------------ modprobe/6539 is trying to acquire lock: 000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90 but task is already holding lock: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&shost->scan_mutex){+.+.}: __mutex_lock+0xfe/0xc70 mutex_lock_nested+0x1b/0x20 scsi_remove_device+0x26/0x40 [scsi_mod] sdev_store_delete+0x27/0x30 [scsi_mod] dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x190/0x230 __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&shost->scan_mutex); lock(kn->count#202); lock(&shost->scan_mutex); lock(kn->count#202); *** DEADLOCK *** 2 locks held by modprobe/6539: #0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360 #1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] stack backtrace: CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_circular_bug.isra.34+0x213/0x221 __lock_acquire+0x1a7e/0x1b50 lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html. Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()") Signed-off-by: Bart Van Assche Cc: Greg Kroah-Hartman Acked-by: Tejun Heo Cc: Johannes Thumshirn Cc: Signed-off-by: Greg Kroah-Hartman Signed-off-by: Martin K. Petersen --- drivers/scsi/scsi_sysfs.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 3a6f557ec128..56b65b85b121 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -709,8 +709,24 @@ static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - if (device_remove_file_self(dev, attr)) - scsi_remove_device(to_scsi_device(dev)); + struct kernfs_node *kn; + + kn = sysfs_break_active_protection(&dev->kobj, &attr->attr); + WARN_ON_ONCE(!kn); + /* + * Concurrent writes into the "delete" sysfs attribute may trigger + * concurrent calls to device_remove_file() and scsi_remove_device(). + * device_remove_file() handles concurrent removal calls by + * serializing these and by ignoring the second and later removal + * attempts. Concurrent calls of scsi_remove_device() are + * serialized. The second and later calls of scsi_remove_device() are + * ignored because the first call of that function changes the device + * state into SDEV_DEL. + */ + device_remove_file(dev, attr); + scsi_remove_device(to_scsi_device(dev)); + if (kn) + sysfs_unbreak_active_protection(kn); return count; }; static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete); From 00ee0e07c68d550e61e6e44dbeb1d589cfdb6a48 Mon Sep 17 00:00:00 2001 From: Mike Christie Date: Thu, 26 Jul 2018 12:13:49 -0500 Subject: [PATCH 898/970] iscsi target: fix session creation failure handling commit 26abc916a898d34c5ad159315a2f683def3c5555 upstream. The problem is that iscsi_login_zero_tsih_s1 sets conn->sess early in iscsi_login_set_conn_values. If the function fails later like when we alloc the idr it does kfree(sess) and leaves the conn->sess pointer set. iscsi_login_zero_tsih_s1 then returns -Exyz and we then call iscsi_target_login_sess_out and access the freed memory. This patch has iscsi_login_zero_tsih_s1 either completely setup the session or completely tear it down, so later in iscsi_target_login_sess_out we can just check for it being set to the connection. Cc: stable@vger.kernel.org Fixes: 0957627a9960 ("iscsi-target: Fix sess allocation leak in...") Signed-off-by: Mike Christie Acked-by: Martin K. Petersen Signed-off-by: Matthew Wilcox Signed-off-by: Greg Kroah-Hartman --- drivers/target/iscsi/iscsi_target_login.c | 35 ++++++++++++++--------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c index 9ccd5da8f204..d2f82aaf6a85 100644 --- a/drivers/target/iscsi/iscsi_target_login.c +++ b/drivers/target/iscsi/iscsi_target_login.c @@ -333,8 +333,7 @@ static int iscsi_login_zero_tsih_s1( pr_err("idr_alloc() for sess_idr failed\n"); iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR, ISCSI_LOGIN_STATUS_NO_RESOURCES); - kfree(sess); - return -ENOMEM; + goto free_sess; } sess->creation_time = get_jiffies_64(); @@ -350,20 +349,28 @@ static int iscsi_login_zero_tsih_s1( ISCSI_LOGIN_STATUS_NO_RESOURCES); pr_err("Unable to allocate memory for" " struct iscsi_sess_ops.\n"); - kfree(sess); - return -ENOMEM; + goto remove_idr; } sess->se_sess = transport_init_session(TARGET_PROT_NORMAL); if (IS_ERR(sess->se_sess)) { iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR, ISCSI_LOGIN_STATUS_NO_RESOURCES); - kfree(sess->sess_ops); - kfree(sess); - return -ENOMEM; + goto free_ops; } return 0; + +free_ops: + kfree(sess->sess_ops); +remove_idr: + spin_lock_bh(&sess_idr_lock); + idr_remove(&sess_idr, sess->session_index); + spin_unlock_bh(&sess_idr_lock); +free_sess: + kfree(sess); + conn->sess = NULL; + return -ENOMEM; } static int iscsi_login_zero_tsih_s2( @@ -1152,13 +1159,13 @@ void iscsi_target_login_sess_out(struct iscsi_conn *conn, ISCSI_LOGIN_STATUS_INIT_ERR); if (!zero_tsih || !conn->sess) goto old_sess_out; - if (conn->sess->se_sess) - transport_free_session(conn->sess->se_sess); - if (conn->sess->session_index != 0) { - spin_lock_bh(&sess_idr_lock); - idr_remove(&sess_idr, conn->sess->session_index); - spin_unlock_bh(&sess_idr_lock); - } + + transport_free_session(conn->sess->se_sess); + + spin_lock_bh(&sess_idr_lock); + idr_remove(&sess_idr, conn->sess->session_index); + spin_unlock_bh(&sess_idr_lock); + kfree(conn->sess->sess_ops); kfree(conn->sess); conn->sess = NULL; From 072555e660f68cd7d7fc4d7d803ffb968c64669f Mon Sep 17 00:00:00 2001 From: Alberto Panizzo Date: Fri, 6 Jul 2018 15:18:51 +0200 Subject: [PATCH 899/970] clk: rockchip: fix clk_i2sout parent selection bits on rk3399 commit a64ad008980c65d38e6cf6858429c78e6b740c41 upstream. Register, shift and mask were wrong according to datasheet. Fixes: 115510053e5e ("clk: rockchip: add clock controller for the RK3399") Cc: stable@vger.kernel.org Signed-off-by: Alberto Panizzo Signed-off-by: Anthony Brandon Signed-off-by: Heiko Stuebner Signed-off-by: Greg Kroah-Hartman --- drivers/clk/rockchip/clk-rk3399.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/rockchip/clk-rk3399.c b/drivers/clk/rockchip/clk-rk3399.c index 8387c7a40bda..05671c03efe2 100644 --- a/drivers/clk/rockchip/clk-rk3399.c +++ b/drivers/clk/rockchip/clk-rk3399.c @@ -629,7 +629,7 @@ static struct rockchip_clk_branch rk3399_clk_branches[] __initdata = { MUX(0, "clk_i2sout_src", mux_i2sch_p, CLK_SET_RATE_PARENT, RK3399_CLKSEL_CON(31), 0, 2, MFLAGS), COMPOSITE_NODIV(SCLK_I2S_8CH_OUT, "clk_i2sout", mux_i2sout_p, CLK_SET_RATE_PARENT, - RK3399_CLKSEL_CON(30), 8, 2, MFLAGS, + RK3399_CLKSEL_CON(31), 2, 1, MFLAGS, RK3399_CLKGATE_CON(8), 12, GFLAGS), /* uart */ From d8467a6b6ddd957147e27bb328ac753e97debcfd Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 23 Aug 2018 16:59:25 +0300 Subject: [PATCH 900/970] PM / clk: signedness bug in of_pm_clk_add_clks() commit 5e2e2f9f76e157063a656351728703cb02b068f1 upstream. "count" needs to be signed for the error handling to work. I made "i" signed as well so they match. Fixes: 02113ba93ea4 (PM / clk: Add support for obtaining clocks from device-tree) Cc: 4.6+ # 4.6+ Signed-off-by: Dan Carpenter Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/base/power/clock_ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/power/clock_ops.c b/drivers/base/power/clock_ops.c index 8e2e4757adcb..5a42ae4078c2 100644 --- a/drivers/base/power/clock_ops.c +++ b/drivers/base/power/clock_ops.c @@ -185,7 +185,7 @@ EXPORT_SYMBOL_GPL(of_pm_clk_add_clk); int of_pm_clk_add_clks(struct device *dev) { struct clk **clks; - unsigned int i, count; + int i, count; int ret; if (!dev || !dev->of_node) From a3702bbac9e25aef34473b0a8cd220b73024462e Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Tue, 26 Jun 2018 15:28:29 +0200 Subject: [PATCH 901/970] power: generic-adc-battery: fix out-of-bounds write when copying channel properties commit 932d47448c3caa0fa99e84d7f5bc302aa286efd8 upstream. We did have sporadic problems in the pinctrl framework during boot where a pin group name unexpectedly became NULL leading to a NULL dereference in strcmp. Detailled analysis of the failing cases did reveal that there were two devm allocated objects close to each other. The second one was the affected group_desc in pinmux and the first one was the psy_desc->properties buffer of the gab driver. Review of the gab code showed that the address calculation for one memcpy() is wrong. It does properties + sizeof(type) * index but C is defined to do the index multiplication already for pointer + integer additions. Hence the factor was applied twice and the memcpy() does write outside of the properties buffer. Sometimes it happened to be the pinctrl and triggered the strcmp(NULL). Anyways, it is overkill to use a memcpy() here instead of a simple assignment, which is easier to read and has less risk for wrong address calculations. So we change code to a simple assignment. If we initialize the index to the first free location, we can even remove the local variable 'properties'. This bug seems to exist right from the beginning in 3.7-rc1 in commit e60fea794e6e ("power: battery: Generic battery driver using IIO") Signed-off-by: H. Nikolaus Schaller Cc: stable@vger.kernel.org Fixes: e60fea794e6e ("power: battery: Generic battery driver using IIO") Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman --- drivers/power/supply/generic-adc-battery.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/drivers/power/supply/generic-adc-battery.c b/drivers/power/supply/generic-adc-battery.c index edb36bf781b0..6a544ea8a5ec 100644 --- a/drivers/power/supply/generic-adc-battery.c +++ b/drivers/power/supply/generic-adc-battery.c @@ -243,10 +243,9 @@ static int gab_probe(struct platform_device *pdev) struct power_supply_desc *psy_desc; struct power_supply_config psy_cfg = {}; struct gab_platform_data *pdata = pdev->dev.platform_data; - enum power_supply_property *properties; int ret = 0; int chan; - int index = 0; + int index = ARRAY_SIZE(gab_props); adc_bat = devm_kzalloc(&pdev->dev, sizeof(*adc_bat), GFP_KERNEL); if (!adc_bat) { @@ -280,8 +279,6 @@ static int gab_probe(struct platform_device *pdev) } memcpy(psy_desc->properties, gab_props, sizeof(gab_props)); - properties = (enum power_supply_property *) - ((char *)psy_desc->properties + sizeof(gab_props)); /* * getting channel from iio and copying the battery properties @@ -295,15 +292,12 @@ static int gab_probe(struct platform_device *pdev) adc_bat->channel[chan] = NULL; } else { /* copying properties for supported channels only */ - memcpy(properties + sizeof(*(psy_desc->properties)) * index, - &gab_dyn_props[chan], - sizeof(gab_dyn_props[chan])); - index++; + psy_desc->properties[index++] = gab_dyn_props[chan]; } } /* none of the channels are supported so let's bail out */ - if (index == 0) { + if (index == ARRAY_SIZE(gab_props)) { ret = -ENODEV; goto second_mem_fail; } @@ -314,7 +308,7 @@ static int gab_probe(struct platform_device *pdev) * as come channels may be not be supported by the device.So * we need to take care of that. */ - psy_desc->num_properties = ARRAY_SIZE(gab_props) + index; + psy_desc->num_properties = index; adc_bat->psy = power_supply_register(&pdev->dev, psy_desc, &psy_cfg); if (IS_ERR(adc_bat->psy)) { From c1ebdbe4ccf138f87c78f58931d74526a5d76bf8 Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Tue, 26 Jun 2018 15:28:30 +0200 Subject: [PATCH 902/970] power: generic-adc-battery: check for duplicate properties copied from iio channels commit a427503edaaed9b75ed9746a654cece7e93e60a8 upstream. If an iio channel defines a basic property, there are duplicate entries in /sys/class/power/*/uevent. So add a check to avoid duplicates. Since all channels may be duplicates, we have to modify the related error check. Signed-off-by: H. Nikolaus Schaller Cc: stable@vger.kernel.org Fixes: e60fea794e6e ("power: battery: Generic battery driver using IIO") Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman --- drivers/power/supply/generic-adc-battery.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/power/supply/generic-adc-battery.c b/drivers/power/supply/generic-adc-battery.c index 6a544ea8a5ec..f627b39f64bf 100644 --- a/drivers/power/supply/generic-adc-battery.c +++ b/drivers/power/supply/generic-adc-battery.c @@ -246,6 +246,7 @@ static int gab_probe(struct platform_device *pdev) int ret = 0; int chan; int index = ARRAY_SIZE(gab_props); + bool any = false; adc_bat = devm_kzalloc(&pdev->dev, sizeof(*adc_bat), GFP_KERNEL); if (!adc_bat) { @@ -292,12 +293,22 @@ static int gab_probe(struct platform_device *pdev) adc_bat->channel[chan] = NULL; } else { /* copying properties for supported channels only */ - psy_desc->properties[index++] = gab_dyn_props[chan]; + int index2; + + for (index2 = 0; index2 < index; index2++) { + if (psy_desc->properties[index2] == + gab_dyn_props[chan]) + break; /* already known */ + } + if (index2 == index) /* really new */ + psy_desc->properties[index++] = + gab_dyn_props[chan]; + any = true; } } /* none of the channels are supported so let's bail out */ - if (index == ARRAY_SIZE(gab_props)) { + if (!any) { ret = -ENODEV; goto second_mem_fail; } From b8c0e15469bab732065e64f7dffadab0b7103990 Mon Sep 17 00:00:00 2001 From: Scott Bauer Date: Thu, 26 Apr 2018 11:51:08 -0600 Subject: [PATCH 903/970] cdrom: Fix info leak/OOB read in cdrom_ioctl_drive_status commit 8f3fafc9c2f0ece10832c25f7ffcb07c97a32ad4 upstream. Like d88b6d04: "cdrom: information leak in cdrom_ioctl_media_changed()" There is another cast from unsigned long to int which causes a bounds check to fail with specially crafted input. The value is then used as an index in the slot array in cdrom_slot_status(). Signed-off-by: Scott Bauer Signed-off-by: Scott Bauer Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/cdrom/cdrom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 07b77fb102a1..987e8f503522 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -2536,7 +2536,7 @@ static int cdrom_ioctl_drive_status(struct cdrom_device_info *cdi, if (!CDROM_CAN(CDC_SELECT_DISC) || (arg == CDSL_CURRENT || arg == CDSL_NONE)) return cdi->ops->drive_status(cdi, CDSL_CURRENT); - if (((int)arg >= cdi->capacity)) + if (arg >= cdi->capacity) return -EINVAL; return cdrom_slot_status(cdi, arg); } From 3af20bddda6f24b5a1c0072f2f6e3bb3ed04bf78 Mon Sep 17 00:00:00 2001 From: Daniel Rosenberg Date: Thu, 30 Aug 2018 16:09:46 -0700 Subject: [PATCH 904/970] staging: android: ion: check for kref overflow This patch is against 4.9. It does not apply to master due to a large rework of ion in 4.12 which removed the affected functions altogther. 4c23cbff073f3b9b ("staging: android: ion: Remove import interface") Userspace can cause the kref to handles to increment arbitrarily high. Ensure it does not overflow. Signed-off-by: Daniel Rosenberg Signed-off-by: Greg Kroah-Hartman --- drivers/staging/android/ion/ion.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 403df8bf4b48..806e9b30b9dc 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -15,6 +15,7 @@ * */ +#include #include #include #include @@ -305,6 +306,16 @@ static void ion_handle_get(struct ion_handle *handle) kref_get(&handle->ref); } +/* Must hold the client lock */ +static struct ion_handle *ion_handle_get_check_overflow( + struct ion_handle *handle) +{ + if (atomic_read(&handle->ref.refcount) + 1 == 0) + return ERR_PTR(-EOVERFLOW); + ion_handle_get(handle); + return handle; +} + int ion_handle_put_nolock(struct ion_handle *handle) { return kref_put(&handle->ref, ion_handle_destroy); @@ -347,9 +358,9 @@ struct ion_handle *ion_handle_get_by_id_nolock(struct ion_client *client, handle = idr_find(&client->idr, id); if (handle) - ion_handle_get(handle); + return ion_handle_get_check_overflow(handle); - return handle ? handle : ERR_PTR(-EINVAL); + return ERR_PTR(-EINVAL); } static bool ion_handle_validate(struct ion_client *client, @@ -1110,7 +1121,7 @@ struct ion_handle *ion_import_dma_buf(struct ion_client *client, /* if a handle exists for this buffer just take a reference to it */ handle = ion_handle_lookup(client, buffer); if (!IS_ERR(handle)) { - ion_handle_get(handle); + handle = ion_handle_get_check_overflow(handle); mutex_unlock(&client->lock); goto end; } From 9eabacaf4ce59a07baacac5f31586de4ae7e9194 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 5 Sep 2018 09:20:11 +0200 Subject: [PATCH 905/970] Linux 4.9.125 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 53d57acfc17e..aef09ca7a924 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 124 +SUBLEVEL = 125 EXTRAVERSION = NAME = Roaring Lionus From 6aa4a723bd99ab424325426c7461ad95f8bdb7ef Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Sat, 14 Jul 2018 12:52:10 -0400 Subject: [PATCH 906/970] net: 6lowpan: fix reserved space for single frames commit ac74f87c789af40936a80131c4759f3e72579c3a upstream. This patch fixes patch add handling to take care tail and headroom for single 6lowpan frames. We need to be sure we have a skb with the right head and tailroom for single frames. This patch do it by using skb_copy_expand() if head and tailroom is not enough allocated by upper layer. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195059 Reported-by: David Palma Reported-by: Rabi Narayan Sahoo Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring Signed-off-by: Stefan Schmidt Signed-off-by: Greg Kroah-Hartman --- net/ieee802154/6lowpan/tx.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/net/ieee802154/6lowpan/tx.c b/net/ieee802154/6lowpan/tx.c index dbb476d7d38f..50ed47559bb7 100644 --- a/net/ieee802154/6lowpan/tx.c +++ b/net/ieee802154/6lowpan/tx.c @@ -266,9 +266,24 @@ netdev_tx_t lowpan_xmit(struct sk_buff *skb, struct net_device *ldev) /* We must take a copy of the skb before we modify/replace the ipv6 * header as the header could be used elsewhere */ - skb = skb_unshare(skb, GFP_ATOMIC); - if (!skb) - return NET_XMIT_DROP; + if (unlikely(skb_headroom(skb) < ldev->needed_headroom || + skb_tailroom(skb) < ldev->needed_tailroom)) { + struct sk_buff *nskb; + + nskb = skb_copy_expand(skb, ldev->needed_headroom, + ldev->needed_tailroom, GFP_ATOMIC); + if (likely(nskb)) { + consume_skb(skb); + skb = nskb; + } else { + kfree_skb(skb); + return NET_XMIT_DROP; + } + } else { + skb = skb_unshare(skb, GFP_ATOMIC); + if (!skb) + return NET_XMIT_DROP; + } ret = lowpan_header(skb, ldev, &dgram_size, &dgram_offset); if (ret < 0) { From 41b2e6eff1da435174fe9c01b23b50f60b718ea1 Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Mon, 2 Jul 2018 16:32:03 -0400 Subject: [PATCH 907/970] net: mac802154: tx: expand tailroom if necessary commit f9c52831133050c6b82aa8b6831c92da2bbf2a0b upstream. This patch is necessary if case of AF_PACKET or other socket interface which I am aware of it and didn't allocated the necessary room. Reported-by: David Palma Reported-by: Rabi Narayan Sahoo Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring Signed-off-by: Stefan Schmidt Signed-off-by: Greg Kroah-Hartman --- net/mac802154/tx.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c index 7e253455f9dd..bcd1a5e6ebf4 100644 --- a/net/mac802154/tx.c +++ b/net/mac802154/tx.c @@ -63,8 +63,21 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) int ret; if (!(local->hw.flags & IEEE802154_HW_TX_OMIT_CKSUM)) { - u16 crc = crc_ccitt(0, skb->data, skb->len); + struct sk_buff *nskb; + u16 crc; + if (unlikely(skb_tailroom(skb) < IEEE802154_FCS_LEN)) { + nskb = skb_copy_expand(skb, 0, IEEE802154_FCS_LEN, + GFP_ATOMIC); + if (likely(nskb)) { + consume_skb(skb); + skb = nskb; + } else { + goto err_tx; + } + } + + crc = crc_ccitt(0, skb->data, skb->len); put_unaligned_le16(crc, skb_put(skb, 2)); } From 5c45154990d557575cfa74b0b14b8a16f36af31f Mon Sep 17 00:00:00 2001 From: Chirantan Ekbote Date: Mon, 16 Jul 2018 17:35:29 -0700 Subject: [PATCH 908/970] 9p/net: Fix zero-copy path in the 9p virtio transport commit d28c756caee6e414d9ba367d0b92da24145af2a8 upstream. The zero-copy optimization when reading or writing large chunks of data is quite useful. However, the 9p messages created through the zero-copy write path have an incorrect message size: it should be the size of the header + size of the data being written but instead it's just the size of the header. This only works if the server ignores the size field of the message and otherwise breaks the framing of the protocol. Fix this by re-writing the message size field with the correct value. Tested by running `dd if=/dev/zero of=out bs=4k count=1` inside a virtio-9p mount. Link: http://lkml.kernel.org/r/20180717003529.114368-1-chirantan@chromium.org Signed-off-by: Chirantan Ekbote Reviewed-by: Greg Kurz Tested-by: Greg Kurz Cc: Dylan Reid Cc: Guenter Roeck Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- net/9p/trans_virtio.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 3aa5a93ad107..202c8ef1c0fa 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -406,6 +406,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, p9_debug(P9_DEBUG_TRANS, "virtio request\n"); if (uodata) { + __le32 sz; int n = p9_get_mapped_pages(chan, &out_pages, uodata, outlen, &offs, &need_drop); if (n < 0) @@ -416,6 +417,12 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req, memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4); outlen = n; } + /* The size field of the message must include the length of the + * header and the length of the data. We didn't actually know + * the length of the data until this point so add it in now. + */ + sz = cpu_to_le32(req->tc->size + outlen); + memcpy(&req->tc->sdata[0], &sz, sizeof(sz)); } else if (uidata) { int n = p9_get_mapped_pages(chan, &in_pages, uidata, inlen, &offs, &need_drop); From ae8f22ed6f92c3e1a7d0d4edb2d6dfbea51406b4 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Fri, 10 Aug 2018 11:13:52 +0200 Subject: [PATCH 909/970] spi: davinci: fix a NULL pointer dereference commit 563a53f3906a6b43692498e5b3ae891fac93a4af upstream. On non-OF systems spi->controlled_data may be NULL. This causes a NULL pointer derefence on dm365-evm. Signed-off-by: Bartosz Golaszewski Signed-off-by: Mark Brown Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi-davinci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-davinci.c b/drivers/spi/spi-davinci.c index 0d8f43a17edb..1905d20c229f 100644 --- a/drivers/spi/spi-davinci.c +++ b/drivers/spi/spi-davinci.c @@ -215,7 +215,7 @@ static void davinci_spi_chipselect(struct spi_device *spi, int value) pdata = &dspi->pdata; /* program delay transfers if tx_delay is non zero */ - if (spicfg->wdelay) + if (spicfg && spicfg->wdelay) spidat1 |= SPIDAT1_WDEL; /* From 9f16a87ffe0029afe7c22293965ada8b409ab502 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Fri, 29 Jun 2018 13:33:09 +0200 Subject: [PATCH 910/970] spi: spi-fsl-dspi: Fix imprecise abort on VF500 during probe commit d8ffee2f551a627ffb7b216e2da322cb9a037f77 upstream. Registers of DSPI should not be accessed before enabling its clock. On Toradex Colibri VF50 on Iris carrier board this could be seen during bootup as imprecise abort: Unhandled fault: imprecise external abort (0x1c06) at 0x00000000 Internal error: : 1c06 [#1] ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.39-dirty #97 Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) Backtrace: [<804166a8>] (regmap_write) from [<80466b5c>] (dspi_probe+0x1f0/0x8dc) [<8046696c>] (dspi_probe) from [<8040107c>] (platform_drv_probe+0x54/0xb8) [<80401028>] (platform_drv_probe) from [<803ff53c>] (driver_probe_device+0x280/0x2f8) [<803ff2bc>] (driver_probe_device) from [<803ff674>] (__driver_attach+0xc0/0xc4) [<803ff5b4>] (__driver_attach) from [<803fd818>] (bus_for_each_dev+0x70/0xa4) [<803fd7a8>] (bus_for_each_dev) from [<803fee74>] (driver_attach+0x24/0x28) [<803fee50>] (driver_attach) from [<803fe980>] (bus_add_driver+0x1a0/0x218) [<803fe7e0>] (bus_add_driver) from [<803fffe8>] (driver_register+0x80/0x100) [<803fff68>] (driver_register) from [<80400fdc>] (__platform_driver_register+0x48/0x50) [<80400f94>] (__platform_driver_register) from [<8091cf7c>] (fsl_dspi_driver_init+0x1c/0x20) [<8091cf60>] (fsl_dspi_driver_init) from [<8010195c>] (do_one_initcall+0x4c/0x174) [<80101910>] (do_one_initcall) from [<80900e8c>] (kernel_init_freeable+0x144/0x1d8) [<80900d48>] (kernel_init_freeable) from [<805ff6a8>] (kernel_init+0x10/0x114) [<805ff698>] (kernel_init) from [<80107be8>] (ret_from_fork+0x14/0x2c) Cc: Fixes: 5ee67b587a2b ("spi: dspi: clear SPI_SR before enable interrupt") Signed-off-by: Krzysztof Kozlowski Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi-fsl-dspi.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c index a67b0ff6a362..db3b6e9151a8 100644 --- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -715,21 +715,6 @@ static int dspi_probe(struct platform_device *pdev) return PTR_ERR(dspi->regmap); } - dspi_init(dspi); - dspi->irq = platform_get_irq(pdev, 0); - if (dspi->irq < 0) { - dev_err(&pdev->dev, "can't get platform irq\n"); - ret = dspi->irq; - goto out_master_put; - } - - ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt, 0, - pdev->name, dspi); - if (ret < 0) { - dev_err(&pdev->dev, "Unable to attach DSPI interrupt\n"); - goto out_master_put; - } - dspi->clk = devm_clk_get(&pdev->dev, "dspi"); if (IS_ERR(dspi->clk)) { ret = PTR_ERR(dspi->clk); @@ -740,6 +725,21 @@ static int dspi_probe(struct platform_device *pdev) if (ret) goto out_master_put; + dspi_init(dspi); + dspi->irq = platform_get_irq(pdev, 0); + if (dspi->irq < 0) { + dev_err(&pdev->dev, "can't get platform irq\n"); + ret = dspi->irq; + goto out_clk_put; + } + + ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt, 0, + pdev->name, dspi); + if (ret < 0) { + dev_err(&pdev->dev, "Unable to attach DSPI interrupt\n"); + goto out_clk_put; + } + master->max_speed_hz = clk_get_rate(dspi->clk) / dspi->devtype_data->max_clock_factor; From f916daa615e1c0d67fb3b7a65572fbc56c6aaea6 Mon Sep 17 00:00:00 2001 From: Matthew Auld Date: Wed, 2 May 2018 20:50:21 +0100 Subject: [PATCH 911/970] drm/i915/userptr: reject zero user_size commit c11c7bfd213495784b22ef82a69b6489f8d0092f upstream. Operating on a zero sized GEM userptr object will lead to explosions. Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl") Testcase: igt/gem_userptr_blits/input-checking Signed-off-by: Matthew Auld Cc: Chris Wilson Reviewed-by: Chris Wilson Signed-off-by: Chris Wilson Link: https://patchwork.freedesktop.org/patch/msgid/20180502195021.30900-1-matthew.auld@intel.com Cc: Loic Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/i915_gem_userptr.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index c6f780f5abc9..555fd47c1831 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -778,6 +778,9 @@ i915_gem_userptr_ioctl(struct drm_device *dev, void *data, struct drm_file *file I915_USERPTR_UNSYNCHRONIZED)) return -EINVAL; + if (!args->user_size) + return -EINVAL; + if (offset_in_page(args->user_ptr | args->user_size)) return -EINVAL; From 64a2af0e69a599475ff71b740409191e1954e9b8 Mon Sep 17 00:00:00 2001 From: Daniel Mack Date: Wed, 27 Jun 2018 20:58:45 +0200 Subject: [PATCH 912/970] libertas: fix suspend and resume for SDIO connected cards commit 7444a8092906ed44c09459780c56ba57043e39b1 upstream. Prior to commit 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM callbacks from the sdio bus"), the MMC core used to call into the power management functions of SDIO clients itself and removed the card if the return code was non-zero. IOW, the mmc handled errors gracefully and didn't upchain them to the pm core. Since this change, the mmc core relies on generic power management functions which treat all errors as a reason to cancel the suspend immediately. This causes suspend attempts to fail when the libertas driver is loaded. To fix this, power down the card explicitly in if_sdio_suspend() when we know we're about to lose power and return success. Also set a flag in these cases, and power up the card again in if_sdio_resume(). Fixes: 573185cc7e64 ("mmc: core: Invoke sdio func driver's PM callbacks from the sdio bus") Cc: Signed-off-by: Daniel Mack Reviewed-by: Chris Ball Reviewed-by: Ulf Hansson Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/marvell/libertas/dev.h | 1 + .../net/wireless/marvell/libertas/if_sdio.c | 30 +++++++++++++++---- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/marvell/libertas/dev.h b/drivers/net/wireless/marvell/libertas/dev.h index edf710bc5e77..3de1457ad3a7 100644 --- a/drivers/net/wireless/marvell/libertas/dev.h +++ b/drivers/net/wireless/marvell/libertas/dev.h @@ -103,6 +103,7 @@ struct lbs_private { u8 fw_ready; u8 surpriseremoved; u8 setup_fw_on_resume; + u8 power_up_on_resume; int (*hw_host_to_card) (struct lbs_private *priv, u8 type, u8 *payload, u16 nb); void (*reset_card) (struct lbs_private *priv); int (*power_save) (struct lbs_private *priv); diff --git a/drivers/net/wireless/marvell/libertas/if_sdio.c b/drivers/net/wireless/marvell/libertas/if_sdio.c index 47f4a14c84fe..a0ae8d8763bb 100644 --- a/drivers/net/wireless/marvell/libertas/if_sdio.c +++ b/drivers/net/wireless/marvell/libertas/if_sdio.c @@ -1341,15 +1341,23 @@ static void if_sdio_remove(struct sdio_func *func) static int if_sdio_suspend(struct device *dev) { struct sdio_func *func = dev_to_sdio_func(dev); - int ret; struct if_sdio_card *card = sdio_get_drvdata(func); + struct lbs_private *priv = card->priv; + int ret; mmc_pm_flag_t flags = sdio_get_host_pm_caps(func); + priv->power_up_on_resume = false; /* If we're powered off anyway, just let the mmc layer remove the * card. */ - if (!lbs_iface_active(card->priv)) - return -ENOSYS; + if (!lbs_iface_active(priv)) { + if (priv->fw_ready) { + priv->power_up_on_resume = true; + if_sdio_power_off(card); + } + + return 0; + } dev_info(dev, "%s: suspend: PM flags = 0x%x\n", sdio_func_id(func), flags); @@ -1357,9 +1365,14 @@ static int if_sdio_suspend(struct device *dev) /* If we aren't being asked to wake on anything, we should bail out * and let the SD stack power down the card. */ - if (card->priv->wol_criteria == EHS_REMOVE_WAKEUP) { + if (priv->wol_criteria == EHS_REMOVE_WAKEUP) { dev_info(dev, "Suspend without wake params -- powering down card\n"); - return -ENOSYS; + if (priv->fw_ready) { + priv->power_up_on_resume = true; + if_sdio_power_off(card); + } + + return 0; } if (!(flags & MMC_PM_KEEP_POWER)) { @@ -1372,7 +1385,7 @@ static int if_sdio_suspend(struct device *dev) if (ret) return ret; - ret = lbs_suspend(card->priv); + ret = lbs_suspend(priv); if (ret) return ret; @@ -1387,6 +1400,11 @@ static int if_sdio_resume(struct device *dev) dev_info(dev, "%s: resume: we're back\n", sdio_func_id(func)); + if (card->priv->power_up_on_resume) { + if_sdio_power_on(card); + wait_event(card->pwron_waitq, card->priv->fw_ready); + } + ret = lbs_resume(card->priv); return ret; From 0fdb739af29eccf31d8dae1b7f294c38a5960eb2 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 26 Jul 2018 12:11:39 -0500 Subject: [PATCH 913/970] mailbox: xgene-slimpro: Fix potential NULL pointer dereference commit 3512a18cbd8d09e22a790540cb9624c3c49827ba upstream. There is a potential execution path in which function platform_get_resource() returns NULL. If this happens, we will end up having a NULL pointer dereference. Fix this by replacing devm_ioremap with devm_ioremap_resource, which has the NULL check and the memory region request. This code was detected with the help of Coccinelle. Cc: stable@vger.kernel.org Fixes: f700e84f417b ("mailbox: Add support for APM X-Gene platform mailbox driver") Signed-off-by: Gustavo A. R. Silva Signed-off-by: Jassi Brar Signed-off-by: Greg Kroah-Hartman --- drivers/mailbox/mailbox-xgene-slimpro.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/mailbox/mailbox-xgene-slimpro.c b/drivers/mailbox/mailbox-xgene-slimpro.c index dd2afbca51c9..26d2f89db4d9 100644 --- a/drivers/mailbox/mailbox-xgene-slimpro.c +++ b/drivers/mailbox/mailbox-xgene-slimpro.c @@ -195,9 +195,9 @@ static int slimpro_mbox_probe(struct platform_device *pdev) platform_set_drvdata(pdev, ctx); regs = platform_get_resource(pdev, IORESOURCE_MEM, 0); - mb_base = devm_ioremap(&pdev->dev, regs->start, resource_size(regs)); - if (!mb_base) - return -ENOMEM; + mb_base = devm_ioremap_resource(&pdev->dev, regs); + if (IS_ERR(mb_base)) + return PTR_ERR(mb_base); /* Setup mailbox links */ for (i = 0; i < MBOX_CNT; i++) { From 1ae3174fb80704dfd8bbd7e6d73cf4fef033c2f4 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Tue, 7 Aug 2018 02:12:45 +0530 Subject: [PATCH 914/970] powerpc/fadump: handle crash memory ranges array index overflow commit 1bd6a1c4b80a28d975287630644e6b47d0f977a5 upstream. Crash memory ranges is an array of memory ranges of the crashing kernel to be exported as a dump via /proc/vmcore file. The size of the array is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since commit 142b45a72e22 ("memblock: Add array resizing support"). On large memory systems with a few DLPAR operations, the memblock memory regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such systems, registering fadump results in crash or other system failures like below: task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000 NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180 REGS: c00000000b73b570 TRAP: 0300 Tainted: G L X (4.4.140+) MSR: 8000000000009033 CR: 22004484 XER: 20000000 CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0 ... NIP [c000000000047df4] smp_send_reschedule+0x24/0x80 LR [c0000000000f9e58] resched_curr+0x138/0x160 Call Trace: resched_curr+0x138/0x160 (unreliable) check_preempt_curr+0xc8/0xf0 ttwu_do_wakeup+0x38/0x150 try_to_wake_up+0x224/0x4d0 __wake_up_common+0x94/0x100 ep_poll_callback+0xac/0x1c0 __wake_up_common+0x94/0x100 __wake_up_sync_key+0x70/0xa0 sock_def_readable+0x58/0xa0 unix_stream_sendmsg+0x2dc/0x4c0 sock_sendmsg+0x68/0xa0 ___sys_sendmsg+0x2cc/0x2e0 __sys_sendmsg+0x5c/0xc0 SyS_socketcall+0x36c/0x3f0 system_call+0x3c/0x100 as array index overflow is not checked for while setting up crash memory ranges causing memory corruption. To resolve this issue, dynamically allocate memory for crash memory ranges and resize it incrementally, in units of pagesize, on hitting array size limit. Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program headers.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Hari Bathini Reviewed-by: Mahesh Salgaonkar [mpe: Just use PAGE_SIZE directly, fixup variable placement] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/fadump.h | 3 - arch/powerpc/kernel/fadump.c | 92 ++++++++++++++++++++++++++----- 2 files changed, 78 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index 0031806475f0..f93238ad06bd 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -190,9 +190,6 @@ struct fadump_crash_info_header { struct cpumask online_mask; }; -/* Crash memory ranges */ -#define INIT_CRASHMEM_RANGES (INIT_MEMBLOCK_REGIONS + 2) - struct fad_crash_memory_ranges { unsigned long long base; unsigned long long size; diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 93a6eeba3ace..e3acf5c3480e 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -48,8 +49,10 @@ static struct fadump_mem_struct fdm; static const struct fadump_mem_struct *fdm_active; static DEFINE_MUTEX(fadump_mutex); -struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES]; +struct fad_crash_memory_ranges *crash_memory_ranges; +int crash_memory_ranges_size; int crash_mem_ranges; +int max_crash_mem_ranges; /* Scan the Firmware Assisted dump configuration details. */ int __init early_init_dt_scan_fw_dump(unsigned long node, @@ -731,38 +734,88 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active) return 0; } -static inline void fadump_add_crash_memory(unsigned long long base, - unsigned long long end) +static void free_crash_memory_ranges(void) +{ + kfree(crash_memory_ranges); + crash_memory_ranges = NULL; + crash_memory_ranges_size = 0; + max_crash_mem_ranges = 0; +} + +/* + * Allocate or reallocate crash memory ranges array in incremental units + * of PAGE_SIZE. + */ +static int allocate_crash_memory_ranges(void) +{ + struct fad_crash_memory_ranges *new_array; + u64 new_size; + + new_size = crash_memory_ranges_size + PAGE_SIZE; + pr_debug("Allocating %llu bytes of memory for crash memory ranges\n", + new_size); + + new_array = krealloc(crash_memory_ranges, new_size, GFP_KERNEL); + if (new_array == NULL) { + pr_err("Insufficient memory for setting up crash memory ranges\n"); + free_crash_memory_ranges(); + return -ENOMEM; + } + + crash_memory_ranges = new_array; + crash_memory_ranges_size = new_size; + max_crash_mem_ranges = (new_size / + sizeof(struct fad_crash_memory_ranges)); + return 0; +} + +static inline int fadump_add_crash_memory(unsigned long long base, + unsigned long long end) { if (base == end) - return; + return 0; + + if (crash_mem_ranges == max_crash_mem_ranges) { + int ret; + + ret = allocate_crash_memory_ranges(); + if (ret) + return ret; + } pr_debug("crash_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n", crash_mem_ranges, base, end - 1, (end - base)); crash_memory_ranges[crash_mem_ranges].base = base; crash_memory_ranges[crash_mem_ranges].size = end - base; crash_mem_ranges++; + return 0; } -static void fadump_exclude_reserved_area(unsigned long long start, +static int fadump_exclude_reserved_area(unsigned long long start, unsigned long long end) { unsigned long long ra_start, ra_end; + int ret = 0; ra_start = fw_dump.reserve_dump_area_start; ra_end = ra_start + fw_dump.reserve_dump_area_size; if ((ra_start < end) && (ra_end > start)) { if ((start < ra_start) && (end > ra_end)) { - fadump_add_crash_memory(start, ra_start); - fadump_add_crash_memory(ra_end, end); + ret = fadump_add_crash_memory(start, ra_start); + if (ret) + return ret; + + ret = fadump_add_crash_memory(ra_end, end); } else if (start < ra_start) { - fadump_add_crash_memory(start, ra_start); + ret = fadump_add_crash_memory(start, ra_start); } else if (ra_end < end) { - fadump_add_crash_memory(ra_end, end); + ret = fadump_add_crash_memory(ra_end, end); } } else - fadump_add_crash_memory(start, end); + ret = fadump_add_crash_memory(start, end); + + return ret; } static int fadump_init_elfcore_header(char *bufp) @@ -802,10 +855,11 @@ static int fadump_init_elfcore_header(char *bufp) * Traverse through memblock structure and setup crash memory ranges. These * ranges will be used create PT_LOAD program headers in elfcore header. */ -static void fadump_setup_crash_memory_ranges(void) +static int fadump_setup_crash_memory_ranges(void) { struct memblock_region *reg; unsigned long long start, end; + int ret; pr_debug("Setup crash memory ranges.\n"); crash_mem_ranges = 0; @@ -816,7 +870,9 @@ static void fadump_setup_crash_memory_ranges(void) * specified during fadump registration. We need to create a separate * program header for this chunk with the correct offset. */ - fadump_add_crash_memory(RMA_START, fw_dump.boot_memory_size); + ret = fadump_add_crash_memory(RMA_START, fw_dump.boot_memory_size); + if (ret) + return ret; for_each_memblock(memory, reg) { start = (unsigned long long)reg->base; @@ -825,8 +881,12 @@ static void fadump_setup_crash_memory_ranges(void) start = fw_dump.boot_memory_size; /* add this range excluding the reserved dump area. */ - fadump_exclude_reserved_area(start, end); + ret = fadump_exclude_reserved_area(start, end); + if (ret) + return ret; } + + return 0; } /* @@ -950,6 +1010,7 @@ static void register_fadump(void) { unsigned long addr; void *vaddr; + int ret; /* * If no memory is reserved then we can not register for firmware- @@ -958,7 +1019,9 @@ static void register_fadump(void) if (!fw_dump.reserve_dump_area_size) return; - fadump_setup_crash_memory_ranges(); + ret = fadump_setup_crash_memory_ranges(); + if (ret) + return ret; addr = be64_to_cpu(fdm.rmr_region.destination_address) + be64_to_cpu(fdm.rmr_region.source_len); /* Initialize fadump crash info header. */ @@ -1036,6 +1099,7 @@ void fadump_cleanup(void) } else if (fw_dump.dump_registered) { /* Un-register Firmware-assisted dump if it was registered. */ fadump_unregister_dump(&fdm); + free_crash_memory_ranges(); } } From 89bdde28076710535713d7100670e53613555bb2 Mon Sep 17 00:00:00 2001 From: Mahesh Salgaonkar Date: Tue, 7 Aug 2018 19:46:46 +0530 Subject: [PATCH 915/970] powerpc/pseries: Fix endianness while restoring of r3 in MCE handler. commit cd813e1cd7122f2c261dce5b54d1e0c97f80e1a5 upstream. During Machine Check interrupt on pseries platform, register r3 points RTAS extended event log passed by hypervisor. Since hypervisor uses r3 to pass pointer to rtas log, it stores the original r3 value at the start of the memory (first 8 bytes) pointed by r3. Since hypervisor stores this info and rtas log is in BE format, linux should make sure to restore r3 value in correct endian format. Without this patch when MCE handler, after recovery, returns to code that that caused the MCE may end up with Data SLB access interrupt for invalid address followed by kernel panic or hang. Severe Machine check interrupt [Recovered] NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel] Initiator: CPU Error type: SLB [Multihit] Effective address: d00000000ca70000 cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0] pc: c0000000009694c0: vsnprintf+0x80/0x480 lr: c0000000009698e0: vscnprintf+0x20/0x60 sp: c0000000fc777830 msr: 8000000002009033 dar: a803a30c000000d0 current = 0xc00000000bc9ef00 paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01 pid = 8860, comm = insmod vscnprintf+0x20/0x60 vprintk_emit+0xb4/0x4b0 vprintk_func+0x5c/0xd0 printk+0x38/0x4c init_module+0x1c0/0x338 [bork_kernel] do_one_initcall+0x54/0x230 do_init_module+0x8c/0x248 load_module+0x12b8/0x15b0 sys_finit_module+0xa8/0x110 system_call+0x58/0x6c --- Exception: c00 (System Call) at 00007fff8bda0644 SP (7fffdfbfe980) is in userspace This patch fixes this issue. Fixes: a08a53ea4c97 ("powerpc/le: Enable RTAS events support") Cc: stable@vger.kernel.org # v3.15+ Reviewed-by: Nicholas Piggin Signed-off-by: Mahesh Salgaonkar Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/pseries/ras.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 904a677208d1..34989ce43147 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -346,7 +346,7 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs) } savep = __va(regs->gpr[3]); - regs->gpr[3] = savep[0]; /* restore original r3 */ + regs->gpr[3] = be64_to_cpu(savep[0]); /* restore original r3 */ /* If it isn't an extended log we can use the per cpu 64bit buffer */ h = (struct rtas_error_log *)&savep[1]; From 0eb725c141ace79026c4f8430db302c028eed98d Mon Sep 17 00:00:00 2001 From: Frederick Lawler Date: Thu, 18 Jan 2018 12:55:24 -0600 Subject: [PATCH 916/970] PCI: Add wrappers for dev_printk() commit 7506dc7989933235e6fc23f3d0516bdbf0f7d1a8 upstream. Add PCI-specific dev_printk() wrappers and use them to simplify the code slightly. No functional change intended. Signed-off-by: Frederick Lawler [bhelgaas: squash into one patch] Signed-off-by: Bjorn Helgaas [only take the pci.h portion of this patch, to make backporting stuff easier over time - gregkh] Signed-off-by: Greg Kroah-Hartman --- include/linux/pci.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 78c9f4a91d94..534cb43e8635 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2151,4 +2151,16 @@ static inline bool pci_ari_enabled(struct pci_bus *bus) /* provide the legacy pci_dma_* API */ #include +#define pci_printk(level, pdev, fmt, arg...) \ + dev_printk(level, &(pdev)->dev, fmt, ##arg) + +#define pci_emerg(pdev, fmt, arg...) dev_emerg(&(pdev)->dev, fmt, ##arg) +#define pci_alert(pdev, fmt, arg...) dev_alert(&(pdev)->dev, fmt, ##arg) +#define pci_crit(pdev, fmt, arg...) dev_crit(&(pdev)->dev, fmt, ##arg) +#define pci_err(pdev, fmt, arg...) dev_err(&(pdev)->dev, fmt, ##arg) +#define pci_warn(pdev, fmt, arg...) dev_warn(&(pdev)->dev, fmt, ##arg) +#define pci_notice(pdev, fmt, arg...) dev_notice(&(pdev)->dev, fmt, ##arg) +#define pci_info(pdev, fmt, arg...) dev_info(&(pdev)->dev, fmt, ##arg) +#define pci_dbg(pdev, fmt, arg...) dev_dbg(&(pdev)->dev, fmt, ##arg) + #endif /* LINUX_PCI_H */ From f8700e03c86c9d81483339a2d4fd33777a6debd8 Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt Date: Fri, 17 Aug 2018 17:30:39 +1000 Subject: [PATCH 917/970] powerpc/powernv/pci: Work around races in PCI bridge enabling commit db2173198b9513f7add8009f225afa1f1c79bcc6 upstream. The generic code is racy when multiple children of a PCI bridge try to enable it simultaneously. This leads to drivers trying to access a device through a not-yet-enabled bridge, and this EEH errors under various circumstances when using parallel driver probing. There is work going on to fix that properly in the PCI core but it will take some time. x86 gets away with it because (outside of hotplug), the BIOS enables all the bridges at boot time. This patch does the same thing on powernv by enabling all bridges that have child devices at boot time, thus avoiding subsequent races. It's suitable for backporting to stable and distros, while the proper PCI fix will probably be significantly more invasive. Signed-off-by: Benjamin Herrenschmidt Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powernv/pci-ioda.c | 37 +++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 9ed90c502005..f52cc6fd4290 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -3124,12 +3124,49 @@ static void pnv_pci_ioda_create_dbgfs(void) #endif /* CONFIG_DEBUG_FS */ } +static void pnv_pci_enable_bridge(struct pci_bus *bus) +{ + struct pci_dev *dev = bus->self; + struct pci_bus *child; + + /* Empty bus ? bail */ + if (list_empty(&bus->devices)) + return; + + /* + * If there's a bridge associated with that bus enable it. This works + * around races in the generic code if the enabling is done during + * parallel probing. This can be removed once those races have been + * fixed. + */ + if (dev) { + int rc = pci_enable_device(dev); + if (rc) + pci_err(dev, "Error enabling bridge (%d)\n", rc); + pci_set_master(dev); + } + + /* Perform the same to child busses */ + list_for_each_entry(child, &bus->children, node) + pnv_pci_enable_bridge(child); +} + +static void pnv_pci_enable_bridges(void) +{ + struct pci_controller *hose; + + list_for_each_entry(hose, &hose_list, list_node) + pnv_pci_enable_bridge(hose->bus); +} + static void pnv_pci_ioda_fixup(void) { pnv_pci_ioda_setup_PEs(); pnv_pci_ioda_setup_iommu_api(); pnv_pci_ioda_create_dbgfs(); + pnv_pci_enable_bridges(); + #ifdef CONFIG_EEH eeh_init(); eeh_addr_cache_build(); From 6f329f2767460f97dd56a54a1511aea84287f5ae Mon Sep 17 00:00:00 2001 From: Vaibhav Jain Date: Wed, 4 Jul 2018 20:58:33 +0530 Subject: [PATCH 918/970] cxl: Fix wrong comparison in cxl_adapter_context_get() commit ef6cb5f1a048fdf91ccee6d63d2bfa293338502d upstream. Function atomic_inc_unless_negative() returns a bool to indicate success/failure. However cxl_adapter_context_get() wrongly compares the return value against '>=0' which will always be true. The patch fixes this comparison to '==0' there by also fixing this compile time warning: drivers/misc/cxl/main.c:290 cxl_adapter_context_get() warn: 'atomic_inc_unless_negative(&adapter->contexts_num)' is unsigned Fixes: 70b565bbdb91 ("cxl: Prevent adapter reset if an active context exists") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Dan Carpenter Signed-off-by: Vaibhav Jain Acked-by: Andrew Donnellan Acked-by: Frederic Barrat Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- drivers/misc/cxl/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c index cc1706a92ace..a078d49485d8 100644 --- a/drivers/misc/cxl/main.c +++ b/drivers/misc/cxl/main.c @@ -293,7 +293,7 @@ int cxl_adapter_context_get(struct cxl *adapter) int rc; rc = atomic_inc_unless_negative(&adapter->contexts_num); - return rc >= 0 ? 0 : -EBUSY; + return rc ? 0 : -EBUSY; } void cxl_adapter_context_put(struct cxl *adapter) From 2c3c284b44ac21370c5e8dbe0365ef935953ec3b Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Mon, 2 Jul 2018 14:08:18 -0700 Subject: [PATCH 919/970] ib_srpt: Fix a use-after-free in srpt_close_ch() commit 995250959d22fc341b5424e3343b0ce5df672461 upstream. Avoid that KASAN reports the following: BUG: KASAN: use-after-free in srpt_close_ch+0x4f/0x1b0 [ib_srpt] Read of size 4 at addr ffff880151180cb8 by task check/4681 CPU: 15 PID: 4681 Comm: check Not tainted 4.18.0-rc2-dbg+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 srpt_close_ch+0x4f/0x1b0 [ib_srpt] srpt_set_enabled+0xf7/0x1e0 [ib_srpt] srpt_tpg_enable_store+0xb8/0x120 [ib_srpt] configfs_write_file+0x14e/0x1d0 [configfs] __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably") Signed-off-by: Bart Van Assche Cc: Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 876f438aa048..fe7c6ec67d98 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1701,8 +1701,7 @@ static bool srpt_close_ch(struct srpt_rdma_ch *ch) int ret; if (!srpt_set_ch_state(ch, CH_DRAINING)) { - pr_debug("%s-%d: already closed\n", ch->sess_name, - ch->qp->qp_num); + pr_debug("%s: already closed\n", ch->sess_name); return false; } From e4f531212d7be8f9878768dfdaeaccda090171fe Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 26 Jun 2018 08:39:36 -0700 Subject: [PATCH 920/970] RDMA/rxe: Set wqe->status correctly if an unexpected response is received commit 61b717d041b1976530f68f8b539b2e3a7dd8e39c upstream. Every function that returns COMPST_ERROR must set wqe->status to another value than IB_WC_SUCCESS before returning COMPST_ERROR. Fix the only code path for which this is not yet the case. Signed-off-by: Bart Van Assche Cc: Reviewed-by: Yuval Shaia Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/sw/rxe/rxe_comp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index 6c5e29db88e3..df15b6d7b645 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -273,6 +273,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp, case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE: if (wqe->wr.opcode != IB_WR_RDMA_READ && wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV) { + wqe->status = IB_WC_FATAL_ERR; return COMPST_ERROR; } reset_retry_counters(qp); From 684f5d9a1d98eb44b1b253d4e779f4d78598b5cf Mon Sep 17 00:00:00 2001 From: piaojun Date: Wed, 25 Jul 2018 11:13:16 +0800 Subject: [PATCH 921/970] fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed commit 3111784bee81591ea2815011688d28b65df03627 upstream. In my testing, v9fs_fid_xattr_set will return successfully even if the backend ext4 filesystem has no space to store xattr key-value. That will cause inconsistent behavior between front end and back end. The reason is that lsetxattr will be triggered by p9_client_clunk, and unfortunately we did not catch the error. This patch will catch the error to notify upper caller. p9_client_clunk (in 9p) p9_client_rpc(clnt, P9_TCLUNK, "d", fid->fid); v9fs_clunk (in qemu) put_fid free_fid v9fs_xattr_fid_clunk v9fs_co_lsetxattr s->ops->lsetxattr ext4_xattr_user_set (in host ext4 filesystem) Link: http://lkml.kernel.org/r/5B57EACC.2060900@huawei.com Signed-off-by: Jun Piao Cc: Eric Van Hensbergen Cc: Ron Minnich Cc: Latchesar Ionkov Cc: Andrew Morton Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- fs/9p/xattr.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/9p/xattr.c b/fs/9p/xattr.c index f329eee6dc93..352abc39e891 100644 --- a/fs/9p/xattr.c +++ b/fs/9p/xattr.c @@ -105,7 +105,7 @@ int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name, { struct kvec kvec = {.iov_base = (void *)value, .iov_len = value_len}; struct iov_iter from; - int retval; + int retval, err; iov_iter_kvec(&from, WRITE | ITER_KVEC, &kvec, 1, value_len); @@ -126,7 +126,9 @@ int v9fs_fid_xattr_set(struct p9_fid *fid, const char *name, retval); else p9_client_write(fid, 0, &from, &retval); - p9_client_clunk(fid); + err = p9_client_clunk(fid); + if (!retval && err) + retval = err; return retval; } From b69ef7c90e84135347f824f7a670b622f7a87989 Mon Sep 17 00:00:00 2001 From: jiangyiwen Date: Fri, 3 Aug 2018 12:11:34 +0800 Subject: [PATCH 922/970] 9p/virtio: fix off-by-one error in sg list bounds check commit 23cba9cbde0bba05d772b335fe5f66aa82b9ad19 upstream. Because the value of limit is VIRTQUEUE_NUM, if index is equal to limit, it will cause sg array out of bounds, so correct the judgement of BUG_ON. Link: http://lkml.kernel.org/r/5B63D5F6.6080109@huawei.com Signed-off-by: Yiwen Jiang Reported-By: Dan Carpenter Acked-by: Jun Piao Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- net/9p/trans_virtio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 202c8ef1c0fa..73c700a55f02 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -189,7 +189,7 @@ static int pack_sg_list(struct scatterlist *sg, int start, s = rest_of_page(data); if (s > count) s = count; - BUG_ON(index > limit); + BUG_ON(index >= limit); /* Make sure we don't terminate early. */ sg_unmark_end(&sg[index]); sg_set_buf(&sg[index++], data, s); @@ -234,6 +234,7 @@ pack_sg_list_p(struct scatterlist *sg, int start, int limit, s = PAGE_SIZE - data_off; if (s > count) s = count; + BUG_ON(index >= limit); /* Make sure we don't terminate early. */ sg_unmark_end(&sg[index]); sg_set_page(&sg[index++], pdata[i++], s, data_off); From c53310d06c6ad74a0bd72883e07fe837ef840111 Mon Sep 17 00:00:00 2001 From: Tomas Bortoli Date: Tue, 10 Jul 2018 00:29:43 +0200 Subject: [PATCH 923/970] net/9p/client.c: version pointer uninitialized commit 7913690dcc5e18e235769fd87c34143072f5dbea upstream. The p9_client_version() does not initialize the version pointer. If the call to p9pdu_readf() returns an error and version has not been allocated in p9pdu_readf(), then the program will jump to the "error" label and will try to free the version pointer. If version is not initialized, free() will be called with uninitialized, garbage data and will provoke a crash. Link: http://lkml.kernel.org/r/20180709222943.19503-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli Reported-by: syzbot+65c6b72f284a39d416b4@syzkaller.appspotmail.com Reviewed-by: Jun Piao Reviewed-by: Yiwen Jiang Cc: Eric Van Hensbergen Cc: Ron Minnich Cc: Latchesar Ionkov Signed-off-by: Andrew Morton Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- net/9p/client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/9p/client.c b/net/9p/client.c index 1fd60190177e..98d299ea52ee 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -931,7 +931,7 @@ static int p9_client_version(struct p9_client *c) { int err = 0; struct p9_req_t *req; - char *version; + char *version = NULL; int msize; p9_debug(P9_DEBUG_9P, ">>> TVERSION msize %d protocol %d\n", From 35c740d17746e5d9c401040663a32d0b669f209d Mon Sep 17 00:00:00 2001 From: Tomas Bortoli Date: Fri, 20 Jul 2018 11:27:30 +0200 Subject: [PATCH 924/970] net/9p/trans_fd.c: fix race-condition by flushing workqueue before the kfree() commit 430ac66eb4c5b5c4eb846b78ebf65747510b30f1 upstream. The patch adds the flush in p9_mux_poll_stop() as it the function used by p9_conn_destroy(), in turn called by p9_fd_close() to stop the async polling associated with the data regarding the connection. Link: http://lkml.kernel.org/r/20180720092730.27104-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli Reported-by: syzbot+39749ed7d9ef6dfb23f6@syzkaller.appspotmail.com To: Eric Van Hensbergen To: Ron Minnich To: Latchesar Ionkov Cc: Yiwen Jiang Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- net/9p/trans_fd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 7bc2208b6cc4..f8dafa8fa70b 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -181,6 +181,8 @@ static void p9_mux_poll_stop(struct p9_conn *m) spin_lock_irqsave(&p9_poll_lock, flags); list_del_init(&m->poll_pending_link); spin_unlock_irqrestore(&p9_poll_lock, flags); + + flush_work(&p9_poll_work); } /** From 9a3f8fd5bd75b8bc96e435cc973c2cd9e25c6c17 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Thu, 2 Aug 2018 16:18:24 +0800 Subject: [PATCH 925/970] dm thin: stop no_space_timeout worker when switching to write-mode commit 75294442d896f2767be34f75aca7cc2b0d01301f upstream. Now both check_for_space() and do_no_space_timeout() will read & write pool->pf.error_if_no_space. If these functions run concurrently, as shown in the following case, the default setting of "queue_if_no_space" can get lost. precondition: * error_if_no_space = false (aka "queue_if_no_space") * pool is in Out-of-Data-Space (OODS) mode * no_space_timeout worker has been queued CPU 0: CPU 1: // delete a thin device process_delete_mesg() // check_for_space() invoked by commit() set_pool_mode(pool, PM_WRITE) pool->pf.error_if_no_space = \ pt->requested_pf.error_if_no_space // timeout, pool is still in OODS mode do_no_space_timeout // "queue_if_no_space" config is lost pool->pf.error_if_no_space = true pool->pf.mode = new_mode Fix it by stopping no_space_timeout worker when switching to write mode. Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-thin.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 0f0374a4ac6e..a952ad890f32 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -2518,6 +2518,8 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) case PM_WRITE: if (old_mode != new_mode) notify_of_pool_mode_change(pool, "write"); + if (old_mode == PM_OUT_OF_DATA_SPACE) + cancel_delayed_work_sync(&pool->no_space_timeout); pool->out_of_data_space = false; pool->pf.error_if_no_space = pt->requested_pf.error_if_no_space; dm_pool_metadata_read_write(pool->pmd); From d08b58b5c3ca41fa65a03a7c11398d48bf803079 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 2 Aug 2018 16:08:52 -0400 Subject: [PATCH 926/970] dm cache metadata: save in-core policy_hint_size to on-disk superblock commit fd2fa95416188a767a63979296fa3e169a9ef5ec upstream. policy_hint_size starts as 0 during __write_initial_superblock(). It isn't until the policy is loaded that policy_hint_size is set in-core (cmd->policy_hint_size). But it never got recorded in the on-disk superblock because __commit_transaction() didn't deal with transfering the in-core cmd->policy_hint_size to the on-disk superblock. The in-core cmd->policy_hint_size gets initialized by metadata_open()'s __begin_transaction_flags() which re-reads all superblock fields. Because the superblock's policy_hint_size was never properly stored, when the cache was created, hints_array_available() would always return false when re-activating a previously created cache. This means __load_mappings() always considered the hints invalid and never made use of the hints (these hints served to optimize). Another detremental side-effect of this oversight is the cache_check utility would fail with: "invalid hint width: 0" Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-cache-metadata.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 6937ca42be8c..a184c9830ca5 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -344,7 +344,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd) disk_super->version = cpu_to_le32(MAX_CACHE_VERSION); memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name)); memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version)); - disk_super->policy_hint_size = 0; + disk_super->policy_hint_size = cpu_to_le32(0); __copy_sm_root(cmd, disk_super); @@ -659,6 +659,7 @@ static int __commit_transaction(struct dm_cache_metadata *cmd, disk_super->policy_version[0] = cpu_to_le32(cmd->policy_version[0]); disk_super->policy_version[1] = cpu_to_le32(cmd->policy_version[1]); disk_super->policy_version[2] = cpu_to_le32(cmd->policy_version[2]); + disk_super->policy_hint_size = cpu_to_le32(cmd->policy_hint_size); disk_super->read_hits = cpu_to_le32(cmd->stats.read_hits); disk_super->read_misses = cpu_to_le32(cmd->stats.read_misses); From e5147bbfc3201df384b8ddd774dd45e322598dc7 Mon Sep 17 00:00:00 2001 From: Tycho Andersen Date: Fri, 6 Jul 2018 10:24:57 -0600 Subject: [PATCH 927/970] uart: fix race between uart_put_char() and uart_shutdown() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit a5ba1d95e46ecaea638ddd7cd144107c783acb5d upstream. We have reports of the following crash: PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0" #0 [ffff88085c6db710] machine_kexec at ffffffff81046239 #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248 #2 [ffff88085c6db830] oops_end at ffffffff81008ae7 #3 [ffff88085c6db860] no_context at ffffffff81050b8f #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75 #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83 #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122 [exception RIP: uart_put_char+149] RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006 RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081 RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120 RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320 R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000 R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544 #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2 #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016 #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194 #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2 #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d #20 [ffff88085c6dbeb0] kthread at ffffffff81096384 #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f​ after slogging through some dissasembly: ffffffff814b6720 : ffffffff814b6720: 55 push %rbp ffffffff814b6721: 48 89 e5 mov %rsp,%rbp ffffffff814b6724: 48 83 ec 20 sub $0x20,%rsp ffffffff814b6728: 48 89 1c 24 mov %rbx,(%rsp) ffffffff814b672c: 4c 89 64 24 08 mov %r12,0x8(%rsp) ffffffff814b6731: 4c 89 6c 24 10 mov %r13,0x10(%rsp) ffffffff814b6736: 4c 89 74 24 18 mov %r14,0x18(%rsp) ffffffff814b673b: e8 b0 8e 58 00 callq ffffffff81a3f5f0 ffffffff814b6740: 4c 8b a7 88 02 00 00 mov 0x288(%rdi),%r12 ffffffff814b6747: 45 31 ed xor %r13d,%r13d ffffffff814b674a: 41 89 f6 mov %esi,%r14d ffffffff814b674d: 49 83 bc 24 70 01 00 cmpq $0x0,0x170(%r12) ffffffff814b6754: 00 00 ffffffff814b6756: 49 8b 9c 24 80 01 00 mov 0x180(%r12),%rbx ffffffff814b675d: 00 ffffffff814b675e: 74 2f je ffffffff814b678f ffffffff814b6760: 48 89 df mov %rbx,%rdi ffffffff814b6763: e8 a8 67 58 00 callq ffffffff81a3cf10 <_raw_spin_lock_irqsave> ffffffff814b6768: 41 8b 8c 24 78 01 00 mov 0x178(%r12),%ecx ffffffff814b676f: 00 ffffffff814b6770: 89 ca mov %ecx,%edx ffffffff814b6772: f7 d2 not %edx ffffffff814b6774: 41 03 94 24 7c 01 00 add 0x17c(%r12),%edx ffffffff814b677b: 00 ffffffff814b677c: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b6782: 75 23 jne ffffffff814b67a7 ffffffff814b6784: 48 89 c6 mov %rax,%rsi ffffffff814b6787: 48 89 df mov %rbx,%rdi ffffffff814b678a: e8 e1 64 58 00 callq ffffffff81a3cc70 <_raw_spin_unlock_irqrestore> ffffffff814b678f: 44 89 e8 mov %r13d,%eax ffffffff814b6792: 48 8b 1c 24 mov (%rsp),%rbx ffffffff814b6796: 4c 8b 64 24 08 mov 0x8(%rsp),%r12 ffffffff814b679b: 4c 8b 6c 24 10 mov 0x10(%rsp),%r13 ffffffff814b67a0: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff814b67a5: c9 leaveq ffffffff814b67a6: c3 retq ffffffff814b67a7: 49 8b 94 24 70 01 00 mov 0x170(%r12),%rdx ffffffff814b67ae: 00 ffffffff814b67af: 48 63 c9 movslq %ecx,%rcx ffffffff814b67b2: 41 b5 01 mov $0x1,%r13b ffffffff814b67b5: 44 88 34 0a mov %r14b,(%rdx,%rcx,1) ffffffff814b67b9: 41 8b 94 24 78 01 00 mov 0x178(%r12),%edx ffffffff814b67c0: 00 ffffffff814b67c1: 83 c2 01 add $0x1,%edx ffffffff814b67c4: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b67ca: 41 89 94 24 78 01 00 mov %edx,0x178(%r12) ffffffff814b67d1: 00 ffffffff814b67d2: eb b0 jmp ffffffff814b6784 ffffffff814b67d4: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff814b67db: 00 00 00 00 00 for our build, this is crashing at: circ->buf[circ->head] = c; Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf) protected by the "per-port mutex", which based on uart_port_check() is state->port.mutex. Indeed, the lock acquired in uart_put_char() is uport->lock, i.e. not the same lock. Anyway, since the lock is not acquired, if uart_shutdown() is called, the last chunk of that function may release state->xmit.buf before its assigned to null, and cause the race above. To fix it, let's lock uport->lock when allocating/deallocating state->xmit.buf in addition to the per-port mutex. v2: switch to locking uport->lock on allocation/deallocation instead of locking the per-port mutex in uart_put_char. Note that since uport->lock is a spin lock, we have to switch the allocation to GFP_ATOMIC. v3: move the allocation outside the lock, so we can switch back to GFP_KERNEL Signed-off-by: Tycho Andersen Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/serial_core.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index 6cbd46c565f8..53e6db8b0330 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -172,6 +172,7 @@ static int uart_port_startup(struct tty_struct *tty, struct uart_state *state, { struct uart_port *uport = uart_port_check(state); unsigned long page; + unsigned long flags = 0; int retval = 0; if (uport->type == PORT_UNKNOWN) @@ -186,15 +187,18 @@ static int uart_port_startup(struct tty_struct *tty, struct uart_state *state, * Initialise and allocate the transmit and temporary * buffer. */ - if (!state->xmit.buf) { - /* This is protected by the per port mutex */ - page = get_zeroed_page(GFP_KERNEL); - if (!page) - return -ENOMEM; + page = get_zeroed_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + uart_port_lock(state, flags); + if (!state->xmit.buf) { state->xmit.buf = (unsigned char *) page; uart_circ_clear(&state->xmit); + } else { + free_page(page); } + uart_port_unlock(uport, flags); retval = uport->ops->startup(uport); if (retval == 0) { @@ -253,6 +257,7 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state) { struct uart_port *uport = uart_port_check(state); struct tty_port *port = &state->port; + unsigned long flags = 0; /* * Set the TTY IO error marker @@ -285,10 +290,12 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state) /* * Free the transmit buffer page. */ + uart_port_lock(state, flags); if (state->xmit.buf) { free_page((unsigned long)state->xmit.buf); state->xmit.buf = NULL; } + uart_port_unlock(uport, flags); } /** From 4e834c730df6dc887d1a7c5ed607aaa7aa991fb5 Mon Sep 17 00:00:00 2001 From: Lars-Peter Clausen Date: Mon, 25 Jun 2018 11:03:07 +0300 Subject: [PATCH 928/970] iio: ad9523: Fix displayed phase commit 5a4e33c1c53ae7d4425f7d94e60e4458a37b349e upstream. Fix the displayed phase for the ad9523 driver. Currently the most significant decimal place is dropped and all other digits are shifted one to the left. This is due to a multiplication by 10, which is not necessary, so remove it. Signed-off-by: Lars-Peter Clausen Signed-off-by: Alexandru Ardelean Fixes: cd1678f9632 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator") Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/frequency/ad9523.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/frequency/ad9523.c b/drivers/iio/frequency/ad9523.c index 99eba524f6dd..cb45fa314a92 100644 --- a/drivers/iio/frequency/ad9523.c +++ b/drivers/iio/frequency/ad9523.c @@ -642,7 +642,7 @@ static int ad9523_read_raw(struct iio_dev *indio_dev, code = (AD9523_CLK_DIST_DIV_PHASE_REV(ret) * 3141592) / AD9523_CLK_DIST_DIV_REV(ret); *val = code / 1000000; - *val2 = (code % 1000000) * 10; + *val2 = code % 1000000; return IIO_VAL_INT_PLUS_MICRO; default: return -EINVAL; From d4e5a9eda9174c3c68d6ef05264d5746bbc29569 Mon Sep 17 00:00:00 2001 From: Lars-Peter Clausen Date: Fri, 27 Jul 2018 09:42:45 +0300 Subject: [PATCH 929/970] iio: ad9523: Fix return value for ad952x_store() commit 9a5094ca29ea9b1da301b31fd377c0c0c4c23034 upstream. A sysfs write callback function needs to either return the number of consumed characters or an error. The ad952x_store() function currently returns 0 if the input value was "0", this will signal that no characters have been consumed and the function will be called repeatedly in a loop indefinitely. Fix this by returning number of supplied characters to indicate that the whole input string has been consumed. Signed-off-by: Lars-Peter Clausen Signed-off-by: Alexandru Ardelean Fixes: cd1678f96329 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator") Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/frequency/ad9523.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/frequency/ad9523.c b/drivers/iio/frequency/ad9523.c index cb45fa314a92..1642b55f70da 100644 --- a/drivers/iio/frequency/ad9523.c +++ b/drivers/iio/frequency/ad9523.c @@ -508,7 +508,7 @@ static ssize_t ad9523_store(struct device *dev, return ret; if (!state) - return 0; + return len; mutex_lock(&indio_dev->mlock); switch ((u32)this_attr->address) { From 0e0dd1a2298e91acab08bf3667fc6081c6371b04 Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Tue, 19 Jun 2018 16:00:24 -0700 Subject: [PATCH 930/970] vmw_balloon: fix inflation of 64-bit GFNs commit 09755690c6b7c1eabdc4651eb3b276f8feb1e447 upstream. When balloon batching is not supported by the hypervisor, the guest frame number (GFN) must fit in 32-bit. However, due to a bug, this check was mistakenly ignored. In practice, when total RAM is greater than 16TB, the balloon does not work currently, making this bug unlikely to happen. Fixes: ef0f8f112984 ("VMware balloon: partially inline vmballoon_reserve_page.") Cc: stable@vger.kernel.org Reviewed-by: Xavier Deguillard Signed-off-by: Nadav Amit Signed-off-by: Greg Kroah-Hartman --- drivers/misc/vmw_balloon.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c index 5e047bfc0cc4..b0b6f99c8f0f 100644 --- a/drivers/misc/vmw_balloon.c +++ b/drivers/misc/vmw_balloon.c @@ -450,7 +450,7 @@ static int vmballoon_send_lock_page(struct vmballoon *b, unsigned long pfn, pfn32 = (u32)pfn; if (pfn32 != pfn) - return -1; + return -EINVAL; STATS_INC(b->stats.lock[false]); @@ -460,7 +460,7 @@ static int vmballoon_send_lock_page(struct vmballoon *b, unsigned long pfn, pr_debug("%s - ppn %lx, hv returns %ld\n", __func__, pfn, status); STATS_INC(b->stats.lock_fail[false]); - return 1; + return -EIO; } static int vmballoon_send_batched_lock(struct vmballoon *b, @@ -597,11 +597,12 @@ static int vmballoon_lock_page(struct vmballoon *b, unsigned int num_pages, locked = vmballoon_send_lock_page(b, page_to_pfn(page), &hv_status, target); - if (locked > 0) { + if (locked) { STATS_INC(b->stats.refused_alloc[false]); - if (hv_status == VMW_BALLOON_ERROR_RESET || - hv_status == VMW_BALLOON_ERROR_PPN_NOTNEEDED) { + if (locked == -EIO && + (hv_status == VMW_BALLOON_ERROR_RESET || + hv_status == VMW_BALLOON_ERROR_PPN_NOTNEEDED)) { vmballoon_free_page(page, false); return -EIO; } @@ -617,7 +618,7 @@ static int vmballoon_lock_page(struct vmballoon *b, unsigned int num_pages, } else { vmballoon_free_page(page, false); } - return -EIO; + return locked; } /* track allocated page */ From fa51177bbc27a63d64af2a3547018ff1033a160f Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Tue, 19 Jun 2018 16:00:25 -0700 Subject: [PATCH 931/970] vmw_balloon: do not use 2MB without batching commit 5081efd112560d3febb328e627176235b250d59d upstream. If the hypervisor sets 2MB batching is on, while batching is cleared, the balloon code breaks. In this case the legacy mechanism is used with 2MB page. The VM would report a 2MB page is ballooned, and the hypervisor would only take the first 4KB. While the hypervisor should not report such settings, make the code more robust by not enabling 2MB support without batching. Fixes: 365bd7ef7ec8e ("VMware balloon: Support 2m page ballooning.") Cc: stable@vger.kernel.org Reviewed-by: Xavier Deguillard Signed-off-by: Nadav Amit Signed-off-by: Greg Kroah-Hartman --- drivers/misc/vmw_balloon.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c index b0b6f99c8f0f..b6ccd551c00e 100644 --- a/drivers/misc/vmw_balloon.c +++ b/drivers/misc/vmw_balloon.c @@ -341,7 +341,13 @@ static bool vmballoon_send_start(struct vmballoon *b, unsigned long req_caps) success = false; } - if (b->capabilities & VMW_BALLOON_BATCHED_2M_CMDS) + /* + * 2MB pages are only supported with batching. If batching is for some + * reason disabled, do not use 2MB pages, since otherwise the legacy + * mechanism is used with 2MB pages, causing a failure. + */ + if ((b->capabilities & VMW_BALLOON_BATCHED_2M_CMDS) && + (b->capabilities & VMW_BALLOON_BATCHED_CMDS)) b->supported_page_sizes = 2; else b->supported_page_sizes = 1; From 1893974d5dd4578e21ca9132b17f67a83b1d7a3f Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Tue, 19 Jun 2018 16:00:26 -0700 Subject: [PATCH 932/970] vmw_balloon: VMCI_DOORBELL_SET does not check status commit ce664331b2487a5d244a51cbdd8cb54f866fbe5d upstream. When vmballoon_vmci_init() sets a doorbell using VMCI_DOORBELL_SET, for some reason it does not consider the status and looks at the result. However, the hypervisor does not update the result - it updates the status. This might cause VMCI doorbell not to be enabled, resulting in degraded performance. Fixes: 48e3d668b790 ("VMware balloon: Enable notification via VMCI") Cc: stable@vger.kernel.org Reviewed-by: Xavier Deguillard Signed-off-by: Nadav Amit Signed-off-by: Greg Kroah-Hartman --- drivers/misc/vmw_balloon.c | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c index b6ccd551c00e..8e739b29079e 100644 --- a/drivers/misc/vmw_balloon.c +++ b/drivers/misc/vmw_balloon.c @@ -1036,29 +1036,30 @@ static void vmballoon_vmci_cleanup(struct vmballoon *b) */ static int vmballoon_vmci_init(struct vmballoon *b) { - int error = 0; + unsigned long error, dummy; - if ((b->capabilities & VMW_BALLOON_SIGNALLED_WAKEUP_CMD) != 0) { - error = vmci_doorbell_create(&b->vmci_doorbell, - VMCI_FLAG_DELAYED_CB, - VMCI_PRIVILEGE_FLAG_RESTRICTED, - vmballoon_doorbell, b); + if ((b->capabilities & VMW_BALLOON_SIGNALLED_WAKEUP_CMD) == 0) + return 0; - if (error == VMCI_SUCCESS) { - VMWARE_BALLOON_CMD(VMCI_DOORBELL_SET, - b->vmci_doorbell.context, - b->vmci_doorbell.resource, error); - STATS_INC(b->stats.doorbell_set); - } - } + error = vmci_doorbell_create(&b->vmci_doorbell, VMCI_FLAG_DELAYED_CB, + VMCI_PRIVILEGE_FLAG_RESTRICTED, + vmballoon_doorbell, b); - if (error != 0) { - vmballoon_vmci_cleanup(b); + if (error != VMCI_SUCCESS) + goto fail; - return -EIO; - } + error = VMWARE_BALLOON_CMD(VMCI_DOORBELL_SET, b->vmci_doorbell.context, + b->vmci_doorbell.resource, dummy); + + STATS_INC(b->stats.doorbell_set); + + if (error != VMW_BALLOON_SUCCESS) + goto fail; return 0; +fail: + vmballoon_vmci_cleanup(b); + return -EIO; } /* From 604c8018bedad856c1d6477496616190be05db10 Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Tue, 19 Jun 2018 16:00:27 -0700 Subject: [PATCH 933/970] vmw_balloon: fix VMCI use when balloon built into kernel commit c3cc1b0fc27508da53fe955a3b23d03964410682 upstream. Currently, when all modules, including VMCI and VMware balloon are built into the kernel, the initialization of the balloon happens before the VMCI is probed. As a result, the balloon fails to initialize the VMCI doorbell, which it uses to get asynchronous requests for balloon size changes. The problem can be seen in the logs, in the form of the following message: "vmw_balloon: failed to initialize vmci doorbell" The driver would work correctly but slightly less efficiently, probing for requests periodically. This patch changes the balloon to be initialized using late_initcall() instead of module_init() to address this issue. It does not address a situation in which VMCI is built as a module and the balloon is built into the kernel. Fixes: 48e3d668b790 ("VMware balloon: Enable notification via VMCI") Cc: stable@vger.kernel.org Reviewed-by: Xavier Deguillard Signed-off-by: Nadav Amit Signed-off-by: Greg Kroah-Hartman --- drivers/misc/vmw_balloon.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c index 8e739b29079e..518e2dec2aa2 100644 --- a/drivers/misc/vmw_balloon.c +++ b/drivers/misc/vmw_balloon.c @@ -1297,7 +1297,14 @@ static int __init vmballoon_init(void) return 0; } -module_init(vmballoon_init); + +/* + * Using late_initcall() instead of module_init() allows the balloon to use the + * VMCI doorbell even when the balloon is built into the kernel. Otherwise the + * VMCI is probed only after the balloon is initialized. If the balloon is used + * as a module, late_initcall() is equivalent to module_init(). + */ +late_initcall(vmballoon_init); static void __exit vmballoon_exit(void) { From 4e935c2eaec5182ade1bac3d12b115e2f195a069 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 4 Jul 2018 11:05:55 +0200 Subject: [PATCH 934/970] rtc: omap: fix potential crash on power off commit 5c8b84f410b3819d14cb1ebf32e4b3714b5a6e0b upstream. Do not set the system power-off callback and omap power-off rtc pointer until we're done setting up our device to avoid leaving stale pointers around after a late probe error. Fixes: 97ea1906b3c2 ("rtc: omap: Support ext_wakeup configuration") Cc: stable # 4.9 Cc: Marcin Niestroj Cc: Tony Lindgren Signed-off-by: Johan Hovold Acked-by: Tony Lindgren Reviewed-by: Marcin Niestroj Signed-off-by: Alexandre Belloni Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-omap.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/rtc/rtc-omap.c b/drivers/rtc/rtc-omap.c index 51e52446eacb..bd5ca548c265 100644 --- a/drivers/rtc/rtc-omap.c +++ b/drivers/rtc/rtc-omap.c @@ -817,13 +817,6 @@ static int omap_rtc_probe(struct platform_device *pdev) goto err; } - if (rtc->is_pmic_controller) { - if (!pm_power_off) { - omap_rtc_power_off_rtc = rtc; - pm_power_off = omap_rtc_power_off; - } - } - /* Support ext_wakeup pinconf */ rtc_pinctrl_desc.name = dev_name(&pdev->dev); @@ -833,6 +826,13 @@ static int omap_rtc_probe(struct platform_device *pdev) return PTR_ERR(rtc->pctldev); } + if (rtc->is_pmic_controller) { + if (!pm_power_off) { + omap_rtc_power_off_rtc = rtc; + pm_power_off = omap_rtc_power_off; + } + } + return 0; err: From dc697314056cbf0f4b9ff2432c617edbebce1107 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 1 Aug 2018 15:40:57 -0400 Subject: [PATCH 935/970] tracing: Do not call start/stop() functions when tracing_on does not change commit f143641bfef9a4a60c57af30de26c63057e7e695 upstream. Currently, when one echo's in 1 into tracing_on, the current tracer's "start()" function is executed, even if tracing_on was already one. This can lead to strange side effects. One being that if the hwlat tracer is enabled, and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's start() function is called again which will recreate another kernel thread, and make it unable to remove the old one. Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de Cc: stable@vger.kernel.org Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file") Reported-by: Erica Bugden Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 148c2210a2b8..a47339b156ce 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6920,7 +6920,9 @@ rb_simple_write(struct file *filp, const char __user *ubuf, if (buffer) { mutex_lock(&trace_types_lock); - if (val) { + if (!!val == tracer_tracing_is_on(tr)) { + val = 0; /* do nothing */ + } else if (val) { tracer_tracing_on(tr); if (tr->current_trace->start) tr->current_trace->start(tr); From 941ca8dbf7748d0998858a6e0623b6b9b9b0de7a Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Thu, 16 Aug 2018 16:08:37 -0400 Subject: [PATCH 936/970] tracing/blktrace: Fix to allow setting same value commit 757d9140072054528b13bbe291583d9823cde195 upstream. Masami Hiramatsu reported: Current trace-enable attribute in sysfs returns an error if user writes the same setting value as current one, e.g. # cat /sys/block/sda/trace/enable 0 # echo 0 > /sys/block/sda/trace/enable bash: echo: write error: Invalid argument # echo 1 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable bash: echo: write error: Device or resource busy But this is not a preferred behavior, it should ignore if new setting is same as current one. This fixes the problem as below. # cat /sys/block/sda/trace/enable 0 # echo 0 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable # echo 1 > /sys/block/sda/trace/enable Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home Cc: Ingo Molnar Cc: Jens Axboe Cc: linux-block@vger.kernel.org Cc: stable@vger.kernel.org Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()") Reported-by: Masami Hiramatsu Tested-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- kernel/trace/blktrace.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index 4e17d55ba127..bfa8bb3a6e19 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -1720,6 +1720,10 @@ static ssize_t sysfs_blk_trace_attr_store(struct device *dev, mutex_lock(&bdev->bd_mutex); if (attr == &dev_attr_enable) { + if (!!value == !!q->blk_trace) { + ret = 0; + goto out_unlock_bdev; + } if (value) ret = blk_trace_setup_queue(q, bdev); else From 0ef606320567f081475f57224639efb40bfddd69 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Thu, 9 Aug 2018 15:37:59 -0400 Subject: [PATCH 937/970] uprobes: Use synchronize_rcu() not synchronize_sched() commit 016f8ffc48cb01d1e7701649c728c5d2e737d295 upstream. While debugging another bug, I was looking at all the synchronize*() functions being used in kernel/trace, and noticed that trace_uprobes was using synchronize_sched(), with a comment to synchronize with {u,ret}_probe_trace_func(). When looking at those functions, the data is protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This is using the wrong synchronize_*() function. Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer") Acked-by: Oleg Nesterov Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_uprobe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index 788262984818..f0ab801a6437 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -969,7 +969,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file) list_del_rcu(&link->list); /* synchronize with u{,ret}probe_trace_func */ - synchronize_sched(); + synchronize_rcu(); kfree(link); if (!list_empty(&tu->tp.files)) From 262f38faadc43e844d849b65337c535023ccd388 Mon Sep 17 00:00:00 2001 From: Rafael David Tinoco Date: Fri, 6 Jul 2018 14:28:33 -0300 Subject: [PATCH 938/970] mfd: hi655x: Fix regmap area declared size for hi655x commit 6afebb70ee7a4bde106dc1a875e7ac7997248f84 upstream. Fixes https://bugs.linaro.org/show_bug.cgi?id=3903 LTP Functional tests have caused a bad paging request when triggering the regmap_read_debugfs() logic of the device PMIC Hi6553 (reading regmap/f8000000.pmic/registers file during read_all test): Unable to handle kernel paging request at virtual address ffff0 [ffff00000984e000] pgd=0000000077ffe803, pud=0000000077ffd803,0 Internal error: Oops: 96000007 [#1] SMP ... Hardware name: HiKey Development Board (DT) ... Call trace: regmap_mmio_read8+0x24/0x40 regmap_mmio_read+0x48/0x70 _regmap_bus_reg_read+0x38/0x48 _regmap_read+0x68/0x170 regmap_read+0x50/0x78 regmap_read_debugfs+0x1a0/0x308 regmap_map_read_file+0x48/0x58 full_proxy_read+0x68/0x98 __vfs_read+0x48/0x80 vfs_read+0x94/0x150 SyS_read+0x6c/0xd8 el0_svc_naked+0x30/0x34 Code: aa1e03e0 d503201f f9400280 8b334000 (39400000) Investigations have showed that, when triggered by debugfs read() handler, the mmio regmap logic was reading a bigger (16k) register area than the one mapped by devm_ioremap_resource() during hi655x-pmic probe time (4k). This commit changes hi655x's max register, according to HW specs, to be the same as the one declared in the pmic device in hi6220's dts, fixing the issue. Cc: #v4.9 #v4.14 #v4.16 #v4.17 Signed-off-by: Rafael David Tinoco Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/hi655x-pmic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mfd/hi655x-pmic.c b/drivers/mfd/hi655x-pmic.c index 0fc62995695b..11347a3e6d40 100644 --- a/drivers/mfd/hi655x-pmic.c +++ b/drivers/mfd/hi655x-pmic.c @@ -49,7 +49,7 @@ static struct regmap_config hi655x_regmap_config = { .reg_bits = 32, .reg_stride = HI655X_STRIDE, .val_bits = 8, - .max_register = HI655X_BUS_ADDR(0xFFF), + .max_register = HI655X_BUS_ADDR(0x400) - HI655X_STRIDE, }; static struct resource pwrkey_resources[] = { From 3ac733eb641d974b6072c9817bd382838139979d Mon Sep 17 00:00:00 2001 From: Tomas Bortoli Date: Fri, 27 Jul 2018 13:05:58 +0200 Subject: [PATCH 939/970] 9p: fix multiple NULL-pointer-dereferences commit 10aa14527f458e9867cf3d2cc6b8cb0f6704448b upstream. Added checks to prevent GPFs from raising. Link: http://lkml.kernel.org/r/20180727110558.5479-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli Reported-by: syzbot+1a262da37d3bead15c39@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- net/9p/trans_fd.c | 5 ++++- net/9p/trans_rdma.c | 3 +++ net/9p/trans_virtio.c | 3 +++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index f8dafa8fa70b..2b543532e2f1 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -939,7 +939,7 @@ p9_fd_create_tcp(struct p9_client *client, const char *addr, char *args) if (err < 0) return err; - if (valid_ipaddr4(addr) < 0) + if (addr == NULL || valid_ipaddr4(addr) < 0) return -EINVAL; csocket = NULL; @@ -987,6 +987,9 @@ p9_fd_create_unix(struct p9_client *client, const char *addr, char *args) csocket = NULL; + if (addr == NULL) + return -EINVAL; + if (strlen(addr) >= UNIX_PATH_MAX) { pr_err("%s (%d): address too long: %s\n", __func__, task_pid_nr(current), addr); diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c index 553ed4ecb6a0..5a2ad4707463 100644 --- a/net/9p/trans_rdma.c +++ b/net/9p/trans_rdma.c @@ -622,6 +622,9 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args) struct rdma_conn_param conn_param; struct ib_qp_init_attr qp_attr; + if (addr == NULL) + return -EINVAL; + /* Parse the transport specific mount options */ err = parse_opts(args, &opts); if (err < 0) diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 73c700a55f02..da0d3b257459 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -651,6 +651,9 @@ p9_virtio_create(struct p9_client *client, const char *devname, char *args) int ret = -ENOENT; int found = 0; + if (devname == NULL) + return -EINVAL; + mutex_lock(&virtio_9p_lock); list_for_each_entry(chan, &virtio_chan_list, chan_list) { if (!strncmp(devname, chan->tag, chan->tag_len) && From 18c5d285a9d4a878313116c1e7095ea1c0869ebe Mon Sep 17 00:00:00 2001 From: "zhangyi (F)" Date: Tue, 14 Aug 2018 10:34:42 +0800 Subject: [PATCH 940/970] PM / sleep: wakeup: Fix build error caused by missing SRCU support commit 3df6f61fff49632492490fb6e42646b803a9958a upstream. Commit ea0212f40c6 (power: auto select CONFIG_SRCU) made the code in drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to select CONFIG_SRCU in Kconfig, which leads to the following build error if CONFIG_SRCU is not selected somewhere else: drivers/built-in.o: In function `wakeup_source_remove': (.text+0x3c6fc): undefined reference to `synchronize_srcu' drivers/built-in.o: In function `pm_print_active_wakeup_sources': (.text+0x3c7a8): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `pm_print_active_wakeup_sources': (.text+0x3c84c): undefined reference to `__srcu_read_unlock' drivers/built-in.o: In function `device_wakeup_arm_wake_irqs': (.text+0x3d1d8): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `device_wakeup_arm_wake_irqs': (.text+0x3d228): undefined reference to `__srcu_read_unlock' drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs': (.text+0x3d24c): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs': (.text+0x3d29c): undefined reference to `__srcu_read_unlock' drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu' Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled. Fixes: ea0212f40c6 (power: auto select CONFIG_SRCU) Cc: 4.2+ # 4.2+ Signed-off-by: zhangyi (F) [ rjw: Minor subject/changelog fixups ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- kernel/power/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig index e8517b63eb37..dd2b5a4d89a5 100644 --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -105,6 +105,7 @@ config PM_SLEEP def_bool y depends on SUSPEND || HIBERNATE_CALLBACKS select PM + select SRCU config PM_SLEEP_SMP def_bool y From 8729412ca3527c82e6daae2ab3c3e7e929c1b5de Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 22 Aug 2018 16:43:39 +0200 Subject: [PATCH 941/970] KVM: VMX: fixes for vmentry_l1d_flush module parameter commit 0027ff2a75f9dcf0537ac0a65c5840b0e21a4950 upstream. Two bug fixes: 1) missing entries in the l1d_param array; this can cause a host crash if an access attempts to reach the missing entry. Future-proof the get function against any overflows as well. However, the two entries VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must not be accepted by the parse function, so disable them there. 2) invalid values must be rejected even if the CPU does not have the bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF) ... and a small refactoring, since the .cmd field is redundant with the index in the array. Reported-by: Bandan Das Cc: stable@vger.kernel.org Fixes: a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7 Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 8e4ac0a91309..8888d894bf39 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -198,12 +198,14 @@ static enum vmx_l1d_flush_state __read_mostly vmentry_l1d_flush_param = VMENTER_ static const struct { const char *option; - enum vmx_l1d_flush_state cmd; + bool for_parse; } vmentry_l1d_param[] = { - {"auto", VMENTER_L1D_FLUSH_AUTO}, - {"never", VMENTER_L1D_FLUSH_NEVER}, - {"cond", VMENTER_L1D_FLUSH_COND}, - {"always", VMENTER_L1D_FLUSH_ALWAYS}, + [VMENTER_L1D_FLUSH_AUTO] = {"auto", true}, + [VMENTER_L1D_FLUSH_NEVER] = {"never", true}, + [VMENTER_L1D_FLUSH_COND] = {"cond", true}, + [VMENTER_L1D_FLUSH_ALWAYS] = {"always", true}, + [VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false}, + [VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false}, }; #define L1D_CACHE_ORDER 4 @@ -287,8 +289,9 @@ static int vmentry_l1d_flush_parse(const char *s) if (s) { for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) { - if (sysfs_streq(s, vmentry_l1d_param[i].option)) - return vmentry_l1d_param[i].cmd; + if (vmentry_l1d_param[i].for_parse && + sysfs_streq(s, vmentry_l1d_param[i].option)) + return i; } } return -EINVAL; @@ -298,13 +301,13 @@ static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) { int l1tf, ret; - if (!boot_cpu_has(X86_BUG_L1TF)) - return 0; - l1tf = vmentry_l1d_flush_parse(s); if (l1tf < 0) return l1tf; + if (!boot_cpu_has(X86_BUG_L1TF)) + return 0; + /* * Has vmx_init() run already? If not then this is the pre init * parameter parsing. In that case just store the value and let @@ -324,6 +327,9 @@ static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp) static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp) { + if (WARN_ON_ONCE(l1tf_vmx_mitigation >= ARRAY_SIZE(vmentry_l1d_param))) + return sprintf(s, "???\n"); + return sprintf(s, "%s\n", vmentry_l1d_param[l1tf_vmx_mitigation].option); } From e996a24db2087ce1b3ad2e41fa2c2b05d1de8652 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Fri, 10 Aug 2018 20:43:48 -0700 Subject: [PATCH 942/970] xtensa: limit offsets in __loop_cache_{all,page} commit be75de25251f7cf3e399ca1f584716a95510d24a upstream. When building kernel for xtensa cores with big cache lines (e.g. 128 bytes or more) __loop_cache_all and __loop_cache_page may generate assembly instructions with immediate fields that are too big. This results in the following build errors: arch/xtensa/mm/misc.S: Assembler messages: arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '256' arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '384' arch/xtensa/kernel/head.S: Assembler messages: arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '256' arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '384' arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '256' arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '384' arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '256' arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '384' Add parameter max_immed to these macros and use it to limit values of immediate operands. Extract common code of these macros into the new macro __loop_cache_unroll. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov Signed-off-by: Greg Kroah-Hartman --- arch/xtensa/include/asm/cacheasm.h | 65 ++++++++++++++++++------------ 1 file changed, 40 insertions(+), 25 deletions(-) diff --git a/arch/xtensa/include/asm/cacheasm.h b/arch/xtensa/include/asm/cacheasm.h index 2041abb10a23..2c73b4571226 100644 --- a/arch/xtensa/include/asm/cacheasm.h +++ b/arch/xtensa/include/asm/cacheasm.h @@ -31,16 +31,32 @@ * */ - .macro __loop_cache_all ar at insn size line_width + + .macro __loop_cache_unroll ar at insn size line_width max_immed + + .if (1 << (\line_width)) > (\max_immed) + .set _reps, 1 + .elseif (2 << (\line_width)) > (\max_immed) + .set _reps, 2 + .else + .set _reps, 4 + .endif + + __loopi \ar, \at, \size, (_reps << (\line_width)) + .set _index, 0 + .rep _reps + \insn \ar, _index << (\line_width) + .set _index, _index + 1 + .endr + __endla \ar, \at, _reps << (\line_width) + + .endm + + + .macro __loop_cache_all ar at insn size line_width max_immed movi \ar, 0 - - __loopi \ar, \at, \size, (4 << (\line_width)) - \insn \ar, 0 << (\line_width) - \insn \ar, 1 << (\line_width) - \insn \ar, 2 << (\line_width) - \insn \ar, 3 << (\line_width) - __endla \ar, \at, 4 << (\line_width) + __loop_cache_unroll \ar, \at, \insn, \size, \line_width, \max_immed .endm @@ -57,14 +73,9 @@ .endm - .macro __loop_cache_page ar at insn line_width + .macro __loop_cache_page ar at insn line_width max_immed - __loopi \ar, \at, PAGE_SIZE, 4 << (\line_width) - \insn \ar, 0 << (\line_width) - \insn \ar, 1 << (\line_width) - \insn \ar, 2 << (\line_width) - \insn \ar, 3 << (\line_width) - __endla \ar, \at, 4 << (\line_width) + __loop_cache_unroll \ar, \at, \insn, PAGE_SIZE, \line_width, \max_immed .endm @@ -72,7 +83,8 @@ .macro ___unlock_dcache_all ar at #if XCHAL_DCACHE_LINE_LOCKABLE && XCHAL_DCACHE_SIZE - __loop_cache_all \ar \at diu XCHAL_DCACHE_SIZE XCHAL_DCACHE_LINEWIDTH + __loop_cache_all \ar \at diu XCHAL_DCACHE_SIZE \ + XCHAL_DCACHE_LINEWIDTH 240 #endif .endm @@ -81,7 +93,8 @@ .macro ___unlock_icache_all ar at #if XCHAL_ICACHE_LINE_LOCKABLE && XCHAL_ICACHE_SIZE - __loop_cache_all \ar \at iiu XCHAL_ICACHE_SIZE XCHAL_ICACHE_LINEWIDTH + __loop_cache_all \ar \at iiu XCHAL_ICACHE_SIZE \ + XCHAL_ICACHE_LINEWIDTH 240 #endif .endm @@ -90,7 +103,8 @@ .macro ___flush_invalidate_dcache_all ar at #if XCHAL_DCACHE_SIZE - __loop_cache_all \ar \at diwbi XCHAL_DCACHE_SIZE XCHAL_DCACHE_LINEWIDTH + __loop_cache_all \ar \at diwbi XCHAL_DCACHE_SIZE \ + XCHAL_DCACHE_LINEWIDTH 240 #endif .endm @@ -99,7 +113,8 @@ .macro ___flush_dcache_all ar at #if XCHAL_DCACHE_SIZE - __loop_cache_all \ar \at diwb XCHAL_DCACHE_SIZE XCHAL_DCACHE_LINEWIDTH + __loop_cache_all \ar \at diwb XCHAL_DCACHE_SIZE \ + XCHAL_DCACHE_LINEWIDTH 240 #endif .endm @@ -109,7 +124,7 @@ #if XCHAL_DCACHE_SIZE __loop_cache_all \ar \at dii __stringify(DCACHE_WAY_SIZE) \ - XCHAL_DCACHE_LINEWIDTH + XCHAL_DCACHE_LINEWIDTH 1020 #endif .endm @@ -119,7 +134,7 @@ #if XCHAL_ICACHE_SIZE __loop_cache_all \ar \at iii __stringify(ICACHE_WAY_SIZE) \ - XCHAL_ICACHE_LINEWIDTH + XCHAL_ICACHE_LINEWIDTH 1020 #endif .endm @@ -166,7 +181,7 @@ .macro ___flush_invalidate_dcache_page ar as #if XCHAL_DCACHE_SIZE - __loop_cache_page \ar \as dhwbi XCHAL_DCACHE_LINEWIDTH + __loop_cache_page \ar \as dhwbi XCHAL_DCACHE_LINEWIDTH 1020 #endif .endm @@ -175,7 +190,7 @@ .macro ___flush_dcache_page ar as #if XCHAL_DCACHE_SIZE - __loop_cache_page \ar \as dhwb XCHAL_DCACHE_LINEWIDTH + __loop_cache_page \ar \as dhwb XCHAL_DCACHE_LINEWIDTH 1020 #endif .endm @@ -184,7 +199,7 @@ .macro ___invalidate_dcache_page ar as #if XCHAL_DCACHE_SIZE - __loop_cache_page \ar \as dhi XCHAL_DCACHE_LINEWIDTH + __loop_cache_page \ar \as dhi XCHAL_DCACHE_LINEWIDTH 1020 #endif .endm @@ -193,7 +208,7 @@ .macro ___invalidate_icache_page ar as #if XCHAL_ICACHE_SIZE - __loop_cache_page \ar \as ihi XCHAL_ICACHE_LINEWIDTH + __loop_cache_page \ar \as ihi XCHAL_ICACHE_LINEWIDTH 1020 #endif .endm From 7f2163b56e5eab5d63aa68084fecd63052389999 Mon Sep 17 00:00:00 2001 From: Max Filippov Date: Fri, 10 Aug 2018 22:21:22 -0700 Subject: [PATCH 943/970] xtensa: increase ranges in ___invalidate_{i,d}cache_all commit fec3259c9f747c039f90e99570540114c8d81a14 upstream. Cache invalidation macros use cache line size to iterate over invalidated cache lines, assuming that all cache ways are invalidated by single instruction, but xtensa ISA recommends to not assume that for future compatibility: In some implementations all ways at index Addry-1..z are invalidated regardless of the specified way, but for future compatibility this behavior should not be assumed. Iterate over all cache ways in ___invalidate_icache_all and ___invalidate_dcache_all. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov Signed-off-by: Greg Kroah-Hartman --- arch/xtensa/include/asm/cacheasm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/xtensa/include/asm/cacheasm.h b/arch/xtensa/include/asm/cacheasm.h index 2c73b4571226..34545ecfdd6b 100644 --- a/arch/xtensa/include/asm/cacheasm.h +++ b/arch/xtensa/include/asm/cacheasm.h @@ -123,7 +123,7 @@ .macro ___invalidate_dcache_all ar at #if XCHAL_DCACHE_SIZE - __loop_cache_all \ar \at dii __stringify(DCACHE_WAY_SIZE) \ + __loop_cache_all \ar \at dii XCHAL_DCACHE_SIZE \ XCHAL_DCACHE_LINEWIDTH 1020 #endif @@ -133,7 +133,7 @@ .macro ___invalidate_icache_all ar at #if XCHAL_ICACHE_SIZE - __loop_cache_all \ar \at iii __stringify(ICACHE_WAY_SIZE) \ + __loop_cache_all \ar \at iii XCHAL_ICACHE_SIZE \ XCHAL_ICACHE_LINEWIDTH 1020 #endif From 9ba1a9ea7cb781fa293831ad77b4632b6db90f8c Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 4 Jul 2018 12:59:58 +0300 Subject: [PATCH 944/970] pnfs/blocklayout: off by one in bl_map_stripe() commit 0914bb965e38a055e9245637aed117efbe976e91 upstream. "dev->nr_children" is the number of children which were parsed successfully in bl_parse_stripe(). It could be all of them and then, in that case, it is equal to v->stripe.volumes_count. Either way, the > should be >= so that we don't go beyond the end of what we're supposed to. Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Dan Carpenter Reviewed-by: Christoph Hellwig Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman --- fs/nfs/blocklayout/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index a69ef4e9c24c..d6e4191276c0 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -203,7 +203,7 @@ static bool bl_map_stripe(struct pnfs_block_dev *dev, u64 offset, chunk = div_u64(offset, dev->chunk_size); div_u64_rem(chunk, dev->nr_children, &chunk_idx); - if (chunk_idx > dev->nr_children) { + if (chunk_idx >= dev->nr_children) { dprintk("%s: invalid chunk idx %d (%lld/%lld)\n", __func__, chunk_idx, offset, dev->chunk_size); /* error, should not happen */ From b5bc39d4ffb53b0fb335d96d3abf1a4585019c0a Mon Sep 17 00:00:00 2001 From: Bill Baker Date: Tue, 19 Jun 2018 16:24:58 -0500 Subject: [PATCH 945/970] NFSv4 client live hangs after live data migration recovery commit 0f90be132cbf1537d87a6a8b9e80867adac892f6 upstream. After a live data migration event at the NFS server, the client may send I/O requests to the wrong server, causing a live hang due to repeated recovery events. On the wire, this will appear as an I/O request failing with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly. NFS4ERR_BADSSESSION is returned because the session ID being used was issued by the other server and is not valid at the old server. The failure is caused by async worker threads having cached the transport (xprt) in the rpc_task structure. After the migration recovery completes, the task is redispatched and the task resends the request to the wrong server based on the old value still present in tk_xprt. The solution is to recompute the tk_xprt field of the rpc_task structure so that the request goes to the correct server. Signed-off-by: Bill Baker Reviewed-by: Chuck Lever Tested-by: Helen Chao Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 9 ++++++++- include/linux/sunrpc/clnt.h | 1 + net/sunrpc/clnt.c | 28 ++++++++++++++++++++-------- 3 files changed, 29 insertions(+), 9 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index cf5fdc25289a..e7ca62a86dab 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -541,8 +541,15 @@ nfs4_async_handle_exception(struct rpc_task *task, struct nfs_server *server, ret = -EIO; return ret; out_retry: - if (ret == 0) + if (ret == 0) { exception->retry = 1; + /* + * For NFS4ERR_MOVED, the client transport will need to + * be recomputed after migration recovery has completed. + */ + if (errorcode == -NFS4ERR_MOVED) + rpc_task_release_transport(task); + } return ret; } diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 333ad11b3dd9..44161a4c124c 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -155,6 +155,7 @@ int rpc_switch_client_transport(struct rpc_clnt *, void rpc_shutdown_client(struct rpc_clnt *); void rpc_release_client(struct rpc_clnt *); +void rpc_task_release_transport(struct rpc_task *); void rpc_task_release_client(struct rpc_task *); int rpcb_create_local(struct net *); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index b2ae4f150ec6..244eac1bd648 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -965,10 +965,20 @@ struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *old, } EXPORT_SYMBOL_GPL(rpc_bind_new_program); +void rpc_task_release_transport(struct rpc_task *task) +{ + struct rpc_xprt *xprt = task->tk_xprt; + + if (xprt) { + task->tk_xprt = NULL; + xprt_put(xprt); + } +} +EXPORT_SYMBOL_GPL(rpc_task_release_transport); + void rpc_task_release_client(struct rpc_task *task) { struct rpc_clnt *clnt = task->tk_client; - struct rpc_xprt *xprt = task->tk_xprt; if (clnt != NULL) { /* Remove from client task list */ @@ -979,12 +989,14 @@ void rpc_task_release_client(struct rpc_task *task) rpc_release_client(clnt); } + rpc_task_release_transport(task); +} - if (xprt != NULL) { - task->tk_xprt = NULL; - - xprt_put(xprt); - } +static +void rpc_task_set_transport(struct rpc_task *task, struct rpc_clnt *clnt) +{ + if (!task->tk_xprt) + task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi); } static @@ -992,8 +1004,7 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt) { if (clnt != NULL) { - if (task->tk_xprt == NULL) - task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi); + rpc_task_set_transport(task, clnt); task->tk_client = clnt; atomic_inc(&clnt->cl_count); if (clnt->cl_softrtry) @@ -1550,6 +1561,7 @@ call_start(struct rpc_task *task) task->tk_msg.rpc_proc->p_count++; clnt->cl_stats->rpccnt++; task->tk_action = call_reserve; + rpc_task_set_transport(task, clnt); } /* From ba99ff79ac84a768cc9163897ec6b616bd14858f Mon Sep 17 00:00:00 2001 From: Jon Hunter Date: Tue, 3 Jul 2018 09:59:47 +0100 Subject: [PATCH 946/970] ARM: tegra: Fix Tegra30 Cardhu PCA954x reset commit 6e1811900b6fe6f2b4665dba6bd6ed32c6b98575 upstream. On all versions of Tegra30 Cardhu, the reset signal to the NXP PCA9546 I2C mux is connected to the Tegra GPIO BB0. Currently, this pin on the Tegra is not configured as a GPIO but as a special-function IO (SFIO) that is multiplexing the pin to an I2S controller. On exiting system suspend, I2C commands sent to the PCA9546 are failing because there is no ACK. Although it is not possible to see exactly what is happening to the reset during suspend, by ensuring it is configured as a GPIO and driven high, to de-assert the reset, the failures are no longer seen. Please note that this GPIO is also used to drive the reset signal going to the camera connector on the board. However, given that there is no camera support currently for Cardhu, this should not have any impact. Fixes: 40431d16ff11 ("ARM: tegra: enable PCA9546 on Cardhu") Cc: stable@vger.kernel.org Signed-off-by: Jon Hunter Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/tegra30-cardhu.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/tegra30-cardhu.dtsi b/arch/arm/boot/dts/tegra30-cardhu.dtsi index f11012bb58cc..cfcf5dcb51a8 100644 --- a/arch/arm/boot/dts/tegra30-cardhu.dtsi +++ b/arch/arm/boot/dts/tegra30-cardhu.dtsi @@ -205,6 +205,7 @@ #address-cells = <1>; #size-cells = <0>; reg = <0x70>; + reset-gpio = <&gpio TEGRA_GPIO(BB, 0) GPIO_ACTIVE_LOW>; }; }; From 40b08cdac9ae0877dd39d2dcfe2f8c7c68c8ce59 Mon Sep 17 00:00:00 2001 From: Yannik Sembritzki Date: Thu, 16 Aug 2018 14:05:10 +0100 Subject: [PATCH 947/970] Replace magic for trusting the secondary keyring with #define commit 817aef260037f33ee0f44c17fe341323d3aebd6d upstream. Replace the use of a magic number that indicates that verify_*_signature() should use the secondary keyring with a symbol. Signed-off-by: Yannik Sembritzki Signed-off-by: David Howells Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- certs/system_keyring.c | 3 ++- crypto/asymmetric_keys/pkcs7_key_type.c | 2 +- include/linux/verification.h | 6 ++++++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/certs/system_keyring.c b/certs/system_keyring.c index 50979d6dcecd..247665054691 100644 --- a/certs/system_keyring.c +++ b/certs/system_keyring.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -207,7 +208,7 @@ int verify_pkcs7_signature(const void *data, size_t len, if (!trusted_keys) { trusted_keys = builtin_trusted_keys; - } else if (trusted_keys == (void *)1UL) { + } else if (trusted_keys == VERIFY_USE_SECONDARY_KEYRING) { #ifdef CONFIG_SECONDARY_TRUSTED_KEYRING trusted_keys = secondary_trusted_keys; #else diff --git a/crypto/asymmetric_keys/pkcs7_key_type.c b/crypto/asymmetric_keys/pkcs7_key_type.c index 1063b644efcd..b2aa925a84bc 100644 --- a/crypto/asymmetric_keys/pkcs7_key_type.c +++ b/crypto/asymmetric_keys/pkcs7_key_type.c @@ -62,7 +62,7 @@ static int pkcs7_preparse(struct key_preparsed_payload *prep) return verify_pkcs7_signature(NULL, 0, prep->data, prep->datalen, - (void *)1UL, usage, + VERIFY_USE_SECONDARY_KEYRING, usage, pkcs7_view_content, prep); } diff --git a/include/linux/verification.h b/include/linux/verification.h index a10549a6c7cd..cfa4730d607a 100644 --- a/include/linux/verification.h +++ b/include/linux/verification.h @@ -12,6 +12,12 @@ #ifndef _LINUX_VERIFICATION_H #define _LINUX_VERIFICATION_H +/* + * Indicate that both builtin trusted keys and secondary trusted keys + * should be used. + */ +#define VERIFY_USE_SECONDARY_KEYRING ((struct key *)1UL) + /* * The use to which an asymmetric key is being put. */ From 7c439bc2220d624fc7edba4c9bdc1d65b4f47f26 Mon Sep 17 00:00:00 2001 From: Yannik Sembritzki Date: Thu, 16 Aug 2018 14:05:23 +0100 Subject: [PATCH 948/970] Fix kexec forbidding kernels signed with keys in the secondary keyring to boot commit ea93102f32244e3f45c8b26260be77ed0cc1d16c upstream. The split of .system_keyring into .builtin_trusted_keys and .secondary_trusted_keys broke kexec, thereby preventing kernels signed by keys which are now in the secondary keyring from being kexec'd. Fix this by passing VERIFY_USE_SECONDARY_KEYRING to verify_pefile_signature(). Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically") Signed-off-by: Yannik Sembritzki Signed-off-by: David Howells Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/kexec-bzimage64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 3407b148c240..490f9be3fda2 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -529,7 +529,7 @@ static int bzImage64_cleanup(void *loader_data) static int bzImage64_verify_sig(const char *kernel, unsigned long kernel_len) { return verify_pefile_signature(kernel, kernel_len, - NULL, + VERIFY_USE_SECONDARY_KEYRING, VERIFYING_KEXEC_PE_SIGNATURE); } #endif From 04d1d58c274996163c7c7d659336cedbade30761 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 22 Aug 2018 17:30:14 +0200 Subject: [PATCH 949/970] mm/tlb: Remove tlb_remove_table() non-concurrent condition commit a6f572084fbee8b30f91465f4a085d7a90901c57 upstream. Will noted that only checking mm_users is incorrect; we should also check mm_count in order to cover CPUs that have a lazy reference to this mm (and could do speculative TLB operations). If removing this turns out to be a performance issue, we can re-instate a more complete check, but in tlb_table_flush() eliding the call_rcu_sched(). Fixes: 267239116987 ("mm, powerpc: move the RCU page-table freeing into generic code") Reported-by: Will Deacon Signed-off-by: Peter Zijlstra (Intel) Acked-by: Rik van Riel Acked-by: Will Deacon Cc: Nicholas Piggin Cc: David Miller Cc: Martin Schwidefsky Cc: Michael Ellerman Cc: stable@kernel.org Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/memory.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 0ff735601654..f3fef1df7402 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -373,15 +373,6 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table) { struct mmu_table_batch **batch = &tlb->batch; - /* - * When there's less then two users of this mm there cannot be a - * concurrent page-table walk. - */ - if (atomic_read(&tlb->mm->mm_users) < 2) { - __tlb_remove_table(table); - return; - } - if (*batch == NULL) { *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); if (*batch == NULL) { From eada1b2246fcc2be26d49108c680ebd199954fb6 Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Thu, 7 Jun 2018 09:56:59 -0700 Subject: [PATCH 950/970] iommu/vt-d: Add definitions for PFSID commit 0f725561e168485eff7277d683405c05b192f537 upstream. When SRIOV VF device IOTLB is invalidated, we need to provide the PF source ID such that IOMMU hardware can gauge the depth of invalidation queue which is shared among VFs. This is needed when device invalidation throttle (DIT) capability is supported. This patch adds bit definitions for checking and tracking PFSID. Signed-off-by: Jacob Pan Cc: stable@vger.kernel.org Cc: "Ashok Raj" Cc: "Lu Baolu" Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/intel-iommu.c | 1 + include/linux/intel-iommu.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 1612d3a22d42..7d20f629cc32 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -421,6 +421,7 @@ struct device_domain_info { struct list_head global; /* link to global list */ u8 bus; /* PCI bus number */ u8 devfn; /* PCI devfn number */ + u16 pfsid; /* SRIOV physical function source ID */ u8 pasid_supported:3; u8 pasid_enabled:1; u8 pri_supported:1; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 23e129ef6726..0892615ce93d 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -125,6 +125,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val) * Extended Capability Register */ +#define ecap_dit(e) ((e >> 41) & 0x1) #define ecap_pasid(e) ((e >> 40) & 0x1) #define ecap_pss(e) ((e >> 35) & 0x1f) #define ecap_eafs(e) ((e >> 34) & 0x1) @@ -294,6 +295,7 @@ enum { #define QI_DEV_IOTLB_SID(sid) ((u64)((sid) & 0xffff) << 32) #define QI_DEV_IOTLB_QDEP(qdep) (((qdep) & 0x1f) << 16) #define QI_DEV_IOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) +#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xfff) << 52)) #define QI_DEV_IOTLB_SIZE 1 #define QI_DEV_IOTLB_MAX_INVS 32 @@ -318,6 +320,7 @@ enum { #define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32) #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0xffff) << 16) #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4) +#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xfff) << 52)) #define QI_DEV_EIOTLB_MAX_INVS 32 #define QI_PGRP_IDX(idx) (((u64)(idx)) << 55) From b68377cb57f248cafc3b118e78adeb60a6d442d5 Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Thu, 7 Jun 2018 09:57:00 -0700 Subject: [PATCH 951/970] iommu/vt-d: Fix dev iotlb pfsid use commit 1c48db44924298ad0cb5a6386b88017539be8822 upstream. PFSID should be used in the invalidation descriptor for flushing device IOTLBs on SRIOV VFs. Signed-off-by: Jacob Pan Cc: stable@vger.kernel.org Cc: "Ashok Raj" Cc: "Lu Baolu" Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/dmar.c | 6 +++--- drivers/iommu/intel-iommu.c | 17 ++++++++++++++++- include/linux/intel-iommu.h | 5 ++--- 3 files changed, 21 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 8c53748a769d..63110fbbb410 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1328,8 +1328,8 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu); } -void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep, - u64 addr, unsigned mask) +void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, + u16 qdep, u64 addr, unsigned mask) { struct qi_desc desc; @@ -1344,7 +1344,7 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep, qdep = 0; desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) | - QI_DIOTLB_TYPE; + QI_DIOTLB_TYPE | QI_DEV_IOTLB_PFSID(pfsid); qi_submit_sync(&desc, iommu); } diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 7d20f629cc32..2558a381e118 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1512,6 +1512,20 @@ static void iommu_enable_dev_iotlb(struct device_domain_info *info) return; pdev = to_pci_dev(info->dev); + /* For IOMMU that supports device IOTLB throttling (DIT), we assign + * PFSID to the invalidation desc of a VF such that IOMMU HW can gauge + * queue depth at PF level. If DIT is not set, PFSID will be treated as + * reserved, which should be set to 0. + */ + if (!ecap_dit(info->iommu->ecap)) + info->pfsid = 0; + else { + struct pci_dev *pf_pdev; + + /* pdev will be returned if device is not a vf */ + pf_pdev = pci_physfn(pdev); + info->pfsid = PCI_DEVID(pf_pdev->bus->number, pf_pdev->devfn); + } #ifdef CONFIG_INTEL_IOMMU_SVM /* The PCIe spec, in its wisdom, declares that the behaviour of @@ -1577,7 +1591,8 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain, sid = info->bus << 8 | info->devfn; qdep = info->ats_qdep; - qi_flush_dev_iotlb(info->iommu, sid, qdep, addr, mask); + qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, + qdep, addr, mask); } spin_unlock_irqrestore(&device_domain_lock, flags); } diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 0892615ce93d..e353f6600b0b 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -466,9 +466,8 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, u64 type); extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int size_order, u64 type); -extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep, - u64 addr, unsigned mask); - +extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, + u16 qdep, u64 addr, unsigned mask); extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu); extern int dmar_ir_support(void); From d2f96e17ca75f183cfd50ff0b6655fb74e502a1e Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 13 May 2017 21:39:49 -0400 Subject: [PATCH 952/970] osf_getdomainname(): use copy_to_user() commit 9ba3eb5103cf56f0daaf07de4507df76e7813ed7 upstream. Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman --- arch/alpha/kernel/osf_sys.c | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index 4f95577b0180..ddba64b46fe1 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -561,25 +561,20 @@ SYSCALL_DEFINE0(getdtablesize) */ SYSCALL_DEFINE2(osf_getdomainname, char __user *, name, int, namelen) { - unsigned len; - int i; + int len, err = 0; + char *kname; - if (!access_ok(VERIFY_WRITE, name, namelen)) - return -EFAULT; - - len = namelen; - if (len > 32) - len = 32; + if (namelen > 32) + namelen = 32; down_read(&uts_sem); - for (i = 0; i < len; ++i) { - __put_user(utsname()->domainname[i], name + i); - if (utsname()->domainname[i] == '\0') - break; - } + kname = utsname()->domainname; + len = strnlen(kname, namelen); + if (copy_to_user(name, kname, min(len + 1, namelen))) + err = -EFAULT; up_read(&uts_sem); - return 0; + return err; } /* From 55463c60b7d56e98936abfd092b4983dd010df50 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Mon, 25 Jun 2018 18:34:10 +0200 Subject: [PATCH 953/970] sys: don't hold uts_sem while accessing userspace memory commit 42a0cc3478584d4d63f68f2f5af021ddbea771fa upstream. Holding uts_sem as a writer while accessing userspace memory allows a namespace admin to stall all processes that attempt to take uts_sem. Instead, move data through stack buffers and don't access userspace memory while uts_sem is held. Cc: stable@vger.kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jann Horn Signed-off-by: Eric W. Biederman Signed-off-by: Greg Kroah-Hartman --- arch/alpha/kernel/osf_sys.c | 51 ++++++++--------- arch/sparc/kernel/sys_sparc_32.c | 22 +++++--- arch/sparc/kernel/sys_sparc_64.c | 20 ++++--- kernel/sys.c | 95 +++++++++++++++----------------- kernel/utsname_sysctl.c | 45 +++++++++------ 5 files changed, 121 insertions(+), 112 deletions(-) diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index ddba64b46fe1..6e0d549eb3bb 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -526,24 +526,19 @@ SYSCALL_DEFINE4(osf_mount, unsigned long, typenr, const char __user *, path, SYSCALL_DEFINE1(osf_utsname, char __user *, name) { int error; + char tmp[5 * 32]; down_read(&uts_sem); - error = -EFAULT; - if (copy_to_user(name + 0, utsname()->sysname, 32)) - goto out; - if (copy_to_user(name + 32, utsname()->nodename, 32)) - goto out; - if (copy_to_user(name + 64, utsname()->release, 32)) - goto out; - if (copy_to_user(name + 96, utsname()->version, 32)) - goto out; - if (copy_to_user(name + 128, utsname()->machine, 32)) - goto out; + memcpy(tmp + 0 * 32, utsname()->sysname, 32); + memcpy(tmp + 1 * 32, utsname()->nodename, 32); + memcpy(tmp + 2 * 32, utsname()->release, 32); + memcpy(tmp + 3 * 32, utsname()->version, 32); + memcpy(tmp + 4 * 32, utsname()->machine, 32); + up_read(&uts_sem); - error = 0; - out: - up_read(&uts_sem); - return error; + if (copy_to_user(name, tmp, sizeof(tmp))) + return -EFAULT; + return 0; } SYSCALL_DEFINE0(getpagesize) @@ -563,18 +558,21 @@ SYSCALL_DEFINE2(osf_getdomainname, char __user *, name, int, namelen) { int len, err = 0; char *kname; + char tmp[32]; - if (namelen > 32) + if (namelen < 0 || namelen > 32) namelen = 32; down_read(&uts_sem); kname = utsname()->domainname; len = strnlen(kname, namelen); - if (copy_to_user(name, kname, min(len + 1, namelen))) - err = -EFAULT; + len = min(len + 1, namelen); + memcpy(tmp, kname, len); up_read(&uts_sem); - return err; + if (copy_to_user(name, tmp, len)) + return -EFAULT; + return 0; } /* @@ -736,13 +734,14 @@ SYSCALL_DEFINE3(osf_sysinfo, int, command, char __user *, buf, long, count) }; unsigned long offset; const char *res; - long len, err = -EINVAL; + long len; + char tmp[__NEW_UTS_LEN + 1]; offset = command-1; if (offset >= ARRAY_SIZE(sysinfo_table)) { /* Digital UNIX has a few unpublished interfaces here */ printk("sysinfo(%d)", command); - goto out; + return -EINVAL; } down_read(&uts_sem); @@ -750,13 +749,11 @@ SYSCALL_DEFINE3(osf_sysinfo, int, command, char __user *, buf, long, count) len = strlen(res)+1; if ((unsigned long)len > (unsigned long)count) len = count; - if (copy_to_user(buf, res, len)) - err = -EFAULT; - else - err = 0; + memcpy(tmp, res, len); up_read(&uts_sem); - out: - return err; + if (copy_to_user(buf, tmp, len)) + return -EFAULT; + return 0; } SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *, buffer, diff --git a/arch/sparc/kernel/sys_sparc_32.c b/arch/sparc/kernel/sys_sparc_32.c index 646988d4c1a3..740f43b9b541 100644 --- a/arch/sparc/kernel/sys_sparc_32.c +++ b/arch/sparc/kernel/sys_sparc_32.c @@ -201,23 +201,27 @@ SYSCALL_DEFINE5(rt_sigaction, int, sig, asmlinkage long sys_getdomainname(char __user *name, int len) { - int nlen, err; - + int nlen, err; + char tmp[__NEW_UTS_LEN + 1]; + if (len < 0) return -EINVAL; - down_read(&uts_sem); - + down_read(&uts_sem); + nlen = strlen(utsname()->domainname) + 1; err = -EINVAL; if (nlen > len) - goto out; + goto out_unlock; + memcpy(tmp, utsname()->domainname, nlen); - err = -EFAULT; - if (!copy_to_user(name, utsname()->domainname, nlen)) - err = 0; + up_read(&uts_sem); -out: + if (copy_to_user(name, tmp, nlen)) + return -EFAULT; + return 0; + +out_unlock: up_read(&uts_sem); return err; } diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c index 02e05e221b94..ebecbc927460 100644 --- a/arch/sparc/kernel/sys_sparc_64.c +++ b/arch/sparc/kernel/sys_sparc_64.c @@ -524,23 +524,27 @@ extern void check_pending(int signum); SYSCALL_DEFINE2(getdomainname, char __user *, name, int, len) { - int nlen, err; + int nlen, err; + char tmp[__NEW_UTS_LEN + 1]; if (len < 0) return -EINVAL; - down_read(&uts_sem); - + down_read(&uts_sem); + nlen = strlen(utsname()->domainname) + 1; err = -EINVAL; if (nlen > len) - goto out; + goto out_unlock; + memcpy(tmp, utsname()->domainname, nlen); - err = -EFAULT; - if (!copy_to_user(name, utsname()->domainname, nlen)) - err = 0; + up_read(&uts_sem); -out: + if (copy_to_user(name, tmp, nlen)) + return -EFAULT; + return 0; + +out_unlock: up_read(&uts_sem); return err; } diff --git a/kernel/sys.c b/kernel/sys.c index b13b530b5e0f..6c4e9b533258 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1142,18 +1142,19 @@ static int override_release(char __user *release, size_t len) SYSCALL_DEFINE1(newuname, struct new_utsname __user *, name) { - int errno = 0; + struct new_utsname tmp; down_read(&uts_sem); - if (copy_to_user(name, utsname(), sizeof *name)) - errno = -EFAULT; + memcpy(&tmp, utsname(), sizeof(tmp)); up_read(&uts_sem); + if (copy_to_user(name, &tmp, sizeof(tmp))) + return -EFAULT; - if (!errno && override_release(name->release, sizeof(name->release))) - errno = -EFAULT; - if (!errno && override_architecture(name)) - errno = -EFAULT; - return errno; + if (override_release(name->release, sizeof(name->release))) + return -EFAULT; + if (override_architecture(name)) + return -EFAULT; + return 0; } #ifdef __ARCH_WANT_SYS_OLD_UNAME @@ -1162,55 +1163,46 @@ SYSCALL_DEFINE1(newuname, struct new_utsname __user *, name) */ SYSCALL_DEFINE1(uname, struct old_utsname __user *, name) { - int error = 0; + struct old_utsname tmp; if (!name) return -EFAULT; down_read(&uts_sem); - if (copy_to_user(name, utsname(), sizeof(*name))) - error = -EFAULT; + memcpy(&tmp, utsname(), sizeof(tmp)); up_read(&uts_sem); + if (copy_to_user(name, &tmp, sizeof(tmp))) + return -EFAULT; - if (!error && override_release(name->release, sizeof(name->release))) - error = -EFAULT; - if (!error && override_architecture(name)) - error = -EFAULT; - return error; + if (override_release(name->release, sizeof(name->release))) + return -EFAULT; + if (override_architecture(name)) + return -EFAULT; + return 0; } SYSCALL_DEFINE1(olduname, struct oldold_utsname __user *, name) { - int error; + struct oldold_utsname tmp = {}; if (!name) return -EFAULT; - if (!access_ok(VERIFY_WRITE, name, sizeof(struct oldold_utsname))) - return -EFAULT; down_read(&uts_sem); - error = __copy_to_user(&name->sysname, &utsname()->sysname, - __OLD_UTS_LEN); - error |= __put_user(0, name->sysname + __OLD_UTS_LEN); - error |= __copy_to_user(&name->nodename, &utsname()->nodename, - __OLD_UTS_LEN); - error |= __put_user(0, name->nodename + __OLD_UTS_LEN); - error |= __copy_to_user(&name->release, &utsname()->release, - __OLD_UTS_LEN); - error |= __put_user(0, name->release + __OLD_UTS_LEN); - error |= __copy_to_user(&name->version, &utsname()->version, - __OLD_UTS_LEN); - error |= __put_user(0, name->version + __OLD_UTS_LEN); - error |= __copy_to_user(&name->machine, &utsname()->machine, - __OLD_UTS_LEN); - error |= __put_user(0, name->machine + __OLD_UTS_LEN); + memcpy(&tmp.sysname, &utsname()->sysname, __OLD_UTS_LEN); + memcpy(&tmp.nodename, &utsname()->nodename, __OLD_UTS_LEN); + memcpy(&tmp.release, &utsname()->release, __OLD_UTS_LEN); + memcpy(&tmp.version, &utsname()->version, __OLD_UTS_LEN); + memcpy(&tmp.machine, &utsname()->machine, __OLD_UTS_LEN); up_read(&uts_sem); + if (copy_to_user(name, &tmp, sizeof(tmp))) + return -EFAULT; - if (!error && override_architecture(name)) - error = -EFAULT; - if (!error && override_release(name->release, sizeof(name->release))) - error = -EFAULT; - return error ? -EFAULT : 0; + if (override_architecture(name)) + return -EFAULT; + if (override_release(name->release, sizeof(name->release))) + return -EFAULT; + return 0; } #endif @@ -1224,17 +1216,18 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len) if (len < 0 || len > __NEW_UTS_LEN) return -EINVAL; - down_write(&uts_sem); errno = -EFAULT; if (!copy_from_user(tmp, name, len)) { - struct new_utsname *u = utsname(); + struct new_utsname *u; + down_write(&uts_sem); + u = utsname(); memcpy(u->nodename, tmp, len); memset(u->nodename + len, 0, sizeof(u->nodename) - len); errno = 0; uts_proc_notify(UTS_PROC_HOSTNAME); + up_write(&uts_sem); } - up_write(&uts_sem); return errno; } @@ -1242,8 +1235,9 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len) SYSCALL_DEFINE2(gethostname, char __user *, name, int, len) { - int i, errno; + int i; struct new_utsname *u; + char tmp[__NEW_UTS_LEN + 1]; if (len < 0) return -EINVAL; @@ -1252,11 +1246,11 @@ SYSCALL_DEFINE2(gethostname, char __user *, name, int, len) i = 1 + strlen(u->nodename); if (i > len) i = len; - errno = 0; - if (copy_to_user(name, u->nodename, i)) - errno = -EFAULT; + memcpy(tmp, u->nodename, i); up_read(&uts_sem); - return errno; + if (copy_to_user(name, tmp, i)) + return -EFAULT; + return 0; } #endif @@ -1275,17 +1269,18 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len) if (len < 0 || len > __NEW_UTS_LEN) return -EINVAL; - down_write(&uts_sem); errno = -EFAULT; if (!copy_from_user(tmp, name, len)) { - struct new_utsname *u = utsname(); + struct new_utsname *u; + down_write(&uts_sem); + u = utsname(); memcpy(u->domainname, tmp, len); memset(u->domainname + len, 0, sizeof(u->domainname) - len); errno = 0; uts_proc_notify(UTS_PROC_DOMAINNAME); + up_write(&uts_sem); } - up_write(&uts_sem); return errno; } diff --git a/kernel/utsname_sysctl.c b/kernel/utsname_sysctl.c index c8eac43267e9..d2b3b2973456 100644 --- a/kernel/utsname_sysctl.c +++ b/kernel/utsname_sysctl.c @@ -17,7 +17,7 @@ #ifdef CONFIG_PROC_SYSCTL -static void *get_uts(struct ctl_table *table, int write) +static void *get_uts(struct ctl_table *table) { char *which = table->data; struct uts_namespace *uts_ns; @@ -25,21 +25,9 @@ static void *get_uts(struct ctl_table *table, int write) uts_ns = current->nsproxy->uts_ns; which = (which - (char *)&init_uts_ns) + (char *)uts_ns; - if (!write) - down_read(&uts_sem); - else - down_write(&uts_sem); return which; } -static void put_uts(struct ctl_table *table, int write, void *which) -{ - if (!write) - up_read(&uts_sem); - else - up_write(&uts_sem); -} - /* * Special case of dostring for the UTS structure. This has locks * to observe. Should this be in kernel/sys.c ???? @@ -49,13 +37,34 @@ static int proc_do_uts_string(struct ctl_table *table, int write, { struct ctl_table uts_table; int r; - memcpy(&uts_table, table, sizeof(uts_table)); - uts_table.data = get_uts(table, write); - r = proc_dostring(&uts_table, write, buffer, lenp, ppos); - put_uts(table, write, uts_table.data); + char tmp_data[__NEW_UTS_LEN + 1]; - if (write) + memcpy(&uts_table, table, sizeof(uts_table)); + uts_table.data = tmp_data; + + /* + * Buffer the value in tmp_data so that proc_dostring() can be called + * without holding any locks. + * We also need to read the original value in the write==1 case to + * support partial writes. + */ + down_read(&uts_sem); + memcpy(tmp_data, get_uts(table), sizeof(tmp_data)); + up_read(&uts_sem); + r = proc_dostring(&uts_table, write, buffer, lenp, ppos); + + if (write) { + /* + * Write back the new value. + * Note that, since we dropped uts_sem, the result can + * theoretically be incorrect if there are two parallel writes + * at non-zero offsets to the same sysctl. + */ + down_write(&uts_sem); + memcpy(get_uts(table), tmp_data, sizeof(tmp_data)); + up_write(&uts_sem); proc_sys_poll_notify(table->poll); + } return r; } From a56a15432ac808d63d7c3b24fa81dfa6afbab067 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Mon, 25 Jun 2018 18:34:19 +0200 Subject: [PATCH 954/970] userns: move user access out of the mutex commit 5820f140edef111a9ea2ef414ab2428b8cb805b1 upstream. The old code would hold the userns_state_mutex indefinitely if memdup_user_nul stalled due to e.g. a userfault region. Prevent that by moving the memdup_user_nul in front of the mutex_lock(). Note: This changes the error precedence of invalid buf/count/*ppos vs map already written / capabilities missing. Fixes: 22d917d80e84 ("userns: Rework the user_namespace adding uid/gid...") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn Acked-by: Christian Brauner Acked-by: Serge Hallyn Signed-off-by: Eric W. Biederman Signed-off-by: Greg Kroah-Hartman --- kernel/user_namespace.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 86b7854fec8e..f789bbba9b8e 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -649,7 +649,16 @@ static ssize_t map_write(struct file *file, const char __user *buf, unsigned idx; struct uid_gid_extent *extent = NULL; char *kbuf = NULL, *pos, *next_line; - ssize_t ret = -EINVAL; + ssize_t ret; + + /* Only allow < page size writes at the beginning of the file */ + if ((*ppos != 0) || (count >= PAGE_SIZE)) + return -EINVAL; + + /* Slurp in the user data */ + kbuf = memdup_user_nul(buf, count); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); /* * The userns_state_mutex serializes all writes to any given map. @@ -683,19 +692,6 @@ static ssize_t map_write(struct file *file, const char __user *buf, if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) goto out; - /* Only allow < page size writes at the beginning of the file */ - ret = -EINVAL; - if ((*ppos != 0) || (count >= PAGE_SIZE)) - goto out; - - /* Slurp in the user data */ - kbuf = memdup_user_nul(buf, count); - if (IS_ERR(kbuf)) { - ret = PTR_ERR(kbuf); - kbuf = NULL; - goto out; - } - /* Parse the user data */ ret = -EINVAL; pos = kbuf; From 0d1694b185ca2e7d4b0491ea2ce135b391878eea Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Tue, 12 Jun 2018 20:49:45 +0200 Subject: [PATCH 955/970] ubifs: Fix memory leak in lprobs self-check commit eef19816ada3abd56d9f20c88794cc2fea83ebb2 upstream. Allocate the buffer after we return early. Otherwise memory is being leaked. Cc: Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/lprops.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/ubifs/lprops.c b/fs/ubifs/lprops.c index 6c3a1abd0e22..780a436d8c45 100644 --- a/fs/ubifs/lprops.c +++ b/fs/ubifs/lprops.c @@ -1091,10 +1091,6 @@ static int scan_check_cb(struct ubifs_info *c, } } - buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL); - if (!buf) - return -ENOMEM; - /* * After an unclean unmount, empty and freeable LEBs * may contain garbage - do not scan them. @@ -1113,6 +1109,10 @@ static int scan_check_cb(struct ubifs_info *c, return LPT_SCAN_CONTINUE; } + buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL); + if (!buf) + return -ENOMEM; + sleb = ubifs_scan(c, lnum, 0, buf, 0); if (IS_ERR(sleb)) { ret = PTR_ERR(sleb); From 48e1148488b6d88e3d0b803867db249bc5568d42 Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 1 Jul 2018 23:20:50 +0200 Subject: [PATCH 956/970] Revert "UBIFS: Fix potential integer overflow in allocation" commit 08acbdd6fd736b90f8d725da5a0de4de2dd6de62 upstream. This reverts commit 353748a359f1821ee934afc579cf04572406b420. It bypassed the linux-mtd review process and fixes the issue not as it should. Cc: Kees Cook Cc: Silvio Cesare Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/journal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c index 504658fd0d08..7d764e3b6c79 100644 --- a/fs/ubifs/journal.c +++ b/fs/ubifs/journal.c @@ -1265,7 +1265,7 @@ static int recomp_data_node(const struct ubifs_info *c, int err, len, compr_type, out_len; out_len = le32_to_cpu(dn->size); - buf = kmalloc_array(out_len, WORST_COMPR_FACTOR, GFP_NOFS); + buf = kmalloc(out_len * WORST_COMPR_FACTOR, GFP_NOFS); if (!buf) return -ENOMEM; From 1bc1f0f72992c97d99bb50b231592750199db02a Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Sun, 1 Jul 2018 23:20:51 +0200 Subject: [PATCH 957/970] ubifs: Check data node size before truncate commit 95a22d2084d72ea067d8323cc85677dba5d97cae upstream. Check whether the size is within bounds before using it. If the size is not correct, abort and dump the bad data node. Cc: Kees Cook Cc: Silvio Cesare Cc: stable@vger.kernel.org Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Reported-by: Silvio Cesare Signed-off-by: Richard Weinberger Reviewed-by: Kees Cook Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/journal.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c index 7d764e3b6c79..5253c5841c7e 100644 --- a/fs/ubifs/journal.c +++ b/fs/ubifs/journal.c @@ -1344,7 +1344,16 @@ int ubifs_jnl_truncate(struct ubifs_info *c, const struct inode *inode, else if (err) goto out_free; else { - if (le32_to_cpu(dn->size) <= dlen) + int dn_len = le32_to_cpu(dn->size); + + if (dn_len <= 0 || dn_len > UBIFS_BLOCK_SIZE) { + ubifs_err(c, "bad data node (block %u, inode %lu)", + blk, inode->i_ino); + ubifs_dump_node(c, dn); + goto out_free; + } + + if (dn_len <= dlen) dlen = 0; /* Nothing to do */ else { int compr_type = le16_to_cpu(dn->compr_type); From 36ac3a0116a5229d2ff4d1b12e7f81408d07ae5e Mon Sep 17 00:00:00 2001 From: Richard Weinberger Date: Tue, 12 Jun 2018 00:52:28 +0200 Subject: [PATCH 958/970] ubifs: Fix synced_i_size calculation for xattr inodes commit 59965593205fa4044850d35ee3557cf0b7edcd14 upstream. In ubifs_jnl_update() we sync parent and child inodes to the flash, in case of xattrs, the parent inode (AKA host inode) has a non-zero data_len. Therefore we need to adjust synced_i_size too. This issue was reported by ubifs self tests unter a xattr related work load. UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: ui_size is 4, synced_i_size is 0, but inode is clean UBIFS error (ubi0:0 pid 1896): dbg_check_synced_i_size: i_ino 65, i_mode 0x81a4, i_size 4 Cc: Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- fs/ubifs/journal.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c index 5253c5841c7e..f8ce849e90d1 100644 --- a/fs/ubifs/journal.c +++ b/fs/ubifs/journal.c @@ -661,6 +661,11 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir, spin_lock(&ui->ui_lock); ui->synced_i_size = ui->ui_size; spin_unlock(&ui->ui_lock); + if (xent) { + spin_lock(&host_ui->ui_lock); + host_ui->synced_i_size = host_ui->ui_size; + spin_unlock(&host_ui->ui_lock); + } mark_inode_clean(c, ui); mark_inode_clean(c, host_ui); return 0; From 8001317c0bdfbdc462033da788c429859f39b1a3 Mon Sep 17 00:00:00 2001 From: Vignesh R Date: Mon, 11 Jun 2018 11:39:56 +0530 Subject: [PATCH 959/970] pwm: tiehrpwm: Fix disabling of output of PWMs commit 38dabd91ff0bde33352ca3cc65ef515599b77a05 upstream. pwm-tiehrpwm driver disables PWM output by putting it in low output state via active AQCSFRC register in ehrpwm_pwm_disable(). But, the AQCSFRC shadow register is not updated. Therefore, when shadow AQCSFRC register is re-enabled in ehrpwm_pwm_enable() (say to enable second PWM output), previous settings are lost as shadow register value is loaded into active register. This results in things like PWMA getting enabled automatically, when PWMB is enabled and vice versa. Fix this by updating AQCSFRC shadow register as well during ehrpwm_pwm_disable(). Fixes: 19891b20e7c2 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM") Cc: stable@vger.kernel.org Signed-off-by: Vignesh R Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman --- drivers/pwm/pwm-tiehrpwm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pwm/pwm-tiehrpwm.c b/drivers/pwm/pwm-tiehrpwm.c index b5c6b0636893..c0e06f0c19d1 100644 --- a/drivers/pwm/pwm-tiehrpwm.c +++ b/drivers/pwm/pwm-tiehrpwm.c @@ -382,6 +382,8 @@ static void ehrpwm_pwm_disable(struct pwm_chip *chip, struct pwm_device *pwm) aqcsfrc_mask = AQCSFRC_CSFA_MASK; } + /* Update shadow register first before modifying active register */ + ehrpwm_modify(pc->mmio_base, AQCSFRC, aqcsfrc_mask, aqcsfrc_val); /* * Changes to immediate action on Action Qualifier. This puts * Action Qualifier control on PWM output from next TBCLK From 3752de782d5b59b8a379d4d51f9fb454beed8f19 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 25 Jul 2018 15:41:54 +0200 Subject: [PATCH 960/970] fb: fix lost console when the user unplugs a USB adapter commit 8c5b044299951acd91e830a688dd920477ea1eda upstream. I have a USB display adapter using the udlfb driver and I use it on an ARM board that doesn't have any graphics card. When I plug the adapter in, the console is properly displayed, however when I unplug and re-plug the adapter, the console is not displayed and I can't access it until I reboot the board. The reason is this: When the adapter is unplugged, dlfb_usb_disconnect calls unlink_framebuffer, then it waits until the reference count drops to zero and then it deallocates the framebuffer. However, the console that is attached to the framebuffer device keeps the reference count non-zero, so the framebuffer device is never destroyed. When the USB adapter is plugged again, it creates a new device /dev/fb1 and the console is not attached to it. This patch fixes the bug by unbinding the console from unlink_framebuffer. The code to unbind the console is moved from do_unregister_framebuffer to a function unbind_console. When the console is unbound, the reference count drops to zero and the udlfb driver frees the framebuffer. When the adapter is plugged back, a new framebuffer is created and the console is attached to it. Signed-off-by: Mikulas Patocka Cc: Dave Airlie Cc: Bernie Thompson Cc: Ladislav Michl Cc: stable@vger.kernel.org [b.zolnierkie: preserve old behavior for do_unregister_framebuffer()] Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/core/fbmem.c | 38 +++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index 76c1ad96fb37..74273bc7ca9a 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1695,12 +1695,12 @@ static int do_register_framebuffer(struct fb_info *fb_info) return 0; } -static int do_unregister_framebuffer(struct fb_info *fb_info) +static int unbind_console(struct fb_info *fb_info) { struct fb_event event; - int i, ret = 0; + int ret; + int i = fb_info->node; - i = fb_info->node; if (i < 0 || i >= FB_MAX || registered_fb[i] != fb_info) return -EINVAL; @@ -1715,17 +1715,29 @@ static int do_unregister_framebuffer(struct fb_info *fb_info) unlock_fb_info(fb_info); console_unlock(); + return ret; +} + +static int __unlink_framebuffer(struct fb_info *fb_info); + +static int do_unregister_framebuffer(struct fb_info *fb_info) +{ + struct fb_event event; + int ret; + + ret = unbind_console(fb_info); + if (ret) return -EINVAL; pm_vt_switch_unregister(fb_info->dev); - unlink_framebuffer(fb_info); + __unlink_framebuffer(fb_info); if (fb_info->pixmap.addr && (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT)) kfree(fb_info->pixmap.addr); fb_destroy_modelist(&fb_info->modelist); - registered_fb[i] = NULL; + registered_fb[fb_info->node] = NULL; num_registered_fb--; fb_cleanup_device(fb_info); event.info = fb_info; @@ -1738,7 +1750,7 @@ static int do_unregister_framebuffer(struct fb_info *fb_info) return 0; } -int unlink_framebuffer(struct fb_info *fb_info) +static int __unlink_framebuffer(struct fb_info *fb_info) { int i; @@ -1750,6 +1762,20 @@ int unlink_framebuffer(struct fb_info *fb_info) device_destroy(fb_class, MKDEV(FB_MAJOR, i)); fb_info->dev = NULL; } + + return 0; +} + +int unlink_framebuffer(struct fb_info *fb_info) +{ + int ret; + + ret = __unlink_framebuffer(fb_info); + if (ret) + return ret; + + unbind_console(fb_info); + return 0; } EXPORT_SYMBOL(unlink_framebuffer); From a328c4cec0346a390819ce3283aca70307db90ee Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Wed, 25 Jul 2018 15:41:55 +0200 Subject: [PATCH 961/970] udlfb: set optimal write delay commit bb24153a3f13dd0dbc1f8055ad97fe346d598f66 upstream. The default delay 5 jiffies is too much when the kernel is compiled with HZ=100 - it results in jumpy cursor in Xwindow. In order to find out the optimal delay, I benchmarked the driver on 1280x720x30fps video. I found out that with HZ=1000, 10ms is acceptable, but with HZ=250 or HZ=300, we need 4ms, so that the video is played without any frame skips. This patch changes the delay to this value. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Greg Kroah-Hartman --- include/video/udlfb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/video/udlfb.h b/include/video/udlfb.h index f9466fa54ba4..2ad9a6d37ff4 100644 --- a/include/video/udlfb.h +++ b/include/video/udlfb.h @@ -87,7 +87,7 @@ struct dlfb_data { #define MIN_RAW_PIX_BYTES 2 #define MIN_RAW_CMD_BYTES (RAW_HEADER_BYTES + MIN_RAW_PIX_BYTES) -#define DL_DEFIO_WRITE_DELAY 5 /* fb_deferred_io.delay in jiffies */ +#define DL_DEFIO_WRITE_DELAY msecs_to_jiffies(HZ <= 300 ? 4 : 10) /* optimal value for 720p video */ #define DL_DEFIO_WRITE_DISABLE (HZ*60) /* "disable" with long delay */ /* remove these once align.h patch is taken into kernel */ From 6fdad64a14f541e6f11f93cec9e6802549dd2821 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 7 Jun 2018 13:43:48 +0200 Subject: [PATCH 962/970] getxattr: use correct xattr length commit 82c9a927bc5df6e06b72d206d24a9d10cced4eb5 upstream. When running in a container with a user namespace, if you call getxattr with name = "system.posix_acl_access" and size % 8 != 4, then getxattr silently skips the user namespace fixup that it normally does resulting in un-fixed-up data being returned. This is caused by posix_acl_fix_xattr_to_user() being passed the total buffer size and not the actual size of the xattr as returned by vfs_getxattr(). This commit passes the actual length of the xattr as returned by vfs_getxattr() down. A reproducer for the issue is: touch acl_posix setfacl -m user:0:rwx acl_posix and the compile: #define _GNU_SOURCE #include #include #include #include #include #include #include /* Run in user namespace with nsuid 0 mapped to uid != 0 on the host. */ int main(int argc, void **argv) { ssize_t ret1, ret2; char buf1[128], buf2[132]; int fret = EXIT_SUCCESS; char *file; if (argc < 2) { fprintf(stderr, "Please specify a file with " "\"system.posix_acl_access\" permissions set\n"); _exit(EXIT_FAILURE); } file = argv[1]; ret1 = getxattr(file, "system.posix_acl_access", buf1, sizeof(buf1)); if (ret1 < 0) { fprintf(stderr, "%s - Failed to retrieve " "\"system.posix_acl_access\" " "from \"%s\"\n", strerror(errno), file); _exit(EXIT_FAILURE); } ret2 = getxattr(file, "system.posix_acl_access", buf2, sizeof(buf2)); if (ret2 < 0) { fprintf(stderr, "%s - Failed to retrieve " "\"system.posix_acl_access\" " "from \"%s\"\n", strerror(errno), file); _exit(EXIT_FAILURE); } if (ret1 != ret2) { fprintf(stderr, "The value of \"system.posix_acl_" "access\" for file \"%s\" changed " "between two successive calls\n", file); _exit(EXIT_FAILURE); } for (ssize_t i = 0; i < ret2; i++) { if (buf1[i] == buf2[i]) continue; fprintf(stderr, "Unexpected different in byte %zd: " "%02x != %02x\n", i, buf1[i], buf2[i]); fret = EXIT_FAILURE; } if (fret == EXIT_SUCCESS) fprintf(stderr, "Test passed\n"); else fprintf(stderr, "Test failed\n"); _exit(fret); } and run: ./tester acl_posix On a non-fixed up kernel this should return something like: root@c1:/# ./t Unexpected different in byte 16: ffffffa0 != 00 Unexpected different in byte 17: ffffff86 != 00 Unexpected different in byte 18: 01 != 00 and on a fixed kernel: root@c1:~# ./t Test passed Cc: stable@vger.kernel.org Fixes: 2f6f0654ab61 ("userns: Convert vfs posix_acl support to use kuids and kgids") Link: https://bugzilla.kernel.org/show_bug.cgi?id=199945 Reported-by: Colin Watson Signed-off-by: Christian Brauner Acked-by: Serge Hallyn Signed-off-by: Eric W. Biederman Signed-off-by: Greg Kroah-Hartman --- fs/xattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xattr.c b/fs/xattr.c index 932b9061a3a2..093998872329 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -540,7 +540,7 @@ getxattr(struct dentry *d, const char __user *name, void __user *value, if (error > 0) { if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) || (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) - posix_acl_fix_xattr_to_user(kvalue, size); + posix_acl_fix_xattr_to_user(kvalue, error); if (size && copy_to_user(value, kvalue, error)) error = -EFAULT; } else if (error == -ERANGE && size >= XATTR_SIZE_MAX) { From 05a085c7c501b7bec772eafb421d19be15b8e7e1 Mon Sep 17 00:00:00 2001 From: Vishal Verma Date: Fri, 10 Aug 2018 13:23:15 -0600 Subject: [PATCH 963/970] libnvdimm: fix ars_status output length calculation commit 286e87718103acdf85f4ed323a37e4839a8a7c05 upstream. Commit efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling") Introduced additional hardening for ambiguity in the ACPI spec for ars_status output sizing. However, it had a couple of cases mixed up. Where it should have been checking for (and returning) "out_field[1] - 4" it was using "out_field[1] - 8" and vice versa. This caused a four byte discrepancy in the buffer size passed on to the command handler, and in some cases, this caused memory corruption like: ./daxdev-errors.sh: line 76: 24104 Aborted (core dumped) ./daxdev-errors $busdev $region malloc(): memory corruption Program received signal SIGABRT, Aborted. [...] #5 0x00007ffff7865a2e in calloc () from /lib64/libc.so.6 #6 0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136 #7 0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144 #8 test_daxdev_clear_error (region_name=, bus_name=) at daxdev-errors.c:332 Cc: Cc: Dave Jiang Cc: Keith Busch Cc: Lukasz Dorau Cc: Dan Williams Fixes: efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling") Signed-off-by: Vishal Verma Reviewed-by: Keith Busch Signed-of-by: Dave Jiang Signed-off-by: Greg Kroah-Hartman --- drivers/nvdimm/bus.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index c1a65ce31243..de6d3b749c60 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -748,9 +748,9 @@ u32 nd_cmd_out_size(struct nvdimm *nvdimm, int cmd, * overshoots the remainder by 4 bytes, assume it was * including 'status'. */ - if (out_field[1] - 8 == remainder) + if (out_field[1] - 4 == remainder) return remainder; - return out_field[1] - 4; + return out_field[1] - 8; } else if (cmd == ND_CMD_CALL) { struct nd_cmd_pkg *pkg = (struct nd_cmd_pkg *) in_field; From 6c6d17485d2daeb6c455a65d05d21cb285ec6699 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Wed, 5 Sep 2018 16:29:49 -0400 Subject: [PATCH 964/970] printk/tracing: Do not trace printk_nmi_enter() commit d1c392c9e2a301f38998a353f467f76414e38725 upstream. I hit the following splat in my tests: ------------[ cut here ]------------ IRQs not enabled as expected WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 EIP: tick_nohz_idle_enter+0x44/0x8c Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00 75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0 e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007 ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296 CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0 Call Trace: do_idle+0x33/0x202 cpu_startup_entry+0x61/0x63 start_secondary+0x18e/0x1ed startup_32_smp+0x164/0x168 irq event stamp: 18773830 hardirqs last enabled at (18773829): [] trace_hardirqs_on_thunk+0xc/0x10 hardirqs last disabled at (18773830): [] trace_hardirqs_off_thunk+0xc/0x10 softirqs last enabled at (18773824): [] __do_softirq+0x25f/0x2bf softirqs last disabled at (18773767): [] call_on_stack+0x45/0x4b ---[ end trace b7c64aa79e17954a ]--- After a bit of debugging, I found what was happening. This would trigger when performing "perf" with a high NMI interrupt rate, while enabling and disabling function tracer. Ftrace uses breakpoints to convert the nops at the start of functions to calls to the function trampolines. The breakpoint traps disable interrupts and this makes calls into lockdep via the trace_hardirqs_off_thunk in the entry.S code. What happens is the following: do_idle { [interrupts enabled] [interrupts disabled] TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET test if pt_regs say return to interrupts enabled [yes] TRACE_IRQS_ON [lockdep says irqs are on] nmi_enter() { printk_nmi_enter() [traced by ftrace] [ hit ftrace breakpoint ] TRACE_IRQS_OFF [lockdep says irqs off] [...] TRACE_IRQS_IRET [return from breakpoint] test if pt_regs say interrupts enabled [no] [iret back to interrupt] [iret back to code] tick_nohz_idle_enter() { lockdep_assert_irqs_enabled() [lockdep say no!] Although interrupts are indeed enabled, lockdep thinks it is not, and since we now do asserts via lockdep, it gives a false warning. The issue here is that printk_nmi_enter() is called before lockdep_off(), which disables lockdep (for this reason) in NMIs. By simply not allowing ftrace to see printk_nmi_enter() (via notrace annotation) we keep lockdep from getting confused. Cc: stable@vger.kernel.org Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI") Acked-by: Sergey Senozhatsky Acked-by: Petr Mladek Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/printk/nmi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/printk/nmi.c b/kernel/printk/nmi.c index 5fa65aa904d3..2c3e7f024c15 100644 --- a/kernel/printk/nmi.c +++ b/kernel/printk/nmi.c @@ -260,12 +260,12 @@ void __init printk_nmi_init(void) printk_nmi_flush(); } -void printk_nmi_enter(void) +void notrace printk_nmi_enter(void) { this_cpu_write(printk_func, vprintk_nmi); } -void printk_nmi_exit(void) +void notrace printk_nmi_exit(void) { this_cpu_write(printk_func, vprintk_default); } From 3ddf06cdf05aab898a10cc1c1d28f2ecc43a6f34 Mon Sep 17 00:00:00 2001 From: Shan Hai Date: Thu, 23 Aug 2018 02:02:56 +0800 Subject: [PATCH 965/970] bcache: release dc->writeback_lock properly in bch_writeback_thread() commit 3943b040f11ed0cc6d4585fd286a623ca8634547 upstream. The writeback thread would exit with a lock held when the cache device is detached via sysfs interface, fix it by releasing the held lock before exiting the while-loop. Fixes: fadd94e05c02 (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set) Signed-off-by: Shan Hai Signed-off-by: Coly Li Tested-by: Shenghui Wang Cc: stable@vger.kernel.org #4.17+ Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/md/bcache/writeback.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index bb7aa31c2a08..cdf388d40274 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -456,8 +456,10 @@ static int bch_writeback_thread(void *arg) * data on cache. BCACHE_DEV_DETACHING flag is set in * bch_cached_dev_detach(). */ - if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)) + if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)) { + up_write(&dc->writeback_lock); break; + } } up_write(&dc->writeback_lock); From e0ec112e7eb158de714f37a444a4eb861f839c10 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 14 Aug 2018 11:46:08 +0300 Subject: [PATCH 966/970] perf auxtrace: Fix queue resize commit 99cbbe56eb8bede625f410ab62ba34673ffa7d21 upstream. When the number of queues grows beyond 32, the array of queues is resized but not all members were being copied. Fix by also copying 'tid', 'cpu' and 'set'. Signed-off-by: Adrian Hunter Cc: Jiri Olsa Cc: stable@vger.kernel.org Fixes: e502789302a6e ("perf auxtrace: Add helpers for queuing AUX area tracing data") Link: http://lkml.kernel.org/r/20180814084608.6563-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman --- tools/perf/util/auxtrace.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index 78bd632f144d..29d015e2d900 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -195,6 +195,9 @@ static int auxtrace_queues__grow(struct auxtrace_queues *queues, for (i = 0; i < queues->nr_queues; i++) { list_splice_tail(&queues->queue_array[i].head, &queue_array[i].head); + queue_array[i].tid = queues->queue_array[i].tid; + queue_array[i].cpu = queues->queue_array[i].cpu; + queue_array[i].set = queues->queue_array[i].set; queue_array[i].priv = queues->queue_array[i].priv; } From 4a219e41a9dbf14a17d0056e7a1a148196117ba5 Mon Sep 17 00:00:00 2001 From: Ondrej Mosnacek Date: Wed, 22 Aug 2018 08:26:31 +0200 Subject: [PATCH 967/970] crypto: vmx - Fix sleep-in-atomic bugs commit 0522236d4f9c5ab2e79889cb020d1acbe5da416e upstream. This patch fixes sleep-in-atomic bugs in AES-CBC and AES-XTS VMX implementations. The problem is that the blkcipher_* functions should not be called in atomic context. The bugs can be reproduced via the AF_ALG interface by trying to encrypt/decrypt sufficiently large buffers (at least 64 KiB) using the VMX implementations of 'cbc(aes)' or 'xts(aes)'. Such operations then trigger BUG in crypto_yield(): [ 891.863680] BUG: sleeping function called from invalid context at include/crypto/algapi.h:424 [ 891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc [ 891.864739] 1 lock held by kcapi-enc/12347: [ 891.864811] #0: 00000000f5d42c46 (sk_lock-AF_ALG){+.+.}, at: skcipher_recvmsg+0x50/0x530 [ 891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted 4.19.0-0.rc0.git3.1.fc30.ppc64le #1 [ 891.865251] Call Trace: [ 891.865340] [c0000003387578c0] [c000000000d67ea4] dump_stack+0xe8/0x164 (unreliable) [ 891.865511] [c000000338757910] [c000000000172a58] ___might_sleep+0x2f8/0x310 [ 891.865679] [c000000338757990] [c0000000006bff74] blkcipher_walk_done+0x374/0x4a0 [ 891.865825] [c0000003387579e0] [d000000007e73e70] p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto] [ 891.865993] [c000000338757ad0] [c0000000006c0ee0] skcipher_encrypt_blkcipher+0x60/0x80 [ 891.866128] [c000000338757b10] [c0000000006ec504] skcipher_recvmsg+0x424/0x530 [ 891.866283] [c000000338757bd0] [c000000000b00654] sock_recvmsg+0x74/0xa0 [ 891.866403] [c000000338757c10] [c000000000b00f64] ___sys_recvmsg+0xf4/0x2f0 [ 891.866515] [c000000338757d90] [c000000000b02bb8] __sys_recvmsg+0x68/0xe0 [ 891.866631] [c000000338757e30] [c00000000000bbe4] system_call+0x5c/0x70 Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module") Fixes: c07f5d3da643 ("crypto: vmx - Adding support for XTS") Cc: stable@vger.kernel.org Signed-off-by: Ondrej Mosnacek Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/vmx/aes_cbc.c | 30 ++++++++++++++---------------- drivers/crypto/vmx/aes_xts.c | 21 ++++++++++++++------- 2 files changed, 28 insertions(+), 23 deletions(-) diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c index 46131701c378..92e116365272 100644 --- a/drivers/crypto/vmx/aes_cbc.c +++ b/drivers/crypto/vmx/aes_cbc.c @@ -111,24 +111,23 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc, ret = crypto_blkcipher_encrypt(&fallback_desc, dst, src, nbytes); } else { - preempt_disable(); - pagefault_disable(); - enable_kernel_vsx(); - blkcipher_walk_init(&walk, dst, src, nbytes); ret = blkcipher_walk_virt(desc, &walk); while ((nbytes = walk.nbytes)) { + preempt_disable(); + pagefault_disable(); + enable_kernel_vsx(); aes_p8_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr, nbytes & AES_BLOCK_MASK, &ctx->enc_key, walk.iv, 1); + disable_kernel_vsx(); + pagefault_enable(); + preempt_enable(); + nbytes &= AES_BLOCK_SIZE - 1; ret = blkcipher_walk_done(desc, &walk, nbytes); } - - disable_kernel_vsx(); - pagefault_enable(); - preempt_enable(); } return ret; @@ -152,24 +151,23 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc, ret = crypto_blkcipher_decrypt(&fallback_desc, dst, src, nbytes); } else { - preempt_disable(); - pagefault_disable(); - enable_kernel_vsx(); - blkcipher_walk_init(&walk, dst, src, nbytes); ret = blkcipher_walk_virt(desc, &walk); while ((nbytes = walk.nbytes)) { + preempt_disable(); + pagefault_disable(); + enable_kernel_vsx(); aes_p8_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr, nbytes & AES_BLOCK_MASK, &ctx->dec_key, walk.iv, 0); + disable_kernel_vsx(); + pagefault_enable(); + preempt_enable(); + nbytes &= AES_BLOCK_SIZE - 1; ret = blkcipher_walk_done(desc, &walk, nbytes); } - - disable_kernel_vsx(); - pagefault_enable(); - preempt_enable(); } return ret; diff --git a/drivers/crypto/vmx/aes_xts.c b/drivers/crypto/vmx/aes_xts.c index 24353ec336c5..52e7ae05ae1f 100644 --- a/drivers/crypto/vmx/aes_xts.c +++ b/drivers/crypto/vmx/aes_xts.c @@ -123,32 +123,39 @@ static int p8_aes_xts_crypt(struct blkcipher_desc *desc, ret = enc ? crypto_blkcipher_encrypt(&fallback_desc, dst, src, nbytes) : crypto_blkcipher_decrypt(&fallback_desc, dst, src, nbytes); } else { + blkcipher_walk_init(&walk, dst, src, nbytes); + + ret = blkcipher_walk_virt(desc, &walk); + preempt_disable(); pagefault_disable(); enable_kernel_vsx(); - blkcipher_walk_init(&walk, dst, src, nbytes); - - ret = blkcipher_walk_virt(desc, &walk); iv = walk.iv; memset(tweak, 0, AES_BLOCK_SIZE); aes_p8_encrypt(iv, tweak, &ctx->tweak_key); + disable_kernel_vsx(); + pagefault_enable(); + preempt_enable(); + while ((nbytes = walk.nbytes)) { + preempt_disable(); + pagefault_disable(); + enable_kernel_vsx(); if (enc) aes_p8_xts_encrypt(walk.src.virt.addr, walk.dst.virt.addr, nbytes & AES_BLOCK_MASK, &ctx->enc_key, NULL, tweak); else aes_p8_xts_decrypt(walk.src.virt.addr, walk.dst.virt.addr, nbytes & AES_BLOCK_MASK, &ctx->dec_key, NULL, tweak); + disable_kernel_vsx(); + pagefault_enable(); + preempt_enable(); nbytes &= AES_BLOCK_SIZE - 1; ret = blkcipher_walk_done(desc, &walk, nbytes); } - - disable_kernel_vsx(); - pagefault_enable(); - preempt_enable(); } return ret; } From ac617410962fcdd6b4675865a4469a8a54abb584 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Horia=20Geant=C4=83?= Date: Mon, 6 Aug 2018 15:29:09 +0300 Subject: [PATCH 968/970] crypto: caam/jr - fix descriptor DMA unmapping MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit cc98963dbaaea93d17608641b8d6942a5327fc31 upstream. Descriptor address needs to be swapped to CPU endianness before being DMA unmapped. Cc: # 4.8+ Fixes: 261ea058f016 ("crypto: caam - handle core endianness != caam endianness") Reported-by: Laurentiu Tudor Signed-off-by: Horia Geantă Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/caam/jr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c index 9e7f28122bb7..6d475a2e298c 100644 --- a/drivers/crypto/caam/jr.c +++ b/drivers/crypto/caam/jr.c @@ -189,7 +189,8 @@ static void caam_jr_dequeue(unsigned long devarg) BUG_ON(CIRC_CNT(head, tail + i, JOBR_DEPTH) <= 0); /* Unmap just-run descriptor so we can post-process */ - dma_unmap_single(dev, jrp->outring[hw_idx].desc, + dma_unmap_single(dev, + caam_dma_to_cpu(jrp->outring[hw_idx].desc), jrp->entinfo[sw_idx].desc_size, DMA_TO_DEVICE); From 0515258e4776912b53a7772b208fc130837380fb Mon Sep 17 00:00:00 2001 From: Jeremy Cline Date: Tue, 31 Jul 2018 01:37:31 +0000 Subject: [PATCH 969/970] fs/quota: Fix spectre gadget in do_quotactl commit 7b6924d94a60c6b8c1279ca003e8744e6cd9e8b1 upstream. 'type' is user-controlled, so sanitize it after the bounds check to avoid using it in speculative execution. This covers the following potential gadgets detected with the help of smatch: * fs/ext4/super.c:5741 ext4_quota_read() warn: potential spectre issue 'sb_dqopt(sb)->files' [r] * fs/ext4/super.c:5778 ext4_quota_write() warn: potential spectre issue 'sb_dqopt(sb)->files' [r] * fs/f2fs/super.c:1552 f2fs_quota_read() warn: potential spectre issue 'sb_dqopt(sb)->files' [r] * fs/f2fs/super.c:1608 f2fs_quota_write() warn: potential spectre issue 'sb_dqopt(sb)->files' [r] * fs/quota/dquot.c:412 mark_info_dirty() warn: potential spectre issue 'sb_dqopt(sb)->info' [w] * fs/quota/dquot.c:933 dqinit_needed() warn: potential spectre issue 'dquots' [r] * fs/quota/dquot.c:2112 dquot_commit_info() warn: potential spectre issue 'dqopt->ops' [r] * fs/quota/dquot.c:2362 vfs_load_quota_inode() warn: potential spectre issue 'dqopt->files' [w] (local cap) * fs/quota/dquot.c:2369 vfs_load_quota_inode() warn: potential spectre issue 'dqopt->ops' [w] (local cap) * fs/quota/dquot.c:2370 vfs_load_quota_inode() warn: potential spectre issue 'dqopt->info' [w] (local cap) * fs/quota/quota.c:110 quota_getfmt() warn: potential spectre issue 'sb_dqopt(sb)->info' [r] * fs/quota/quota_v2.c:84 v2_check_quota_file() warn: potential spectre issue 'quota_magics' [w] * fs/quota/quota_v2.c:85 v2_check_quota_file() warn: potential spectre issue 'quota_versions' [w] * fs/quota/quota_v2.c:96 v2_read_file_info() warn: potential spectre issue 'dqopt->info' [r] * fs/quota/quota_v2.c:172 v2_write_file_info() warn: potential spectre issue 'dqopt->info' [r] Additionally, a quick inspection indicates there are array accesses with 'type' in quota_on() and quota_off() functions which are also addressed by this. Cc: Josh Poimboeuf Cc: stable@vger.kernel.org Signed-off-by: Jeremy Cline Signed-off-by: Jan Kara Signed-off-by: Greg Kroah-Hartman --- fs/quota/quota.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 2d445425aad7..a2329f7ec638 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -17,6 +17,7 @@ #include #include #include +#include static int check_quotactl_permission(struct super_block *sb, int type, int cmd, qid_t id) @@ -706,6 +707,7 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id, if (type >= (XQM_COMMAND(cmd) ? XQM_MAXQUOTAS : MAXQUOTAS)) return -EINVAL; + type = array_index_nospec(type, MAXQUOTAS); /* * Quota not supported on this fs? Check this before s_quota_types * since they needn't be set if quota is not supported at all. From 66f5a871e5987c8f4bff333b66c361a53cdcd350 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 9 Sep 2018 20:01:26 +0200 Subject: [PATCH 970/970] Linux 4.9.126 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index aef09ca7a924..b26481fef3f0 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 9 -SUBLEVEL = 125 +SUBLEVEL = 126 EXTRAVERSION = NAME = Roaring Lionus