pcieport "error corrected"

You have a problem with Salix? Post here and we'll do what we can to help.
Post Reply
User avatar
mimosa
Salix Warrior
Posts: 3215
Joined: 25. May 2010, 17:02
Contact:

pcieport "error corrected"

Post by mimosa » 18. Jan 2022, 10:19

Sometimes during boot I see a lot of messages like this:

Code: Select all

Nov 15 15:49:52 x99 kernel: pcieport 0000:00:03.0: AER: Multiple 
Corrected error received: id=0018 Nov 15 15:49:52 x99 kernel: pcieport
0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, 
id=0018(Receiver ID) Nov 15 15:49:52 x99 kernel: pcieport 0000:00:03.0: 
device [8086:6f08] error status/mask=00000040/00002000 Nov 15 15:49:52 
x99 kernel: pcieport 0000:00:03.0: [ 6] Bad TLP
That's copied and pasted from the internet (it didn't happen this boot). In my case the relevant device is this:
https://pci-ids.ucw.cz/read/PC/8086/a338
When it does happen, I can see from dmesg that the messages keep coming, many each second.

The two suggestions I've seen for this are to pass 'pci=noaer' as a kernel boot option, just to switch the messages off (the reasoning being there is no actual fault); or pci=nommconf, see the first answer here:
https://unix.stackexchange.com/question ... er-bad-tlp

But I've no idea whether there might be a hardware fault, something wrong with firmware, or something else.

This started within the past few days, so could be related to the latest kernel update.

Code: Select all

root[mimosa]# lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:16.3 Serial controller: Intel Corporation Cannon Lake PCH Active Management Technology - SOL (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1e.1 Communication controller: Intel Corporation Device a329 (rev 10)
00:1f.0 ISA bridge: Intel Corporation Cannon Point-LP LPC Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller
02:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
03:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a)
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
root[mimosa]# inxi
CPU: 6-core Intel Core i5-9500E (-MCP-) speed/min/max: 4114/800/4200 MHz
Kernel: 5.15.14 x86_64 Up: 46m Mem: 1687.1/15838.8 MiB (10.7%)
Storage: 2.05 TiB (13.8% used) Procs: 208 Shell: Bash inxi: 3.3.11
root[mimosa]# uname -a
Linux darkstar.slack.org 5.15.14 #1 SMP PREEMPT Wed Jan 12 14:51:47 CST 2022 x86_64 Intel(R) Core(TM) i5-9500E CPU @ 3.00GHz GenuineIntel GNU/Linux

User avatar
gapan
Salix Wizard
Posts: 5851
Joined: 6. Jun 2009, 17:40

Re: pcieport "error corrected"

Post by gapan » 18. Jan 2022, 18:51

That's probably because the chipset is not 100% supported by Linux yet.

My sister bought a laptop with the same chipset (or maybe similar) lately and I haven't been able to make the internal sound work yet. Good thing she only uses bluetooth audio anyway, that works. :roll:
Image
Image

User avatar
mimosa
Salix Warrior
Posts: 3215
Joined: 25. May 2010, 17:02
Contact:

Re: pcieport "error corrected"

Post by mimosa » 18. Jan 2022, 19:03

I also had trouble with the sound, but found a workaround, which may or may not be applicable ... one part of it was to set sound in BIOS to work only from either the front or the back audio socket. The rest is Linux-specific, I can give more info if it would be useful.

Sounds as though I can just ignore this, then. However, it doesn't occur every time, and may only have started recently - so a kernel regression.

galmei
Posts: 132
Joined: 1. Jun 2018, 21:54

Re: pcieport "error corrected"

Post by galmei » 19. Jan 2022, 20:45

mimosa wrote:
18. Jan 2022, 10:19
Sometimes during boot I see a lot of messages like this:

Code: Select all

Nov 15 15:49:52 x99 kernel: pcieport 0000:00:03.0: AER: Multiple 
Corrected error received: id=0018 Nov 15 15:49:52 x99 kernel: pcieport
0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, 
id=0018(Receiver ID) Nov 15 15:49:52 x99 kernel: pcieport 0000:00:03.0: 
device [8086:6f08] error status/mask=00000040/00002000 Nov 15 15:49:52 
x99 kernel: pcieport 0000:00:03.0: [ 6] Bad TLP
But I've no idea whether there might be a hardware fault, something wrong with firmware, or something else.
You can try to turn off the cause of this misbehavior. To do this, you can configure the PCIE16_3 slot in the BIOS to statically use the x8 mode instead of the automatic mode. The PCIE16_3 slot shares bandwidth with M.2/U.2 devices on x99 boards.

If this can be done, it can avoid the performance loss that this misbehavior brings.

The bug has been known for 4 years. Since you are already using a new kernel, I think this problem will not change in the future (if at all possible).
mimosa wrote:
18. Jan 2022, 10:19
This started within the past few days, so could be related to the latest kernel update.
You can check whether the previous kernel could also report the problem by comparing the configuration entry 'CONFIG_PCIEAER=' in boot/config of the previous kernel with that of the kernel used now. In the current one you should find 'CONFIG_PCIEAER=y'.

User avatar
mimosa
Salix Warrior
Posts: 3215
Joined: 25. May 2010, 17:02
Contact:

Re: pcieport "error corrected"

Post by mimosa » 20. Jan 2022, 08:33

Thanks galmei, I'll look into it.

One thing that puzzles me though about this is it seems to be intermittent. Most boots don't produce the messages.

galmei
Posts: 132
Joined: 1. Jun 2018, 21:54

Re: pcieport "error corrected"

Post by galmei » 20. Jan 2022, 17:06

mimosa wrote:
20. Jan 2022, 08:33
Thanks galmei, I'll look into it.

One thing that puzzles me though about this is it seems to be intermittent. Most boots don't produce the messages.
I am no longer sure that I have hit the problem correctly. I understood that you copied the messages from the web and not from your system, but I was probably influenced by the information in the quote. The link https://pci-ids.ucw.cz/read/PC/8086/a338 refers to the description of another PCI Express port (#1), which I had not taken into account. How do you know that this is the correct PCI Express port?
mimosa wrote:
20. Jan 2022, 08:33
One thing that puzzles me though about this is it seems to be intermittent. Most boots don't produce the messages.
From the error messages on the web, one can see that the behaviour occurs after a few minutes to after hours after start-up. Are you sure that the event only occurs after start-up? If it occurs after start-up, it could recur later. If the error messages could have occurred with the previous kernel version because it was not turned off, then the current occurrence, could also indicate a hardware or environmental change. Sporadically, because the environmental conditions change all the time (e.g. temperature of the board, air temperature, humidity or your personal electrostatic charge. How long are the sparks you draw when you discharge at the keyboard :) ). I don't believe that myself, but if something suddenly changes, then something has changed or was changed before.

Would you please still show the actual messages from your system? On the basis of these, one could reconsider.

By the way, this is not a Linux problem. In my search on the web, I was able to find that the operating system from Redmond is also affected. This increases the likelihood of a hardware problem.

User avatar
mimosa
Salix Warrior
Posts: 3215
Joined: 25. May 2010, 17:02
Contact:

Re: pcieport "error corrected"

Post by mimosa » 20. Jan 2022, 17:28

8086/a338 is the correct info, which I jotted down on a piece of paper. I haven't seen the actual message lately, to be able to copy and paste here.

In my case, it always occurs during startup, and then continues till I reboot. I can see this with dmesg.

On most boots, it doesn't occur.

I only noticed it about a week ago, around the time there had been a kernel update. But this could be coincidence.

User avatar
mimosa
Salix Warrior
Posts: 3215
Joined: 25. May 2010, 17:02
Contact:

Re: pcieport "error corrected"

Post by mimosa » 23. Jan 2022, 15:54

Here is the actual message - first time it's done it in a fair few days. For that reason among others, I'm not that worried about this, but for completeness' sake:

Code: Select all

root[mimosa]# dmesg | tail
[  140.883787] pcieport 0000:00:1c.0:   device [8086:a338] error status/mask=00000001/00002000
[  140.883788] pcieport 0000:00:1c.0:    [ 0] RxErr                 
[  140.906665] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[  140.906672] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  140.906673] pcieport 0000:00:1c.0:   device [8086:a338] error status/mask=00000001/00002000
[  140.906675] pcieport 0000:00:1c.0:    [ 0] RxErr                 
[  140.977960] pcieport 0000:00:1c.0: AER: Corrected error received: 0000:00:1c.0
[  140.977968] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[  140.977969] pcieport 0000:00:1c.0:   device [8086:a338] error status/mask=00000001/00002000
[  140.977970] pcieport 0000:00:1c.0:    [ 0] RxErr  

Post Reply