0
\$\begingroup\$

I found this question in Stackoverflow and the answers say:

I don't believe so -- from a software viewpoint, PCI-E is quite well disguised to look like (fast) PCI.

As far as I know, nearly the only reasonable way to do this is with specialized hardware -- specifically a logic analyzer with a PCI-E bus probe. I've used an Agilent analyzer with a FuturePlus probe, and can recommend the combination with only a couple reservations: first, it's not cheap. Second, it can be a bit of a jump for somebody accustomed purely to software.

The only way to debug the actual protocol items, which are called Transaction Layer Packets (TLPs) and Data Link Layer Packets (DLLPs) is to use a hardware PCI Express Protocol Analyzers. Very few are sold so the prices are high. Lots of engineering goes into capturing data at gigabit speeds and presenting it in an easy to decipher form. LeCroy's cheapest unit starts from $16,000. The lowest priced PCI Express Protocol Analyzer on the market is from ITIC ($7,995). This includes the protocol analyzer, a x4 lane slot probe, cables and software.

But why? We can capture ethernet packets, wifi packets, USB packets using tools like Wireshark. What makes it difficult or impossible to capture PCIE packets using software? Why do we need to use a hardware equipment?

\$\endgroup\$
2
  • \$\begingroup\$ In addition to the below answers, PCI and PCIE use capacitive inputs and outputs of 1 nf, plus 22 ohm terminations. This stuff is at GHZ speed and isosynchronous, not expecting to be captured. \$\endgroup\$
    – Sparky256
    Commented Mar 9 at 19:22
  • \$\begingroup\$ I’m voting to close this question because it belongs to a software stack \$\endgroup\$ Commented Mar 10 at 13:41

3 Answers 3

6
\$\begingroup\$

We can capture ethernet packets, wifi packets, USB packets

With ethernet/wifi, the hardware handles low level tasks but the CPU processes each packet directly. The CPU builds packets then tells the network hardware to send them. When packets are received, the CPU gets them explicitly as packets. So obviously, software can access and capture raw packets.

With USB it's a bit more complicated. The CPU does not handle individual packets, instead it handles transfers. It may tell the hardware: send this chunk of data in bulk mode to an endpoint in the device. Then the hardware is in charge of slicing it into packets and handling the low-level details. The CPU never sees individual packets, but it manages transfers, which is a lot less work. So in the USB case, software cannot capture raw packets, but it can capture transfers between host and device. Functionally, it's not much different.

With PCIExpress, the CPU does not handle transfers at all. It's memory mapped, so when a participant on PCIe wants to transfer data, at the logical level it will do memory accesses. At the physical level, the hardware translates these memory accesses into PCIe packets and transactions. When the GPU wants to read data from main memory, there is no intervention from the CPU at all. The CPU has no idea it's happening, which is the whole point: CPU cycles are not wasted on low level processes. Even when the CPU uses PCIe to give commands to a peripheral, for example a SSD, it will do memory mapped accesses. To the CPU, it looks like it's writing to a RAM address. The hardware handles everything else, the PCIe protocol, paketization etc. So in the PCIe case, by default, software can't capture anything at all. In order to do it, you'd need packet capture to be implemented by the PCIe hardware itself and presented to the CPU with an appropriate interface.

\$\endgroup\$
4
\$\begingroup\$

Why do we need to use a hardware equipment?

It depends upon if the PCIe Root Complex provides a mechanism to capture the PCIe packets and store them in memory for analysis by software.

HiSilicon PCIe Tune and Trace device is one example of a PCIe Root Complex device which contains Linux Kernel support for tracing. E.g. from the Trace section:

PTT trace is designed for dumping the TLP headers to the memory, which can be used to analyze the transactions and usage condition of the PCIe Link. You can choose to filter the traced headers by either Requester ID, or those downstream of a set of Root Ports on the same core of the PTT device. It's also supported to trace the headers of certain type and of certain direction.

On a search, haven't yet found any Intel or AMD x86-64 processors which are documented as providing a built-it trace mechanism to capture PCIe packets.

\$\endgroup\$
2
\$\begingroup\$

The short answer is that effectively the entire PCI/PCIe protocol stack is implemented in hardware (aside from high-level enumeration and management), while large portions of the protocol stack for Ethernet and USB are implemented in software.

For Ethernet and USB, the hardware only implements a few layers of the stack. These protocols are designed to do as much in software as possible, trading performance and cost for flexibility. For both Ethernet and USB, the hardware basically gives the software the raw packet data, with the hardware only handling serialization and a few other low-level functions like checksums and segmentation. In this case software MUST see everything, so capturing all of the traffic is a relatively straightforward thing to do.

For PCIe, the whole point of the protocol is to provide a high performance and low latency method for connecting the CPU and system memory to peripherals. One of the key features of PCIe is direct memory access, where peripheral components can directly read or write system memory without any involvement from the CPU. To make this work, the entire protocol has to be implemented in hardware such that read/write requests from peripheral components can be directly sent to the memory controller. Because of this, the CPU has zero visibility into what is going on over the bus. Similarly, when the CPU accesses a peripheral device, it does so by performing normal load and store operations against specific regions of address space, and these load and store operations get translated into PCIe TLPs entirely in hardware. At no point is software even aware of the fact that TLPs exist, so there is no way by default for software to capture anything.

So, to capture PCIe traffic, you need some sort of explicit capture capability in hardware that's separate from the actual implementation of the protocol. This could be a separate piece of hardware like a PCIe protocol analyzer. But it is also possible that certain PCIe components could have debug features that can capture traffic, but there is no standard mechanism for doing this since it's not a required part of the protocol nor is it even an optional feature covered in the specification.

Other protocols can also have similar characteristics. For example, RoCE (RDMA over converged Ethernet) operates entirely in hardware. Software can set up RDMA target buffers and other hosts can access buffers over the network without any involvement from the local CPU, with the RDMA implementation on the NIC translating the incoming RDMA read/write requests into PCIe read/write operations. But, it's relatively common for an RDMA capable NIC to be able to capture the on-the-wire traffic for diagnostic purposes, although this generally results in reduced performance. And since it's Ethernet, it would also be possible to capture the traffic in the network via port mirroring and such.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.