0
\$\begingroup\$

If you connect a NVME SSD to a PCIe x4 slot it will have a higher maximum bandwith than if you connect it to a 1x or 2x slot. So logically it will transfer large files faster.

But if you never need any high throughput, as you only work with small files for example, will there be any difference in performance between a x4 slot vs a 1x slot?

Foremost I wonder if the latency will be exactly the same? When the host sends any commands to the drive they tend to be small in size, so they should not be affected by the max bandwith. But maybe when you have 4 lanes those communications will be send in parallel instead of queued on one lane?

Or doesn't it matter at all because the SSD controller will only process them sequentually?

\$\endgroup\$

2 Answers 2

0
\$\begingroup\$

requesting a small file will still initiate a transfer of a full block from your NVMe. So, that's significant data (in terms of PCIe packet sizes), and yes, the duration until that's finished actually drops with the number of lanes.

Where it doesn't matter is the latency which a device interrupt takes to reach your CPU, because that's not payload data distributed across lanes.

When the host sends any commands to the drive they tend to be small in size,

[[citation needed]], first of all. NVMe commands can, and will in a many-small-storage-requests scenario, be quite complex, as it makes sense to combine multiple commands to avoid the much worse "ping-pong" latency, or having to wait for the first to successfully be served before sending the second (see NCQ).

A PCIe gen3 (which is what you'd typically find) transaction layer packet has some 22 to 30 B of header+checksums. Getting an empty payload packet out still has quite some latency on the CPU side, so you'd avoid small packets.

Or doesn't it matter at all because the SSD controller will only process them sequentually?

No. That's the main technical reason why we use NVMe and not ATAPI-over-PCIe: the NVMe device and the host can queue requests and commands and serve them in the order that's most advantageous.

\$\endgroup\$
2
  • \$\begingroup\$ Is it possible to make any estimate about how much slower it will perform with 1 lane vs 4 lanes (provided you operate below 1 GB/s) to read 1 MB of data for example? If I understand correctly, most of the time is lost by waiting for the transfer to start, and not the actual transfer itself. So it will not be 4 times slower. So what would be a (very rough) estimate of the reduction? \$\endgroup\$
    – Maestro
    Commented Nov 2, 2023 at 21:24
  • 1
    \$\begingroup\$ at a large amount of data like 1MB that we assume to not be limited by CPU interrupt latency, e.g. because it's done purely through DMA, roughly one fourth of the time, to a reasonably good approximation. Looks different when you ask for a single 4KB block. \$\endgroup\$ Commented Nov 2, 2023 at 21:25
1
\$\begingroup\$

But if you never need any high throughput, as you only work with small files for example, will there be any difference in performance between a x4 slot vs a 1x slot?

Short answer: not really.

Foremost I wonder if the latency will be exactly the same? When the host sends any commands to the drive they tend to be small in size, so they should not be affected by the max bandwith. But maybe when you have 4 lanes those communications will be send in parallel instead of queued on one lane?

Each transfer will take 4x longer with 1/4 the bandwidth, so if you know the size of the transfer and Google for random read IOPS numbers for your device you can calculate this exactly. Using typical values for 4K random access and 1 GB/s you can get an idea though. On a typical drive it'll take about 100 uS to start the transfer than 4E3/1E9 = 4 uS to complete it.

\$\endgroup\$
3
  • \$\begingroup\$ sorry, accidentally edited your answer instead of mine. Rolled that back. \$\endgroup\$ Commented Nov 2, 2023 at 20:58
  • \$\begingroup\$ @user1850479 So it will take 104 uS using 1 lane, and 101 uS using 4 lanes since the actual data transfer is such a small part of the total delay? In that case it makes sense that both perform (almost) the same. \$\endgroup\$
    – Maestro
    Commented Nov 2, 2023 at 21:11
  • 1
    \$\begingroup\$ @Maestro Yes, so about 3% different in this case. But look up the numbers for your specific device and then calculate for your target file size. As the file size gets bigger, the transfer time becomes a proportionally larger fraction of the total latency. Similarly if you buy a faster SSD, you can get slightly faster access times as well. \$\endgroup\$ Commented Nov 2, 2023 at 21:33

Not the answer you're looking for? Browse other questions tagged or ask your own question.