Thunderbolt 4 uses 4 lanes of PCIe 3.0
I'll admit this is a little confusing... in this case, "4 lanes of PCIe 3.0" is an indication of bandwidth, with no real association to the physical implementation... each Thunderbolt 4 lane being that much faster than PCIe.
Wikipedia does a good job of summarising this in the Thunderbolt 3 section:
It allows up to 4 lanes of PCI Express 3.0 (32.4 Gbit/s) for general-purpose data transfer, and 4 lanes of DisplayPort 1.4 HBR3 (32.40 Gbit/s before 8/10 encoding removal, and 25.92 Gbit/s after) for video,[78] but the maximum combined data rate cannot exceed 40 Gbit/s
... to clarify, the bandwidth is shared between PCIe and DisplayPort, with some flexibility on how much bandwidth is allocated to each interface - with a hard cap at 40 Gbit/s.
each lane should contain 3 twisted pairs, two for communication (TX and RX) and one for reference clock
This isn't true - for PCIe, the reference clock is not per-lane, it's for the whole link. As already noted, it's also possible to omit the clock, with the two endpoints having entirely independent reference clocks. A "PCIe lane" is two pairs (Tx and Rx). The reference clock is just to help the endpoints with natural drift that occurs with two independent clocks - it's not used to clock data directly.
With other protocols (including Thunderbolt), the clock can be "recovered" due to the line encoding that is used... fundamentally X-bit words are encoded into Y-bit symbols on the wire (where X < Y
), and this encoding ensures there is a transition or edge regularly enough that the clock can be inferred from the signal.