TL;DR: Yes, but No
- Yes,
- the BCD correction circuit is part of the default data path between ALU and A-register and
- it does need some time to perform.
- No, it's not relevant as for one
- the correction circuit needs only a few nano seconds, but more important
- the circuit is only active during PHI1.
PHI1 active is the internal cycle, while PHI2 is when memory gets accessed (*1) - and memory access is where the timing bottle neck of 6502 systems resides. A 1 MHz 6502 already needs 2 MHz memory (450 ns access time or better). The internal phase (PHI1) is extreme relaxed.
Is there any documentation or other evidence to show that BCD support was indeed the limiting factor on the 6502 clock speed in the 70s and early 80s,
Not that I ever heard of - even more as it's structure and circuitry is quite openly documented.
i.e. that it was sometimes what made the difference between a particular 6502 being able to run at 2 MHz or not?
You're aware that the 6502 existed in different speed grades?
- 6502 for 1 MHz
- 6502A for 2 MHz
- 6502B for 3 MHz
- 6502C for 4 MHz (*2)
As usual they are selected variant (early on) or simply marking. For all of them the relevant timing path is memory access.
If so, could you increase the clock speed just by deciding not to use BCD in your application?
No. Not unless one varies the clock source according to instruction use. As explained BCD correction is used during PHI1, where there is plenty time to spend a few nano seconds. The tight path is PHI2, where memory access is done (*3).
Or would you need to redesign the ALU to streamline the logic?
The 6502's circuitry is already quite optimized according to Ken Shirriff compact description (scroll down to "Comparison to the 6502").
*1 - The 6502 works essentially double clocked with alternating cycles used for internal operation and memory access. This is handled by creating two non overlapping clocks, one for each cycle, providing 4 flanks for operation.
*2 - 6502C, not 65C02. It's still a NMOS CPU.
*3 - Then again, this can already be done without looking at instructions. PHI2 depends on memory speed, while PHI1 on more relaxed internal operation speed. It was a known hack to operate a 6502 on a 3:5 clock, so 1.25 MHz for a 1 MHz CPU. This also shows why it wasn't used much: no real gain, but a much more complex clock circuit - for a CPU where one of it's major advantages was to not needing external clock circuitry.