24

I just came across this amazing 1976 article by Woz. In it he describes a relatively complete floating-point system for the 6502 with a 32-bit format (similar to earlier MS code).

I understand the code, mostly, but I am curious about its performance. I know that a lot of it runs through FMUL and thus one would expect that newer designs using unwound loops and/or self-modifying code would improve on that.

But given the constraints of the time, mostly memory and a desire to be read-only (for some machines anyway), has this code been greatly improved upon?

I have poked about a bit for benchmarks comparing this code to MS's version, but have not found anything applicable - the Rugg/Feldmann would do it but the only numbers I see are for MS BASIC vs. Integer, so Woz's FP code is not being run in either case.

7
  • 6
    The code includes logic to work around a bug in the ROR instruction of the very first 6502 microprocessors, and would probably be more than twice as fast without that.
    – supercat
    Commented Jun 24, 2019 at 18:39
  • Wow, that's interesting. I was aware of this bug, but had not seen an estimate on the difference in time! Commented Jun 24, 2019 at 18:40
  • 6
    Addresses $1F66 to $1F75 could be replaced by 12 bytes worth of LSR and ROR instructions with total execution time of 30 cycles. Instead, they run six iterations of a loop with an execution time of 25 cycles/iteration. That function probably represents the majority of the execution time of a floating-point multiply.
    – supercat
    Commented Jun 24, 2019 at 19:36
  • 2
    … or how precise/repeatable/feature filled? The Rankin/Wozniak code doesn't handle NaNs or Infs and has no transcendental functions, so it's of limited general use today. It's also single precision, so cumulative rounding errors will build up quickly.
    – scruss
    Commented Jun 26, 2019 at 10:12
  • 1
    Good points scruss. Admittedly, I was mostly interested in pure performance against the other implementations like the one in MS. Commented Jun 26, 2019 at 14:19

1 Answer 1

10

Steve Wozniak wrote most of his software to be compact rather than fast, reflecting the constraints of affordable memory hardware of his time. That often resulted in contortions that made it run considerably slower than a speed-optimised implementation, such as the extensive reuse of the FMUL subroutine mentioned.

Home micros sold before 1980 typically came with 8K or 16K of ROM in total, which had to support all the features of both BASIC and native user programs. The early Apple machines were no exception.

FP routines written for a less space-constrained machine, such as the BBC Micro which often had 48K of ROM from the factory (16K MOS, 16K BASIC, 16K DFS), could be considerably faster due to the use of more specialised routines and more speed-optimised coding techniques that took up more space. The BBC Master capitalised on a 128K "MegaROM" (named because 128KB = 1 megabit) and the more capable 65C02 to further accelerate the FP and graphics routines.

It's hard to directly compare Woz FP and BBC Micro FP because they operate to different precisions - 4 and 5 bytes respectively - so the BBC Micro has to do more work to complete its calculations. Nevertheless, a Mandelbrot-based benchmark on different BASICs ported to a common (relatively fast) machine shows that the BBC implementation was still faster:

ehBasic: 172 seconds
Applesoft: 161 seconds
BBC Basic 1-3: 124 seconds
BBC Basic 4: 96 seconds

In the above table, ehBasic is effectively an expanded version of MS BASIC implementing a 5-byte FP format. BBC Basic 4 is the Master version using 65C02 instructions.

17
  • Erm... the Question is not about Applesoft - and especially not about the BBC - but Woz' FP routines. So none of the text of this 'answer' nor the benchmarks are related to the question in any way.
    – Raffzahn
    Commented Jun 29, 2019 at 0:39
  • 1
    As far as I'm aware, Woz' FP routines were used in Applesoft BASIC. The question effectively asked for a comparison with other FP implementations.
    – Chromatix
    Commented Jun 29, 2019 at 0:44
  • 3
    Nop, Applesoft uses Microsoft's FP routines (1+4 byte format), while Woz' implementation is based on 32 bit (1+3 byte) representation. They where only used with Integer BASIC and Assembly routines. And while the question is about comparsion, it's useless to write mostly about a different BASIC when there is no relation and no comparsion at all.
    – Raffzahn
    Commented Jun 29, 2019 at 0:51
  • 1
    @Chromatix Could you do that comparison, then? As it stands, this answer doesn't really answer the question.
    – wizzwizz4
    Commented Jun 29, 2019 at 5:43
  • 1
    I don't really have time today, but I'll see what I can organise.
    – Chromatix
    Commented Jun 29, 2019 at 5:46

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .