https://en.wikipedia.org/wiki/Transistor_count provides some transistor count figures for early microprocessors, e.g. 8008 and 6502 both listed at around 3500.

The PDP-8 was a 12-bit computer with a deliberately simple instruction set. At first glance, 12 is greater than 8, so might be expected to be larger than the early 8-bit microprocessors, but the address sizes were 12 and 16 bits respectively. (Why did the PDP-8 not distinguish between data and address widths like the microprocessors? Conjecture: for early microprocessors, pins and board traces were expensive, but early minicomputers were made of discrete components which means they were effectively made of wires, so saving a few wires on the data bus wouldn't help.) It's not clear just from the nominal numbers, how the PDP should be expected to compare with early microcomputers.

https://www.pdp8.net/straight8/functional_restore.shtml says "The backplanes contain 230 cards, approximately 10,148 diodes, 1409 transistors, 5615 resistors, and 1674 capacitor."

1409 is a surprisingly small number. For example, it's less than half of 3500. That leads one to ask how it managed to be so much simpler than e.g. the 6502, what tricks the 6502 designers were missing.

However, there are also the other components to take into account. Is there some sense in which the diodes, resistors and capacitors did things that would be done by transistors in a microprocessor?

What would be a reasonable assessment for the transistor count of the PDP-8 by the same metric that the 6502 has 3510 transistors? Or vice versa?

    There appear to have been a bunch of them, per gordonbell.azurewebsites.net/tcmwebpage/timeline/… — e.g. the PDP8/S is a cost-reduced model with the same instruction set but which uses serial bit-by-bit arithmetic, no doubt saving on transistors, being "one-fifteenth of a PDP-8 at one-half the cost" (impliedly meaning one-fifteenth as fast). Following up on the PDP-8/S I see a claim of 519 logic gates, which could easily mean 1409 transistors. Leaving as a comment because I'm very fuzzy on how logic gates were constructed then. Or now.
    – Tommy
    Commented Mar 1, 2018 at 19:24
    The 8008 and 6502 use a data width bus of 8-bits because they support bytes as a data type, and they use the data bus twice (i.e. taking longer) if the memory transfer operations is 16-bits. The PDP-8 did not support bytes as a data type, just words. So, there was no reason to differentiate data width and address widths, both used the same 12-bit words.
    – Erik Eidt
    Commented Mar 1, 2018 at 20:02
    @ErikEidt: BTW, at least the later Omnibus models (starting with the PDP-8/E) had separate busses for address and data. So there would have been no reason to have identical width for address and data; other DEC PDPs didn't do this, either. And in fact, there is the option to expand the address bus by another 3 bits to 15 bits using data and code "fields" (KM8 "Memory Extension and Time Share Option"), though I don't think any existing PDP ever used all 8 of them.
    – dirkt
    Commented Mar 3, 2018 at 19:36
    @dirkt, you're absolutely right, the later models would have had a larger address bus, eh?
    – Erik Eidt
    Commented Mar 3, 2018 at 20:47

The PDP-8 was a 12-bit computer with a deliberately simple instruction set.

That's part of your answer right there: the 6502 was in many ways more complex than the PDP-8: the 6502 has 56 machine instructions, but the PDP-8 has only 24.¹

The base configuration of the original PDP-8 simply offered less power in its half rack of space than the 6502 does in its little 40-pin DIP package.

Here's some of what was missing in the PDP-8 relative to the 6502:

  • Subtraction: The 6502 had both an add with carry instruction (ADC) and a subtract with borrow instruction (SBC). Luxury!

    The only arithmetic instruction in the PDP-8 is a two's complement add instruction (TAD). To subtract two integers, you had to do one of several tricks:

    1. For arbitrary operands, you could complement the subtrahend and add 1 to it using a "complement and increment accumulator" instruction (CIA) then TAD it to the minuend, which is what a two's complement subtraction is.²

    2. If the subtrahend is a constant known at assembly time, some PDP-8 assemblers could negate it for you and store it as an in-page constant in the generated machine code, then reference that from the generated TAD instruction. "Add the negative value stored in page offset 42 to the accumulator." Basically, it would shift the cost of the CIA instruction from run time to assembly time.³

      Those without assemblers that could do this automatically might take the time to manually work out the arithmetic to save the cost of a CIA instruction. I've seen many PDP-8 assembly programs with this sort of magic constant embedded, often with no explanation for why that particular value was used!

    3. Sometimes you'd get lucky and the two's complement of the constant you needed to subtract would happen to be equal to one of the instructions in the same page. That's right, PDP-8 programs would sometimes reuse executable machine code as data in other parts of the code!

    4. If both operands were constants known at assembly time, some PDP-8 assemblers could do the subtraction for you and just store the difference in the machine code as a constant.

  • Non-destructive store: The PDP-8 deposit and clear accumulator instruction (DCA) was a double-edged sword.

    On the plus side, it gave the programmer something like the 6502's store accumulator in memory instruction (STA) without requiring an equivalent to the 6502's load accumulator from memory instruction (LDA): because DCA clears the accumulator after storing it in memory, you could just TAD a new value from memory to the just-zeroed accumulator, giving the same effect as LDA without requiring more transistors in the CPU to achieve the effect. (n + 0 = n) The dual use of TAD instructions for "two's complement add" and "load zeroed accumulator with new value" makes these instructions much more common in PDP-8 assembly code than you'd guess from the acronym.⁴

    The downside is that if you wanted to store a copy of the accumulator somewhere but keep working with it in the accumulator, you had to TAD it back into the accumulator from memory after storing it in that same memory cell! This pathological case runs about 3× slower on a PDP-8 than the equivalent code on a 6502. (We'll see this happen in the "exclusive or" program below!)

    When evaluating this design choice, consider the counter-pressures on the PDP-8's design that forced it:

    • The PDP-8's designers wanted 12-bit words.

    • They wanted every instruction to be just one word, because although variable-length instructions lift a lot of limits on the design, they also require more transistors to decode.

    • Given the choice to have fixed-length words, the designers of the PDP-8 then had to decide how many bits to dedicate to the instruction's operation code and how many to the operand. They chose to dedicate ¼ of the bits to the op-code, 3 bits, giving the official 8 "true" instructions.¹

    • The PDP-8 designers wanted memory reference instructions (MRI) like TAD to be able to address zero-page, indirect, current-page, and zero-page-indirect memory references, which requires 2 bits in the instruction word: 00, 10, 01, and 11, respectively.

    • That leaves 7 bits for MRI instructions, which gives the PDP-8 page size of 2⁷ = 128 words.

    All of which means that if the designers of the PDP-8 series had wanted to have a "load accumulator" instruction like that in the 6502, they'd either have to reassign one of the address bits in the instruction, shrinking the page size to 64; or increase the word size, requiring a wider CPU and thus more transistors; or allow variable-width instructions, requiring more transistors to decode the more complicated instructions.

  • Registers: The 6502 has many similarities to the PDP-8 register setup. Both CPUs have only one general-purpose register called the accumulator,⁵ and both have a program counter, for example.

    There are also many differences, some of which are responsible for the lower transistor count in the PDP-8 CPU:

    • The 6502 has two dedicated index registers, X and Y. The PDP-8 sets aside 8 of its zero page memory locations as auto-increment registers instead. Every time you accessed one of those special core memory locations, the processor would increment them.

      It's a roughly neutral tradeoff in terms of speed: where a 6502 program would need explicit INX and INY instructions to do indexed RAM lookups, the equivalent PDP-8 code could leave those instructions out but would need to do indirect core memory accesses through zero page, which is slower than indexed RAM access. A 6502 program that says "load the X register with Z, load AC indirectly via X, then increment X and jump" is equivalent to "store index X in zero page core location 010, load AC indirectly via 010, then jump" in a PDP-8.

    • About half the bits in the 6502 status register have no equivalent in the PDP-8. This is no accident: bits were nearly free when it came to microprocessors, but when you're constructing registers from discrete transistors, diodes, and resistors, each bit costs real money.

    • ...which is why two of the precious few registers in the PDP-8 are only 3 bits long: Instruction Field (IF) and Data Field (DF).

      Coupled with the 12-bit native word size, IF and DF allow the PDP-8 to address up to 32 kWords of core memory, half the amount the 6502 can directly address. IF and DF are basically a form of bank switching, hard-limited to 8 banks in the PDP-8. (2³ = 8.)

      More or less the same technique allows 6502-based computers to escape the 64 kB addressing limit of the processor. I had 640 kiB of RAM in my Apple //c: the 128 kiB of on-board RAM plus a 512 kiB Applied Engineering Z-RAM Ultra. Few programs made good use of that extra memory, but then, there were plenty of PDP-8 programs that were confined to a single 4 kWord core memory field, too.⁶

  • Exclusive OR: The 6502 provides this directly via the EOR instruction, whereas a PDP-8 required seven instructions to provide that operation:

    DCA TMP     / save AC to a temporary location, zeroing AC
    TAD TMP     / pull value back into AC
    AND M       / bitwise AND value from core memory location M
    CIA         / two's complement negate AC
    CLL RAL     / clear LINK bit and rotate AC left == 2 * AC
    TAD TMP     / add TMP and M values to AC
    TAD M

    (Program source: Douglas W. Jones' PDP-8 Programmer's Reference Manual. There's a less efficient alternative in the DECUS PDP-8 Cookbook, volume 1 on page 23.)

    That's a great illustration of the cost of the tiny PDP-8 instruction set: it's Turing complete, so technically there's nothing we can't do with the instruction set, given enough CPU time and memory. But that's the trick: we don't have infinite CPU time or infinite memory.

  • Stack: The 6502 has a built-in hardware 256 byte stack — and a stack pointer register to go with it — a feature entirely missing from the PDP-8 architecture.

    You might therefore wonder how (or whether!) a PDP-8 program could have subroutines. Yes, subroutines did exist on the PDP-8, but they worked entirely differently from those on the 6502 and more modern processors. Instead of pushing the program counter and any registers that need saving onto the stack, the PDP-8's "jump to subroutine" instruction (JMS) stores the current program counter in the core memory location referenced by the jump, then sets the PC to the next memory location and begins executing from there. There is no "return" instruction in the PDP-8: the called subroutine just makes an indirect JMP through its first core memory location, which is set to 0 by convention when the program first loads into core. (More detail here.)

    For simple call graphs, the PDP-8's JMS instruction is roughly equivalent to the 6502's more powerful JSR instruction. Because there is no stack, the PDP-8 doesn't need an equivalent to the 6502's "return from subroutine" (RTS) instruction: it can press the plain JMP instruction into service to get that ability.

    All of this means that recursion and reentrancy could not be done on the PDP-8 through the normal calling convention. An assembly language programmer that wanted to do recursion needed to write extra code to maintain a software call stack. It was a mixed bag for those writing in high-level languages: the FORTRAN IV system for OS/8 didn't allow recursion, but the DEC FOCAL-8 implementation did!

So far, I've explained away 5 of the "missing" instructions in the PDP-8 relative to the 6502. (INX, INY, LDA, EOR, and RTS.) I could keep going, but this answer is already plenty long. Suffice it to say, I'm confident I could come up with ways to replace all of the other 56 - 24 - 5 = 27 "missing" instructions.

The 6502 takes at least two cycles to do anything, so at 1 MHz, it can execute up to 500000 instructions per second, or 500 kIPS. Core memory based PDP-8s can do purely in-CPU operations in one memory cycle time, but anything involving core memory takes at least 2 core memory cycles due to the need for core memory to be re-magnetized after each read. This means that the PDP-8/I's 1.5 µs core memory cycle time equates to 333 kIPS for the fastest sort of operations involving core memory.

That being said, when it comes to the question of overall performance and value delivered, what actually matters is how the systems were configured:

  • Most computers in the PDP-8 line either had the Extended Arithmetic Element (EAE) option available or had it integrated from the factory, giving it capabilities not in the 6502, such as integer multiply and divide instructions. Add in the Floating Point Processor (FPP) option and you now get hardware floating-point arithmetic. I'm not aware of a direct equivalent for the 6502; even if there were math coprocessors for the 6502, the EAE and FPP were commonly installed options in the PDP-8 world, being required for some programs.⁷

    Adding the EAE to a PDP-8 could easily swamp the 6502's IPS rate advantage for some applications. The KE8/I EAE can complete an integer multiply or divide in two core memory cycles, whereas it could take dozens of instructions each taking 2+ cycles on a 6502 to get similar functionality.

    The advantage can swing the other way, though. A program making heavy use of a recursive call stack would likely run faster on a 6502 due to the explicit hardware support for it, whereas the PDP-8 equivalent would require extra instructions and core memory references to emulate the facility.

  • For the PDP-8, multiple serial terminals — either Teletype model 33 ASRs or glass TTYs — were common, whereas color bitmapped output was common on 6502 based microcomputers and rare on the PDP-8. Which is more valuable? It depends on your application. For games, the 6502 micros would win hands down over a PDP-8. But a mid-sized PDP-8 system could reasonably provide 8 terminals of BASIC for a small computer lab, whereas you'd need 8 individual micros to give the same level of service. (Such a thing was in fact sold as the DEC EduSystem 20.)

  • Terminals commonly used with the PDP-8 were inherently 80-column devices, whereas that was usually an extra-cost add-on for 6502-based microcomputers, where available at all, and where absent, usually left you with 40 column or less output, because they had to be usable with a standard low-resolution television as the monitor.⁸

  • A PDP-8 would usually have more high-speed storage than a 6502-based microcomputer. PDP-8s contemporaneous with microcomputers also used floppy disks, and instead of audio cassette tapes, they'd use the much more capable DECtape system. Hard drives were common with PDP-8s, but they were very rare with 6502 microcomputers, even into the mid-1980s.

I give these examples because once you add all of the I/O cards, coprocessors, and such involved in all of this, the actual number of transistors in use within a large PDP-8 system might well exceed the number of transistors inside a well-rounded 6502 microcomputer. The PDP-8 would cost more, of course, but it would probably also be delivering more measurable value.

Why did the PDP-8 not distinguish between data and address widths like the microprocessors?

Because it made the instruction set simpler, which simplifies the CPU needed to interpret those instructions, which reduces the number of transistors you need to implement the CPU.

The cost of this was a complicated memory model that makes the segmented memory architecture of the 8086 look simple.

for early microprocessors, pins and board traces were expensive, but early minicomputers were made of discrete components which means they were effectively made of wires, so saving a few wires on the data bus wouldn't help

No. While both IC-based and wire-wrapped CPUs are expensive to design and prototype, PCBs and ICs are cheap to manufacture in volume, while wire-wrapped CPUs — the style used for the earliest PDP-8s — are expensive to produce, requiring either expensive hand-wrapping or really expensive programmable wire-wrapping machines.

The compensating virtue of wire-wrapped CPU backplanes is that a skilled technician can rewire one at need, whereas to fix a design error in a mid-1970's style microcomputer, you often had to re-spin the IC or the PCB involved, which was expensive. Once you got the design working, a microcomputer tended to stay more static than a wire-wrapped computer did. It was not uncommon for field technicians to apply fixes to machines by reworking the backplane. It was common for one color of wire to be used for the stock design from the factory, another color for factory modifications, and another for field repairs and modifications.

This led to jargon like "blue wire" when speaking of local hackery. Q: "Why is there an extra switch on the 8/e console in your lab, Dr. Mbogo?" A: "That's a blue-wire mod to zero out the ADC without resetting the bus."

Digressions and Footnotes

  1. Officially, the PDP-8 has only 8 true instructions, but I prefer to consider the 17 portable microcoded OPR instructions as separate instructions, since many of them map directly to distinct instructions in more modern instruction sets, including that of the 6502. The fact that you can combine select subsets of those 17 instructions into a single OPR instruction is an optimization, not a useful distinction in our comparison here. That count leaves out BSW and the Group 3 OPR instructions, because they're not portable between PDP-8 models.

    I choose not to consider the various IOT sub-instructions separately, since that would drag all of the PDP-8 peripherals into the discussion. We're interested in comparing CPUs only here. I made an exception for the EAE and FPP peripherals above since although they were accessed with IOT instructions, we'd consider them CPU coprocessors today.

  2. The CIA instruction is one of those microcoded instructions: it's actually an OPR instruction with the complement accumulator (CMA) and increment accumulator (IAC) bits set.

    "CIA" is just the common mnemonic assigned to this combined instruction: PDP-8 assemblers predefined several of these as a convenience, so you didn't have to list the individual operations:

    CMA IAC     / two's complement negate AC in one instruction
    CIA         / same thing, in most PDP-8 assemblers

    Because the set of possible combinations of OPR bits was rather large, though, the particular operation you wanted might not have a predefined mnemonic, so PDP-8 assemblers let you create custom instructions by binding a mnemonic to a 12-bit value:

    TCN=    CMA IAC

    Nothing stopped you from assigning a nonstandard name to any instruction, so you could use this in a program if the "CIA" acronym bothers you: two's complement negate.

    Some programmers would also use this feature for primitive macros:

    NL0002=  CLA CLL CML RTL

    That loads the constant 2 into the accumulator in a single instruction: clear accumulator, clear link, complement link, rotate link left by 2 bits. Now you can use "NL0002" as single instruction opcode to load the constant 2, avoiding the need for a separate CLA instruction and in-page constant.

  3. That saves about 3µs and one of the 128 locations per core memory page, both of which can be a significant savings.

    The core memory location was especially precious, given not only how few of them there are, but also the cost of spanning pages. For instance, it's cheaper to jump to an in-page location than to one in a neighboring page, since that required an indirect jump via an in-page constant, called a "link" in PDP-8 assembler jargon. (Not to be confused with the processor's 1-bit LINK register!) The indirect jump not only cost a core memory location, it's slower to execute than an in-page jump.

  4. When not preceded by a DCA, a TAD instruction used for "load accumulator" would be preceded by a clear accumulator (CLA) instruction of some sort. A lot of effort in optimizing PDP-8 code goes into arranging instructions in a way that results in a CLA OPR instruction happening just before a TAD needs the accumulator to be zeroed to work properly. This is why there is a CLA instruction in OPR groups 1, 2 and 3: if your TAD instruction happened to be preceded by a group 2 OPR instruction such as SZL, you might be able to set the group 2 CLA bit to clear the accumulator as a side effect of the SZL rather than have the first instruction of the jump target be a separate "CLA" instruction, which would normally be in group 1.

    Keep in mind that this was all before we had modern notions of good software engineering practices. Side effects and actions separated from their causes by considerable distance in the code were good practice at this time: you almost certainly had less core memory than you wanted, so saving an instruction here and there was important.

  5. The PDP-8/e and later integrated parts of the EAE option so they also had the MQ register, but that wasn't fully general-purpose. Code that didn't need the EAE could use it as a faster scratch-pad than core memory, but that was of limited utility since few PDP-8 instructions work with MQ directly.

  6. The popular TSS/8 operating system for the PDP-8 presented a virtual 4 kWord PDP-8 to each logged in user. With some clever swapping code, it could support up to 24 users on a 32k machine.

    Another example were the several multi-user FOCAL systems, where a common scheme was to run the interpreter in the first 4k of core and swap individual user programs or user code libraries in and out of available 4k fields above that. One such system (LIBRA FOCAL-8) ran in only 8k of core and could support up to 7 users!

  7. The OS/8 FORTRAN IV system is notable in this regard. Its loader was smart enough to detect the presence of the EAE and FPP options and switch from slow software emulations of them to direct hardware access. You wrote your program the same either way. Doubtless some sites started off without the EAE and FPP, then later added it to make their programs run faster.

  8. Once video terminals started to displace paper teletypes, a common PDP-8 application was word processing, where 80 columns of text maps nicely to the width of office paper. The DECmate series was basically a DEC VT52 terminal with an embedded PDP-8 compatible IC CPU and a dual floppy drive, loaded with a variant of OS/8, and usually sold as a word processor.

    Excellent answer. Thanks. Some things worth noting: 1) The simplification for a processor using the same size address and data bus is massive. Rather than having to have separate data paths for these, they can share the same path, removing a huge amount of redundant circuitry. 8-bit CPUs couldn't afford to do that, because 256 bytes of memory is too small for any real application. 4096 12-bit words (which can be used to store 2 characters each, if you don't need lower case) is just about useful enough to get by in many cases.
    – Jules
    Commented Nov 6, 2018 at 23:41
    2) A lot of applications can be implemented using 12-bit arithmetic. A lot fewer can be implemented using 8-bit arithmetic. For any of those applications, a PDP-8 is obviously at an advantage, as a 6502 will need to use multiple instructions to perform a 16-bit calculation while the PDP-8 can just use one 12-bit one.
    – Jules
    Commented Nov 6, 2018 at 23:49

The classic PDP-8 made extensive use of what were called DCD gates. These diode-capacitor-diode gates performed the logical AND function between two inputs, one a pulse and one a level signal. So, for example, each flip-flop in the PDP-8 accumulator was made with two transistors plus an array of DCD gates on the set and reset inputs that served to multiplex the myriad inputs to the accumulator.

The extensive use of DCD gates cut way down on the transistor count compared to machines implemented with pure diode transistor logic, where every AND was likely to involve a transistor, and they certainly cut the transistor count compared with MOSFET logic.

DCD gates look bizarre to a modern designer because we rarely use pulse logic today, but in the early 1960s, when transistors cost over a dollar each, and when most engineers were still thinking in terms of tubes, it made good sense to use cheap components like capacitors and diodes when you could get away with it.

You can see more about the classic PDP-8 here: -- http://www.cs.uiowa.edu/~jones/pdp8/UI-8/

Take a look at the schematic for the PDP-8 accumulator. You'll see that each bit of the accumulator (just two transistors) is surrounded by 15 DCD gates (7 that can reset the flipflop and 8 that can set it). Each DCD gate consists of 4 diodes, a capacitor and some resistors. When the level input is high, a positive pulse on the pulse input causes an over-high noise spike on the output, coupled through the capacitor. When the level input is low, a positive pulse on the pulse input is coupled through the capacitor to a positive pulse on the output. The official DEC explanation of DCD gates is in the Logic Handbook, 1968 (and other editions).


The PDP-8 processor was implemented in diode-transistor logic, whereas most early microprocessors were implemented in some variety of MOS logic, most commonly NMOS. This accounts for some of the discrepancy you note: a typical gate in MOS logic would use 3 transistors (the wikipedia article shows a resister and two transistors, but on an actual integrated circuit the resister is most commonly swapped for a transistor with its gate attached to a constant source, because that's easier to implement in MOS-type IC production processes), while in diode-transistor logic, there's only one transistor + several diodes.

That said, the PDP-8 processor was very simple. It had a single bit ALU that was used repeatedly to perform single operations, which would have saved a significant number of gates. It also has fewer registers than even the notoriously register-sparse 6502. Due to using a serial ALU, the larger word length would have only really had much effect on the size of the registers, and not made the rest of the processor any more complex.

It is therefore unsurprising that the gate count is lower than the 6502 (approximately 500 vs 1000 respectively).


It looks like the Wikipedia article I was basing some of the above is misleading: only the PDP-8/S has a single bit ALU (but then, that is where the 500-gate figure came from too). Other sources confirm that the original PDP-8 has a full 12-bit ALU, so likely has many more gates than 500... the transistor count suggests it could have up to 1400 gates, although I imagine some of those transistors are used for buffers rather than gates, so I'd estimate it probably has a similar number of gates to the 6502.

    Wasn't it only the PDP-8/S that used the serial ALU?
    – scruss
    Commented Mar 1, 2018 at 21:58
  • 1
    @scruss - I'm not certain. The Wikipedia article implies that the original PDP-8 had a serial ALU too, and the improvement on the PDP-8/S was to use a serial data path for interconnection between modules, whereas the original PDP-8 used parallel connections, presumably only serializing the data on entry into the ALU. I'm finding it hard to confirm that this is correct, though.
    – Jules
    Commented Mar 1, 2018 at 22:08
    @scruss: All variants of the PDP-8 except the PDP-8/S used a 12-bit parallel ALU. See e.g. gordonbell.azurewebsites.net/tcmwebpage/timeline/… "A cost performance tradeoff took place in the PDP-8 (parallel-by-word arithmetic) and PDP-8/S (serial- by-bit arithmetic) implementations." Commented Mar 2, 2018 at 3:29
  • The bias to use as many passive components and as few actives as possible might have been a holdover from vacuum tube times... Commented Mar 8, 2018 at 11:24
    @rackandboneman I suspect the preference for diodes over transistors was simply a cost consideration. Discrete transistors were much more expensive than diodes. (Even today it's still true, albeit with a smaller price gap.)
    – RETRAC
    Commented Dec 19, 2018 at 20:44

Why did the PDP-8 not distinguish between data and address widths like the microprocessors?

You have your question the wrong way around. Up until the advent of microprocessors, it was quite normal for addresses and data to be the same size. When they weren't the same size, the address was normally smaller than the data. So the PDP-11 has 16 bit addresses and 16 bit data (even though it was byte addressed). The DEC 10 had 36 bit data and 18 bit word addresses. The IBM system/360 had 32 bit data and byte addresses.

The reason for this is that, by today's standards memory was expensive. A typical computer used magnetic core memory which consists of little magnetic doughnuts threaded on a lattice of very fine wires. The manufacture of this stuff was quite intricate, it was almost always done by hand. IIRC they sometimes used lace makers to weave the wire lattice. So, for Digital, there was no point in having more than 256 k-words in its DEC 10 because nobody would have been able to afford the memory. The four gigabytes theoretically addressable on the system 360 was unimaginable back in the day.

Once they figured out how to do large scale integrated circuits, everything changed. Memory became much cheaper, relatively speaking, so did processors. However, if you wanted to make a computer that an ordinary person could afford, you still had to keep costs down, hence the 8-bit microprocessor. Having only 8 bits of data kept the transistor count down and made the packaging and external data paths much cheaper to manufacture. Unfortunately, 8 bits is not a practical size for a memory address which is why they went to 16 bits for addresses.

    @Wilson - it answers a side question asked along with the primary question.
    – Jules
    Commented Mar 5, 2018 at 13:15
    @Wilson It answers the bit I quoted at the top.
    – JeremyP
    Commented Mar 5, 2018 at 14:56

