The stack circuitry of the Intel 8087 floating point chip, reverse-engineered

100 points • by elpocko • yesterday at 6:16 PM • 47 comments • view on HN

Comments

Sometime in the 80s, I implemented the core of the Mandelbrot Set calculation using assembly on an 8087. As the article mentions, the compilers did math very inefficiently on this stack architecture. For example, if you multiplied two numbers together and then added a third, they would push the first two numbers, multiply, pop the result, push the result back onto the stack (perhaps clearing the stack? after 40 years I don't remember), push the third number, add, pop the result. For the Mandelbrot loop this was even worse, as it never kept the results of the loop. My assembly kept all the intermediate results on the stack for a 100x speed up.

Running this code, the 8087 emitted a high-pitched whine. I could tell when my code was broken and it had gone into an infinite loop by the sound. Which was convenient because, of course, there was no debugger.

Thanks for bringing back this memory.

➕ show 3 replies

mbf1 • today at 2:48 AM

There were a couple interesting points about the market for 8087 chips -- Intel designed the motherboard for the IBM PC, and they included an 8086 slot and a slot for either an 8087 or 8089. IBM didn't populate the slot for the coprocessor chip as it would compete with their mainframes, but Intel went around marketing the chips to research labs. One of them ended up with Stephen Fried who founded Microway in 1981 to create software for the 8087 and sell the chips, and the company is still in business after 44 years of chasing high performance computing. That's how I first got started with computing - a Microway Number Smasher (TM) card in an IBM PC.

The 80287 (AKA 287) and 80387 (AKA 387) floating point microprocessors started to pick up some competition from Weitek 1167 and 4167 chips and Inmos Transputer chips, so Intel integrated the FPU into the CPU with the 80486 processor (I question whether this was a monopoly move on Intel's part). This was also the first time that Intel made multiple versions of a CPU - there was a 486DX and a 486SX (colloquially referred to as the "sucks" model at the time) which disabled the FPU.

The 486 was also interesting because it was the first Intel x86 series chip to be able to operate at a multiple of the base frequency with the release of the DX2, DX3, and DX4 variants which allowed for different clock rates of 50MHz, 66MHz, 75MHz, and 100MHz based on the 25MHz and 33MHz base clock rates. I had a DX2-66MHz for a while and a DX4-100. The magic of these higher clock rates came from the introduction of the cache memory. The 486 was the first Intel CPU to utilize a cache.

Even though Intel had superseded the 8087/287/387 floating point coprocessor by including the latest version in the 80486, they introduced the 80860 (AKA i860) which was a VLIW RISC-based 64-bit FPU that was significantly faster, and also was the first microprocessor to exceed 1 million transistors.

The history of the FPU dedicated for special purpose applications is that it eventually became superseded by the GPU. Some of the first powerful GPUs from companies like Silicon Graphics utilized a number of i860 chips on a card in a very similar structure to more modern GPUs. You can think of each of the 12x i860 chips on an SGI Onyx / RealityEngine2 like a Streaming Multiprocessor node in an NVIDIA GPU.

Obviously, modern computers run at significantly faster clock speeds with significantly more cache and many kinds of cache, but it's good to look at the history of where these devices started to appreciate where we are now.

➕ show 2 replies

userbinator • today at 6:52 AM

I don't know what the GRX field is.

The field of the instruction that selects the stack offset.

kens • yesterday at 6:32 PM

Author here for your 8087 questions...

➕ show 4 replies

tigranbs • yesterday at 8:08 PM

The 2-bit-per-transistor ROM using four transistor sizes is wild. Were there other chips from this era experimenting with semi-analog storage, or was the 8087 unusually aggressive here?

➕ show 1 reply

em3rgent0rdr • yesterday at 8:10 PM

Looking at the complexity and area of hardware floating point, I often wonder why we don't see more unified combined integer+floating point units, like done in the R4200 [1], which reused most of the integer datapath while just adding a smaller extra smaller 12-bit datapath for the exponent.

[1] https://en.wikipedia.org/wiki/R4200

➕ show 1 reply

garaetjjte • yesterday at 11:09 PM

Story of how Intel-derived proposal was standardized as IEEE754: https://people.eecs.berkeley.edu/~wkahan/ieee754status/754st...

hyperman1 • yesterday at 9:20 PM

I didn't expect the microcode to be at the center of the chip. I'd expect it on the side and only talking to the microcode engine, making more room for data traffic between chip halves. Also, the microcode is huge.

➕ show 1 reply

CaliforniaKarl • yesterday at 9:19 PM

I wonder, if C used Reverse-Polish notation for math operations, would compilers have been able to target the 8087 better than they did?

➕ show 3 replies

librasteve • yesterday at 10:39 PM

Looks like a log multiply-adder ... maybe a 5 clock cycle? Also, on the microcode ... them FP divide algorithms are pretty intense.

Would be cool to hear a real designer compare to the Weitek 1064.

lisbbb • today at 5:14 AM

I made my Dad buy me a 387 math coprocessor when I was in college because I was taking math and physics courses but I bet none of the software I used ever even accessed that chip. It was more about the empty socket on the mobo looking out of place.

leeter • yesterday at 9:46 PM

I remember failing an interview with the optimization team of a large fruit trademarked computer maker because I couldn't explain why the x87 stack was a bad design. TBF they were looking for someone with a masters, not someone just graduating with a BS. But, now I know... honestly, I'm still not 100% sure what they were looking for in an answer. I assume something about register renaming. memory, and cycle efficiency.

➕ show 1 reply

burnt-resistor • yesterday at 8:27 PM

Very cool.

It's all about that 80-bit/82-bit floating point format with the explicit mantissa bit just to be extra different. ;) Not only is it a 1:15:1:63, it's (2(tag)):1:15:1:63, whereas binary64 is 1:11:0:52. (sign:exponent [biased]:explicit leading mantissa bit stored?:manitissa remaining)

Other pre-P5 ISA idiosyncrasies: Only the 8087 has FDISI/FNDISI, FENI/FNENI. Only the plain 287 has a functional FSETPM. Most everything else looks like a 387 ISA-wise, more or less until MMX arrived. That's all I know.

I'm curious what the CX-83D87 and Weiteks look like.

Keep up the good work!

PS: Perhaps sometime in the (near) future we might get almost 1:1 silicon "OCR" transcription of die scans to FPGA RTL with bugs and all?

➕ show 1 reply

ForOldHack • yesterday at 7:56 PM

This is cool, but the renormalization and (Programmable and bidirectional) barrel shifter are of much more interest.

I had a 10Mhz XT, and ran a 8087-8 at a bit higher clock rate. I used it both for Lotus 1-2-3 and Turbo Pascal-87. It made Turbo Pascal significantly faster.

➕ show 1 reply

alt Hacker News

The stack circuitry of the Intel 8087 floating point chip, reverse-engineered

Comments