A floating-point unit (FPU) is a part of a processor, specifically designed to perform operations on floating-point numbers. All processors incorporate at least addition, subtraction, and multiplication. The fused multiply–add operation is increasingly being implemented. It is very useful for speeding up certain algorithms. Most processors also incorporate division and square root, but these operations can also be implemented effectively in software, such as on Itanium.
Some systems are also capable of performing more complex calculations, such as exponentials or trigonometric functions (such as cosines), but not very accurately in some domains. Most often, the processor uses microcode to emulate these operations. This microprogramming is slower than wiring, but much more economical, less energy-dissipating and above all less complex. To give an example, the Pentium had a microprogrammed floating unit. We can also notice that the incomplete filling of a microcode table was at the origin of its famous bug.
Floating-point operations like addition and multiplication are typically pipelined, but more complicated operations, such as division, may not be. Some systems may even have one or more circuit(s) dedicated to floating-point number division.
---
Due to the limitations of the technology of the time, until the mid-1990s, it was normal for the FPU to be entirely separate from the processor in personal computers. This FPU was then placed in a dedicated slot on the computer’s motherboard.

The presence of an FPU allows for a significant increase in performance for floating-point intensive calculations. For example, coprocessors offered wider registers: Even with 16- and 32-bit CPUs, the FPU often had 64-bit, 80-bit or even 128-bit wide registers. This made it possible to perform simple calculations with higher accuracy and to cover a wider range of values. Since the FPU inside is ultimately a digital computing unit, further, tricky methods are needed to achieve real acceleration. Most FPUs provide operations for basic arithmetic operations (with higher precision than the CPU), logarithm, root and power calculus, and trigonometric functions, as well as functions for computing with matrices.