Hyper Pipelined Technology
In order to deliver the
highest clock rates, the Pentium 4 features a pipeline twice as big as the
one on the Pentium III (10). The original Pentium processor which was based
on the P5 architecture featured a total of 5 stages. Intel doubled that number
on the P6 architecture featuring a total of 10 stages on the Pentium PRO and
Pentium III. Intel doubled that number again with their latest architecture
(NetBurst); the Pentium 4 features a total of 20 Pipelines. The 20-stage
pipeline is what Intel calls their Hyper Pipelined Technology.
Advanced Dynamic Execution
Intel describes the Advanced Dynamic Execution being an out of order speculative
execution engine. This engine keeps the execution units executing instructions.
This is accomplished by providing a large window of instructions from which
the execution units can choose. The large out of order instruction window
allows the processor to avoid stalls that might occur while instructions are
waiting for dependencies to resolve. Intelís previous P6 architecture featured
a small window of 42 instructions, the NetBurst architecture that can have
up to 126 instructions in this window (in flight).
This technology at the same time features an improved branch prediction capability.
The Pentium 4 is estimated to reduce branch miss-predictions by around 33%
compared to the P6 architectureís branch prediction capability. This is achieved
by implanting a 4K branch target buffer that is used to store more detail
on the history of past branches and as well as by implementing a more advanced
branch prediction algorithm.
Rapid Execution Engine
The new architecture permitted
the Pentium 4 to run the Arithmetic Logic Units (ALUs) two times the frequency
of the Processorís core it self. This means that the Arithmetic Logic Units
on a Pentium 4 running at 2.2 are operating at 4.4GHz with a latency that
is half the duration of the core clock. This can be directly translated in
higher through and reduced latency of execution.
400MHz Front Side Bus
The most talked features about the Pentium 4 is its 400MHz BUS. The Pentium
III Processorís 133MHz bus, which is 64-bit Wide, is capable of delivering
1.06GB/S of data. The Architecture of the Pentium 4 is somewhat different.
The Pentium 4ís bus is clocked at only 100MHz at also 64-bit wide, what differs
here is that the 100MHz is quad pumped and is capable of achieving a whooping
Advanced Transfer Cache
Intelís Pentium 4 features 8KB of L1 data cache. This is half the size of what
the Pentium III features. This may seem a bit confusing at first, but smaller
caches have lower latencies. This was done in order to decrease the latency
of the L1 memory, this should result in an improved transfer rate but at the
same time, the little size (8K) might not be enough for some specific tasks.
This is where the L2 memory comes in mind. The Pentium 4 Willamette, like
the Pentium III (Coppermine) features 256k of on-die-cache
on a 256-bit bus and the Pentium 4 Northwood features a total of 512K of L2
Execution Trace Cache
This technology caches decoded x86 instructions (micro-ops), thus removing
the latency associated with the instruction decoder from the execution loop.
The Execution Trace Cache stores the micro-ops in the path of program execution
flow, where the results of branches in the code are integrated into the same
Execution Trace Cache is another handy technique Intel implemented in its
new Architecture to ease the penalty of miss-Predicted Branch instructions.
On older Intel processors, based on previous architectures, if the branch
instruction was miss-predicted, the processor needed to start the process
from the beginning. The NetBurst architectures allows going directly through
the Execution Trace Cache Technology to retrieve the micro-op and then send
it through execution pipeline without having to restart the process from the
Streaming SIMD Extensions
Pentium 4 architecture features 144 new instructions capable of delivering
128-bit SIMD integer arithmetic operation and 128-Bit SIMD Double Precision