The Display of Percent CPU in top

The single most important piece of information processed by top is the
measure of a process’s percentage cpu utilization, known as percent
cpu
. Although top is perfectly capable of display and sorting on a
variety of information, by default it sorts by percent cpu. The
reason for this is that most people use top to find out what the cpu
is doing, or more specifically which process is hogging the cpu.
Percent cpu readily reveals this information.


But how exactly is this measured? Since the early days of BSD, the
percent cpu utilization of each process was calculated by the kernel
and saved along with other process information in what’s called the
proc structure. The kernel maintains an array of proc structures to
track its processes. The ps command and, in some cases, top simply
display this value as a true representation of process
utilization. Unfortunately, that isn’t the whole truth.

You see, the only reason the kernel cares about this number is so that
it can provide equitable access to the cpu. If a process is using too
much cpu and there are other processes waiting to run, then the kernel
will want to ensure that each process gets a fair shot at running. So
it calculates a figure that takes recent history in to account; a
decaying average of the true cpu utilization for a process. In most
BSD-derived systems, for example, the average is decayed so that 95%
of a process’s utilization from 60 seconds ago is no longer counted
(but 5% is). If a process begins to accumulate a high average
percentage then it has been sustaining such high use for at least 30
seconds, and probably longer. A process with a high average is likely
a cpu hog and should be given lower priority when being scheduled for
the cpu. In Unix, the decaying average is used when making scheduling
decisions (see footnote). You can see how this number behaves over
time. Start an infinite loop on an otherwise idle system (my favorite
is “while true; do true; done”). Then watch the %cpu figure in the
output of ps. At first it will be quite low. In fact, it will take
about 30 seconds before the percentage reaches 50. It won’t get over
90% until about 55 seconds, and it won’t reach 99% or 100% until a
full minute after you start the loop. This is because it is still
giving weight to the idleness of the process from a full minute in the
past.

The decaying average is helpful to the kernel, and it may be helpful
to someone who wants to get a single snapshot of the system with ps.
But top is intended to provide information about what is happening
with the system right now, not a minute ago. I think it is misleading
for top to show this decaying average as percent cpu, as it doesn’t
always provide an accurate picture of the moment. Some of the ports
for top now calculate this percentage from other information rather
than simply parroting what the kernel has done. This provides a
better picture of what happened over the past 5 seconds, while
ignoring information that the kernel scheduler will use. The
calculation is simple. A process’s total accumulated cpu time is also
tracked in the proc structure, and is shown in the column labeled
time. To calculate percent cpu, top will sample and remember every
process’s cpu time. At the next update, it will again sample cpu
time, calculate the difference from the last reading, divide by the
elapsed time between the samples, then multiply by 100 to convert to a
percentage (multiprocessor systems complicate this, and we will get to
that case next). This provides an accurate measurement of cpu
utilization between updates, but it is not a decaying average. If you
are on a system where top performs this calculation, then start your
infinite loop again and run top. You will see that within two updates
the process is at or near 100% (assuming your system is otherwise idle
and that there is only one processor). There is one downside to this
method: top has to take two samples before it can display percent cpu.
You may notice on such systems that it takes about a second for top to
display its first screen of data. This is entirely due to the need
for recording two samples before calculating percent cpu.

There are cases where top will only show 50%, 25%, or even a lower
figure for a looping process on an otherwise idle system. If you are
on a system that has more than one cpu, top will probably display
cpu utilization as a percentage of the total available cpu. So if you
have two processors, a single process can only use one of them and
will only show 50%. Likewise on a 4-processor system your infinite
loop will only show 25%. However, a process with multiple threads
will be able to utilize more than one processor simultaneously. Top
will reflect such utilization with higher percentages. Think of it as
a percentage of all available cpu cycles across all processors.

While continuing development for top, I plan to implement direct
calculation of percent cpu in as many ports of top as I can, as I
believe it provides a better picture of system activity. I am
interested in feedback from the user community on this matter. What
do you think?

Weighted CPU

While we are on the subject of percent cpu, we must mention something
called weighted cpu. This is a bit of an embarrassment to me, but I
feel that some sort of clarification is long past due. Top was
originally written for BSD 4.1, which tracked percent cpu pretty much
as described here. However, the scheduler would apply an additional
calculation to that number before using it to influence scheduling.
The result was known as weighted cpu percent, and was also reflected
in the source for ps. I dutifully followed suit and, in that early
version, displayed raw cpu and weighted cpu in two columns
side-by-side. As Unix grew, mutated, and developed, scheduling
algorithms were changed and methods altered. Percent cpu is an easy
measure to understand and it has significance to you and me, but the
intent and meaning of weighted cpu became lost. Most Unix systems no
longer used this calculation, preferring improved scheduling
algorithms that no longer needed it. Nonetheless, when top was ported
to these systems by well intentioned individuals, they copied the code
for calculating weighted cpu. I think it is fair to say that at this
point in time the number no longer has any meaning on any modern Unix
system. Even the ps version of BSD calculates and displays
something called weighted cpu, but that number really has no relevance
to scheduling or anything that goes on in the kernel. Right now I
consider weighted cpu nothing more than a waste of valuable screen
space. In future versions of top you may no longer see that number.

Footnote

In actuality, BSD systems (starting with version 4.3) no longer use a
traditional decaying average in their scheduling algorithms, and they
only track the number to keep ps happy. The BSD scheduler instead
uses a formula that decays cpu utilization by a factor of the load
average.