The measured effect of I/O on application startup
I did some experiments with the small tool I wrote in previous post in order to gather some startup data about Firefox. It turns out it can't flush directories and other metadata from the page cache, making it unfortunately useless for what I'm most interested in.
So, I gathered various startup information about Firefox, showing how page cache (thus I/O) has a huge influence on startup. The data in this post are mean times and 95% confidence interval for 50 startups with an existing but fresh profile, in a mono-processor GNU/Linux x86-64 virtual machine (using kvm) with 1GB RAM and a 10GB raw hard drive partition over USB, running, except when said otherwise, on an i7 870 (up to 3.6GHz with Intel Turbo Boost). The Operating System itself is an up-to-date Debian Squeeze running the default GNOME environment.
Firefox startup time is measured as the difference between the time in ms right before starting Firefox and time in ms as returned by javascript in a data:text/html
page used as home page.
Startup vs. page cache
Average startup time (ms) | |
---|---|
Entirely cold cache (drop_caches ) |
5887.62 ± 0.88% |
Cold cache after boot | 3382.0 ± 0.51% |
Selectively cold cache (see below) | 2911.46 ± 0.48% |
Hot cache (everything previously in memory) | 250.74 ± 0.18% |
The selectively cold cache case makes use of the flush
program from previous post and a systemtap script used to get the list of files read during startup. This script will be described in a separate post.
As you can see, profiling startup after echo 3 > /proc/sys/vm/drop_caches
takes significantly more time than in the normal conditions users would experience, because of all system libraries that would normally be in the page cache being flushed, hence biasing the view one can have of the actual startup performance. Mozilla build bots were running, until recently, a ts_cold
startup test that, as I understand it, had this bias (which is part of why it was stopped).
The Hot cache value is also interesting because it shows that the vast majority of cold startup time is due to hard disk I/O (and no, there is no page faults number difference).
I/O vs CPU
Interestingly, testing on a less beefy machine (Core 2 Duo 2.2GHz) with the same USB disk and kvm setup shows something not entirely intuitive:
Average (ms) | |
---|---|
Entirely cold cache | 6748.42 ± 1.01% |
Cold cache after boot | 3973.64 ± 0.53% |
Selectively cold cache | 3445.7 ± 0.43% |
Hot cache | 570.58 ± 0.70% |
I, for one, would have expected I/O bound startups to only be slower by around 320ms, which is roughly the hot cache startup difference, or, in other words, the CPU bound startup difference. But I figured I was forgetting an important factor.
I/O vs. CPU scaling
Modern processors do frequency scaling, which allows the processor to run slowly when underused, and faster when used, thus saving power. It was first used on laptop processors to reduce power drawn from the battery, allowing batteries to last longer, and is now used on desktop processors to reduce power consumption. It unfortunately has a drawback, in that it introduces some latency when the scaling kicks in.
A not so nice side effect of frequency scaling is that when a process is waiting for I/O, the CPU is underused, making the CPU usually run at its slowest frequency. When the I/O ends and the process runs again, the CPU can go back to full speed. This means every I/O can induce, on top of latency because of, e.g. disk seeks, CPU scaling latency. And it actually has much more impact than I would have thought.
Here are results on the same Core 2 Duo, with frequency scaling disabled, and the CPU forced to its top speed :
Average (ms) | |
---|---|
Entirely cold cache | 5824.1 ± 1.13% |
Cold cache after boot | 3441.8 ± 0.43% |
Selectively cold cache | 3025.72 ± 0.29% |
Hot cache | 576.88 ± 0.98% |
(I would have liked to do the same on the i7, but Intel Turbo Boost complicates things and I would have needed to get two new sets of data)
Update: I actually found a way to force one core at its max frequency and run kvm processes on it, giving the following results:
Average (ms) | |
---|---|
Entirely cold cache | 5395.94 ± 0.83% |
Cold cache after boot | 3087.47 ± 0.31% |
Selectively cold cache | 2673.64 ± 0.21% |
Hot cache | 258.52 ± 0.35% |
I haven't gathered enough data to have accurate figures, but it also seems that forcing the CPU frequency to a fraction of the fastest supported frequency gives the intuitive results where the difference between all I/O bound startup times is the same as the difference for hot cache startup times. As such, I/O bound startup improvements would be best measured as an improvement in the difference between cold and hot cache startup times, i.e. (cold2 - hot2) - (cold1 - hot1)
, at a fixed CPU frequency.
Startup vs. desktop environment
We saw above that the amount of system libraries in page cache directly influences application startup times. And not all GNU/Linux systems are made equal. While the above times were obtained under a GNOME environment, some other desktop environments don't use the same base libraries, which can make Firefox require to load more of them at cold startup. The most used environment besides GNOME is KDE, and here is what cold startup looks like under KDE:
Average startup time (ms) | |
---|---|
GNOME cold cache | 3382.0 ± 0.51% |
KDE cold cache | 4031.9 ± 0.48% |
It's significantly slower, yet not as slow as the entirely cold cache case. This is due to KDE not using (thus not pre-loading) some of the GNOME core libraries, yet using libraries in common, like e.g. libc (obviously), or dbus.
2011-01-03 16:59:17+0900
You can leave a response, or trackback from your own site.
2011-01-03 17:46:06+0900
An interesting read, thanks for the investigation. I hope some kind of reliable startup performance tool comes from this.
2011-01-03 22:34:24+0900
In your KVM setup, did you take into account that the host system caches the disk image, even if you drop caches in the guest?
2011-01-03 23:10:19+0900
Anonymous: Yes, obviously. I used the cache=none option for drives.
2011-01-11 21:05:39+0900
Please keep up this work. It is fantastic to see startup time finally getting the well-overdue attention it has needed since … well since the first firefox build and the days when any criticism sent to Mozilla about Firefox startup perf was ignorantly (it now seems) backhanded towards the critic with excuses like “well IE is massively pre-loaded, use our Netscape 4 era pre-loader if you want to”.
From that point we are now at the stage where I see Taras is looking for new developer(s) to work on startup alone!