Archive for the 'p.d.o' Category

Prepare yourself for the upcoming changes on the mozilla.debian.net repository

With the upcoming changes in the beta and aurora channels (6.0 is going to reach beta, and 7.0 to reach aurora), the mozilla.debian.net repository is going to adapt, and drop versioned archives in favor of channel archives. The channel archives for beta and aurora already existed, however what is new is the "release" channel. Currently, that channel contains Iceweasel 5.0, but as soon as 6.0 is released, that's what the "release" channel will contain.

To summarize, if you added lines containing iceweasel-x.0 where x is 4, 5, or 6 in your /etc/apt/sources.list, you need to update it to the corresponding channel (don't forget 4.0 is dead, you should use "release" instead).

The iceweasel-5.0 and iceweasel-6.0 archives still exist at the moment but will be dropped as soon as the new aurora and beta releases are ready, which should be real soon now (only waiting for actual upstream releases).

As a somehow related note, it should be noted that Iceweasel 5.0 should (finally) enter Debian unstable on the 15th of July, at which point the latest 6.0 beta will also be uploaded to Debian experimental. It is still unclear how long it will take for Iceweasel 5.0 to reach Debian testing/wheezy, because of all the reverse dependencies, but when that happens, we'll also be able to push it to backports.debian.org.

2011-07-07 12:25:21+0900

firefox | 24 Comments »

Iceweasel 5.0 in experimental

I just pushed Iceweasel 5.0 to Debian experimental. Why not unstable, some will ask? Well, because we still need to give some time after a first notice before breaking plenty of packages (Thanks to Julien Cristau for the MBF, by the way).

I also discontinued the Iceweasel 4.0 backport for Squeeze, as Iceweasel 4.0 won't be receiving security updates. Speaking of security updates, 3.6.18 was also made available on mozilla.debian.net for Wheezy, Squeeze and Lenny. However, I still have to backport the necessary patches to 3.5 in Squeeze and 3.0 in Lenny. My real life schedule wasn't compatible with the security release schedule, so I got late on the security backport train.

In the coming weeks, there will also be some additional changes to the mozilla.debian.net repository, but I'll give more details when that happens.

2011-06-22 02:41:30+0900

firefox | 16 Comments »

Iceweasel 5.0b2

... would have been released today if mozilla.debian.net was responding. But it's moving to a new server.

2011-05-21 08:17:50+0900

firefox | 5 Comments »

Debian Squeeze + btrfs = FAIL

Executive summary: Don't use btrfs on Debian Squeeze.
Longer summary: Don't use btrfs RAID with the kernel Debian Squeeze comes with.

About six months ago, I set up a new server to handle this web site, mail, and various other things. The system and most services (including web and mail) was set to use an MD RAID 1 array across two small partitions on two separate disks, and the remaining space was setup in three different btrfs file systems:

  • One btrfs RAID 0 for shared data I wouldn't mind having offline while fixing issues on one disk
  • One btrfs RAID 1 for shared data I would mind having offline while fixing issues on one disk
  • One last btrfs RAID 0 for entirely throwable things such as build chroots

Three days ago, this happened:

May 10 10:18:04 goemon kernel: [3545898.548311] ata4: hard resetting link
May 10 10:18:04 goemon kernel: [3545898.867556] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
May 10 10:18:04 goemon kernel: [3545898.874973] ata4.00: configured for UDMA/33

followed by other ATA related messages, then, garbage such as:

May 10 10:18:07 goemon kernel: [3545901.28123] sd3000 d]SneKy:AotdCmad[urn][ecitr
May 10 10:18:07 goemon kernel: 4[550.821 ecio es aawt es ecitr i e)
May 10 10:18:07 goemon kernel: 6[550.824     20 00 00 00 00 00 00 00 <>3491225     16 44 <>3491216]s ::::[d]Ad es:N diinlsneifrain<>3491216]s ::::[d]C:Ra(0:2 00 03 80 06 0<>3491217]edrqet / ro,dvsb etr2272
May 10 10:18:07 goemon kernel: 3[550.837 ad:sb:rshdln etr2252
May 10 10:18:07 goemon kernel: 6[551214]s ::::[d]Rsl:hsbt=I_Kdiebt=RVRSNE<>3491215]s ::::[d]SneKy:AotdCmad[urn][ecitr
May 10 10:18:07 goemon kernel: 4[550.833 ecitrsnedt ihsnedsrpos(nhx:<>3491216]    7 b0 00 00 c0 a8 00 00 0

Then later on:

May 10 12:01:18 goemon kernel: [3552089.226147] lost page write due to I/O error on sdb4
May 10 12:01:18 goemon kernel: [3552089.226312] lost page write due to I/O error on sdb4
May 10 12:10:14 goemon kernel: [3552624.625669] btrfs no csum found for inode 23642 start 0
May 10 12:10:14 goemon kernel: [3552624.625783] btrfs no csum found for inode 23642 start 4096
May 10 12:10:14 goemon kernel: [3552624.625884] btrfs no csum found for inode 23642 start 8192

etc. and more garbage.

At that point, I wanted to shutdown the server, check the hardware, and reboot. Shutdown didn't want to proceed completely. Btrfs just froze on the sync happening during the shutdown phase, so I had to power off violently. Nothing seemed really problematic on the hardware end, and after a reboot, both disks were properly working.

The MD RAID would resynchronize, and the btrfs filesystems would be automatically mounted. It would work for a while, until such things could be seen in the logs, with more garbage as above in between:

May 10 14:41:18 goemon kernel: [ 1253.455545] __ratelimit: 35363 callbacks suppressed
May 10 14:45:04 goemon kernel: [ 1478.717749] parent transid verify failed on 358190825472 wanted 42547 found 42525
May 10 14:45:04 goemon kernel: [ 1478.717936] parent transid verify failed on 358316642304 wanted 42547 found 42515
May 10 14:45:04 goemon kernel: [ 1478.717939] parent transid verify failed on 358190825472 wanted 42547 found 42525
May 10 14:45:04 goemon kernel: [ 1478.718128] parent transid verify failed on 358316642304 wanted 42547 found 42515
May 10 14:45:04 goemon kernel: [ 1478.718131] parent transid verify failed on 358190825472 wanted 42547 found 42525

Then there would be kernel btrfs processes going on and on sucking CPU and I/O, doing whatever it was doing. At such moment, most file reading off one of the btrfs volumes would either take very long or freeze, and un-mounting would only freeze. At that point, considering the advantages of btrfs (in my case, mostly, snapshots) were outweighed by such issues (this wasn't my first btrfs fuck up, but by large, the most dreadful) and the fact that btrfs is just so slow compared to other filesystems, I decided I didn't want to care trying to save these filesystems from their agonizing death, and that I'd just go with ext4 on MD RAID instead. Also, I didn't want to just try (with the possibility of going through similar pain) again with a more recent kernel.

Fortunately, I had backups of most of the data (only problem being the time required to restore that amount of data), but for the few remaining things which, by force of bad timing, I didn't have a backup of, I needed to somehow get them back from these btrfs volumes. So I created new file systems to replace the btrfs volumes I could directly throw away and started recovering data from backups. I also, at the same time, tried to copy a big disk image from the remaining btrfs volume. Somehow, this worked, with the system load varying between 20 and 60... (with a lot of garbage in the logs and other services deeply impacted as well) But when trying to copy the remaining files I wanted to recover, things got worse, so I had to initiate a shutdown, and power cycle again.

Since apparently the kernel wasn't going to be very helpful, the next step was to just get other things working, and get the data back some other way. What I did was to use a virtual machine to get the data off the remaining btrfs volume. The kernel could become unusable all it wanted to, I could just hard reboot without impacting the other services.

In the virtual machine, things got "interesting". I did try various things I've seen on the linux-btrfs list, but nothing really did anything at all except spew some more parent transid messages. I should mention that the remaining btrfs volume was a RAID 0. To mount those, you'd mount one of the constituting disks like this:

$ mount /dev/sdb /mnt

Except that it would complain that it can't find a valid whatever (I don't remember the exact term, and I threw the VM away already) so it wouldn't mount the volume. But when mounting the other constituting disk, it would just work. Well, that's kind of understandable, but what is not is that on the next boot (I had to reboot a lot, see below), it would error out on the disk that worked previously, and work on the disk that was failing before.

So, here is how things went:

  • I would boot the VM and mount the volume,
  • launch an rsync of the data to recover, which I'd send onto the host system,
  • observe, from the host system, what was going on I/O wise,
  • at some point (usually after something like 10 to 50 files rsync'ed), after throwing a bunch of parent transid error messages, the VM would just stop doing any kind of I/O (even if left alone for several minutes), at which point I'd hard shutdown the VM and start over.

Ain't that fun?

The good thing is that in the end, despite the pain, I recovered all that needed to be recovered. I'm in the process of recreating my build chroots from scratch, but that's not exactly difficult. It would just have taken a lot more time to recover them the same way, 50 files at a time.

Side note: yes, I did try newer versions of btrfsck ; yes I did try newer kernels. No, nothing worked to make these btrfs volumes viable. No, I don't have an image of these completely fucked up volumes.

2011-05-13 12:13:32+0900

p.d.o | 15 Comments »

Installing Iceweasel 5.0a2 on Debian GNU/Linux

  • Go to the Debian Mozilla Team page.
  • Select the Debian version you are running, "Iceweasel" and the version, "5.0".
  • Follow the instructions.
  • Profit.

Only amd64 and i386 packages are available. Note that there is another Iceweasel "version" available there: "aurora". Currently, this is the same as "5.0", but whenever Firefox 5.0 will reach the beta stage, "aurora" will be 6.0a2. Please feel free to use "aurora" if you want to keep using these pre-beta builds.

2011-05-02 19:00:42+0900

firefox | 20 Comments »

Coming soon

2011-04-23 10:31:59+0900

firefox | 20 Comments »

Avoiding dependencies upon recent libstdc++

Mozilla has been distributing Firefox builds for GNU/Linux systems for a while, and 4.0 should even bring official builds for x86-64 (finally, some would say). The buildbots configuration for these builds uses gcc 4.3.3 to compile the Firefox source code. With the C++ part of gcc, it can sometimes mean side effects when using the C++ STL.

Historically, the Mozilla code base hasn't made a great use of the STL, most probably because 10+ years back, portability and/or compiler support wasn't very good. More recently, with the borrowing of code from the Chromium project, this changed. While the borrowed code for out-of-process plugins support didn't have an impact on libstdc++ usage, the recent addition of ANGLE had. This manifests itself in symbols version usage

These are the symbol versions required from libstdc++.so.6 on 3.6 (as given by objdump -p):

  • CXXABI_1.3
  • GLIBCXX_3.4

And on 4.0:

  • CXXABI_1.3
  • GLIBCXX_3.4
  • GLIBCXX_3.4.9

This means Firefox 4.0 builds from Mozilla need the GLIBCXX_3.4.9 symbol version, which was introduced with gcc 4.2. This means Firefox 4.0 builds don't work on systems with a libstdc++ older than that, while 3.6 builds would. It so happens that the system libstdc++ on the buildbots themselves is that old, which is why we set LD_LIBRARY_PATH to the appropriate location during tests. This shouldn't however be a big problem for users.

Newer gcc, new problems

As part of making Firefox faster, we're planning to switch to gcc 4.5, to benefit from better (as in working) profile guided optimization, and other compiler improvements. We actually attempted to switch to gcc 4.5 twice during the 4.0 development cycle. But various problems made us go back to gcc 4.3.3, the main contender being the use of even newer libstdc++ symbols:

  • CXXABI_1.3
  • GLIBCXX_3.4
  • GLIBCXX_3.4.5
  • GLIBCXX_3.4.9
  • GLIBCXX_3.4.14

GLIBCXX_3.4.14 was added in gcc 4.5, making the build require a very recent libstdc++ installed on users systems. As this wouldn't work for Mozilla builds, we attempted to build with -static-libstdc++. This options makes the resulting binary effectively contain libstdc++ itself, which means not requiring a system one. This is the usual solution used for builds such as Mozilla's, that require to work properly on very different systems.

The downside of -static-libstdc++ is that it makes the libxul.so binary larger (about 1MB larger). It looks like the linker doesn't try to eliminate the code from libstdc++ that isn't actually used. Taras has been fighting to try to get libstdc++ in a shape that would allow the linker to remove that code that is effectively dead weight for Firefox.

Why do we need these symbols?

The actual number of symbols required with the GLIBCXX_3.4.14 version is actually very low:

  • std::_List_node_base::_M_hook(std::_List_node_base*)
  • std::_List_node_base::_M_unhook()

With the addition of the following on debug builds only:

  • std::string::_S_construct_aux_2(unsigned int, char, std::allocator<char> const&)
  • std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_S_construct_aux_2(unsigned int, wchar_t, std::allocator<wchar_t> const&)

The number of symbols required with the GLIBCXX_3.4.9 version is even lower:

  • std::istream& std::istream::_M_extract<double>(double&)

It however varies depending on the compiler version. I have seen other builds also require std::ostream& std::ostream::_M_insert(double).

All these are actually internal implementation details of the libstdc++. We're never calling these functions directly. I'm going to show two small examples triggering some of these requirements (that actually generalize to all of them).

The case of templates

#include <iostream>
int main() {
    unsigned int i;
    std::cin >> i;
    return i;
}

This example, when built, requires std::istream& std::istream::_M_extract<double>(double&), but we are effectively calling std::istream& operator>>(unsigned int&). It is defined in /usr/include/c++/4.5/istream as:

template<typename _CharT, typename _Traits>
class basic_istream : virtual public basic_ios<_CharT, _Traits> {
    basic_istream<_CharT, _Traits>& operator>>(unsigned int& __n) {
        return _M_extract(__n);
    }
}

And _M_extract is defined in /usr/include/c++/4.5/bits/istream.tcc as:

template<typename _CharT, typename _Traits> template<typename _ValueT>
        basic_istream<_CharT, _Traits>&
        basic_istream<_CharT, _Traits>::_M_extract(_ValueT& __v) {
            (...)
        }

And later on in the same file:

extern template istream& istream::_M_extract(unsigned int&);

What this all means is that libstdc++ actually provides an implementation of an instance of the template for the istream (a.k.a. basic_istream<char>) class, with an unsigned int & parameter (and some more implementations). So, when building the example program, gcc decides, instead of instantiating the template, to use the libstdc++ function.

This extern definition, however, is guarded by a #if _GLIBCXX_EXTERN_TEMPLATE, so if we build with -D_GLIBCXX_EXTERN_TEMPLATE=0, we actually get gcc to instantiate the template, thus getting rid of the GLIBCXX_3.4.9 dependency. The downside is that this doesn't work so well with bigger code, because other things are hidden behind #if _GLIBCXX_EXTERN_TEMPLATE.

There is however another (obvious) way to for the template instantiation: instantiating it. So adding template std::istream& std::istream::_M_extract(unsigned int&); to our code is just enough to get rid of the GLIBCXX_3.4.9 dependency. Other template cases obviously can be worked around the same way.

The case of renamed implementations

#include <list>
int main() {
    std::list<int> l;
    l.push_back(42);
    return 0;
}

Here, we get a dependency on std::_List_node_base::_M_hook(std::_List_node_base*) but we are effectively calling std::list<int>::push_back(int &). It is defined in /usr/include/c++/bits/stl_list.h as:

template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
class list : protected _List_base<_Tp, _Alloc> {
    void push_back(const value_type& __x) {
        this->_M_insert(end(), __x);
    }
}

_M_insert is defined in the same file:

template<typename ... _Args>
void _M_insert(iterator __position, _Args&&... __args) {
    _List_node<_Tp>* __tmp = _M_create_node(std::forward<_args>(__args)...);
    __tmp->_M_hook(__position._M_node);
}

Finally, _M_hook is defined as follows:

struct _List_node_base {
    void _M_hook(_List_node_base * const __position) throw ();
}

In gcc 4.4, however, push_back has the same definition, and while _M_insert is defined similarly, it calls __tmp->hook instead of __tmp->_M_hook. Interestingly, gcc 4.5's libstdc++ exports symbols for both std::_List_node_base::_M_hook and std::_List_node_base::hook, and the code for both methods is the same.

Considering the above, a work-around for this kind of dependency is to define the newer function in our code, and make it call the old function. In our case here, this would look like:

namespace std {
    struct _List_node_base {
        void hook(_List_node_base * const __position) throw ();
        void _M_hook(_List_node_base * const __position) throw ();
    };
    void _List_node_base::_M_hook(_List_node_base * const __position) throw () {
        hook(__position);
    }
}

... which you need to put in a separate source file, not including <list>.

All in all, with a small hack, we are able to build Firefox with gcc 4.5 without requiring libstdc++ 4.5. Now, another reason to switch to gcc 4.5 was to use better optimization flags, but it turns out it makes the binaries 6MB bigger. But that's another story.

2011-03-14 13:21:04+0900

p.d.o, p.m.o | 3 Comments »

東北地方太平洋沖地震

みんなは大丈夫かな~

2011-03-11 10:10:15+0900

p.d.o | No Comments »

A good reason to keep patched source in $VCS

There are a lot of different workflows to maintain Debian packages under a Version Control System. Some people prefer to only keep the debian directory, some the whole source. And in the latter category, some prefer the source tree to be patched with Debian changes, while others prefer to keep it unpatched and exclusively use debian/patches.

It turns out the former and the latter don't work so well in one specific case that any package may hit some day ; and that day, you realize how wrong you were not tracking the entire patched source. That happened to me recently, though instead of actually going forward and switch to tracking the patched source, I cheated and simply ditched the patch, because I didn't strictly need it.

In all fairness, this is not only a case against not tracking patched source, but also a case of the 3.0 (quilt) source format being cumbersome.

In my specific case, I cherry picked an upstream patch modifying and adding some test cases related to a cherry-picked fix. One of the added test cases was a UTF-16 file. UTF-16 files can't be diff'ed nor patch'ed except in the git patch format, but quilt doesn't use nor support that. The solution around this limitation of 3.0 (quilt) format is to include the plain modified file in the Debian tarball, and add its path to debian/source/include-binaries.

On the VCS side of things, it means you have to modify the file in the source directory, and fill debian/source/include-binaries accordingly. Wait. Modify the file in the source directory ? But the other files aren't ! They're tracked by patches !

So here you are, with all of your modifications exclusively in debian/patches... except one.

2011-03-06 10:27:33+0900

debian | 5 Comments »

More graphs for the Debian Bug Tracking System

I have been maintaining Debian Bug Tracking System graphs for a few years, now, though not very actively. They initially were available on people.debian.org/~glandium/bts/, but there have been some recent changes.

A while ago, I started experimenting with brand new graphs on merkel.debian.org/~glandium/bts/, and when merkel was announced to be dying a few months ago, I got in touch with the QA team to know what to do with them, and it was decided we'd put them on qa.debian.org. I unfortunately didn't follow up much on this and only recently actually worked on the migration, which took place 2 weeks ago.

The result is that the graphs have officially moved to qa.debian.org/data/bts/graphs/, and links on the Package Tracking System have been updated accordingly. There is now also an additional graph tracking all open bugs in the BTS, across all packages:

Today, I added a new feature, allowing to consolidate data for multiple arbitrary packages in a single graph. Such graphs can be generated with the following URL scheme (please don't over-abuse of it):

https://qa.debian.org/data/bts/graphs/multi/name0,name1,name2,etc.png

As an example, here is a graph for all the bugs on the packages I (co-)maintain:

https://qa.debian.org/data/bts/graphs/multi/dehydra,diggler,iceape,iceweasel,libxml2,libxslt,livehttpheaders,mozilla-dom-inspector,nspr,nss,pyxpcom,venkman,vmfs-tools,webkit,xulrunner,zfs-fuse.png

And here are the bugs affecting Mozilla-related packages:

https://qa.debian.org/data/bts/graphs/multi/iceape,icedove,iceowl,iceweasel,nspr,nss,xulrunner.png

I guess the next step is to allow per-maintainer consolidation through URLs such as

https://qa.debian.org/data/bts/graphs/by-maint/address.png

Update: per-maintainer consolidation has been added.

(Hidden message here: please help triaging these bugs)

2011-03-05 14:21:31+0900

debian | 6 Comments »