上周我们报道了原生支持 Linux 的 ZFS 将很快发布,它是由劳伦斯利弗莫尔国家实验室(Lawrence Livermore National Laboratory)研制的将 Sun 的 ZFS 文件系统以 CDDL 许可内核模块的方式移植到 Linux 上的项目。不过就在发表该文章时,已经有可用的 FUSE(用户空间的文件系统)的 ZFS 模块了,而且由于它并非属于 Linux 内核的 GPL 范畴,所以可以合法地使用,不过它并没有带来什么性能上的提升。但在上周末在相关论坛和其他地方上出现了一些关于 ZFS-FUSE 的可靠性以及使用 FUSE 对于在实际硬件中使用会造成什么程度的影响的讨论。我们测试了 ZFS-FUSE——最近的稳定版和 Git 最新版——并将这个 ZFS Linux 移植的可选方案与原生的 EXT4 和 Btrfs 作了对比。
For those not familiar with the GPL-licensed Linux FUSE module, it is a Linux kernel module that has been living within the mainline kernel since the Linux 2.6.14 release and it allows non-privileged users to create their own file-systems in user-space with the FUSE module then providing a bridge to interface with the Linux kernel. FUSE is also available for BSD, OpenSolaris, and Mac OS X operating systems too. With FUSE file-systems living in user-space, they do not need to comply with the GNU GPL since only the FUSE module is loaded against the Linux kernel, but there is an overhead associated with this approach. Besides ZFS-FUSE, there are dozens of other FUSE file-systems including ClamFS, httpFS, ChunkFS, vmware-mount, and GnomeVFS2 FUSE. The most recent release of ZFS-FUSE is version 0.6.9 and is based upon Zpool version 23 (much better than Zpool 18 being used by LLNL/KQ Infotech at this time, with post-18 revisions adding features like de-duplication support) and supports NFS sharing, PowerPC architecture, a multi-threaded ioctl handler, and other improvements. ZFS 0.7.0 is the release presently under development and is expected for release in early October. For our testing of ZFS-FUSE, we used both the latest stable 0.6.9 release and a 0.7.0 Git snapshot as of their latest official code in their Git repository as of 2010-08-28.
对于那些并不熟悉 GPL 许可的 Linux FUSE 模块的读者,应该了解它是从 Linux 2.6.14 版本开始出现在内核主分支中的 Linux 内核模块,它允许没有权限的用户通过 FUSE 模块在用户空间中创建他们自己的文件系统,接着提供与 Linux 内核交互的桥梁。FUSE 也可以用于 BSD、OpenSolaris 和 Mac OS X 操作系统中。当使用用户空间中的 FUSE 文件系统时,它们并不需要遵守 GNU GPL,因为只有 FUSE 模块被加载到 Linux 内核上面,不过这样的连接方式会造成额外的效率损失。除了 ZFS-FUSE,还有几十种其他的 FUSE 文件系统,包括 ClamFS、httpFS、ChunkFS、vmware-mount,以及 GnomeVFS2 FUSE。最新版本的 ZFS-FUSE 是 0.6.9 版,基于 Zpool 23 版本(比目前 LLNL/KQ Infotech 使用的 Zpool 18 好得多,带有过去 18 个版本的补丁并添加了类似重复数据删除(de-duplication)支持这样的功能),并支持 NFS 共享、PowerPC 架构、多线程 ioctl 管理,以及其他的改进。ZFS 0.7.0 是目前还在开发中的版本,预计在 10 月初发布。在我们的 ZFS-FUSE 测试中,我们使用最新的稳定版 0.6.9 版和 0.7.0 Git 最新版,后者为 2010-08-28 的 Git 软件库中的最新官方代码。
KQ Infotech, the company working on a native ZFS module based upon the code of the Lawrence Livermore National Laboratory, has referred to FUSE as "crap." Others have expressed their thoughts that the "performance sucks" and "it's probably nowhere near as robust as a native kernel implementation." There have also been some strong proponents of using FUSE, "In addition, they make some SERIOUS claims against the viability of a fuse-based file-system that are, quite frankly, FALSE. Yes, the zfs-fuse file-system can be slow... on OLD KERNELS. The limitations that these problems are created by have been solved. zfs-fuse, when correctly configured, gives near-platter performance levels! And going through fuse solves the majority of the licensing issues. It's a win-win! And so you have this person coming on the forum here, making crazy claims, not providing any substance, and expect everyone to be amazed? All they're doing is trying to build up hype... for something that is going to tank. Big time." These heated and polarized views on ZFS-FUSE has led us to benchmarking it while we wait being able to test the native ZFS Linux implementation.
KQ Infotech 是一家根据劳伦斯利弗莫尔国家实验室的代码开发原生 ZFS 模块支持的公司,它称 FUSE 为“垃圾”。还有人认为“效率太差”或者“它与健全的原生内核支持相差十万八千里”。也有许多使用 FUSE 的支持者表示:“而且,他们发表一些堂皇的言论,否认基于 fuse 的文件系统的可行性,说实话,这些都是一派胡言。没错,zfs-fuse 文件系统可能会比较慢……但只是在旧版本的内核上。这些问题造成的局限性已经被解决了。zfs-fuse 在正确配置下性能几乎和原生支持一样好!而且通过 fuse 的方式解决了主要的版权问题。这是双赢的!所以你就能容忍论坛上这边出来了一个人,胡言乱语一通,却没有提供任何实证,却希望所有人都和他一样疯疯癫癫?他们想做的仅仅是炒作……给那些注定失败的东西炒作。炒作成大新闻。”这些针对 ZFS-FUSE 的热火朝天的两极化观点让我们想要在测试原生 ZFS Linux 支持之前先测试一下它。
The hardware setup for our ZFS-FUSE benchmarking included an Intel Core i7 920 CPU overclocked to 3.60GHz, an ASRock X58 SuperComputer motherboard, 3GB of DDR3 system memory, a 60GB OCZ Vertex 2 SSD, and an ATI Radeon HD 4670 graphics card. For this ZFS-FUSE benchmarking we performed a clean installation of Ubuntu 10.04.1 LTS (x86_64) and installed the latest Linux kernel Git code for Linux 2.6.36 as of 2010-08-26. We set the Phoronix Test Suite to install and run the tests from a second partition on the OCZ Vertex 2 SSD that was tested with EXT4, Btrfs, ZFS-FUSE 0.6.9, and ZFS-FUSE 0.7.0 (Git 2010-08-28). Following that, we installed OpenSolaris b134 to test its native ZFS implementation on this OCZ solid-state drive. The Phoronix Test Suite facilitated all testing.
我们的 ZFS-FUSE 测试的硬件配置包括一块超频到 3.60GHz 的英特尔(Intel)Core i7 920 CPU、一块 ASRock X58 SuperComputer 主板、3GB 的 DDR3 系统内存、一快 60GB 的 OCZ Vertex 2 SSD 硬盘,以及一块 ATI Radeon HD 4670 显卡。为了这次 ZFS-FUSE 测试我们安装了原版的 Ubuntu 10.04.1 LTS(x86_64)并安装了最新的 2010-08-26 的 Linux 2.6.36 的 Git 代码。我们将 Phoronix 测试套件安装并运行于 OCZ Vertex 2 SSD 硬盘的第二个分区中,分别以 EXT4、Btrfs、ZFS-FUSE 0.6.9,以及 ZFS-FUSE 0.7.0(Git 2010-08-26)测试。接着,我们安装了 OpenSolaris b134 来测试它在 OCZ 固态硬盘上的原生 ZFS 支持。所有测试由 Phoronix 测试套件进行。
Starting with the Apache server benchmark, using ZFS-FUSE causes a dramatic performance hit compared to EXT4 and Btrfs. The number of requests that could be sustained per second when using ZFS-FUSE on Ubuntu Linux was 42% lower than using EXT4, which is the default file-system of Ubuntu Lucid Lynx.
首先是 Apache 服务器测试,和 EXT4 和 Btrfs 相比使用 ZFS-FUSE 的结果有很大的性能降低。在 Ubuntu Linux 上使用 ZFS-FUSE 在每秒内能够处理的请求数量比用 Ubuntu Lucid Lynx 默认的文件系统 EXT4 低了 42%。
The EXT4 performance under the PostgreSQL workload was over six times faster than when the ZFS file-system had been tested in the Linux user-space. The latest ZFS-FUSE Git code for the upcoming ZFS-FUSE 0.7 release did improve the performance by 6% over ZFS-FUSE 0.6.9, but obviously, this made no dent in the EXT4 performance. Interestingly, the ZFS-FUSE performance was actually faster than that of Btrfs, but it's known right now that the PostgreSQL server performance is notoriously bad right now for Btrfs. We were able to successfully run this test profile under OpenSolaris b134 on ZFS and its performance was 61% faster than the ZFS-FUSE Git code on Linux.
在 PostgreSQL 工作负荷测试中 EXT4 的性能比在 Linux 用户空间中测试的 ZFS 文件系统快了 5 倍有余。最新的 ZFS-FUSE 0.7 版本的 ZFS-FUSE Git 代码比 ZFS-FUSE 0.6.9 性能提高了 6%,但不明显,这无法与填补与 EXT4 性能之间的差距。有趣的是,ZFS-FUSE 的性能实际上比 Btrfs 要好,不过据称目前在 Btrfs 上的 PostgreSQL 服务器性能是众所周知的糟糕。我们成功地在 OpenSolaris b134 的 ZFS 中运行了测试项目,它比 Linux 上的 ZFS-FUSE Git 代码快了 61%。
The PostgreSQL disk performance was over six times faster than ZFS-FUSE compared to EXT4, but with the PostMark e-mail server disk benchmark the performance lead was greater by nine times. ZFS on OpenSolaris was two times faster than ZFS-FUSE 0.6.9 and just under two times for ZFS on OpenSolaris b134. While ZFS is designed around OpenSolaris, it still only ran at 31% the speed of Btrfs and just 22% the speed of EXT4.
PostgreSQL 磁盘性能测试中 EXT4 比 ZFS-FUSE 快了 6 倍多,不过在 PostMark 电子邮件服务器性能测试中则高出了 8 倍。在 OpenSolaris 上的 ZFS 性能比 ZFS-FUSE 0.6.9 快了一倍,实际上正好比它的两倍少那么一点。虽然 ZFS 是为 OpenSolaris 设计的,但它也只能够达到 Btrfs 31% 的速度,仅有 EXT4 22% 的速度。
Btrfs struggles with the SQLite benchmark (similar to the PostgreSQL PGbench), but the widely-common EXT4 file-system was almost twice as fast as ZFS-FUSE on Linux.
Btrfs 在 SQLite 测试(与 PostgreSQL PGbench 类似)中只得了个安慰奖,不过广泛流行的 EXT4 文件系统还是几乎比 Linux 上的 ZFS-FUSE 快了一倍。
EXT4, Btrfs, and ZFS on OpenSolaris all ran well and close to each other while ZFS-FUSE -- both the 0.6.9 stable and 0.7.0 development snapshot -- lagged well behind.
EXT4、Btrfs 和 OpenSolaris 上的 ZFS 都运行得不错,比分接近,然而 ZFS-FUSE——0.6.9 稳定版和 0.7.0 开发最新版——都被甩在了后面。
Btrfs was three times faster here than ZFS-FUSE and more than four and a third times faster was EXT4. The native ZFS performance on OpenSolaris was right on par with Btrfs in the Linux 2.6.36 kernel.
Btrfs 在这一测试中比 ZFS-FUSE 快了 2 倍,而 EXT4 则甩开了它们 3 又三分之一倍。在 OpenSolaris 上的原生 ZFS 的性能和 Linux 2.6.36 内核上的 Btrfs 势均力敌。
ZFS, regardless of whether it is ZFS-FUSE or ZFS on OpenSolaris, really is slaughtered when it comes to the multi-threaded random writes. Btrfs was 24.47x faster than ZFS-FUSE, which handled random writes at only 5MB/s.
ZFS 阵营,无论是 ZFS-FUSE 还是 OpenSolaris 上的 ZFS,在遇到多线程随机写入测试时都完败。Btrfs 的速度是 ZFS-FUSE 的 24.47 倍,后者在随机写入时只能达到 5MB/s 的速度。
When carrying out an 8GB write test with a 64Kb block size in IOzone, EXT4 and ZFS were 1.64~1.67x faster than ZFS-FUSE.
当测试以 64Kb 大小的块的 IOzone 写入 8GB 数据时,EXT4 和 Btrfs(译者注:此处疑为笔误)的速度为 ZFS-FUSE 的 1.64~1.67 倍。
With the final test, which was an 8GB read using IOzone, the disk read performance results for ZFS-FUSE actually look close to that of EXT4 and Btrfs, but that's as the ZFS module is carrying out some caching on the reads.
最后一项测试,使用 IOzone 读取 8GB 数据中,ZFS-FUSE 的磁盘读取性能结果实际上和 EXT4 和 Btrfs 相近,不过那是因为 ZFS 模块在读取时使用了缓存的缘故。
As some Phoronix Premium members have requested CPU usage results during file-system tests, there are some included in this article.
鉴于一些 Phoronix Premium 成员要求在文件系统测试中同时提供 CPU 占用的结果,本文接下来就提供这些信息。
As shown above, the CPU utilization for the Intel Core i7 was actually higher than that when ZFS-FUSE on Linux was being used or when OpenSolaris was being run with its native ZFS implementation.
如上图所示,英特尔 Core i7 的 CPU 使用量在使用 Linux 的 ZFS-FUSE 或运行 OpenSolaris 使用原生 ZFS 支持时都较低。
While the CPU load was lower when using ZFS-FUSE during the PostMark benchmark, the difference was only a few percent. When running Apache, the CPU usage spiked dramatically and was much higher than that of the native Linux file-systems that were tested to the point that it was almost three times greater.
虽然在 PostMark 测试中使用 ZFS-FUSE 的 CPU 负荷较低,差距也仅仅是百分之几而已。在运行 Apache 时,ZFS-FUSE 的 CPU 负载大幅提高,比原生的 Linux 文件系统高了许多,几乎是其 3 倍多。
There is certainly a performance penalty incurred when using ZFS-FUSE on Linux, which may or may not be justified depending upon whether you take advantage of any other features found in the Oracle/Sun ZFS file-system. Even with the very latest Linux 2.6.36 kernel and both the latest stable and unstable releases of the ZFS-FUSE code-base, a significant drop in performance can be measured when running ZFS in user-space. It will be interesting though to see how the native ZFS Linux module from either the Lawrence Livermore National Laboratory or KQ Infotech performs compared to ZFS-FUSE. Even the native ZFS module on OpenSolaris was no match to Linux and EXT4/Btrfs, so our next set of native ZFS Linux results should be rather interesting. It could be argued that those OpenSolaris-compatible test profiles just so happened to be bottlenecked elsewhere in the Solaris Nevada kernel, but even ZFS on FreeBSD is slow when compared to the EXT4 and Btrfs file-systems. These next ZFS on Linux results will be available in September if the native ZFS beta code arrives prior to the annual Oktoberfest outing -- stay tuned for this year's Phoronix gathering details.
在使用 Linux 的 ZFS-FUSE 时一定会引起性能损失,要看你是否利用到甲骨文(Oracle)/Sun 的 ZFS 文件系统中的其他特性才能决定这点性能差距是否值得。即使是在最新的 Linux 2.6.36 内核上的最新的稳定版和不稳定版 ZFS-FUSE,在使用用户空间的 ZFS 时都会有明显的性能损失。不过我们对劳伦斯利弗莫尔国家实验室或者 KQ Infotech 带来的原生 ZFS Linux 模块与 ZFS-FUSE 相比性能如何很感兴趣。就算是 OpenSolaris 上原生的 ZFS 模块都无法与 Linux 和 EXT4/Btrfs 的组合比肩,所有我们将要看到的原生 ZFS Linux 的结果一定很有意思。也有人会争辩说那些兼容 OpenSolaris 的测试项目的结果是被 Solaris Nevada 内核的其他部分产生的瓶颈所影响了,但即使是在 FreeBSD 上的 ZFS 也比 EXT4 和 Btrfs 文件系统要慢。要是原生 ZFS 的测试代码能在十月慕尼黑啤酒节之前发布,那么我们在 9 月就能获知下一款 Linux 上的 ZFS 的测试结果了——关注今年的 Phoronix 了解更多信息。