关于内核模块挂载出现“no symbol version for”问题的研究

1400阅读 0评论2014-09-18 yaoqigui
分类:LINUX

    前几天一个同事问我:如果一个模块要调用另一个模块的函数,要不要做什么特别的处理?我当时只是知道需要将被调用的函数EXPORT_SYMBOL();。但是由于具体的模块实验自己还没有做过,我就立马做了一个给他看,自己也验证一下。这实验一做,问题就来了:虽然在编译通过了(有警告:
  1. WARNING: "exported_function_2" [/home/tekkaman/development/research/Linux_module/caller/caller.ko] undefined!
),但是当把导出函数的模块挂载后,再挂载调用模块的时候出了错误无法挂载:
  1. # insmod exporter_1.ko
  2. Hello, Tekkaman Ninja !
  3. exported_function_1 is online!
  4. # insmod exporter_2.ko
  5. Hello, Tekkaman Ninja !
  6. exported_function_2 is online!
  7. # insmod caller.ko
  8. caller: no symbol version for exported_function_2
  9. caller: Unknown symbol exported_function_2 (err -22)
  10. caller: no symbol version for exported_function_1
  11. caller: Unknown symbol exported_function_1 (err -22)
  12. insmod: error inserting 'caller.ko': -1 Invalid parameters
    这里先将我的测试用例分享如下,大家可以先看下代码:

 Linux_module_func_export.rar   
   (请自行修改内核源码目录和交叉编译器的定义)
   
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
在出了问题之后,我到网上google了解决方法

   其中15楼的解决办法是正解,转载如下:
  1. 这是linux kernel 2.6.26 之后版本的bug (详细描述, 请看%3Fid%3D12446)
  2. 并且这个bug不会被fix
  3. 解决办法是把mod_a的Module.symvers放到mod_b的当前路径,从而编译mod_b,符号信息会自动连接进去.
  4. 或者在mod_b的makefile中使用KBUILD_EXTRA_SYMBOLS指定mod_a的Module.symvers, 如:
  5. KBUILD_EXTRA_SYMBOLS=/mod_a/Module.symvers
  6. 编译mod_b时,搜索Module.symvers的路径是:
  7. 1, kernel source path, e.g. /usr/src/kernels/linux-2.6.28.10
  8. 2, makefile中M=所指定的路径, 它等效于变量KBUILD_EXTMOD的值
  9. 3, 变量KBUILD_EXTRA_SYMBOLS的值
    而16楼  道出了问题的本质:
  1. 15楼分析透彻
  2. 简单说来,就是小b生成的时候不知道小a symbol的校验码,小b加载的时候自然check 校验码出错
    同时还有一篇网文作为参考,解决方法相同:

    用这个方法的确可以解决问题,只要将上面的测试程序中的caller的makefile中加上“KBUILD_EXTRA_SYMBOLS”就好了(里面已经有了,去掉注释,路径改下,但必须是绝对路径哦!)。重新编译caller模块即可。

实验过程:

  1. root@dm816x-evm:/# insmod exporter_1.ko
  2. Hello, Tekkaman Ninja !
  3. exported_function_1 is online!
  4. root@dm816x-evm:/# insmod exporter_2.ko
  5. Hello, Tekkaman Ninja !
  6. exported_function_2 is online!
  7. root@dm816x-evm:/# insmod caller.ko
  8. Hello, Tekkaman Ninja !
  9. Now call exporters's function!
  10. I'm exported_function_1 !(in /home/tekkaman/development/research/Linux_module/exporter_1/exporter_1.c)
  11. I'm exported_function_2 !(in /home/tekkaman/development/research/Linux_module/exporter_2/exporter_2.c)

但是问题解决了是一方面,总是要知道问题的来源和具体情况。

  注意:以下的内容涉及许多内核模块挂载的机制,如果需要,请参考《深入Linux内核构架》的《第七章 模块》,极力推荐大家阅读。

我在网上和Linux代码中找到了原因:

(1)网上的信息
    由于不久前的黑客入侵,linux.org网站下线调整,之后bugzilla就进不去了。所以上面记录的bug信息无法查询。幸好我通过bugID:12446查到了别人cp到他自己博客和一个bug详情的信息,我稍作整理,方便大家阅读:

  1. Summary: Unable to insmod module. Unknwon symbol found
  2. Product: Drivers
  3. Version: 2.5
  4. KernelVersion: 2.6.28
  5. Platform: All
  6. OS/Version: Linux
  7. Tree: Mainline
  8. Status: NEW
  9. Severity: blocking
  10. Priority: P1
  11. Component: Network
  12. AssignedTo: jgarzik at pobox.com
  13. ReportedBy: amit at netxen.com
  14. Latest working kernel version:2.6.28
  15. Earliest failing kernel version:2.6.26
  16. Distribution:
  17. Hardware Environment:
  18. Software Environment:
  19. Problem Description:
  20. When you use driver dependent on other driver. It should load cleanely, if
  21. other is loaded. But in kernel 2.6.26 onwards its give error, unknown symbol
  22. found.
  23. Actually find_symbol able to find symbol but check_version unable to match crc.
  24. Because crc is not know to other driver.
  25. Steps to reproduce:
  26. 1) create two device drivers in two separate directories. let say hello and bye
  27. 2) Export Symbol from hello module and try to use that in bye module.
  28. 3) Compile both the driver separately.
  29. 4) insmod hello.ko
  30. 3) While insmod(ing) bye.ko, gives error "Unknown Symbol found"

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1. Bug 12446 - Unable to insmod module. Unknwon symbol found

  2. ------- Comment #1 From Roland Kletzing 2009-01-17 02:44:19 -------
  3. what about using modprobe instead ? (as this looks in modules.dep to find the dependend modules and load them before)
  4. ------- Comment #2 From sucheta 2009-01-18 21:40:43 -------
  5. Hi Roland, using modprobe doesn't work. The entries in modules.dep are added. Still it ends up showing the same prints - "no symbol version for " and "Unknown symbol ". And modprobe fails.
  6. ------- Comment #3 From amit jain 2009-01-19 00:20:32 -------
  7. Hi Roland, Thanks a lot for replying. Below I have written details of experiments we did and our understanding.
  8. Problem: insmod failure for externally compiled module :- Experiments:
  9. (1) Compiling 2 modules a.ko and b.ko ( dependent on a.ko ) together :- Works
  10. (2) Copying Module.symvers from module "a" dir to the module "b" dir, before compiling b.ko :- Works.
  11. (3) Modprobe after appending following lines in /lib/modules/modules.dep
  12. /lib/modules/2.6.27.7-smp/kernel/drivers/net/a.ko
  13. /lib/modules/2.6.27.7-smp/kernel/drivers/net/b.ko: /lib/modules/2.6.27.7-smp/kernel/drivers/net/a.ko :- Fails
  14. (4) After compiling b.ko, just modifying b.mod.c file to include the undefined symbol in its version table doesn't work (didn't expect to work ).

  15. (5) export_objs (doesn't work):
  16. In Makefile of a.ko:
  17. export_objs := a.ko /
  18. export-objs := a.ko /
  19. exportobjs := a.ko.

  20. (6) Adding "#define EXPORT_SYMTAB" in a.c file (doesn't work).
  21. From above experiments, we found that In .mod.c file it maintains __versions array, which contains export symbol name and its crc. We see symbols of modules which are compiled with kernel. No symbols of externally compiled modules.
  22. The call trace is load_module -> simplify_symbols -> resolve_symbol -> find_symbol and check_version ( if find_symbol succeeds ).
  23. check_version behavior comparison in 2.6.26 and earlier version kernels :-
  24. In earlier versions of kernel also, symbol couldn't be found in its version table. Still, check_version used to return 1 (success) and the dependent module could be insmod(ed) successfully.
  25. However, in kernel 2.6.26 onwards, behavior has changed. check_version on not finding the reqd. symbol in its version table returns 0 (fail) and the dependent external module can't be inserted anymore.
  26. Waiting eagerly for your reply. Thanks in advance.
  27. ------- Comment #4 From amit jain 2009-01-28 21:04:34 -------
  28. Any updates ?
  29. ------- Comment #5 From Alan 2009-03-19 10:19:16 -------
  30. No but this is not a support facility, just a bug tracker and as a problem only you've reported and nobody else has duplicated its a very very low priority, especially as its only out of tree code seeing it
  31. ------- Comment #6 From amit jain 2009-03-19 21:04:12 -------
  32. It should not be very very low priotity, Its easy to reproduce. Its should be blocker bug, because anybody how will try to compile module dependent on other module, will fail. Strange, Why didn't nobody reply on it. I am really stuck, because of this problem.
  33. ------- Comment #7 From Tejun Heo 2009-03-19 21:48:40 -------
  34. cc'ing Rusty Russell.
  35. ------- Comment #8 From Roland Kletzing 2009-03-20 12:11:23 -------
  36. hello amit, i think alan is right. this seems a very specific, personal problem. anyway, if you think it`s a general kernel problem, please post an as simple as possible repro-case (i.e. some sourcecode) for your problem, so chances will raise that someone will look at your problem. i`d recommend bringing this up on kernel related mailing lists
  37. ------- Comment #9 From Rusty Russell 2009-03-22 23:00:51 -------
  38. This is true; a5dd69707424a35d2d2cc094e870f595ad61e916 changed this.
  39. The argument is that modversions is supposed to version symbols used in a module, and this doesn't really work for out-of-tree modules (unless you copy Module.symvers across, as done above). Otherwise module B doesn't know the version of symbol A; we changed such a missing version to fail; the user *did* want us to check versions. Currently you have to modprobe --force-modversion to load such a module. I think the new behavior is correct.
  40. ------- Comment #10 From Qihua Dai 2009-06-04 05:28:02 -------
  41. I also met the same problem on kernel 2.6.28.10. "modprobe --force-modversion" does work for me, it will report "invalid module format" Besides the solution of copy Module.symvers across, below solution can workaround it:
  42. In the makefile of "b", define KBUILD_EXTRA_SYMBOLS which points to Module.symvers in "a" Dir so that "b" will find Module.symvers of "a" e.g. KBUILD_EXTRA_SYMBOLS=/mod_a/Module.symvers During building "b", the search path of Module.symvers will be.
  43. Please correct me if my understanding is incorrect.
  44. 1, kernel source path, e.g. /usr/src/kernels/linux-2.6.28.10
  45. 2, the value of $(M) defined in makefile, which is the same as the value of KBUILD_EXTMOD
  46. 3, the value of KBUILD_EXTRA_SYMBOLS
  47. I think it's a generic kernel problem. It's better to fix it. Below attached a simple case which is from below link:
  48. ------- Comment #11 From Qihua Dai 2009-06-04 05:31:40 -------
  49. A typo, it should be ""modprobe --force-modversion" does NOT work for me, it will report "invalid module format""
  50. ------- Comment #12 From Rusty Russell 2009-06-04 06:54:15 -------
  51. You can't modprobe --force if CONFIG_MODULE_FORCE_LOAD isn't set. That was added in 2.6.26, and is off by default. Hope that clarifies, Rusty.
    从上面的信息中,我们基本可以确定了这个问题的大致情况就是:在内核主线代码树的一个提交修改了内核挂载模块时的函数版本校验机制,使得在挂载模块的时候对于编译时个别函数没有确定CRC校验值无法通过check_version函数检查。

(2)内核源码中的信息
   通过上面的提示,我查找到了内核主线代码中的那个提交:
  1. commit a5dd69707424a35d2d2cc094e870f595ad61e916
  2. Author: Rusty Russell
  3. Date: Fri May 9 16:24:21 2008 +1000
  4. module: be more picky about allowing missing module versions
  5. 模块:对于允许没有本版(校验值)的模块更加挑剔
  6. We allow missing __versions sections, because modprobe --force strips
  7. it. It makes less sense to allow sections where there's no version
  8. for a specific symbol the module uses, so disallow that.
  9. 我们允许没有__versions段,因为“modprobe --force”剔除了他。
  10. 这使得对于允许段中的某个模块使用的个别符号没有version变得毫无意义。
  11. 所以禁止它。
  12. Signed-off-by: Rusty Russell
  13. Signed-off-by: Linus Torvalds
    这个变更的具体版本是从2.6.26-rc1~2.6.26-rc2之间
  1. :~/linux$ git tag --contains a5dd69707
  2. v2.6.26
  3. v2.6.26-rc2
  4. v2.6.26-rc3
  5. v2.6.26-rc4
  6. v2.6.26-rc5
  7. v2.6.26-rc6
  8. v2.6.26-rc7
  9. v2.6.26-rc8
  10. v2.6.26-rc9
  11. v2.6.27
  12. v2.6.27-rc1
  13. ......
    这个提交的描诉说明:这是内核有意要禁止存在个别无版本校验信息的函数的模块挂载。
    而这个修改的diff如下:

  1. :~/linux$ git diff a5dd69707^.. a5dd69707
  2. diff --git a/kernel/module.c b/kernel/module.c
  3. index 8e4528c..2584c0e 100644
  4. --- a/kernel/module.c
  5. +++ b/kernel/module.c
  6. @@ -917,6 +917,10 @@ static int check_version(Elf_Shdr *sechdrs,
  7. if (!crc)
  8. return 1;

  9. + /* No versions at all? modprobe --force does this. */
  10. + if (versindex == 0)
  11. + return try_to_force_load(mod, symname) == 0;
  12. +
  13. versions = (void *) sechdrs[versindex].sh_addr;
  14. num_versions = sechdrs[versindex].sh_size
  15. / sizeof(struct modversion_info);
  16. @@ -932,8 +936,9 @@ static int check_version(Elf_Shdr *sechdrs,
  17. goto bad_version;
  18. }

  19. - if (!try_to_force_load(mod, symname))
  20. - return 1;
  21. + printk(KERN_WARNING "%s: no symbol version for %s\n",
  22. + mod->name, symname);
  23. + return 0;

  24. bad_version:
  25. printk("%s: disagrees about version of symbol %s\n",
对于这个修改的解释:
(1)第一个是允许无__versions段的模块加载。
(2)第二个是禁止__versions段,但个别函数没有version校验值的模块加载。
这两个地方都依赖一个函数(kernel/module.c):
  1. static int try_to_force_load(struct module *mod, const char *reason)
  2. {
  3. #ifdef CONFIG_MODULE_FORCE_LOAD
  4.     if (!test_taint(TAINT_FORCED_MODULE))
  5.         printk(KERN_WARNING "%s: %s: kernel tainted.\n",
  6.          mod->name, reason);
  7.     add_taint_module(mod, TAINT_FORCED_MODULE);
  8.     return 0;
  9. #else
  10.     return -ENOEXEC;
  11. #endif
  12. }
显然这个函数是:
如果有配置CONFIG_MODULE_FORCE_LOAD,则始终返回“0”,
如果没有配置CONFIG_MODULE_FORCE_LOAD,则始终返回错误值(非0)。
所以可以看出:如果要避免“no symbol version for”错误,必要条件是配置CONFIG_MODULE_FORCE_LOAD:
  1. [*] Enable loadable module support --->
  2.     --- Enable loadable module support
  3.     [*] Forced module loading
    但是似乎在配置了CONFIG_MODULE_FORCE_LOAD之后,使用--force参数还是出现通用的错误,还是无法挂载模块,这个后面我再研究下。

(3)如何恢复从前的校验机制?
    知道了原因,其实我们就可以通过自己修改源码来恢复原来的校验机制(单纯出于实验目的,本人赞同内核的机制),只需要做以下的修改即可:
  1. diff --git a/kernel/module.c b/kernel/module.c
  2. index d190664..4e1e39a 100644
  3. --- a/kernel/module.c
  4. +++ b/kernel/module.c
  5. @@ -1001,6 +1001,9 @@ static int check_version(Elf_Shdr *sechdrs,
  6. goto bad_version;
  7. }

  8. + if (!try_to_force_load(mod, symname))
  9. + return 1;
  10. +
  11. printk(KERN_WARNING "%s: no symbol version for %s\n",
  12. mod->name, symname);
  13. return 0;
其实上面的补丁就是把原来的那个允许个别函数没有version校验值的模块加载机制放了回去。
~~~~~~~~~~~~~~~~~~~~~~
下面我们再做一次实验:
(1)重新编译caller模块,注释掉Makefile中的“KBUILD_EXTRA_SYMBOLS”!
  1. tekkaman@tekkaman-desktop:~/development/research/Linux_module/caller$ make
  2. ARCH=arm CROSS_COMPILE=arm-none-linux-gnueabi- make -C /media/6a55c5a3-f467-4b31-a56a-73b57c5cd2a2/development/linux-omap3 M=/home/tekkaman/development/research/Linux_module/caller modules
  3. make[1]: 正在进入目录 `/media/6a55c5a3-f467-4b31-a56a-73b57c5cd2a2/development/linux-omap3'
  4. CC [M] /home/tekkaman/development/research/Linux_module/caller/caller.o
  5. Building modules, stage 2.
  6. MODPOST 1 modules
  7. WARNING: "exported_function_2" [/home/tekkaman/development/research/Linux_module/caller/caller.ko] undefined!
  8. WARNING: "exported_function_1" [/home/tekkaman/development/research/Linux_module/caller/caller.ko] undefined!
  9. CC /home/tekkaman/development/research/Linux_module/caller/caller.mod.o
  10. LD [M] /home/tekkaman/development/research/Linux_module/caller/caller.ko
  11. make[1]:正在离开目录 `/media/6a55c5a3-f467-4b31-a56a-73b57c5cd2a2/development/linux-omap3
(2)启动修改过的内核(配置CONFIG_MODULE_FORCE_LOAD,并打了上面的补丁),挂载模块:
  1. root@dm816x-evm:/# insmod exporter_1.ko
  2. Hello, Tekkaman Ninja !
  3. exported_function_1 is online!
  4. root@dm816x-evm:/# insmod exporter_2.ko
  5. Hello, Tekkaman Ninja !
  6. exported_function_2 is online!
  7. root@dm816x-evm:/# insmod caller.ko
  8. caller: exported_function_2: kernel tainted.
  9. Disabling lock debugging due to kernel taint
  10. Hello, Tekkaman Ninja !
  11. Now call exporters's function!
  12. I'm exported_function_1 !(in /home/tekkaman/development/research/Linux_module/exporter_1/exporter_1.c)
  13. I'm exported_function_2 !(in /home/tekkaman/development/research/Linux_module/exporter_2/exporter_2.c)
  14. root@dm816x-evm:/# dmesg
  15. ......
  16. Hello, Tekkaman Ninja !
  17. exported_function_1 is online!
  18. Hello, Tekkaman Ninja !
  19. exported_function_2 is online!
  20. caller: exported_function_2: kernel tainted.
  21. Disabling lock debugging due to kernel taint
  22. Hello, Tekkaman Ninja !
  23. Now call exporters's function!
  24. I'm exported_function_1 !(in /home/tekkaman/development/research/Linux_module/exporter_1/exporter_1.c)
  25. I'm exported_function_2 !(in /home/tekkaman/development/research/Linux_module/exporter_2/exporter_2.c)
模块是挂载成功了,但是我们可以在dmesg中看到有内核被“玷污”的信息!

    但是为什么我们有两个函数“玷污”了内核,但是只收到了一个函数的信息?大家仔细跟踪一下try_to_force_load函数,会发现在内核第一次被“玷污”后,会设一个标志,第二次在进这个函数的时候发现这个标志后就不再发出“玷污”信息了。因为已经被“玷污”了,就不在乎多一次了~~~
上一篇:Linux设备驱动子系统第二弹 - SD卡
下一篇:PowerPC上电复位的过程描述