简介
产品反馈虚拟机环境,内存配置为8G,升级版本时提示数据校验不过,然后升级失败。
4G内存配置无该问题。host是鲲鹏服务器,kunpeng-920处理器。
同样的虚拟机配置使用其它ubuntu或者centos的系统镜像,虚拟机都没有出现该问题。
定位过程
尝试在虚机设备上升级失败后计算版本文件的MD5值,每次都会变化。
drop_cache之后计算版本文件md5值是正确的,后续计算的MD5值都是错的,而且一直在变化。
每次MD5计算出错后,cp拷贝多份版本文件导出,
和正常版本文件对比发现,固定偏移处总有两处几个字节的变化,不像是单个bit位跳变,没有规律。
flash上的版本文件是没有改动的。因为清缓存后重新计算版本文件的MD5值是正确的。
既然没有造成page dirty,同一个页面也不太可能分配给两个不同的进程,
所以不倾向于pagecache机制问题,不倾向于内存分配问题。
测试偶然发现和配置telnet连接有关,不配置telnet,串口执行md5 ipe不会变化;
开启关闭一个telnet口,md5 ipe值也可能会变化。
telnet端口没有命令交互的话,几十秒md5值不会变化,但是之后会变,猜测可能和网络保活有关。
虚机镜像系统内核不支持硬件断点,启用了kasan的版本也没跑出来什么东西。
查找修改页面
先看看pagecache里的数据到底哪里被改了
写了个测试模块,遍历文件在pagecache中的所有的page,计算page页面数据的MD5值。
然后输出pfn、virtual address、physical address,以及md5值到trace缓冲区。
每次在MD5值出现变化后执行insmod ko 然后rmmod ko,拿到多次打印记录做对比。
两个page的MD5值一直在变,其它page数据MD5都是一样的。

打印页面数据
再写一个测试模块,根据指定pfn号,打印出这两个页面数据,然后多次数据记录做对比。
确实这两个page里固定偏移处的一两个字节在变。
下图中页面数据基本为0,只有固定偏移0x6a6处连续两个字节非0,
是因为系统刚起来,只打开一个telnet口,pagecahe里并没有多少东西。
可见telnet嫌疑之重。

设置页面写保护
之前遇到过只读页面写入数据触发保护异常的情况,
既然出问题的pfn固定,就想着能不能利用这种方式抓下,看看能不能抓得到。
写一个测试模块,根据pfn号找到对应的pte表项,设置为只读。
一旦telnet命令交互产生网络数据,如果写到pagecache的这两个页面,那理应抓得到。
加载两次ko模块 insmod pgrdonly.ko pfn=2238333和insmod pgrdonly.ko pfn=2238378,打印如下
<3>[ 765.828961] [pg_init 152] pc, init_mm addr 0xffff800028df1ae8
<3>[ 765.829906] [set_page_rdonly 104] pc, pfn 2238333
<3>[ 765.830630] [set_page_rdonly 111] pc, page 0xfffffc00094c5710
<3>[ 765.831526] [set_page_rdonly 114] pc, page address 0xffff0001e277d000
<3>[ 765.832539] [get_pte 42] pc, init_mm 0xffff800028df1ae8, page addr 0xffff0001e277d000
<3>[ 765.833791] [ffff0001e277d000] pgd=000000023fff8003
<3>[ 765.834489] pud=000000023f044003
<3>[ 765.834991] pmd=000000023ef30003
<3>[ 765.835457] pte=006800022277d713
<3>[ 765.835993] [set_page_rdonly 117] pc, ptep 0xffff0001fef30be8
<3>[ 765.836878] [set_pte_rdonly 84] pc, pte value 0x6800022277d713 before set rdonly
<3>[ 765.838016] [set_pte_rdonly 86] pc, pte value 0xe000022277d793 construct wrprotect
<3>[ 765.839166] [set_pte_rdonly 88] pc, pte value 0xe000022277d793 after set
<3>[ 781.939330] Exiting set page readlony module
<3>[ 784.383368] [pg_init 152] pc, init_mm addr 0xffff800028df1ae8
<3>[ 784.384348] [set_page_rdonly 104] pc, pfn 2238378
<3>[ 784.385135] [set_page_rdonly 111] pc, page 0xfffffc00094c6520
<3>[ 784.386075] [set_page_rdonly 114] pc, page address 0xffff0001e27aa000
<3>[ 784.387110] [get_pte 42] pc, init_mm 0xffff800028df1ae8, page addr 0xffff0001e27aa000
<3>[ 784.388377] [ffff0001e27aa000] pgd=000000023fff8003
<3>[ 784.389155] pud=000000023f044003
<3>[ 784.389657] pmd=000000023ef30003
<3>[ 784.390169] pte=00680002227aa713
<3>[ 784.390676] [set_page_rdonly 117] pc, ptep 0xffff0001fef30d50
<3>[ 784.391588] [set_pte_rdonly 84] pc, pte value 0x680002227aa713 before set rdonly
<3>[ 784.392805] [set_pte_rdonly 86] pc, pte value 0xe00002227aa793 construct wrprotect
<3>[ 784.394074] [set_pte_rdonly 88] pc, pte value 0xe00002227aa793 after set
<3>[ 787.310472] Exiting set page readlony module
然后telnet端口随便执行一些命令,系统随即触发了保护异常。
0xffff0001e27aa6a6就位于pfn = 2238378的页面内部,
0x6a6的页内偏移也和页面数据对比中的偏移一致,可以看到是模块wan协议栈代码写越界。
<1>[ 796.021377] Unable to handle kernel write to read-only memory at virtual address ffff0001e27aa6a6
<1>[ 796.022942] Mem abort info:
<1>[ 796.023373] ESR = 0x9600004f
<1>[ 796.023912] EC = 0x25: DABT (current EL), IL = 32 bits
<1>[ 796.024764] SET = 0, FnV = 0
<1>[ 796.025232] EA = 0, S1PTW = 0
<1>[ 796.025737] Data abort info:
<1>[ 796.026185] ISV = 0, ISS = 0x0000004f
<1>[ 796.026812] CM = 0, WnR = 1
<1>[ 796.027278] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000040a66000
<1>[ 796.028376] [ffff0001e27aa6a6] pgd=000000023fff8003, pud=000000023f044003, pmd=000000023ef30003, pte=00e00002227aa793
<0>[ 796.030145] Internal error: Oops: 9600004f [#1] SMP
<4>[ 796.030936] Modules linked in: wan(PO) vsr(O) [last unloaded: pgrdonly]
<4>[ 796.032370] CPU: 1 PID: 1993 Comm: kdrvfwdd1 Tainted: P O 5.4.90 #1
<4>[ 796.033595] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
<4>[ 796.034707] pstate: 60400005 (nZCv daif +PAN -UAO)
<4>[ 796.037144] pc : ip_newid+0x4c/0x70 [wan]
<4>[ 796.039317] lr : ip_newid+0x1c/0x70 [wan]
<4>[ 796.040064] sp : ffff00019f6ce420
<4>[ 796.040677] x29: ffff00019f6ce420 x28: 0000000000000000
<4>[ 796.041619] x27: ffff000194cf43c8 x26: ffff800028963348
<4>[ 796.042483] x25: ffff800008beae2c x24: ffff0001c0bb82d0
<4>[ 796.043351] x23: ffff0001c00ef8f0 x22: ffff800028ea0858
<4>[ 796.044209] x21: ffff000194238000 x20: ffff000194cf4380
<4>[ 796.045114] x19: 0000000000000048 x18: ffff800028a04b18
<4>[ 796.045984] x17: 0000000000000001 x16: ffff800028770420
<4>[ 796.046886] x15: 0000000010000000 x14: ffff800029973000
<4>[ 796.047754] x13: 0000000000000001 x12: 00000000ffffffff
<4>[ 796.048645] x11: ffff800041f01000 x10: 0000000000000a00
<4>[ 796.049515] x9 : ffff80002819c458 x8 : ffff00019f6ce4d0
<4>[ 796.050367] x7 : 0000000000000000 x6 : 0000000000000001
<4>[ 796.051256] x5 : 0000000000000000 x4 : ffff000194238000
<4>[ 796.052143] x3 : ffff800028d66000 x2 : 0000000000000005
<4>[ 796.053014] x1 : 000000000000f07e x0 : ffff0001e27aa6a6
<4>[ 796.053891] Call trace:
<4>[ 796.055566] ip_newid+0x4c/0x70 [wan]
<4>[ 796.057519] tcp_output+0x26c4/0x2d00 [wan]
<4>[ 796.059676] tcp_do_segment+0xfd0/0x43e0 [wan]
<4>[ 796.061788] tcp_input+0x277c/0x2ba0 [wan]
<4>[ 796.063818] ip_protoinput+0xa8/0xd0 [wan]
...... // 省去部分业务栈
<4>[ 796.099035] kthread+0x12c/0x130
<4>[ 796.099546] ret_from_fork+0x10/0x18
<0>[ 796.100167] Code: 79400000 0b000020 12003c01 f9400fe0 (79000001)
<4>[ 796.101154] task_switch hooks:
[1]kdb>
模块代码
md5sum pagecache pages
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
#include <linux/highmem.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/namei.h>
#include <linux/fs_struct.h>
#include <linux/mount.h>
#include <linux/path.h>
#include <linux/crypto.h>
#include <crypto/hash.h>
#include <linux/hash.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/mm_types.h>
#define MD5_DIGEST_LENGTH 16
static void calculate_md5(const char *data, size_t len, char *digest)
{
struct crypto_shash *tfm;
struct shash_desc *desc;
char hex_digest[MD5_DIGEST_LENGTH * 2 + 1] = {0}; // 十六进制表示MD5值
char *hash;
int ret;
tfm = crypto_alloc_shash("md5", 0, 0);
if (IS_ERR(tfm)) {
pr_err("Failed to allocate transform for MD5\n");
return;
}
desc = kmalloc(sizeof(struct shash_desc) + crypto_shash_descsize(tfm),
GFP_KERNEL);
if (!desc) {
pr_err("Failed to allocate shash descriptor for MD5\n");
crypto_free_shash(tfm);
return;
}
desc->tfm = tfm;
ret = crypto_shash_init(desc);
if (ret) {
pr_err("Failed to initialize MD5 hash\n");
kfree(desc);
crypto_free_shash(tfm);
return;
}
ret = crypto_shash_update(desc, data, len);
if (ret) {
pr_err("Failed to update MD5 hash\n");
kfree(desc);
crypto_free_shash(tfm);
return;
}
hash = kmalloc(MD5_DIGEST_LENGTH, GFP_KERNEL);
if (!hash) {
pr_err("Failed to allocate memory for MD5 hash\n");
kfree(desc);
crypto_free_shash(tfm);
return;
}
ret = crypto_shash_final(desc, hash);
if (ret) {
pr_err("Failed to finalize MD5 hash\n");
kfree(hash);
kfree(desc);
crypto_free_shash(tfm);
return;
}
bin2hex(hex_digest, hash, MD5_DIGEST_LENGTH);
memcpy(digest, hex_digest, MD5_DIGEST_LENGTH * 2 + 1);
kfree(hash);
kfree(desc);
crypto_free_shash(tfm);
}
static int print_page_md5(const char *filename)
{
struct file *file;
struct path path;
struct inode *inode;
struct page *page;
unsigned long index;
unsigned long phys_addr;
unsigned long pfn;
char *data;
char digest[MD5_DIGEST_LENGTH * 2 + 1] = {0}; // 十六进制表示MD5值
if (kern_path(filename, LOOKUP_FOLLOW, &path)) {
printk(KERN_ERR "Failed to get path for file: %s\n", filename);
return -ENOENT;
}
file = filp_open(filename, O_RDONLY, 0);
if (IS_ERR(file)) {
printk(KERN_ERR "Failed to open file: %s\n", filename);
return PTR_ERR(file);
}
inode = file_inode(file);
index = 0;
while ((page = find_get_page(inode->i_mapping, index))) {
pfn = page_to_pfn(page);
phys_addr = page_to_phys(page);
data = kmap(page);
if (!data) {
printk(KERN_ERR "Failed to map page data\n");
put_page(page);
continue;
}
calculate_md5(data, PAGE_SIZE, digest);
trace_printk("%lu: pfn %lu, va 0x%lx, pa 0x%lx, md5 %s\n", index, pfn, (unsigned long)data, phys_addr, digest);
kunmap(page);
put_page(page);
index++;
cond_resched();
}
filp_close(file, NULL);
return 0;
}
static int __init pg_init(void)
{
const char *filename = "/mnt/flash/ipe";
pr_err("[%s %d] pc, filename %s\n", __func__, __LINE__, filename);
print_page_md5(filename);
return 0;
}
static void __exit pg_exit(void)
{
printk(KERN_INFO "Exiting pagecache_md5 module\n");
}
module_init(pg_init);
module_exit(pg_exit);
dump page content
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
#include <linux/highmem.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/namei.h>
#include <linux/fs_struct.h>
#include <linux/mount.h>
#include <linux/path.h>
#include <linux/crypto.h>
#include <crypto/hash.h>
#include <linux/hash.h>
#include <asm/delay.h>
#include <linux/delay.h>
#include <linux/printk.h>
unsigned char cont[PAGE_SIZE] = {0};
typedef unsigned long ulong;
unsigned long pfn;
module_param(pfn, ulong, S_IRUGO);
static void print_page(unsigned long pfn)
{
struct page *page;
unsigned char *data;
unsigned long phys_addr;
page = pfn_to_page(pfn);
phys_addr = page_to_phys(page);
data = kmap(page);
if (!data) {
printk(KERN_ERR "Failed to map page data\n");
put_page(page);
return;
}
memcpy(cont, data, PAGE_SIZE);
pr_err("pfn %lu, va 0x%lx, pa 0x%lx\n", pfn, (unsigned long)data, phys_addr);
pr_err("------------------------------------------\n");
print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 16, 1, cont, PAGE_SIZE, true);
pr_err("------------------------------------------\n");
kunmap(page);
put_page(page);
}
static int __init pc_init(void)
{
pr_err("[%s %d] pc, hello print page content pfn 2238334,2238397,2238000\n", __func__, __LINE__);
print_page(2238334);
print_page(2238397);
print_page(2238000);
return 0;
}
static void __exit pc_exit(void)
{
printk(KERN_INFO "Exiting print page content module\n");
}
module_init(pc_init);
module_exit(pc_exit);
set page read-only
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
#include <linux/highmem.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/namei.h>
#include <linux/fs_struct.h>
#include <linux/mount.h>
#include <linux/path.h>
#include <linux/crypto.h>
#include <crypto/hash.h>
#include <linux/hash.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/pfn.h>
#include <linux/slab.h>
#include <linux/kallsyms.h>
#include <asm/pgtable.h>
#include <asm/page.h>
#include <asm/pgalloc.h>
typedef unsigned long ulong;
static unsigned long pfn;
module_param(pfn, ulong, S_IRUGO);
struct mm_struct *pmm = NULL;
static pte_t *get_pte(unsigned long addr)
{
struct mm_struct *mm = pmm;
pgd_t *pgdp, pgd;
pud_t *pudp, pud;
pmd_t *pmdp, pmd;
pte_t *ptep = NULL, pte;
pr_err("[%s %d] pc, init_mm 0x%lx, page addr 0x%lx\n", __func__, __LINE__, (unsigned long)pmm, addr);
pgdp = pgd_offset(mm, addr);
pgd = READ_ONCE(*pgdp);
pr_err("[%016lx] pgd=%016llx\n", addr, pgd_val(pgd));
if (pgd_none(pgd) || pgd_bad(pgd)) {
pr_err("[%s %d] pc, pgd invalid\n", __func__, __LINE__);
return 0;
}
pudp = pud_offset(pgdp, addr);
pud = READ_ONCE(*pudp);
pr_err("pud=%016llx\n", pud_val(pud));
if (pud_none(pud) || pud_bad(pud)) {
pr_err("[%s %d] pc, pud invalid\n", __func__, __LINE__);
return 0;
}
pmdp = pmd_offset(pudp, addr);
pmd = READ_ONCE(*pmdp);
pr_err("pmd=%016llx\n", pmd_val(pmd));
if (pmd_none(pmd) || pmd_bad(pmd)) {
pr_err("[%s %d] pc, pmd invalid\n", __func__, __LINE__);
return 0;
}
ptep = pte_offset_map(pmdp, addr);
pte = READ_ONCE(*ptep);
if (!pte_valid(READ_ONCE(*ptep))) {
pr_err("[%s %d] pc, pte invalid\n", __func__, __LINE__);
return 0;
}
pr_err("pte=%016llx\n", pte_val(pte));
return ptep;
}
static void set_pte_rdonly(pte_t *ptep, unsigned long address)
{
pte_t pte;
pte = READ_ONCE(*ptep);
pr_err("[%s %d] pc, pte value 0x%lx before set rdonly\n", __func__, __LINE__, (unsigned long)pte_val(pte));
pte = pte_wrprotect(pte);
pr_err("[%s %d] pc, pte value 0x%lx construct wrprotect\n", __func__, __LINE__, (unsigned long)pte_val(pte));
set_pte_at(pmm, address, ptep, pte);
pr_err("[%s %d] pc, pte value 0x%lx after set\n", __func__, __LINE__, (unsigned long)pte_val(READ_ONCE(*ptep)));
flush_tlb_all();
}
static void write_rdonly_page(unsigned long addr)
{
memset((void*)addr, 0x0, PAGE_SIZE);
}
static void set_page_rdonly(unsigned long pfn)
{
struct page *page = NULL;
unsigned long address;
pte_t *ptep;
pr_err("[%s %d] pc, pfn %lu\n", __func__, __LINE__, pfn);
page = pfn_to_page(pfn);
if (!page) {
pr_err("[%s %d] pc, get pfn %lu page failure\n", __func__, __LINE__, pfn);
return;
}
pr_err("[%s %d] pc, page 0x%lx\n", __func__, __LINE__, (unsigned long)page);
address = (unsigned long)page_address(page);
pr_err("[%s %d] pc, page address 0x%lx\n", __func__, __LINE__, address);
ptep = get_pte(address);
pr_err("[%s %d] pc, ptep 0x%lx\n", __func__, __LINE__, (unsigned long)ptep);
set_pte_rdonly(ptep, address);
}
static void set_alloc_page_rdonly(void)
{
struct page *page;
unsigned long address;
pte_t *ptep;
page = alloc_page(GFP_KERNEL);
if (!page) {
pr_err("[%s %d] pc, alloc_page failure\n", __func__, __LINE__);
return;
}
pr_err("[%s %d] pc, page 0x%lx\n", __func__, __LINE__, (unsigned long)page);
address = (unsigned long)page_address(page);
pr_err("[%s %d] pc, page address 0x%lx\n", __func__, __LINE__, address);
ptep = get_pte(address);
pr_err("[%s %d] pc, ptep 0x%lx\n", __func__, __LINE__, (unsigned long)ptep);
set_pte_rdonly(ptep, address);
}
static int __init pg_init(void)
{
pmm = (struct mm_struct *)kallsyms_lookup_name("init_mm");
if (!pmm) {
pr_err("[%s %d] pc, find init_mm failure\n", __func__, __LINE__);
return -1;
}
pr_err("[%s %d] pc, init_mm addr 0x%lx\n", __func__, __LINE__, (unsigned long)pmm);
set_page_rdonly(pfn);
/*set_alloc_page_rdonly();*/
return 0;
}
static void __exit pg_exit(void)
{
pr_err("Exiting set page readlony module\n");
}
module_init(pg_init);
module_exit(pg_exit);