From bob.picco@hp.com Mon Feb 7 14:16:54 2005 Return-Path: X-Original-To: dave@localhost Delivered-To: dave@localhost Received: from localhost (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id 0E3835F88B for ; Mon, 7 Feb 2005 14:16:54 -0800 (PST) Received: from imap.linux.ibm.com by localhost with IMAP (fetchmail-6.2.5) for dave@localhost (multi-drop); Mon, 07 Feb 2005 14:16:54 -0800 (PST) Received: from localhost ([unix socket]) by imap.linux.ibm.com (Cyrus v2.1.16) with LMTP; Mon, 07 Feb 2005 17:15:24 -0500 X-Sieve: CMU Sieve 2.2 Received: from smtp.linux.ibm.com (linux.ibm.com [9.26.4.197]) by imap.linux.ibm.com (Postfix) with ESMTP id 3F4F17C008 for ; Mon, 7 Feb 2005 17:15:18 -0500 (EST) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.linux.ibm.com (Postfix) with ESMTP id 0059D98014 for ; Mon, 7 Feb 2005 17:15:17 -0500 (EST) Received: from westrelay01.boulder.ibm.com (westrelay01.boulder.ibm.com [9.17.195.10]) by smtp.linux.ibm.com (Postfix) with ESMTP id 645C298011 for ; Mon, 7 Feb 2005 17:15:17 -0500 (EST) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay01.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j17MFGZp351998 for ; Mon, 7 Feb 2005 15:15:16 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j17MFGnO020141 for ; Mon, 7 Feb 2005 15:15:16 -0700 Received: from d03as03.boulder.ibm.com (d03as03 [9.17.195.250]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j17MFGak020138 for ; Mon, 7 Feb 2005 15:15:16 -0700 Received: from e35.co.us.ibm.com (e35.esmtp.ibm.com [9.14.4.133]) by d03as03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j17MFFm5007339 for ; Mon, 7 Feb 2005 15:15:15 -0700 Received: from zcamail03.zca.compaq.com (zcamail03.zca.compaq.com [161.114.32.103]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j17MFFUS212808 for ; Mon, 7 Feb 2005 17:15:15 -0500 Received: from taynzmail03.nz-tay.cpqcorp.net (taynzmail03.nz-tay.cpqcorp.net [16.47.4.103]) by zcamail03.zca.compaq.com (Postfix) with ESMTP id C1B3EA5A7; Mon, 7 Feb 2005 11:41:37 -0800 (PST) Received: from kitche.zk3.dec.com (kitche2.zk3.dec.com [16.140.160.162]) by taynzmail03.nz-tay.cpqcorp.net (Postfix) with ESMTP id 47E352EF6; Mon, 7 Feb 2005 14:41:37 -0500 (EST) Received: from localhost.localdomain by kitche.zk3.dec.com (8.11.1/1.1.27.5/27Oct00-1235PM) id j17JfZj0001183921; Mon, 7 Feb 2005 14:41:35 -0500 (EST) Date: Mon, 7 Feb 2005 14:46:41 -0500 From: Bob Picco To: Dave Hansen Cc: lhms-devel@lists.sourceforge.net Subject: [PATCH 1/3] IA64 Message-ID: <20050207194641.GJ17600@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Virus-Scanned: by amavisd-new at linux.ibm.com X-Fetchmail-Warning: no recipient addresses matched declared local names X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on spirit X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,SUBJ_ALL_CAPS autolearn=no version=3.0.2 Status: RO Content-Length: 13510 X-UID: 23 X-Keywords: X-Evolution-Source: imap://dave@localhost/ Content-Transfer-Encoding: 8bit This is SPARSEMEM specific. diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/Kconfig linux-2.6.11-rc2-mm2-mhp1/arch/ia64/Kconfig --- linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/Kconfig 2005-02-03 20:23:56.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/arch/ia64/Kconfig 2005-02-04 09:52:11.000000000 -0500 @@ -54,8 +54,6 @@ config IA64_GENERIC bool "generic" select NUMA select ACPI_NUMA - select VIRTUAL_MEM_MAP - select DISCONTIGMEM help This selects the system type of your hardware. A "generic" kernel will run on any supported IA-64 system. However, if you configure @@ -185,6 +183,7 @@ config NUMA config VIRTUAL_MEM_MAP bool "Virtual mem map" + depends on !SPARSEMEM default y if !IA64_HP_SIM help Say Y to compile the kernel with support for a virtual mem map. @@ -197,15 +196,6 @@ config HOLES_IN_ZONE bool default y if VIRTUAL_MEM_MAP -config DISCONTIGMEM - bool "Discontiguous memory support" - depends on (IA64_DIG || IA64_SGI_SN2 || IA64_GENERIC || IA64_HP_ZX1 || IA64_HP_ZX1_SWIOTLB) && NUMA && VIRTUAL_MEM_MAP - default y if (IA64_SGI_SN2 || IA64_GENERIC) && NUMA - help - Say Y to support efficient handling of discontiguous physical memory, - for architectures which are either NUMA (Non-Uniform Memory Access) - or have huge holes in the physical address space for other reasons. - See for more. config IA64_CYCLONE bool "Cyclone (EXA) Time Source support" @@ -226,8 +216,10 @@ config IA64_SGI_SN_SIM simulator (Medusa) then say Y, otherwise say N. config FORCE_MAX_ZONEORDER - int - default "18" + int "MAX_ORDER (11 - 20)" if !HUGETLB_PAGE + range 11 20 if !HUGETLB_PAGE + default "18" if HUGETLB_PAGE + default "11" config SMP bool "Symmetric multi-processing support" @@ -269,7 +261,32 @@ config HOTPLUG_CPU can be controlled through /sys/devices/system/cpu/cpu#. Say N if you want to disable CPU hotplug. -source mm/Kconfig +config SECTION_BITS + int + depends on SPARSEMEM + range 28 32 if !HUGETLB_PAGE + default "32" if HUGETLB_PAGE + default "28" + help + Size of memory section in bits. + +config PHYSICAL_MEMORY_BITS + int + depends on SPARSEMEM + range 44 50 + default 44 + help + Maximum physical memory address bits. + +config ARCH_SPARSEMEM_DEFAULT + bool + depends on NUMA + +config ARCH_DISCONTIGMEM_DISABLE + bool + depends !NUMA + +source "mm/Kconfig" config PREEMPT bool "Preemptible Kernel" diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/contig.c linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/contig.c --- linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/contig.c 2005-02-03 20:23:56.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/contig.c 2005-02-04 09:52:11.000000000 -0500 @@ -283,7 +283,7 @@ paging_init (void) vmem_map = (struct page *) vmalloc_end; efi_memmap_walk(create_mem_map_page_table, NULL); - NODE_DATA(0)->node_mem_map = vmem_map; + mem_map = NODE_DATA(0)->node_mem_map = vmem_map; free_area_init_node(0, &contig_page_data, zones_size, 0, zholes_size); diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/discontig.c linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/discontig.c --- linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/discontig.c 2005-02-03 20:01:47.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/discontig.c 2005-02-04 09:52:11.000000000 -0500 @@ -475,6 +475,21 @@ static void __init initialize_pernode_da } } +#ifdef CONFIG_SPARSEMEM +static int __init register_sparse_mem(unsigned long start, unsigned long end, + void *arg) +{ + int nid; + + start = __pa(start) >> PAGE_SHIFT; + end = __pa(end) >> PAGE_SHIFT; + nid = early_pfn_to_nid(start); + (void) memory_present(nid, start, end); + + return 0; +} +#endif + /** * find_memory - walk the EFI memory map and setup the bootmem allocator * @@ -499,6 +514,9 @@ void __init find_memory(void) reassign_cpu_only_nodes(); /* These actually end up getting called by call_pernode_memory() */ +#ifdef CONFIG_SPARSEMEM + efi_memmap_walk(register_sparse_mem, (void *) 0); +#endif efi_memmap_walk(filter_rsvd_memory, build_node_maps); efi_memmap_walk(filter_rsvd_memory, find_pernode_space); @@ -580,14 +598,17 @@ void show_mem(void) int shared = 0, cached = 0, reserved = 0; printk("Node ID: %d\n", pgdat->node_id); for(i = 0; i < pgdat->node_spanned_pages; i++) { - if (!ia64_pfn_valid(pgdat->node_start_pfn+i)) + struct page *page; + if (pfn_valid(pgdat->node_start_pfn+i)) + page = pfn_to_page(pgdat->node_start_pfn+i); + else continue; - if (PageReserved(pgdat->node_mem_map+i)) + if (PageReserved(page)) reserved++; - else if (PageSwapCache(pgdat->node_mem_map+i)) + else if (PageSwapCache(page)) cached++; - else if (page_count(pgdat->node_mem_map+i)) - shared += page_count(pgdat->node_mem_map+i)-1; + else if (page_count(page)) + shared += page_count(page)-1; } total_present += present; total_reserved += reserved; @@ -702,6 +723,10 @@ void __init paging_init(void) for_each_online_node(node) mem_data[node].min_pfn = ~0UL; +#ifdef CONFIG_SPARSEMEM + sparse_init(); +#endif + efi_memmap_walk(filter_rsvd_memory, count_node_pages); for_each_online_node(node) { @@ -737,6 +762,9 @@ void __init paging_init(void) mem_data[node].num_dma_physpages); } + pfn_offset = mem_data[node].min_pfn; + +#ifndef CONFIG_SPARSEMEM if (node == 0) { vmalloc_end -= PAGE_ALIGN(max_low_pfn * sizeof(struct page)); @@ -746,9 +774,10 @@ void __init paging_init(void) printk("Virtual mem_map starts at 0x%p\n", vmem_map); } - pfn_offset = mem_data[node].min_pfn; - NODE_DATA(node)->node_mem_map = vmem_map + pfn_offset; +#endif + + free_area_init_node(node, NODE_DATA(node), zones_size, pfn_offset, zholes_size); } diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/init.c linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/init.c --- linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/init.c 2005-02-03 20:01:47.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/init.c 2005-02-04 09:52:11.000000000 -0500 @@ -560,7 +560,7 @@ mem_init (void) platform_dma_init(); #endif -#ifndef CONFIG_DISCONTIGMEM +#if !defined(CONFIG_DISCONTIGMEM) && !defined(CONFIG_SPARSEMEM) if (!mem_map) BUG(); max_mapnr = max_low_pfn; diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/Makefile linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/Makefile --- linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/Makefile 2004-12-24 16:34:30.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/Makefile 2005-02-04 09:52:11.000000000 -0500 @@ -7,6 +7,5 @@ obj-y := init.o fault.o tlb.o extable.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_NUMA) += numa.o obj-$(CONFIG_DISCONTIGMEM) += discontig.o -ifndef CONFIG_DISCONTIGMEM -obj-y += contig.o -endif +obj-$(CONFIG_SPARSEMEM) += discontig.o +obj-$(CONFIG_FLATMEM) += contig.o diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/numa.c linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/numa.c --- linux-2.6.11-rc2-mm2-mhp1-orig/arch/ia64/mm/numa.c 2005-02-03 20:01:47.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/arch/ia64/mm/numa.c 2005-02-04 09:52:11.000000000 -0500 @@ -47,3 +47,26 @@ paddr_to_nid(unsigned long paddr) return (i < num_node_memblks) ? node_memblk[i].nid : (num_node_memblks ? -1 : 0); } + +#if defined(CONFIG_SPARSEMEM) && defined(CONFIG_NUMA) +/* + * Because of holes evaluate on section limits. + */ +int early_pfn_to_nid(unsigned long pfn) +{ + int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec; + + for (i = 0; i < num_node_memblks; i++) { + ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT; + esec = (node_memblk[i].start_paddr + node_memblk[i].size + + ((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT; + if (section >= ssec && section < esec) + break; + } + + if (i == num_node_memblks) + return 0; + else + return node_memblk[i].nid; +} +#endif diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/include/asm-ia64/meminit.h linux-2.6.11-rc2-mm2-mhp1/include/asm-ia64/meminit.h --- linux-2.6.11-rc2-mm2-mhp1-orig/include/asm-ia64/meminit.h 2004-12-24 16:34:30.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/include/asm-ia64/meminit.h 2005-02-04 09:52:11.000000000 -0500 @@ -41,7 +41,7 @@ extern int filter_rsvd_memory (unsigned #define GRANULEROUNDUP(n) (((n)+IA64_GRANULE_SIZE-1) & ~(IA64_GRANULE_SIZE-1)) #define ORDERROUNDDOWN(n) ((n) & ~((PAGE_SIZE< #include -#ifdef CONFIG_DISCONTIGMEM +#ifdef CONFIG_NUMA #ifdef CONFIG_IA64_DIG /* DIG systems are small */ # define MAX_PHYSNODE_ID 8 @@ -25,8 +25,36 @@ # define NR_NODE_MEMBLKS (MAX_NUMNODES * 4) #endif -#else /* CONFIG_DISCONTIGMEM */ +#else /* CONFIG_NUMA */ # define NR_NODE_MEMBLKS (MAX_NUMNODES * 4) -#endif /* CONFIG_DISCONTIGMEM */ +#endif /* CONFIG_NUMA */ + +#ifdef CONFIG_SPARSEMEM + /* + * SECTION_SIZE_BITS 2^N: how big each section will be + * MAX_PHYSADDR_BITS 2^N: how much physical address space we have + * MAX_PHYSMEM_BITS 2^N: how much memory we can have in that space + */ + +#define SECTION_SIZE_BITS CONFIG_SECTION_BITS + +/* + * If FORCE_MAX_ORDER is used, then check and possibly enforce the boundary + * condition on SECTION_SIZE_BITS's magnitude. + */ +#ifdef CONFIG_FORCE_MAX_ZONEORDER +#if ((CONFIG_FORCE_MAX_ZONEORDER+PAGE_SHIFT) > SECTION_SIZE_BITS) +#undef SECTION_SIZE_BITS +#define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER+PAGE_SHIFT) +#endif +#endif + +#define MAX_PHYSADDR_BITS CONFIG_PHYSICAL_MEMORY_BITS +#define MAX_PHYSMEM_BITS CONFIG_PHYSICAL_MEMORY_BITS + +/* until we think of something better */ +#define page_is_ram(pfn) 1 + +#endif /* CONFIG_SPARSEMEM */ #endif /* _ASM_IA64_MMZONE_H */ diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/include/asm-ia64/nodedata.h linux-2.6.11-rc2-mm2-mhp1/include/asm-ia64/nodedata.h --- linux-2.6.11-rc2-mm2-mhp1-orig/include/asm-ia64/nodedata.h 2005-02-03 20:01:57.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/include/asm-ia64/nodedata.h 2005-02-04 09:52:11.000000000 -0500 @@ -17,7 +17,7 @@ #include #include -#ifdef CONFIG_DISCONTIGMEM +#if defined(CONFIG_DISCONTIGMEM) || defined(CONFIG_SPARSEMEM) /* * Node Data. One of these structures is located on each node of a NUMA system. @@ -47,6 +47,6 @@ struct ia64_node_data { */ #define NODE_DATA(nid) (local_node_data->pg_data_ptrs[nid]) -#endif /* CONFIG_DISCONTIGMEM */ +#endif /* CONFIG_DISCONTIGMEM || CONFIG_SPARSEMEM */ #endif /* _ASM_IA64_NODEDATA_H */ diff -ruNp -X /home/picco/losl/dontdiff linux-2.6.11-rc2-mm2-mhp1-orig/include/asm-ia64/page.h linux-2.6.11-rc2-mm2-mhp1/include/asm-ia64/page.h --- linux-2.6.11-rc2-mm2-mhp1-orig/include/asm-ia64/page.h 2005-02-03 20:02:13.000000000 -0500 +++ linux-2.6.11-rc2-mm2-mhp1/include/asm-ia64/page.h 2005-02-04 09:52:11.000000000 -0500 @@ -87,17 +87,17 @@ do { \ #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) -#ifdef CONFIG_VIRTUAL_MEM_MAP +#ifdef CONFIG_VIRTUAL_MEM_MAP extern int ia64_pfn_valid (unsigned long pfn); -#else +#elif CONFIG_FLATMEM # define ia64_pfn_valid(pfn) 1 #endif -#ifndef CONFIG_DISCONTIGMEM +#ifdef CONFIG_FLATMEM # define pfn_valid(pfn) (((pfn) < max_mapnr) && ia64_pfn_valid(pfn)) # define page_to_pfn(page) ((unsigned long) (page - mem_map)) # define pfn_to_page(pfn) (mem_map + (pfn)) -#else +#elif CONFIG_DISCONTIGMEM extern struct page *vmem_map; extern unsigned long max_low_pfn; # define pfn_valid(pfn) (((pfn) < max_low_pfn) && ia64_pfn_valid(pfn)) @@ -105,6 +105,10 @@ extern unsigned long max_low_pfn; # define pfn_to_page(pfn) (vmem_map + (pfn)) #endif +#if defined(CONFIG_NUMA) && defined(CONFIG_SPARSEMEM) +extern int early_pfn_to_nid(unsigned long pfn); +#endif + #define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT) #define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT) @@ -123,8 +127,11 @@ typedef union ia64_va { * expressed in this way to ensure they result in a single "dep" * instruction. */ -#define __pa(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg = 0; _v.l;}) -#define __va(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg = -1; _v.p;}) +#define __boot_pa(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg = 0; _v.l;}) +#define __boot_va(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg = -1; _v.p;}) +#define __pa(x) __boot_pa(x) +#define __va(x) __boot_va(x) +#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) #define REGION_NUMBER(x) ({ia64_va _v; _v.l = (long) (x); _v.f.reg;}) #define REGION_OFFSET(x) ({ia64_va _v; _v.l = (long) (x); _v.f.off;})