Class 12 CS 170 May 13 2020 On the board ------------ 1. Last time 2. Segmentation 3. Paging --Intro --key data structure: page table --Segmentation vs. paging 4. Virtual memory on x86 --------------------------------------------------------------------------- 1. Last time virtual memory intro today: segmentation introduction paging introduction 2. Segmentation segmentation means: memory addresses treated like offsets into a contiguous region. consider 14-bit address: first two bits select are the segment number (this is in the first hex digit) next 12 bits (next three hex digits) give offset seg base limit rw ----------------------------------- 0 0x4000 0x46ff 10 1 0x0000 0x04ff 11 2 0x3000 0x3fff 11 the above table results in the mapping below. convince yourself of this!!!!!! virtual physical ------- --------- [0x0000, 0x0700) --> [0x4000, 0x4700) [0x1000, 0x1500) --> [0x0000, 0x0500) [0x2000, 0x3000) --> [0x3000, 0x4000) [0x3000, 0x3fff) --> not mapped where is 0x0240? [4240] 0x1108 [0108] 0x265c [365c] 0x3002 [???] 0x1600 [illegal] This allows sharing: how? Disadvantages: --program may need to know about segments (not in the example above but happens on the x86; see below) --contiguous bytes required --fragmentation External vs. internal fragmentation 3. Paging A. Intro --Basic concept: divide all of memory (physical and virtual) into *fixed-size* chunks. --these chunks are called *PAGES*. --they have a size called the PAGE SIZE. (different hardware architectures specify different sizes) --in the traditional x86 (and in our labs), the PAGE SIZE will be 4096 B = 4KB = 2^{12} --Warm-up: --how many pages are there on a 32-bit architecture? --2^{32} bytes / (2^{12} bytes/page) = 2^{20} pages --Each process has a separate mapping --And each page separately mapped --we will allow the OS to gain control on certain operations --Read-only pages trap to OS on write --Invalid pages trap to OS on read or write --OS can change mapping and resume application (Harder to do this kind of thing with segments because the mapping is more coarse-grained.) --it is proper and fitting to talk about pages having **NUMBERS**. --page 0: [0,4095] --page 1: [4096, 8191] --page 2: [8192, 12277] --page 3: [12777, 16384] ..... --page 2^{20}-1 [ ......, 2^{32} - 1] --unfortunately, it is also proper and fitting to talk about _both_ virtual and physical pages having numbers. --sometimes we will try to be clear with terms like: vpn ppn B. Key data structure: page table --conceptual model: (assuming 32-bit addresses and 4KB pages) there is in the sky a 2^{20} sized array that maps the virtual address to a *physical* page table[20-bit virtual page number] = 20-bit physical page # EXAMPLE: if OS wants a program to be able to use address 0x00402000 to refer to physical address 0x00003000, then the OS conceptually adds an entry: table[0x00402] = 0x00003 (this is the 1026th virtual page being mapped to the 3rd physical page.). in decimal: table[1026] = 3 below, we will see how this is actually implemented NOTE: top 20 bits are doing the indirection. bottom 12 bits just figure out where on the page the access should take place. --bottom bits sometimes called offset. --so now all we have to do is create this mapping --why is this hard? why not just create the mapping? --answer: then you need, per process, roughly 4MB (2^{20} entries * 32 bits per entry) --deal with this shortly --key idea: represent the page table as a tree that is sparse (i.e., many of the child nodes are never filled in) C. segmentation vs paging --paging: + eliminates external fragmentation + not much internal fragmentation + easier to allocate, free, swap, etc. - data structures are larger - more complex + overall: more flexible. (intuition: mapping is more fine-grained, which means more OS control over it) (in more detail, instead of mapping a large range into a large range, we are going to independently control the mapping for every 4 KB.) --segmentation: - vulnerable to two kinds of fragmentation - hard to handle growth or shrinkage of a segment + smaller data structures + simpler overall - overall: less flexible --Segmentation is old-school and these days mostly an annoyance (but it cannot be turned off on the x86!) --however, it comes in handy every now and then --thread-local memory --sandboxing (advanced topic) --also makes it easy to share memory among processes: just use the same segment registers (sharing requires a bit more work if paging is in effect) 4. Case study: virtual memory on x86 * Has segmentation and paging. Cannot turn off segmentation (even though we usually want to) Instead, set things up so that segmentation has no effect Question: how? (Answer: by setting its mapping to be the identity function. Make the offset 0 and the limit the maximum.) * We will focus on paging best overview: the Intel manual http://www.cs.nyu.edu/~mwalfish/classes/15sp/ref/i386/s05_02.htm see handout from last time two-level mapping structure....... * a VA is 32 bits: 31 ................................... 0 * and it gets divided as follows: dir ent table ent offset 31 ....... 22 21 ...... 12 11 ....... 0 --%cr3 is the address of the page directory. --top 10 bits (first two nibbles plus first half of third nibble) select an entry in the page directory, this entry points to a **page table** --next 10 bits select the entry in the page table, which is a physical page number --so there are 1024 entries in page directory --how big is entry in page directory? 4 bytes --entry in page directory and page table: [ base address | bunch of bits | U/S R/W P ] 31..............12 why 20 bits? [answer: there are 2^20 4KB pages in the system] is that base address a physical address, a linear address, a virtual address, what? [answer: it is a physical address. hardware needs to be able to follow the page table structure.] bunch of bits includes dirty (set by hardware) acccessed (set by hardware) cache disabled (set by OS) write through (set by OS) what do these U/S and R/W bits do? --are these for the kernel, the hardware, what? --who is setting them? what is the point? (OS is setting them to indicate protection; hardware is enforcing them) what happens if U/S and R/W differ in pgdir and table? [processor does something deterministic; look up in references] * EXAMPLES Approach: examine an address and divide it up. Get used to doing this. We will work a few examples in class. Basic question: what does OS put in the data structures that are visible to the CPU's MMU to enable different mappings? What if OS wants to map a process's virtual address 0x00402[000] to physical address 0x00003[000] and make it accessible to user-level but read-only? PGDIR PGTABLE ....... <20 bits> <12 bits> ....... | 0x00003 | U=1,W=0,P=1 | [entry 2] | | | [entry 1] .....[entry 1] ----> |_________|_____________| [entry 0] ....... Now what if the OS wants to map that process's virtual address virtual address 0x00403[000] to physical address 0x80000[000] [this is physical address 2GB] and make it accessible to user-level and make it read/write? * Helpful reminders: --each entry in the page *directory* corresponds to 4MB of virtual address space ("corresponds to" means "selects the second-level page table that actually governs the mapping"). --each entry in the page *table* corresponds to 4KB of virtual address space --so how much virtual memory is each page *table* responsible for translating? 4KB? 4MB? something else? --each page directory and each page table itself consumes 4KB of physical memory, i.e., each one of these fits on a page ------------------------------------------------------------------ putting it all together.... here is how the x86's MMU translates a linear address to a physical address: ("linear address" is a synonym for "virtual address" in our context. the reason for the additional term is that on the x86, the segmentation mapping goes from virtual to linear.) [not discussing in class but make sure you understand what is written below.] uint translate (uint la, bool user, bool write) { uint pde; /* page directory entry */ pde = read_mem (%CR3 + 4*(la >> 22)); access (pde, user, write); /* see function below */ pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff)); access (pte, user, write); return (pte & 0xfffff000) + (la & 0xfff); } // check protection. pxe is a pte or pde. // user is true if CPL==3. // write is true if the attempted access was a write. // PG_P, PG_U, PG_W refer to the bits in the entry above void access (uint pxe, bool user, bool write) { if (!(pxe & PG_P) => page fault -- page not present if (!(pxe & PG_U) && user) => page fault -- not access for user if (write && !(pxe & PG_W)) { if (user) => page fault -- not writable if (%CR0 & CR0_WP) => page fault -- not writable } } -------------------------------------------------------------------- * Alternatives --Other configurations possible (both on x86 and on other hardware architectures) --There are some tradeoffs: --between large and small page sizes: --large page sizes means wasting actual memory --small page sizes means lots of page table entries (which may or may not get consumed) --between many levels of mapping and few: --more levels of mapping means less space spent on page structures when address space is sparse (which they nearly always are) but more costly for hardware to walk the page tables --fewer levels of mapping is the other way around: need to allocate larger page tables (which cost more space), but the hardware has fewer levels of mapping --Example: can get 4MB pages on x86 (each page directory entry can just point to a single page) + page tables smaller - more wasted memory to enable this, set PSE mode (set bit 7 in PDE and get 4MB pages, no PTs)