Class 12
CS 170
May 13 2020

On the board
------------

1. Last time
2. Segmentation
3. Paging
    --Intro
    --key data structure: page table
    --Segmentation vs. paging
4. Virtual memory on x86

---------------------------------------------------------------------------

1. Last time
    
    virtual memory intro

    today:
    segmentation introduction

    paging introduction


2. Segmentation

    segmentation means:

        memory addresses treated like offsets into a contiguous
        region.

    
        consider 14-bit address:

            first two bits select are the segment number (this is in the
            first hex digit)

            next 12 bits (next three hex digits) give offset


        seg     base       limit         rw
        -----------------------------------
        0      0x4000      0x46ff        10
        1      0x0000      0x04ff        11
        2      0x3000      0x3fff        11


    the above table results in the mapping below. convince yourself of
    this!!!!!!

            virtual                physical
            -------               ---------

        [0x0000, 0x0700)  -->  [0x4000, 0x4700)
        [0x1000, 0x1500)  -->  [0x0000, 0x0500)
        [0x2000, 0x3000)  -->  [0x3000, 0x4000)
        [0x3000, 0x3fff)  --> not mapped


        where is

            0x0240?       [4240]

            0x1108        [0108]

            0x265c        [365c]

            0x3002        [???]

            0x1600        [illegal]
   

    This allows sharing: how?


    Disadvantages:

        --program may need to know about segments (not in the example
        above but happens on the x86; see below)

        --contiguous bytes required

        --fragmentation

    External vs. internal fragmentation

	
3. Paging

    A. Intro

    --Basic concept: divide all of memory (physical and virtual)
    into *fixed-size* chunks.

        --these chunks are called *PAGES*.

        --they have a size called the PAGE SIZE.
        (different hardware architectures specify different sizes)

        --in the traditional x86 (and in our labs), the PAGE SIZE
        will be 
	    4096 B = 4KB = 2^{12}

    --Warm-up:

	--how many pages are there on a 32-bit architecture?

	--2^{32} bytes / (2^{12} bytes/page) = 2^{20} pages

    --Each process has a separate mapping

        --And each page separately mapped

        --we will allow the OS to gain control on certain operations

            --Read-only pages trap to OS on write
         
            --Invalid pages trap to OS on read or write
         
            --OS can change mapping and resume application

            (Harder to do this kind of thing with segments because the
            mapping is more coarse-grained.)

    --it is proper and fitting to talk about pages having **NUMBERS**. 

	--page 0:   [0,4095]
	--page 1:   [4096, 8191]
	--page 2:   [8192, 12277]
	--page 3:   [12777, 16384]
	.....

	--page 2^{20}-1 [ ......, 2^{32} - 1]

    --unfortunately, it is also proper and fitting to talk about _both_
    virtual and physical pages having numbers.

	--sometimes we will try to be clear with terms like:
	    vpn 
	    ppn

    B. Key data structure: page table

    --conceptual model: 

        (assuming 32-bit addresses and 4KB pages)

	there is in the sky a 2^{20} sized array that maps the
	virtual address to a *physical* page

	table[20-bit virtual page number] = 20-bit physical page #

        EXAMPLE: 

            if OS wants a program to be able to use address 0x00402000
            to refer to physical address 0x00003000, then the OS
            conceptually adds an entry:

                table[0x00402] = 0x00003

            (this is the 1026th virtual page being mapped to the 3rd
            physical page.). in decimal: table[1026] = 3

            below, we will see how this is actually implemented

        NOTE: top 20 bits are doing the indirection. bottom 12 bits just
        figure out where on the page the access should take place.

            --bottom bits sometimes called offset.

    --so now all we have to do is create this mapping

    --why is this hard? why not just create the mapping?

	--answer: then you need, per process, roughly 4MB (2^{20}
	entries * 32 bits per entry)

	--deal with this shortly

        --key idea: represent the page table as a tree that is sparse
        (i.e., many of the child nodes are never filled in)

    C. segmentation vs paging

        --paging:
            + eliminates external fragmentation
            + not much internal fragmentation 
            + easier to allocate, free, swap, etc.
            - data structures are larger
            - more complex
            + overall: more flexible. 

            (intuition: mapping is more fine-grained, which means more 
             OS control over it)

            (in more detail, instead of mapping a large range into a
            large range, we are going to independently control the
            mapping for every 4 KB.)

        --segmentation:
            - vulnerable to two kinds of fragmentation
            - hard to handle growth or shrinkage of a segment
            + smaller data structures
            + simpler overall
            - overall: less flexible

    --Segmentation is old-school and these days mostly an annoyance
    (but it cannot be turned off on the x86!)
    
	--however, it comes in handy every now and then
	
	    --thread-local memory
	    
	    --sandboxing (advanced topic)

	    --also makes it easy to share memory among processes: just use
	    the same segment registers (sharing requires a bit more work if
	    paging is in effect)


4. Case study: virtual memory on x86

    * Has segmentation and paging. 

        Cannot turn off segmentation (even though we usually want to)

        Instead, set things up so that segmentation has no effect

        Question: how? 

	    (Answer: by setting its mapping to be the identity function.
	    Make the offset 0 and the limit the maximum.)

    * We will focus on paging

        best overview: the Intel manual
        http://www.cs.nyu.edu/~mwalfish/classes/15sp/ref/i386/s05_02.htm

        see handout from last time

	two-level mapping structure.......

        * a VA is 32 bits:
	    
	    31 ................................... 0

	* and it gets divided as follows:

		dir ent      table ent      offset
	    31 ....... 22  21 ...... 12  11 ....... 0


	--%cr3 is the address of the page directory.

	--top 10 bits (first two nibbles plus first half of third
	nibble) select an entry in the page directory, this entry points
	to a **page table**

	--next 10 bits select the entry in the page table, which is a
	physical page number

	--so there are 1024 entries in page directory

	--how big is entry in page directory? 4 bytes

	--entry in page directory and page table:

		[   base address   |  bunch of bits | U/S R/W P ]
		31..............12

	    why 20 bits?
		[answer: there are 2^20 4KB pages in the system]

	    is that base address a physical address, a linear address, a
	    virtual address, what?

		[answer: it is a physical address. hardware needs to be
		able to follow the page table structure.]

	    bunch of bits includes 
		    dirty (set by hardware)
		    acccessed (set by hardware)
		    cache disabled (set by OS)
		    write through  (set by OS)

	    what do these U/S and R/W bits do?

	        --are these for the kernel, the hardware, what?

	        --who is setting them? what is the point?

	        (OS is setting them to indicate protection; hardware is
	        enforcing them)

	        what happens if U/S and R/W differ in pgdir and table?

	        [processor does something deterministic; look up in
	        references]


    * EXAMPLES

        Approach: examine an address and divide it up. Get used to doing
        this. We will work a few examples in class.

        Basic question: what does OS put in the data structures that are
        visible to the CPU's MMU to enable different mappings?

        What if OS wants to map a process's
            
            virtual address  0x00402[000]  to
            physical address 0x00003[000]

            and
            
                make it accessible to user-level but read-only?


            PGDIR                               PGTABLE
                
            .......
                                       <20 bits>    <12 bits>
            .......                   | 0x00003 | U=1,W=0,P=1 |   [entry 2]
                                      |         |             |   [entry 1]
            .....[entry 1]     ---->  |_________|_____________|   [entry 0]
                                           
            ....... 


        Now what if the OS wants to map that process's virtual address

            virtual address  0x00403[000]  to
            physical address 0x80000[000]  
                [this is physical address 2GB]

            and 

                make it accessible to user-level and make it read/write?

    * Helpful reminders:

	--each entry in the page *directory* corresponds to 4MB of
	virtual address space ("corresponds to" means "selects the
	second-level page table that actually governs the mapping").

	--each entry in the page *table* corresponds to 4KB of
	virtual address space

	--so how much virtual memory is each page *table*
	responsible for translating? 4KB? 4MB? something else?

	--each page directory and each page table itself consumes
	4KB of physical memory, i.e., each one of these fits on a
	page


    ------------------------------------------------------------------
    putting it all together.... here is how the x86's MMU translates a
    linear address to a physical address:
    ("linear address" is a synonym for "virtual address" in our context.
    the reason for the additional term is that on the x86, the
    segmentation mapping goes from virtual to linear.)

	[not discussing in class but make sure you understand what is
	written below.]

       uint
       translate (uint la, bool user, bool write)
       {
	 uint pde; /* page directory entry */
	 pde = read_mem (%CR3 + 4*(la >> 22));
	 access (pde, user, write); /* see function below */
	 pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
	 access (pte, user, write);
	 return (pte & 0xfffff000) + (la & 0xfff);
       }

       // check protection. pxe is a pte or pde.
       // user is true if CPL==3.
       // write is true if the attempted access was a write.
       // PG_P, PG_U, PG_W refer to the bits in the entry above
       void
       access (uint pxe, bool user, bool write)
       {
	 if (!(pxe & PG_P)  
	    => page fault -- page not present
	 if (!(pxe & PG_U) && user)
	    => page fault -- not access for user
       
	 if (write && !(pxe & PG_W)) {
	   if (user)   
	      => page fault -- not writable
	   if (%CR0 & CR0_WP) 
	      => page fault -- not writable
	 }
       }
    --------------------------------------------------------------------

    * Alternatives

    --Other configurations possible (both on x86 and on other hardware
    architectures)

    --There are some tradeoffs:
    
        --between large and small page sizes:

	    --large page sizes means wasting actual memory

	    --small page sizes means lots of page table entries (which
	    may or may not get consumed)

        --between many levels of mapping and few:

            --more levels of mapping means less space spent on page
            structures when address space is sparse (which they nearly
            always are) but more costly for hardware to walk the page
            tables

            --fewer levels of mapping is the other way around: need to
            allocate larger page tables (which cost more space), but
            the hardware has fewer levels of mapping
       
    --Example: can get 4MB pages on x86 (each page directory entry can
    just point to a single page)

        + page tables smaller 
        - more wasted memory

        to enable this, set PSE mode
       	
       	    (set bit 7 in PDE and get 4MB pages, no PTs)