Class 14
CS 170
25 Nov 2020

On the board
------------

1. Last time
2. Page replacement policies
3. Thrashing

---------------------------------------------------------------------------

1. Last time

    --finished memory unit

    --discussed uses of page faults, costs of page faults

    --started discussing cache eviction policies, regarding RAM as a cache of the
    disk

2. Page replacement policies 

    --evaluating replacement algorithms
	
	input
	--reference string: sequence of page accesses
	--cache (e.g., physical memory) size 

	output
	--number of cache evictions (e.g., number of swaps)

    --examples......

	--time goes left to right. 
	--cache hit = h

        ------------------------------------

	FIFO

	phys_slot    A B C A B D A D B C B
	S1           A     h   D   h   C 
	S2             B     h   A 
	S3               C           B   h

		7 swaps, 4 hits

        ------------------------------------

	OPTIMAL

	phys_slot    A B C A B D A D B C B
	S1           A     h     h     C
	S2             B     h       h   h
	S3               C     D   h

		5 swaps, 6 hits

        ------------------------------------

       * LRU: throw out the least recently used (this is often a good
	idea, but it depends on the future looking like the past. what
	if we chuck a page from our cache and then were about to use
	it?)


	LRU

	phys_slot    A B C A B D A D B C B
	S1           A     h     h     C
	S2             B     h       h   h
	S3               C     D   h

		5 swaps, 6 hits

        --LRU looks awesome!

        --but what if our reference string were ABCDABCDABCD?

	phys_slot   A B C D A B C D A B C D 
	 S1         A     D     C     B
	 S2           B     A     D     C
	 S3             C     B     A     D

	 12 swaps, 0 hits. BUMMER.

        --same thing happens with FIFO.

        --what about OPT? [not as much of a bummer at all.]

        --other weirdness: Belady's anomaly: what happens if you add memory
        under a FIFO policy?

	phys_slot   A B C D A B E A B C D E 
	S1          A     D     E         h
	S2            B     A     h   C
	S3              C     B     h   D

	    9 swaps, 3 hits. not great. let's add some slots. maybe we
	    can do better

	phys_slot   A B C D A B E A B C D E 
	S1          A       h   E       D
	S2            B       h   A       E
	S3              C           B
	S4                D           C

	   10 swaps, 2 hits. this is worse. 

        --do these anomalies always happen?

	    --answer: no. with policies like LRU, contents of memory with X
	    pages is subset of contents with X+1 pages

    --all things considered, LRU is pretty good. let's try to implement
    it......

    --implementing LRU 

	--reasonable to do in application programs like Web servers that
	cache pages (or dedicated Web caches).
	    [use queue to track least recently accessed and use hash map
	    to implement the (k,v) lookup]

	--in OS, LRU itself does not sound great. would be doubling
	memory traffic (after every reference, have to move some
	structure to the head of some list)

	--and in hardware, it's way too much work to timestamp each
	reference and keep the list ordered (remember that the TLB may
	also be implementing these solutions)

    --how can we approximate LRU?

    --another algorithm:
        * CLOCK

	--arrange the slots in a circle. hand sweeps around, clearing
	a bit. the bit is set when the page is accessed. just evict a
	page if the hand points to it when the bit is clear.
    
	--approximates LRU ... because we're evicting pages that haven't
	been used in a while....though of course we may not be evicting
	the *least* recently used one (why not?)

    --can generalize CLOCK:
        * NTH CHANCE

	--don't throw a page out until the hand has swept by N times.

	--OS keeps counter per page: # sweeps

	--On page fault, OS looks at page pointed to by the hand,
	and checks that page's use bit
	    1 --> clear use bit and clear counter
	    0 --> increment counter
		if counter < N, keep going
		if counter = N, replace the page: it hasn't been used in
		  a while

	--How to pick N?
	    Large N --> better approximation to LRU
	    Small N --> more efficient. otherwise going around the
	    circle a lot (might need to keep going around and around
	    until a page's counter gets set = to N)

	--modification:

	    --dirty pages are more expensive to evict (why?)

	    --so give dirty pages an extra chance before replacing

	    common approach (supposedly on Solaris but I don't know):
	    --clean pages use N = 1
	    --dirty pages use N = 2 
		(but initiate write back when N=1, i.e., try to get the
		page clean at N=1)


    --Summary:

	--optimal is known as OPT or MIN (textbook asserts but doesn't
	prove optimality)

	--LRU is usually a good approximation to optimal

	--Implementing LRU in hardware or at OS/hardware interface is a
	pain

	--So implement CLOCK or NTH CHANCE ... decent approximations to
	LRU, which is in turn good approximation to OPT *assuming that
	past is a good predictor of the future* (this assumption does
	not always hold!)

    Fairness

	--if OS needs to swap a page out, does it consider all pages in one
	pool or only those of the process that caused the page fault? 

	--what is the trade-off between local and global policies?

	    --global: more flexible but less fair

	    --local: less flexible but fairer

3. Thrashing

    [The points below apply to any caching system, but for the sake of
    concreteness, let's assume that we're talking about page replacement
    in particular.]

    What is thrashing?

    Processes require more memory than system has

    Specifically, each time a page is brought in, another page, whose
    contents will soon be referenced, is thrown out

    Example:

        --one program touches 50 pages (each equally likely); only 
          have 40 physical page frames 
        
        --If we have enough physical pages, 100ns/ref 
     
        --If we have too few physical pages, assume every 5th
        reference leads to a page fault 
     
        --4refs x 100ns  and 1 page fault x 10ms for disk I/O 

        --this gets us
        5 refs per (10ms + 400ns) = 2ms/ref = 20,000x slowdown!!! 
     

    --What we wanted: virtual memory the size of disk with access
    time the speed of physical memory 

    --What we have here: memory with access time roughly of disk
    (2 ms/mem_ref compare to 10 ms/disk_access)

    As stated earlier, this concept is much larger than OSes: need
    to pay attention to the slow case if it's really slow and common
    enough to matter.


    Reasons/cases:

    --process doesn't reuse memory (or has no temporal locality)

    --process reuses memory but the memory that is absorbing
    most of the accesses doesn't fit.

    --individually, all processes fit, but too much for the system

    what do we do?

    --well, in the first two reasons above, there's nothing you can
    do, other than restructuring your computation or buying memory
    (e.g., expensive hardware that keeps entire customer database in
    RAM)

    --in the third case, can and must shed load. how?
    
    two approaches:
    a. working set
    b. page fault frequency

    a. working set

    --only run a set of processes s.t. the union of their
    working sets fit in memory

    --definition of working set (short version): the pages a
    processed has touched over some trailing window of time

    b. page fault frequency

    --track the metric (# page faults/instructions executed)

    --if that thing rises above a threshold, and there is not enough
    memory on the system, swap out the process


    moral of the story is:
    that if the workload is not cache-friendly, the policy is
    irrelevant.
    
        --> in that case, need to restructure computation, do less work,
        or buy more hardware