Class 14
CS 170
27 May 2020

On the board
------------

1. Last time
2. Page replacement policies continued
3. Thrashing
4. Disks

---------------------------------------------------------------------------

1. Last time

    --finished memory unit

    --discussed uses of page faults, costs of page faults

    --started discussing cache eviction policies, regarding RAM as a cache of the
    disk

2. Page replacement policies (continued)

    --evaluating replacemnt algorithms
	
	input
	--reference string: sequence of page accesses
	--cache (e.g., physical memory) size 

	output
	--number of cache evictions (e.g., number of swaps)

    --examples......

	--time goes left to right. 
	--cache hit = h

        ------------------------------------

	FIFO

	phys_slot    A B C A B D A D B C B
	S1           A     h   D   h   C 
	S2             B     h   A 
	S3               C           B   h

		7 swaps, 4 hits

        ------------------------------------

	OPTIMAL

	phys_slot    A B C A B D A D B C B
	S1           A     h     h     C
	S2             B     h       h   h
	S3               C     D   h

		5 swaps, 6 hits

        ------------------------------------

       * LRU: throw out the least recently used (this is often a good
	idea, but it depends on the future looking like the past. what
	if we chuck a page from our cache and then were about to use
	it?)


	LRU

	phys_slot    A B C A B D A D B C B
	S1           A     h     h     C
	S2             B     h       h   h
	S3               C     D   h

		5 swaps, 6 hits

        --LRU looks awesome!

        --but what if our reference string were ABCDABCDABCD?

	phys_slot   A B C D A B C D A B C D 
	 S1         A     D     C     B
	 S2           B     A     D     C
	 S3             C     B     A     D

	 12 swaps, 0 hits. BUMMER.

        --same thing happens with FIFO.

        --what about OPT? [not as much of a bummer at all.]

        --other weirdness: Belady's anomaly: what happens if you add memory
        under a FIFO policy?

	phys_slot   A B C D A B E A B C D E 
	S1          A     D     E         h
	S2            B     A     h   C
	S3              C     B     h   D

	    9 swaps, 3 hits. not great. let's add some slots. maybe we
	    can do better

	phys_slot   A B C D A B E A B C D E 
	S1          A       h   E       D
	S2            B       h   A       E
	S3              C           B
	S4                D           C

	   10 swaps, 2 hits. this is worse. 

        --do these anomalies always happen?

	    --answer: no. with policies like LRU, contents of memory with X
	    pages is subset of contents with X+1 pages

    --all things considered, LRU is pretty good. let's try to implement
    it......

    --implementing LRU 

	--reasonable to do in application programs like Web servers that
	cache pages (or dedicated Web caches).
	    [use queue to track least recently accessed and use hash map
	    to implement the (k,v) lookup]

	--in OS, LRU itself does not sound great. would be doubling
	memory traffic (after every reference, have to move some
	structure to the head of some list)

	--and in hardware, it's way too much work to timestamp each
	reference and keep the list ordered (remember that the TLB may
	also be implementing these solutions)

    --how can we approximate LRU?

    --another algorithm:
        * CLOCK

	--arrange the slots in a circle. hand sweeps around, clearing
	a bit. the bit is set when the page is accessed. just evict a
	page if the hand points to it when the bit is clear.
    
	--approximates LRU ... because we're evicting pages that haven't
	been used in a while....though of course we may not be evicting
	the *least* recently used one (why not?)

    --can generalize CLOCK:
        * NTH CHANCE

	--don't throw a page out until the hand has swept by N times.

	--OS keeps counter per page: # sweeps

	--On page fault, OS looks at page pointed to by the hand,
	and checks that page's use bit
	    1 --> clear use bit and clear counter
	    0 --> increment counter
		if counter < N, keep going
		if counter = N, replace the page: it hasn't been used in
		  a while

	--How to pick N?
	    Large N --> better approximation to LRU
	    Small N --> more efficient. otherwise going around the
	    circle a lot (might need to keep going around and around
	    until a page's counter gets set = to N)

	--modification:

	    --dirty pages are more expensive to evict (why?)

	    --so give dirty pages an extra chance before replacing

	    common approach (supposedly on Solaris but I don't know):
	    --clean pages use N = 1
	    --dirty pages use N = 2 
		(but initiate write back when N=1, i.e., try to get the
		page clean at N=1)


    --Summary:

	--optimal is known as OPT or MIN (textbook asserts but doesn't
	prove optimality)

	--LRU is usually a good approximation to optimal

	--Implementing LRU in hardware or at OS/hardware interface is a
	pain

	--So implement CLOCK or NTH CHANCE ... decent approximations to
	LRU, which is in turn good approximation to OPT *assuming that
	past is a good predictor of the future* (this assumption does
	not always hold!)

    Fairness

	--if OS needs to swap a page out, does it consider all pages in one
	pool or only those of the process that caused the page fault? 

	--what is the trade-off between local and global policies?

	    --global: more flexible but less fair

	    --local: less flexible but fairer

3. Thrashing

    [The points below apply to any caching system, but for the sake of
    concreteness, let's assume that we're talking about page replacement
    in particular.]

    What is thrashing?

    Processes require more memory than system has

    Specifically, each time a page is brought in, another page, whose
    contents will soon be referenced, is thrown out

    Example:

        --one program touches 50 pages (each equally likely); only 
          have 40 physical page frames 
        
        --If we have enough physical pages, 100ns/ref 
     
        --If we have too few physical pages, assume every 5th
        reference leads to a page fault 
     
        --4refs x 100ns  and 1 page fault x 10ms for disk I/O 

        --this gets us
        5 refs per (10ms + 400ns) = 2ms/ref = 20,000x slowdown!!! 
     

    --What we wanted: virtual memory the size of disk with access
    time the speed of physical memory 

    --What we have here: memory with access time roughly of disk
    (2 ms/mem_ref compare to 10 ms/disk_access)

    As stated earlier, this concept is much larger than OSes: need
    to pay attention to the slow case if it's really slow and common
    enough to matter.


    Reasons/cases:

    --process doesn't reuse memory (or has no temporal locality)

    --process reuses memory but the memory that is absorbing
    most of the accesses doesn't fit.

    --individually, all processes fit, but too much for the system

    what do we do?

    --well, in the first two reasons above, there's nothing you can
    do, other than restructuring your computation or buying memory
    (e.g., expensive hardware that keeps entire customer database in
    RAM)

    --in the third case, can and must shed load. how?
    
    two approaches:
    a. working set
    b. page fault frequency

    a. working set

    --only run a set of processes s.t. the union of their
    working sets fit in memory

    --definition of working set (short version): the pages a
    processed has touched over some trailing window of time

    b. page fault frequency

    --track the metric (# page faults/instructions executed)

    --if that thing rises above a threshold, and there is not enough
    memory on the system, swap out the process


    moral of the story is:
    that if the workload is not cache-friendly, the policy is
    irrelevant.
    
        --> in that case, need to restructure computation, do less work,
        or buy more hardware

4. Disks

    Disks are *the* bottleneck in many systems
    (although this becomes less and less true every year, as solid
    state drives, or SSDs, become cheaper and cheaper)

    [Reference: "An Introduction to Disk Drive Modeling",
    by Chris Ruemmler and John Wilkes. IEEE Computer 1994, Vol. 27,
    Number 3, 1994. pp17-28.]

    [Reference: "An Introduction to Disk Drive Modeling",
    by Chris Ruemmler and John Wilkes. IEEE Computer 1994, Vol. 27,
    Number 3, 1994. pp17-28.]

    What is a disk?

    --stack of magnetic platters

	--Rotate together on a central spindle @3,600-15,000 RPM
	
	--Drive speed drifts slowly over time

	--Can't predict rotational position after 100-200 revolutions

	
	 -------------
	|          platter
	| ------------
	|
	|
	| ------------
	|          platter
	| ------------
        |
	|
	| ------------
	|          platter
	| ------------
	|          

    --Disk arm assembly

	--Arms rotate around pivot, all move together

	--Pivot offers some resistance to linear shocks

	--Arms contain disk heads--one for each recording surface

	--Heads read and write data to platters

    [interlude: why are we studying this?

        disks are still widely in use everywhere, and will be for some
        time. Google, Facebook, etc. all still pack their data centers
        full of cheap, old disks. Also, for them, disk failure is the
        common case, not the random/weird case, (they have so many disks
        that it only makes sense that they would be failing relatively
        often) so they can't cram their datacenters with expensive SSDs.

        As a second point, it's technical literacy; many filesystems
        were designed with the disk in mind (sequential significantly
        better than random). You have to know how these things work as a
        computer scientist and as a programmer.

    ]

    To be continued...