Class 10 
CS 170
6 May 2020

On the board
------------

1. Last time
2. I/O architecture
3. CPU/device interaction
   --Mechanics of communication
   --Polling vs interrupts
   --DMA vs. programmed I/O
4. Software architecture: device drivers

---------------------------------------------------------------------------

1. Last time

    --scheduling

    --today: I/O: kernel/device, and user/kernel

2. I/O architecture (high-level)

    general:
    [draw picture: CPU, Mem, Crossbar]

    Intel's z270: 
    [draw picture: book's Figure 36.2]

    devices:
    [draw picture: book's Figure 36.3]

        lots of details.
        fun to play with.
        registers that do different things when read vs. written.

3. CPU/device interaction (can think of this as kernel/device
interaction)

    A. Mechanics of communication

        (a) explicit I/O instructions
            
            outb, inb, outw, inw

            examples:

            (i) WeensyOS boot.c. see handout
            
                focus on readsect(), waitdisk()

                compare to Figures 36.5 and 36.6 in the book

                the code on the handout is the bootloader, which is 
                reading the WeensyOS kernel from disk to memory.

            (ii) reading keyboard input. see handout

                console_read_digit();

        (b) memory-mapped I/O

            physical address space is mostly ordinary RAM

	    low-memory addresses (650K-1MB) actually refer to other
	    things. 

	    You as a programmer read/write from these addresses using
	    loads and stores. But they aren't "real" loads and stores to
	    memory. They turn into other things: read device registers,
	    send instructions, read/write device memory, etc.

                --interface is the same as interface to memory
                (load/store)

                --but does not behave like memory 

		    + Reads and writes can have "side effects"

		    + Read results can change due to external events 

	    Example: writing to VGA or CGA memory makes things appear on
	    the screen.

	    See handout

                console_putc()

                (this is called by console_printf().)
           
            Some notes about memory-mapped I/O

                avoid confusion: this is not the same thing as
                virtual memory. this is talking about the *physical*
                address.

                    --> is this an abstraction that the OS provides to
                    others or an abstraction that the hardware is
                    providing to the OS?  [the latter]

    B. Polling vs. interrupts

        So far, in our examples, the CPU has been busy waiting. This is
        fine for these examples, but higher bandwidth devices (disks,
        network cards, etc.) need different techniques.

        Polling: check back periodically 

            kernel...
            
           - ... sent a packet? Periodically ask the card when the buffer is
             free.

           - ... waiting for a packet? Periodically ask whether there is
             data

           - ... did Disk I/O? Periodically ask whether the disk is done.

            Disadvantages: wasted CPU cycles and higher latency

        Interrupts: The device interrupts the CPU when its status
        changes (for example, data is ready, or data is fully written).

            (The interrupt controller itself is initialized with I/O
            instructions; if you're curious, see the function
            interrupt_controller_init() in WeensyOS's x86.c.)

            This is what most general-purpose OSes do. There is a
            disadvantage, however. This could come up if you need to
            build a high-performance system.

            Namely: If interrupt rate is high, then the computer can
            spend a lot of time handling interrupts (interrupts are
            expensive because they generate a context switch, and the
            interrupt handler runs at high priority).

                --> in the worst case, you can get *receive livelock*
                where you spend 100% of time in interrupt handler but no
                work gets done.
 
        This tradeoff comes up everywhere....

        How to design systems given these tradeoffs? Start with
        interrupts. If you notice that your system is slowing down
        because of livelock, then switch to polling. If polling is chewing
        up too many cycles, then move towards an adaptive switching
        between interrupts and polling. (But of course, never optimize
        until you actually know what the problem.) A classic reference
        on this subject is the paper 
            "Eliminating Receive Livelock in an Interrupt-driven
            Kernel", by Mogul and Ramakrishnan, 1996.
        
        We have just seen two approaches to synchronizing with
        hardware:

            polling
            interrupts

    C. DMA vs. programmed I/O

        Programmed I/O: what we have been seeing so far: CPU writes data
        directly to device, and reads data directly from device.

	DMA: better way for large and frequent transfers.

	    CPU (really, device driver programmer) places some buffers
	    in main memory.

	    Tells device where the buffers are 

	    Then "pokes" the device by writing to register

            Then device uses *DMA* (direct memory access) to read or
            write the buffers,

            The CPU can poll to see if the DMA completed (or the device
            can interrupt the CPU when done).

            [rough picture:
	       buffer descriptor list
	       <metadata> --> [  buf ]
	       <metadata> --> [  buf ]
	       ....
            ]

        This makes a lot of sense. Instead of having the CPU
        constantly dealing with a small amount of data at a time, the
        device can simply write the contents of its operation straight
        into memory.

        NOTE: book couples DMA to interrupts, but things don't have to
        work like that. You could have all four possibilities in
        {DMA, programmed I/O} x {polling, interrupts}. 
        
            For example, (DMA, polling) would mean requesting a DMA
            and then later polling to see if the DMA is complete.


4. Software architecture: device drivers

    The examples on the handout are simple device drivers.

    Device drivers in general solve a software engineering problem ...

        [draw a picture]

        expose a well-defined interface to the kernel, so that the
        kernel can call comparatively simple read/write calls or
        whatever.

        For example, reset, ioctl, output, read, write,
        handle_interrupt()

        this abstracts away nasty hardware details so that the kernel
        doesn't have to understand them.

        When you write a driver, you are implementing this interface,
        and also calling functions that the kernel itself exposes

    ... but device drivers also *create* software engineering problems.
    Fundamental issues:

        Each device driver is per-OS and per-device (often can't reuse
        the "hard parts")

        They are often written by the device manufacturer (core
        competence of device manufacturers is hardware development, not
        software development).

        Under conventional kernel architectures, bugs in device drivers
        -- and there are many, many of them -- bring down the entire
        machine.

    So we have to worry about potentially sketchy drivers ...

    ... but we also have to worry about potentially sketchy devices.

        a buggy network card can scribble all over memory 
        (solution: use IOMMU; advanced topic)

        plug in your USB stick: claims to be a keyboard; starts issuing
        commands. (IOMMU doesn't help you with this one.)

        plug in a USB stick: if it's carrying a virus (aka malware),
        your computer can now be infected. (Iranian nuclear reactors are
        thought to have been attacked this way. Unfortunately for us,
        the same attacks could work against our power plants, etc.)