Node removal / failure
Simple detection of pointer corruption and node failure
Recovery from node failure and data corruption
- Mark node pointer with invalid tag
- Use next closest sibling of failed node
Invalid pointers has second chance time-to-live
- Failures expected to recover within finite timeframe
- Entries marked invalid with countdown timer
- Each request has some chance of being forwarded to invalid node, in order to check if recovery has been completed
- Referrer tracks traffic to failed node and assigns eachpacket a “validation” probability
- Restarted node notifies referrers to remove invalid tag
- Nodes which fail to recover within timer period must reinsert as new nodes
Node removals = intentional exits from system
- Actively announce removal to referrers,invalidation skipped
- Referrers maintain backups by requesting another sibling ptr