Data management systems and technologies have historically played a pivotal role in the context of computing environments that involve large volumes of data and information. In fact, data base management systems (DBMS) are the critical components of most data-intensive application infrastructure. Furthermore, the underlying technologies, both in terms of language and query models as well as with respect to the system architectures, have reached a level of maturity that has enabled its use as a plug-and-play component without the need for detailed learning of its internals.
During the past decade, however, the entire area of data-management especially as it pertains to large-scale data arising from Internet and Web-based applications is at the cross-roads. The main question in this debate is the effectiveness of old DBMS paradigms: declarative query languages, independence of logical and physical data model, and the computational framework based on the Transaction concept. Several of the large Internet companies such as Google, Yahoo, and Amazon have put forth competing solutions for both building data-intensive scalable applications over the Internet/Web as well as for large-scale data anlaytics.
In the past couple of years, however, as cloud computing has become a pervasive technology and much of our data is being stored and managed in the cloud, many of these proponents of new data management technology are having a change of heart. Some of these issues have arisen due to recent events where datacenter failures have resulted in loss of user data. As a result, new architectures are emerging that rely on traditional DBMS abstractions albeit with a new twist. I urge you all to peruse the following presentation: Google's Multi-datacenter Architecture: Spanner.During this quarter, we will begin a joint exploration to gain a deeper understanding to participate in this debate. In particular, the following topics will be covered:
The detailed lecture organization for the course appears below.
Pre-requisites: CMPSC 170.
Required Textbook: Transactional Information Systems by Gerhard WEIKUM and Gottfried VOSSEN
Instructor: Divy Agrawal, agrawal AT cs.ucsb.edu
Office hours: Tuesday Thursday 11:00AM - 12:00noon, 3117 Harold Frank Hall, and by appointment.
Teaching Assistant:
Date | Topic | Related Reading | Comments |
Tu: 1/08/2013 | No Class: Please review enclosed online resources | Data Management Challenges | Jeff Dean, Google |
Scalable Data Management BLOG | James Hamilton, Amazon | ||
Data in the Cloud | Raghu Ramakrishnan, Microsoft (formerly Yahoo!) | ||
Th: 1/10/2013 | Data Management Issues in Data-intensive Computing | Part I: Overview; Part II:: The Transaction Model | Historical Overview, Motivation, and the Transaction Model |
Tu: 1/15/2013 | Data Management for Enterprise Applications | Lecture #2: Database Computation Model; | Database Correctness |
Th: 1/17/2013 | Data Management for Enterprise Applications | Lecture #3: Equivalence of Executions | Correctness models for Transaction Execution; Homework #1 Assigned |
Tu: 1/22/2013 | Data Management for Enterprise Applications | Lecture #6: Transaction Correctness | Conflict Serializability and Serialization Graph |
Th: 1/24/2013 | Data Management for Enterprise Applications | Lecture #7: Concurrency Control Protocols | Two-phase locking Homework #2 Assigned |
Tu: 1/29/2013 | Data Management for Enterprise Applications | Lecture #8: Non-locking Protocols | Timestamp Ordering & Optimistic Protocols |
Th: 1/31/2013 | Data Management for Enterprise Applications | Lecture #9: Recovery Protocols | Database Recovery from Crash Failures; Homework #2 Due |
Tu: 2/05/2013 | Data Management for Enterprise Applications | Lecture #9: Recovery Protocols | Database Recovery from Crash Failures; Homework #2 Due |
Th: 2/07/2013 | Data Management for Enterprise Applications | Lecture #10: Distributed Recovery | Data Distribution & Data Replication |
Tu: 2/12/2013 | Scalabale Data Management in the Cloud | Lecture #4: Data in the Cloud | Key-value Stores |
Th: 2/14/2013 | Scalable Data Management in the Cloud | Lceture #5: Scalable Data in the Cloud | Google Stack: BigTable, Google GFS, & Chubby |
Tu: 2/19/2013 | Data Management for Internet Applications | Powerpoint Slides | Yahoo's PNUTS & Amazon's Dynamo |
Th: 2/21/2013 | Data Management for Internet Applications | Powerpoint Slides | Data Fission: ElasTras and Data Fusion: Gstore |
Tu: 2/26/2013 | Data Management for Internet Applications | Powerpoint Slides | Microsoft SQL Azure, Google Megastore, and Relational Cloud |
Th: 2/28/2013 | Large-scale Data Analysis in the Enterprise Context | Data Warehousing | Data Warehousing Fundamentals |
Tu: 3/5/2013 | Large-scale Data Analysis in the Enterprise Context | OLAP & Data Cube | Online Analytical Processing and the Data Cube Model |
Th: 3/7/2013 | Large-scale Data Analysis in the Internet Context | MapReduce | The MapReduce Paradigm |
Tu: 3/12/2013 | Multi-Datacenter Issues | Hatem & Faisal | Megastore, Spanner, MDCC, Replicated Transactions, & Message Futures |
Th: 3/14/2013 | Industry Presentation | Dipti Borkar, Director | Couchbase |