Course Syllabus

CMPSC 274: Advanced Topics on Databases

Data Management issues for Data-intensive Computing

WINTER 2013: TuTh 9:00 - 10:30  PHELPS 1401

Class Website: http://www.cs.ucsb.edu/~agrawal/Winter2013/cmpsc274

Course Description

Data management systems and technologies have historically played a pivotal role in the context of computing environments that involve large volumes of data and information. In fact, data base management systems (DBMS) are the critical components of most data-intensive application infrastructure. Furthermore, the underlying technologies, both in terms of language and query models as well as with respect to the system architectures, have reached a level of maturity that has enabled its use as a plug-and-play component without the need for detailed learning of its internals.

During the past decade, however, the entire area of data-management especially as it pertains to large-scale data arising from Internet and Web-based applications is at the cross-roads. The main question in this debate is the effectiveness of old DBMS paradigms: declarative query languages, independence of logical and physical data model, and the computational framework based on the Transaction concept. Several of the large Internet companies such as Google, Yahoo, and Amazon have put forth competing solutions for both building data-intensive scalable applications over the Internet/Web as well as for large-scale data anlaytics.

In the past couple of years, however, as cloud computing has become a pervasive technology and much of our data is being stored and managed in the cloud, many of these proponents of new data management technology are having a change of heart. Some of these issues have arisen due to recent events where datacenter failures have resulted in loss of user data. As a result, new architectures are emerging that rely on traditional DBMS abstractions albeit with a new twist. I urge you all to peruse the following presentation: Google's Multi-datacenter Architecture: Spanner.

During this quarter, we will begin a joint exploration to gain a deeper understanding to participate in this debate. In particular, the following topics will be covered:

The detailed lecture organization for the course appears below.

Pre-requisites: CMPSC 170.

Required Textbook: Transactional Information Systems by Gerhard WEIKUM and Gottfried VOSSEN

Instructor: Divy Agrawal, agrawal AT cs.ucsb.edu

Office hours: Tuesday Thursday 11:00AM - 12:00noon, 3117 Harold Frank Hall, and by appointment.

Teaching Assistant:

Grading:

Reference Books

Reference Papers and Articles

CMPSC 274 Course Outline (approximate):
Date Topic Related Reading Comments
Tu: 1/08/2013 No Class: Please review enclosed online resources Data Management Challenges Jeff Dean, Google
Scalable Data Management BLOG James Hamilton, Amazon
Data in the Cloud Raghu Ramakrishnan, Microsoft (formerly Yahoo!)
Th: 1/10/2013 Data Management Issues in Data-intensive Computing Part I: Overview; Part II:: The Transaction Model Historical Overview, Motivation, and the Transaction Model
Tu: 1/15/2013 Data Management for Enterprise Applications Lecture #2: Database Computation Model; Database Correctness
Th: 1/17/2013 Data Management for Enterprise Applications Lecture #3: Equivalence of Executions Correctness models for Transaction Execution; Homework #1 Assigned
Tu: 1/22/2013 Data Management for Enterprise Applications Lecture #6: Transaction Correctness Conflict Serializability and Serialization Graph
Th: 1/24/2013 Data Management for Enterprise Applications Lecture #7: Concurrency Control Protocols Two-phase locking Homework #2 Assigned
Tu: 1/29/2013 Data Management for Enterprise Applications Lecture #8: Non-locking Protocols Timestamp Ordering & Optimistic Protocols
Th: 1/31/2013 Data Management for Enterprise Applications Lecture #9: Recovery Protocols Database Recovery from Crash Failures; Homework #2 Due
Tu: 2/05/2013 Data Management for Enterprise Applications Lecture #9: Recovery Protocols Database Recovery from Crash Failures; Homework #2 Due
Th: 2/07/2013 Data Management for Enterprise Applications Lecture #10: Distributed Recovery Data Distribution & Data Replication
Tu: 2/12/2013 Scalabale Data Management in the Cloud Lecture #4: Data in the Cloud Key-value Stores
Th: 2/14/2013 Scalable Data Management in the Cloud Lceture #5: Scalable Data in the Cloud Google Stack: BigTable, Google GFS, & Chubby
Tu: 2/19/2013 Data Management for Internet Applications Powerpoint Slides Yahoo's PNUTS & Amazon's Dynamo
Th: 2/21/2013 Data Management for Internet Applications Powerpoint Slides Data Fission: ElasTras and Data Fusion: Gstore
Tu: 2/26/2013 Data Management for Internet Applications Powerpoint Slides Microsoft SQL Azure, Google Megastore, and Relational Cloud
Th: 2/28/2013 Large-scale Data Analysis in the Enterprise Context Data Warehousing Data Warehousing Fundamentals
Tu: 3/5/2013 Large-scale Data Analysis in the Enterprise Context OLAP & Data Cube Online Analytical Processing and the Data Cube Model
Th: 3/7/2013 Large-scale Data Analysis in the Internet Context MapReduce The MapReduce Paradigm
Tu: 3/12/2013 Multi-Datacenter Issues Hatem & Faisal Megastore, Spanner, MDCC, Replicated Transactions, & Message Futures
Th: 3/14/2013 Industry Presentation Dipti Borkar, Director Couchbase