Computer Science Seminar
Where’d My Photos Go? Challenges in Preserving Digital Data for the Long Term
Seminar Slides: Download (PDF)
When: Thursday, December 12, 2013
                     Where: PGH 232
                     Time: 11:00 AM
Speaker: Prof. Ethan L. Miller, University of California, Santa Cruz
Host: Prof. Edgar Gabriel
Society’s memory was once stored on analog media, which survive benign neglect relatively well. However, our culture's heritage is now being generated and stored digitally, potentially presenting problems for long-term storage. However, digital media require relatively constant maintenance to guard against threats ranging from device failure to format obsolescence to storage security to insufficient funding. Preserving our digital heritage in the face of these challenges, thus avoiding a "digital dark age", is thus perhaps one of the most critical challenges faced by computer systems researchers.
This talk will provide an overview of the problem of archival storage, relating current practices in the digital realm to long-established techniques from analog media. It will then discuss trade-offs that must be made over the lifetime of the stored data, which far exceeds the service life of any single component in the system, along with approaches to designing long-term preservation systems that leverage new technologies such as network-attached disk and flash memory to decrease cost and improve long-term security. The talk will also cover our research on the use of modeling and Monte Carlo simulation to gauge the impact of design trade-offs and less common events on the likelihood that data will survive for the desired lifetime given an initial funding level. We are also exploring the effects of long-term trends such as storage cost and density, power cost, and even changes in interest rates on data survivability. By combining new design technologies with an understanding of how system costs may vary over the long term, we hope to provide techniques to guarantee that the vast quantity of data we are now generating will be available to future generations.
Bio:
Ethan L. Miller is a Professor of Computer Science at the University of California,
                     Santa Cruz, where he is the  Director of the NSF I/UCRC Center for Research in Storage
                     Systems (CRSS) and Associate Director of the Storage Systems Research Center (SSRC).
                     He received his ScB from Brown in 1987 and his PhD from UC Berkeley in 1995, and has
                     been on the UC Santa Cruz faculty since 2000.  He has written over 120 papers covering
                     topics such as archival storage, file systems for high-end computing, metadata and
                     information retrieval, file systems performance, secure file systems, and distributed
                     systems.  He was a member of the team that developed Ceph, a scalable high-performance
                     distributed file system for scientific computing that is now being adopted by several
                     high-end computing organizations.  His work on reliability and security for scalable
                     and distributed storage is also widely recognized, as is his work on secure, efficient
                     long-term archival storage and scalable metadata systems.
His current research projects, which are funded by the National Science Foundation, Department of Energy, and industry support for the CRSS and SSRC, include long-term archival storage systems, scalable metadata and indexing structures, high performance petabyte-scale storage systems, and file systems for non-volatile memory technologies. Prof. Miller's broader interests include file systems, parallel and distributed systems, operating systems, and computer security. In addition to research and teaching in storage systems and operating systems, Prof. Miller has worked with industry to help move research results into commercial use at companies such as Symantec, NetApp, and EMC; he has been working with Pure Storage since 2009 on designing purpose-built enterprise block storage from commodity flash drives.