Log-Structured File System

Zhaoyu Luo bio photo By Zhaoyu Luo

Introduction

Principle: collect large amounts of new data in a file cache in main memory, then write the data to disk in a single large I/O that can use all of the disk bandwidth

Design for file systems of the 1990s

observations in 1990s:

  • Memory sizes were growing
  • There was a large and growing gap between random I/O performance and sequential I/O performance
  • Existing file systems perfrom poorly on many common workloads: create new file needs to deal with inode, inode bitmap, directory data block, directory inode, new data block, data bitmap
  • File systems were not RAID-aware: a logical write to a single block causes 4 physical I/O

Log-structured file systems

  • segment == large chunk
  • How much to buffer: Data_in_memory_size = F / (1 - F) * Rpeak * Tseek

segment cleaning

  • read M existing segments, compact into N new segments (N < M), write N segments to new locations to avoid fragmentations
  • clean code segment rather than hot (frequently used), but it is not the best approach

Crash recovery

LFS will choose the most recent check point that is consistent, the check point region update procedure:

  1. write the header, timestamp
    • the body of check point region
    • one last block (with a timestamp)

Checkpoints

  • Checkpoints is the summary of every segment
  • LFS writes check point region every 30s, so the last seconds of updates would be lost in crash

Roll-forward

  • start with the last check point region, find the log, read through next segment to find valid update. Update as much as possible

Reviews

  • OSTEP-LFS
  • inode map solves the recursive update problem by updating only the directory? (is this a problem in other FS?)
  • conventional file system normally keep files contiguously with in-place writes, while LSF tends to use unused disk (Copy-on-write)
  • in SSD, many flash based devices can not rewrite part of a block, so they have to erase cycle of each block before being able to re-write. This fits the Copy-on-write feature which circumvents in-place rewrite

disadvantage

  • old copies are scattered throughout the disk, continuous cleaning is important

Notes

  • Why LFS could lose data?
    • LFS does not want a transaction
    • It is not in a high reliable situation, it is ok to lose some seconds, it is loose reliable
  • Name mangling