Wizcode was founded in 2001. We now have more than 100,000 customers worldwide.

Flash Memory Fragmentation – Myths and Facts

by Anton Tomov on Jun 01


Flash Memory Fragmentation and PerformanceA quick Google search shows many "expert" suggestions that defragmentation of solid state drives should give no difference in performance while increasing the wear of flash cards and even possibly killing them.

 

Myth: Flash cards unlike hard drives do not have movable parts so defragmentation is useless.

 

On theory, since a flash drive does not have moving parts, its access time is independent on where the data is stored, which is why supposedly flash drives don't need defragmenting.

 

The fact is that flash memory is physically organized in blocks (or pages) of data, usually 128K or 256K large. Things get even worse from the fact that in order to change even one single byte, the entire page has to be first erased and then re-written with its contents again. In our example the time needed to change one byte of information is calculated the following way:

 

T = R + E + W

 

T is the total time, R is the time needed to read the entire flash page containing the byte we wish to change, E is the time required to erase the page and W is the time it takes for the data to be written back to the empty page. Not only we had to read 128KB in order to change a single byte but we also had to erase the entire block (which is very slow) and then write 128KB over again.

 

To complicate matters even further, it must be noted that there are additional layers between the flash card controller and the file system that cache pages being read and written to. The cache serves for improving performance. It is a simple trade-off between read/write performance and some RAM being used to cache the pages. It is most effective to read or write entire flash pages performance wise. When the operating system instructs the controller to read a particular sector on the card the cache normally retrieves the entire block and stores it internally. What this means is that information that is stored in a contiguous matter is more likely to be found in the cache than non-contiguous information.

 

Suppose we have a file that is 263892 bytes large and is fragmented. On a FAT32 file system using 1K cluster the file will occupy 260 clusters. In the worst case scenario the clusters will be dispersed across 260 different flash pages. If the file is contiguous all 260 clusters will fit inside 3 flash pages. Caching of the fragmented file will be impossible as it won't fit in the cache (260 pages will require 33MB of RAM to cache) while the defragmented file will fit in just 384K.

 

And finally the FAT file system stores folders the same way files are being stored - in cluster chains. A large folder that is fragmented and is not cached represents a huge performance penalty for standard file operations like listing the files in that folder, renaming or even deleting a file.

 

Conclusion: Fragmentation has a serious impact on flash card performance especially during write operations and when the file system is heavily fragmented across many different flash pages.

 

Myth: Defragmentation shortens flash memory life span.

 

The truth is that flash memory has a limited number of write cycles. There are two types of flash chips used - NOR and NAND. Flash cards using NOR chips have a life span of approximately 300,000 write cycles while the ones utilizing NAND chips can stand up about 1,000,000 write cycles. NAND had historically less possible write cycles, but is catching up lately as the technology is improving. All fast CF cards are based on NAND flash (NOR flash is slower but has other advantages as processors can boot from it directly).

 

Just because a flash chip has a given write cycle rating, it doesn't mean that the chip will self-destruct as soon as that threshold is reached. It means that a flash chip with 1 million erase/write endurance threshold limit will have only 0.02 percent of the sample population turn into a bad block when the write threshold is reached for that block.

 

The flash card controller is tracking how often it wrote to a flash sector and is trying to level the wear of the sectors. This is called wear leveling. The storage card controller monitors how many times each disk block has been written. When a given block has been written above a certain percentage threshold, the solid state flash drive will (in the background, avoiding performance decreases) swap the data in that block with the data in a block that has exhibited a "read-only-like" characteristic.

 

Simple math shows that even if you write to ALL sectors of a flash card about 10 times a day (that is 20 GB of written data for an average 2GB flash card daily!) it will take about 82 years before 0.02% of the card blocks will turn into bad blocks.

 

Since defragmentation involves moving non-contiguous clusters belonging to a file chain to a contiguous cluster space the process indeed does wear the storage card. However modern flash cards have a great tolerance and will last for many years of heavy writing before sectors start becoming “bad”. Built-in wear leveling will further minimize card wear. Also once a file chain becomes contiguous (defragmented) it won’t be moved again during the next defragmentation.

 

Conclusion: Defragmentation indeed increases the number of flash media write-cycles. With modern flash cards life cycle and wear leveling this does not represent a potential problem as it takes decades for the sectors to start becoming bad.

 

Myth: Backing up the data of a flash card, formatting and then restoring it again will produce a file system free of fragmentation.

 

There are many opinions that instead of using defragmentation software one can defragment a storage card by performing a full backup of its contents, performing a fresh format and then restoring the files from the backup on the card again. The idea is that the operating system will lay the files in a contiguous way across the card and will therefore produce a free of fragmentation file system.

 

  • Folders will become heavily fragmented. While the files are likely to be contiguous the same does not apply for folders. The reason is that folders are initially created small (usually one cluster only) and later on as new files are being added the OS extends the folders by adding new clusters to their chains. During the restore process folders will be initially created small and then as the files from the backup are restored inside the folders they will become fragmented as they grow. A folder that is one cluster long on a FAT32 file system using 1K cluster will be able to store about 5 to 6 files before it runs out of free directory entry space (one directory entry is 32 bytes and a long file name occupies usually more than 5 directory entries so one cluster can hold about 5-6 long file names).
  • The entire process is very slow. It will take hours to copy back and forth a large flash card filled with multimedia.
  • There are some specific optimizations carried out by defragmentation software that the backup/restore solution will not solve like flash block alignment or sorting of directory entries.
  • All chances of data recovery are lost. If there were any deleted files that could be restored or file system errors that could be fixed they won't be recoverable at all after the backup/restore solution.

 

Conclusion: Using the backup/restore approach will not achieve the same results as using a well designed file system defragmenter.

 

Data reliability on fragmented vs. non-fragmented file systems

 

Should you ever need to recover lost data or deleted files from a storage card you should be aware that the more contiguous data on the card was the higher the chances of recovering that data is.

 

Data recovery software works by analyzing the existing data on storage devices and tries to reconstruct deleted or otherwise damaged files. When a file is deleted all information about the clusters it contained is lost. Undelete software will probably be able to locate the beginning of the file but without knowing the exact clusters it won't succeed in recovering the data if the clusters were not contiguous. Same goes for unformat/zero-assumption recovery software that attempts to rescue data from storage media with a damaged file system. Disk checking software like ScanDisk will also perform better and more reliably if the file system is not fragmented. That is the reason all FAT data recovery applications heavily rely on the file system being contiguous.

 

Conclusion: Keeping a file system free of fragmentation significantly increases the chances of data recovery.

 

Further reading:


  • AddThis Social Bookmark Button

Only registered users are able to post comments. Please login or get your account in a minute.