Solid State Drive (SSD) Unexpected Power Loss Concerns and Solutions
What happens to data in-transit to the solid state drive (SSD) when there is an unexpected power interruption, is an item overlooked by many Industrial OEM host system designers. Limiting the systems exposure to data loss should be high on the list of design priorities.
The issue of unexpected power loss presents itself when the industrial computer or embedded system experiences an unexpected power outage, surge, spike, sag or brownout. This can also be caused by manually removing the SSD from the system while powered on.
The effect of this power loss will not cause issues during an idle or read operation, but if a write operation is occurring, there is potential of some data loss or worse. Power loss during a write is also known as Write Abort since the write operation is aborted prior to completion.
The main failure symptoms seen after a write abort occurrence are file system corruptions and internal device data corruption.
File system corruptions are due to the operating system not being able to update the file system records prior to power loss. Since the storage device does not know what file system is being used, it is not possible to prevent their occurrence.
The good news is, file system corruptions are typically not fatal since most operating systems will perform a file system repair operation on the next power up. Alternatively, a user can run a command or utility to perform the repair operation.
Internal device data corruption poses a more severe problem. In the worst case, the entire flash drive can be made unusable. This is due the SSD’s internal metadata being corrupted, making necessary a low level format which results in the entire drive’s data loss.
Two of the principal methods Cactus uses to mitigate data loss during write abort are a patented safe power loss protection algorithm and minimization of external DRAM cache.
The algorithm limits the time unwritten data is left susceptible to loss. In the event of sudden power loss, the controller is reset and NAND flash is write protected. A log file stores the last entries prior to shut down and recovery to the last known good state can be accomplished. The worst case is the small amount of data in transit is lost, but the overall data system will not be corrupted.
Limited external DRAM cache in Industrial Grade SSD is a way to prevent a large amount of in-transit data from residing in volatile DRAM memory. For many high performance SSD, DRAM cache is how performance is increased, so this is a trade-off for more reliability versus better performance.