Solid State Drive (SSD) Unexpected Power Loss Concerns and Solutions

What happens to data in-transit to the solid state drive (SSD) when there is an unexpected power interruption, is an item overlooked by many Industrial OEM host system designers. Limiting the systems exposure to data loss should be high on the list of design priorities.

The issue of unexpected power loss presents itself when the industrial computer or embedded system experiences an unexpected power outage, surge, spike, sag or brownout. This can also be caused by manually removing the SSD from the system while powered on.

The effect of this power loss will not cause issues during an idle or read operation, but if a write operation is occurring, there is potential of some data loss or worse. Power loss during a write is also known as Write Abort since the write operation is aborted prior to completion.

The main failure symptoms seen after a write abort occurrence are file system corruptions and internal device data corruption.

File system corruptions are due to the operating system not being able to update the file system records prior to power loss. Since the storage device does not know what file system is being used, it is not possible to prevent their occurrence.

The good news is, file system corruptions are typically not fatal since most operating systems will perform a file system repair operation on the next power up. Alternatively, a user can run a command or utility to perform the repair operation.

Internal device data corruption poses a more severe problem. In the worst case, the entire flash drive can be made unusable. This is due the SSD’s internal metadata being corrupted, making necessary a low level format which results in the entire drive’s data loss.

Since internal data corruption is fatal, this is the focus of the Cactus firmware and hardware efforts to minimize and prevent data loss from occurring in these events. Extensive testing is performed to ensure our Industrial Grade Flash Storage Products are the most robust in the market.

Two of the principal methods Cactus uses to mitigate data loss during write abort are a patented safe power loss protection algorithm and minimization of external DRAM cache.

The algorithm limits the time unwritten data is left susceptible to loss. In the event of sudden power loss, the controller is reset and NAND flash is write protected. A log file stores the last entries prior to shut down and recovery to the last known good state can be accomplished. The worst case is the small amount of data in transit is lost, but the overall data system will not be corrupted.

Limited external DRAM cache in Industrial Grade SSD is a way to prevent a large amount of in-transit data from residing in volatile DRAM memory. For many high performance SSD, DRAM cache is how performance is increased, so this is a trade-off for more reliability versus better performance.

Write Abort White Paper is available for a more in-depth explanation of this issue. In addition, feel free to contact a Cactus Expert for more information.

Steve Larrivee has over 30 year's experience in the data storage market, including 5 years at Seagate Technology and 10 years at SanDisk. He joined Cactus Technologies Limited as an equity partner and Co-Founded Cactus USA in 2007 with partner Tom Aguillon. Learn more about Steve on LinkedIn.