Write Abort - Avoid this Silent Killer of Embedded & Industrial OEM Systems
Write Abort is a condition caused in SSDs by Unexpected Power Loss. It can be fatal to the SSD in an Embedded or Industrial system, depending on the damage caused when power is removed.
The write abort condition is caused by the loss of power while the SSD controller is writing. This is not just when the SSD is being written, as many controller functions going on behind the scenes write to the NAND and internal storage.
Damage, Loss and/or Corruption to the Translation Tables, Firmware, Metadata and Data itself are the symptoms of this condition.
Properly designed Industrial Grade products, such as the Industrial Grade flash storage from Cactus Technologies is the solution for the Write Abort issue.
Download this Write Abort White Paper to understand more.
Let’s take a look at these in more detail.
Translation Tables - Broken:
Many people do not realize how much is going on inside the SSD when data is being written to it. When the host system writes to the SSD, it is really writing to the controller which then decides what to do with this data.
A key function of the controller is Wear Leveling. It serves to most efficiently use the limited number of write/erase cycles of NAND flash as newer consumer NAND memory can have as little as 300 endurance cycles (writes per block) before becoming unreliable.
For reference, Industrial Grade flash storage has SLC NAND memory with up to 100,000 endurance cycles.
To implement this wear leveling, the controller uses a Translation Table which records the link between data written by the host system to a specific location (aka logical), to its different physical location which had more endurance cycles available. This “levels” the “wear” of the NAND’s endurance more evenly.
If one or more of the logical-to-physical links are broken in the translation table, the SSD no longer has the accurate location of the data when the host system requests. As the SSD typically contain the Operating System, application and other data, this alone can cause the host system to crash. Worst yet, the SSD can become an unrecoverable “brick.”
Firmware - Corruption:
Firmware can be thought of as the Operating System of the SSD. As with your computer operating system, if it gets corrupted, it requires repair or perhaps even a fresh install to fix.
Firmware for the SSD is typically stored in a “controller only” accessible area of the NAND flash. If for whatever reason – Write Abort in the case of this article – this firmware is overwritten or changed, there is an issue. Corrupted firmware can cause an SSD to do sporadic functions or stop working entirely.
The problem with Write Abort is it occurs when the power is removed unexpectedly. On a normal power down, the hosts operating system gracefully closes out of all programs and files before powering off.
In the case of Write Abort, controller background functions may be occurring and there is no way of knowing exactly what state the control lines are in prior to unexpected power removal.
Metadata and Data - Loss:
Metadata is similar to the translation tables, in fact, translation tables are part of the Metadata. Metadata also includes other relevant information that a controller stores to assist with the management of the SSD functions.
Some other key items would be defect maps, attributes for the information stored and the number of endurance cycles by NAND block of memory. Loss of any of this information can spell, “game over” for an SSD.
The final item is the user data itself being stored on the SSD.
The entire purpose of the SSD is to be able to reliably store and retrieve data. If there is an issue writing and retrieving the same information, the SSD is not performing its primary function – not too much else to say about this.
The Cactus Technologies’ Industrial Grade flash storage devices are designed and built to withstand many cycles of unexpected power loss without failure.
In fact, we’ve tested our 900S Series products, which include mSATA, 2.5” SSD and CFast for over 30,000 cycles of unexpected power loss during writes.
By using a sophisticated test system that writes to the SSD and then removes power during these writes, we were able to mimic the worst possible scenario for SSDs.
If you are having an unexplained issue with your embedded OEM system or Industrial system design and suspect Write Abort or other SSD related issue, please reach out to us for assistance.