Embedded File System and power-off
Asked Answered
H

5

5

I am working on an embedded application without any OS that needs the use of a File System. I've been over this many times with the people in the project and some agree with me that the system must make a proper shut down of the system whenever there is a power failure or else the file system might go crazy.

Some people say that it doesn't matter if you simply power off the system and let nature run its course, but I think that's one of the worst things to do, especially if you know this will bring you a problem and probably shorten your product's life span.

In the last paragraph I just assumed that it is a problem, but my question remains:

Does a power down have any effect on the file system?

Hallo answered 22/1, 2013 at 13:49 Comment(1)
No note on what hardware you are using to store files. NAND/NOR flash. SPI flash, SCSI disk, IDE disk, a RAM disk? If you are using a RAM disk then a power down will effect the file system. Some hardware may pipe-line access and even journalling filesystems will have issues. You need to support power down at the device level first. For instance, the Linux MTD/UBI drivers which write to flash do support this.Tensity
G
3

For non-journalling filesystem unexpected turn-off can mean corruption of certain data including directory structure. This happens if there's unsaved data in the cache or if the FS is in the process of writing multi-block update and interruption happens when only some blocks are written.

Journalling addresses this problem mostly - if there's interruption in the middle, recovery routine or check-and-repair operation done by the FS (usually implicitly) brings the filesystem to consistent state. However this state is not always the latest - i.e. if there were some data in the memory cache, they can be lost even with journalling. This is because journalling saves you from corruption of the filesystem but doesn't do magic.

Write-through mode (no write caching) reduces possibility of the data loss but doesn't solve the problem completely, as journalling will work as a cache (for a very short time).

So unfortunately backup or data duplication are the main ways to prevent data loss.

Gadabout answered 22/1, 2013 at 14:19 Comment(5)
Mayevski 'EldoS ... I am really trying to catch the problem before the product is finished, so I liked your suggestion to use a journaling FS, but my inicial solution was to add a battery/supercap to alert the processor that it's being powered off and give it enough time to do whatever it needs to doHallo
@Hallo As power failure is one (but not only) possible thread, the battery doesn't exclude necessity for journalling, backup and data duplication. . Other threats are software failure, hardware malfunction (disk controller error, for example), a viking with a hammer smashing your device etc.Burns
@EugeneMayevski'EldoSCorp the OP was not asking about data securization but the need for a shutdown procedureAvailable
@Available power-down is only a partial case of the system being turned off, so your comment is not applicable.Burns
@EugeneMayevski'EldoSCorp I agree on the fact that powering-down is one way to turn off a system, but still this has nothing to do with data securization (ie backup/duplication)Available
B
18

Here is a list of various techniques to help an embedded system tolerate a power failure. These may not be practical for your particular application.

  1. Use a Journaling File System - Can tolerate incomplete writes due to power failure, OS crash, etc. Most modern filesystems are journaled, but do your homework to confirm.

  2. Unless your application needs the write performance, disable all write caching. Check your disk drivers for caching options. Under Linux/Unix, consider mounting the filesystem in sync mode.

  3. Unless it must be writable, make it read-only. Try to keep your application executables and operating system files on their own partition(s), with write protections in place (e.g. mount read only in Linux). Your read/write data should be on its own partition. Even if your application data gets corrupted, your system should still be able to boot (albeit with a fail safe default configuration).

    3a. For data that is only written once (e.g. Configuration Settings), try to keep it mounted as read-only most of the time. If there is a settings change mount is as R/W temporarily, update the data, and then unmount/remount it as read-only.

    3b. Use a technique similar to 3a to handle application/OS updates in the field.

    3c. If it is impractical for you to mount the FS as read-only, at least consider opening individual files as read-only (e.g. fp=fopen("configuration.ini", "r")).

  4. If possible, use separate devices for your storage. Keeping things in separate partitions provides some protection, but there are still edge cases where a partition table may become corrupt and render the entire drive unreadable. Using physically separate devices further isolates against one corrupt device bringing down the whole system. In a perfect world, you would have at least 4 separate devices:

    4a. Boot Loader

    4b. Operating System & Application Code

    4c. Configuration Settings

    4e. Application Data

  5. Know the characteristics of your storage devices, and control the brand/model/revision of devices used. Some hard disks ignore cache flush commands from the OS. We had cases where some models of CompactFlash cards would corrupt themselves during a power failure, but the "industrial" models did not have this problem. Of course, this information was not published in any datasheet, and had to be gathered by experimental testing. We developed a list of approved CF cards, and kept inventory of those cards. We periodically had to update this list as older cards became obsolete, or the manufacturer would make a revision.

  6. Put your temporary files in a RAM Disk. If you keep those writes off-disk, you eliminate them as a potential source of corruption. You also reduce flash wear and tear.

  7. Develop automated corruption detection and recovery methods. - All of the above techniques will not help you if the application simply hangs because a missing config file. You need to be able to recover as gracefully as possible:

    7a. Your system should maintain at least two copies of its configuration settings, a "primary" and a "backup". If the primary fails for some reason, switch to the backup. You should also consider mechanisms for making backups whenever whenever the configuration is changed, or after a configuration has been declared "good" by the user (testing vs production mode).

    7b. Did your Application Data partition fail to mount? Automatically run chkdsk/fsck.

    7c. Did chkdsk/fsck fail to fix the problem? Automatically re-format the partition and get it back to a known state.

    7d. Do you have a Boot Loader or other method to restore the OS and application after a failure?

    7e. Make sure your system will beep, flash an LED, or something to indicate to the user what happened.

  8. Power Failures should be part of your system qualification testing. The only way you will be sure you have a robust system is to test it. Yank the power cord from the system and document what happens. Try yanking the power at multiple points in the system operation (during runtime, while booting, mid configuration, etc). Repeat each test multiple times.

  9. If you cannot mitigate all power failure problems, incorporate a battery or Supercapacitor into the system - Keep in mind that you will need a background process in your OS to initiate a graceful shutdown when power gets low. Also, batteries will require periodic testing and replacement with age.

Burkhardt answered 22/1, 2013 at 13:49 Comment(1)
I found that UBI and UbiFs are very robust for NAND flash. If the file is important, the user application must call fsync(). Some of these suggestions may increase reliability, but will decrease flash life-time.Tensity
F
4

Addition to msemack's response, unfortunately my rating is too low to post a comment to his answer vs. a separate answer.

Does a power down have any effect on the file system?

Yes, if proper measures aren't put in place to prevent corruption. See previous answers for file system options to help mitigate. However if ATA flush/sleep aren't properly implemented on your device you may run into the scenario we did. In our scenario the device was corrupt beyond the file system, and fdisk/format would not recover the device.

Instead an ATA security-erase was required to recover the device once corruption occurs. In order to avoid this, we implemented an ATA sleep command prior to power loss. This required hold-up of 400ms to support the 160ms ATA sleep took, and leave some head room for degradation of the caps over the life of the product.

Notes from our scenario:

  • fdisk/format failed to repair/recover the drive.
  • Our power-safe file system's check disk utility returned that the device had bad blocks, but there really weren't any.
  • flush/sync returned success, quickly, and most likely weren't implemented.
  • Once corrupt, dd could not read the device beyond the 1st partition boundary and returned i/o errors after.
  • hdparm used to issue ATA security-erase, as only method of recovery for some corruption scenarios.
Funest answered 15/10, 2015 at 1:20 Comment(1)
@Funest Thank you very much for your answer. The previous comment may be correct in saying that you did not provide an answer, but I am very grateful you took your time to write it all down. You gave me more reasons to be careful when choosing/implementing a file system. Again ... thank you very muchHallo
G
3

For non-journalling filesystem unexpected turn-off can mean corruption of certain data including directory structure. This happens if there's unsaved data in the cache or if the FS is in the process of writing multi-block update and interruption happens when only some blocks are written.

Journalling addresses this problem mostly - if there's interruption in the middle, recovery routine or check-and-repair operation done by the FS (usually implicitly) brings the filesystem to consistent state. However this state is not always the latest - i.e. if there were some data in the memory cache, they can be lost even with journalling. This is because journalling saves you from corruption of the filesystem but doesn't do magic.

Write-through mode (no write caching) reduces possibility of the data loss but doesn't solve the problem completely, as journalling will work as a cache (for a very short time).

So unfortunately backup or data duplication are the main ways to prevent data loss.

Gadabout answered 22/1, 2013 at 14:19 Comment(5)
Mayevski 'EldoS ... I am really trying to catch the problem before the product is finished, so I liked your suggestion to use a journaling FS, but my inicial solution was to add a battery/supercap to alert the processor that it's being powered off and give it enough time to do whatever it needs to doHallo
@Hallo As power failure is one (but not only) possible thread, the battery doesn't exclude necessity for journalling, backup and data duplication. . Other threats are software failure, hardware malfunction (disk controller error, for example), a viking with a hammer smashing your device etc.Burns
@EugeneMayevski'EldoSCorp the OP was not asking about data securization but the need for a shutdown procedureAvailable
@Available power-down is only a partial case of the system being turned off, so your comment is not applicable.Burns
@EugeneMayevski'EldoSCorp I agree on the fact that powering-down is one way to turn off a system, but still this has nothing to do with data securization (ie backup/duplication)Available
A
2

It totally depends on the file system you are using and if it is acceptable to loose some data at power off based on your project requirements.

One could imagine using a file system that is secured against unattended power-off and is able to recover from a partial write sequence. So on the applicative side, if you don't have critic data that absolutely needs to be written before shuting down, there is no need for a specific power off detection procedure.

Now if you want a more specific answer for your project you will have to give more information on the file system you are using and your project requirements.

Edit: As you have critical applicative data to save before power-off, i think you have answered the question yourself. The only way to secure unattended power-off is to have a brown-out detection that alerts your embedded device coupled with some hardware circuitry that allows keeping delivering enought power to the device to perform the shutdown procedure.

Available answered 22/1, 2013 at 13:58 Comment(2)
We are still in the process of choosing the FS, but it looks like it's gonna be a FAT FS. The requirement for data that we have is that it cannot lose data or write incomplete data and pass it as complete.Hallo
I edited my answer from your previous comment as you can't loose dataAvailable
G
2

The FAT file-system is particularly prone to corruption if a write is in progress or a file is open on shutdown - specifically if ther is a buffered operation that is not flushed . On one project I worked on the solution was to run a file system integrity check and repair (essentially chkdsk/scandsk) on start-up. This strategy did not prevent data loss, but it did prevent the file system becoming unusable.

A number of vendors provide journalling add-on components for FAT to counter exactly this problem. These include Segger, Quadros and Micrium for example.

Either way, your system should generally adopt a open-write-close approach to file access, or open-write-flush if you feel the need to keep the file open.

Guimond answered 22/1, 2013 at 15:49 Comment(2)
Agreed avoid FAT if possible, I am very happy so far with ext4 (with journaling enabled) as an alternative.Evaluate
@ChrisDesjardins: If using removable media such as SD or USB, FAT may still be the best choice for compatability with other systems, however one issue is that long-file-name support in FAT is subject to licensing from Microsoft.Guimond

© 2022 - 2024 — McMap. All rights reserved.