How not to securely erase a NVME drive (2022)

(peterbabic.dev)

54 points | by transpute 4 days ago

14 comments

  • yrro 1 hour ago
    > So I connected it to the computer with the USB to NVME M.2 converter

    > blkdiscard: /dev/nvme0n1: BLKSECDISCARD ioctl failed: Operation not supported

    I've got a USB-to-NVME adapter that exposes the NVME namespaces as SCSI disks. `blkdiscard` did not work with these by default, however it worked fine after I changed the `provisioning_mode` attribute of the disk.

    This can be done by identifying the SCSI device ID of the disk (lsscsi) and then changing it with:

        # echo unmap > /sys/class/scsi_disk/a:b:c:d/provisioning_mode
    
    `lsblk -D` will show which block devices support the discard operation; run it before and after changing provisioning_mode to see the difference.

    This is absolutely not to be used as an alternative to a real 'sanitize' operation directly sent to the NVME controller. If you actually need to securely erase your data, and the drive dosesn't support the sanitize operation, then you should physically shred the drive and demand a refund from the retailed (goods as sold are not fit for purpose).

    Overall, I've found dealing with nvme a frustrating experience. In theory it's nice to have NVME controller firmware be responsible for executing commands from the host (sanitize! change LBA size! underprovision by 30%!) but in practice, it's complete hit or miss whether controllers support a command or will reject it, or maybe they claim to support it but it doesn't work because the controller firmware is buggy shit.

    I would like to have raw NAND devices and have the kernel be in charge of everything, but sadly that wouldn't work for Windows so we're stuck in proprietary firmware hell.

  • digiown 7 hours ago
    And this is why you always encrypt the drive with software. All of these methods seem to put a lot of faith into the drive controller doing what it claim it does, which you can never be all that sure about. Even Microsoft-backed Bitlocker would help here.
    • SoftTalker 6 hours ago
      For SATA SSDs i've used the hdparm secure erase and then verified that dd | hexdump is all zeros. That was good enough for me.
      • aidenn0 5 hours ago
        Depending on your threat model, your check is insufficient, since dd |hexdump will be all zeros even if you just trim all the blocks for a drive that is trim-to-zero.

        Securely erasing flash drives with a threat model of "someone will dump the raw data of the chips" is only fully solvable for self-encrypting drives where you can replace the key. Even if you can issue a block-erase for every single block of the device, block erase doesn't always succeed on NAND.

      • Joel_Mckay 5 hours ago
        For Sata HDD shingled writes and SSD sector replacement it can't be cleaned that way.

        Tools like dban stopped working once firmware sector re-mapping chips on modern storage became common. If you see the reported spare replacement count drop on your older s.m.a.r.t reports, than partial sectors may no longer be accessed from the OS without vendor specific forensic software. =3

        https://sourceforge.net/projects/dban/

    • fulafel 5 hours ago
      Bitlocker can rely on the SSD encryption, so careful there too.
      • wongogue 4 hours ago
        It has been software encryption for a many years now.
        • blackcatsec 4 hours ago
          By default, yes. But it is possible to enable Bitlocker to use a SED directly.
    • Joel_Mckay 7 hours ago
      Indeed, LUKS + F2FS for /home with an external key file imported into initrd solves a lot of issues.

      Primarily, when an SSD slowly fails the sector replacement allotment has already bled data into read-only areas of the drive. As a user, there is no way to reliably scrub that data.

      If the drive suddenly bricks, the warranty service will often not return the original hardware... and just the password protection on an embedded LUKS key is not great.

      There are effective disposal methods:

      1. shred the chips

      2. incinerate the chips

      Wiping/Trim sometimes doesn't even work if the Flash chips are malfunctioning. =3

      • bruce343434 1 hour ago
        What's wrong with the LUKS password protection?
      • tokyobreakfast 4 hours ago
        > an external key file imported into initrd

        This is exceptionally poor advice. This is why TPM exists. Unfortunately adoption is low with the Linux crowd because they still believe the misinformation from 20 years ago.

        • yrro 1 hour ago
          I've lost faith that Linux distros will ever fix the problem where some PCR changes and the TPM refuses to unseal the key... the user is left with a recovery passphrase prompt & no way to verify whether they have been attacked by the 'evil maid', or whether it was just because of a kernel or kernel command line or initrd or initrd module change, etc.
        • Joel_Mckay 3 hours ago
          It is common to remote mount JBOD over initrd drop-bear ssh using sector level strip signature checking, predicted s.m.a.r.t power-cycle-count/hours/serial, proc structure, and an ephemeral key. SElinux is also quite robust in access permission handling.

          TPM collocates a physical key on the same host, incurs its own set of trade-offs with failures or physical access in dormancy, and requires trusting yet another vendor supply chain. There are always better options, but since the Intel Management Engine can access TPM... such solutions incur new problems. Privilege escalation through TPM Sniffing is also rather trivial these days.

          Have a great day. =3

          • dist-epoch 54 minutes ago
            People stopped using dedicated TPM about 10 years ago exactly because it's trivial to sniff it.

            Nowadays you use the fTPM built inside the CPU. And if you don't trust the CPU maker, well, you have bigger problems.

    • yearolinuxdsktp 5 hours ago
      100%. If you’re not encrypting your drive, along with a strong password, you’re fucking around.

      Physical destruction as the only sure way? When your hardware is stolen, good luck physically destroying it.

  • SoftTalker 6 hours ago
    Smash it with a hammer and move on. I'd never buy a used storage device anyway, no telling what malware it might contain.
    • yearolinuxdsktp 5 hours ago
      Do you mean malware in the firmware that sticks around after you format the drive?
  • wtallis 6 hours ago
    It's very common for both NVMe and SATA drives that they'll be locked/frozen during boot and thus will not honor a secure erase command until the drive has been power-cycled, which can usually be accomplished with the system-level sleep/wake cycle. I'm not sure what useful purpose this is meant to serve other than possibly making it hard for malware to instantly and irretrievably wipe your storage.
    • aggieNick02 2 hours ago
      So many NVMe/SATA drives that are locked/frozen during boot, and it turns out this is because the drives are actually behaving incorrectly when "security operations" are blocked on the drive. When "security operations" are blocked, you should not be able to set a password on the drive, but should be able to format it. So that's bug 1.

      Most modern motherboards, on boot, will block "security operations" on a drive where the security password is set to the default (because it hasn't been manually set by the end-user). They do this to prevent malware from being able to set a password on a drive that hasn't had its password set. (Malware could set the password and I believe configure the drive to effectively brick it.)

      But many (probably most) motherboards fail to correctly block "security operations" on a suspend/resume. This is bug 2, and makes suspend/resume often an effective workaround for a drive with bug 1, as well as a theoretical opportunity for malware to easily inflict damage on all drives that support "security operations".

      So one generally ends up stuck and unable to securely erase their drive when it has bug 1 and is installed on a motherboard without bug 2. In this case, you have to hope your motherboard has a feature in its BIOS to, on next boot, not block security operations. Otherwise you're stuck and need to find another motherboard if you want to sanitize your drive, or hope that a firmware update for your drive resolves bug 1.

      The full details are in this comment on a Github issue from 2016: https://github.com/linux-nvme/nvme-cli/issues/84#issuecommen... . It was one of the most rewarding bugs I've had the fortune to get to the bottom of. We were extra motivated to fully understand it when we moved to a new SSD benchmarking test system that turned out to not have bug 2: https://pcpartpicker.com/forums/topic/460000-an-ssd-that-can...

    • transpute 4 hours ago

        locked/frozen during boot
        until the drive has been power-cycled
      
      There's a fixed time window for accepting secure erase, after power cycle?
      • yrro 1 hour ago
        I suppose arguably the kernel, or at least some component of the OS, should be freezing/locking drives as they come online. The firmware doing so as one-off operation during boot is a workaround for the lack of this being done by the OS.
  • IAmLiterallyAB 5 hours ago
    To maximize device performance when wiping a drive to use for something else, I use nvme format with --ses=1.

    Which in theory should free all of the blocks on the flash.

    Really hard to find good documentation on this stuff. Doesn't help that 95% of internet articles just say "overwrite with zeroes" which is useless advice

    • yrro 1 hour ago
      What's the difference between this and sanitize? Should we be doing both?

      [edit] sanitize runs on the controller level while format works on the namespace level. So I suppose formatting won't touch any pages not allocated to a namespace.

      I wish there was _any_ way to find out which NVME controllers supported which operation before you buy them!

    • jeffbee 5 hours ago
      Anything that works at the logical block interface will not usefully wipe the device. SES 1 will physically hit every erase block on the device with 20V to blow it away. This happens suspiciously quickly (< 60 seconds typically) but that's just because flash is great.
      • wolvoleo 3 hours ago
        Doesn't that harm the flash? The OP seems to use this before using it again but such a high voltage seems rather destructive
  • IAmLiterallyAB 5 hours ago
    As far as I know, there is NO way to securely erase a USB flash drive (barring some undocumented vendor specific commands that may exist).
    • rationalist 5 hours ago
      Overwrite every single bit with innocuous files?
      • IAmLiterallyAB 5 hours ago
        That doesn't work on any* NAND flash device, be it a flash drive, NVME, SATA, whatever.

        The block device you see is an abstraction provided by the SSD controller. In reality, the flash capacity is larger. Pages are swapped out for wear leveling. If a block goes bad, it'll be taken out of commission, and your data may hide in there.

        All of this happens on the SSD controller. The kernel doesn't know. You have no way to directly erase or modify specific blocks.

        *Okay, there are raw NAND flash chips without controllers, but that is not you're working with when you have a SSD or flash drive. If you do have a raw flash chip, you can more directly control flash contents.

      • Gigachad 5 hours ago
        This is what `shred` and other secure wipes do. There is some concern over data stored in sections which the firmware has swapped out and made inaccessible. But if this is a concern to you, then you should be using full disk encryption anyway which makes all of this a non issue.
    • jeffbee 4 hours ago
      This is broadly true of cheap thumb drives, but not true of all USB flash drives. The larger ones generally do support secure erase. E.g. the Crucial X6. I don't know if these use secret vendor commands, or if they use the standard SCSI "sanitize" command.
  • tokyobreakfast 5 hours ago
    None of these methods are reliable nor should they be trusted.

    Every organization with good security hygiene requires physical destruction of SSDs. Full stop, end of negotiation, into the shredder it goes.

    Not that it matters much, with the prices of SSDs skyrocketing people are moving back to mechanical disks.

    • yearolinuxdsktp 5 hours ago
      Every organization with good security hygiene requires strong-password-protected disk encryption, because when your stuff is stolen from your Tesla at lunch time in broad daylight, no shredder policy will save you, full stop.
  • NegativeK 6 hours ago
    I had a drawn out conversation with a friend about erasing NVME drives in a way that met compliance needs. The procedure they were given was to install Windows, with Bitlocker, twice with no effort to retain the key.

    But that doesn't even overwrite the visible drive space; you can do a simple PoC to demonstrate that Windows won't get to all the mapped blocks. And that still hasn't gotten to the overprovisioned blocks and wear leveling issues that the article references.

    You could use the BIOS or whatever CLI tool to tell the drive to chuck its encryption key, but are you sure that tool meets whatever compliance requirements you're beholden to? Are you sure the drive firmware does?

    So they went with paying a company to shred the drives. All of them. It's disgustingly wasteful.

    • protocolture 6 hours ago
      Used to do recycling. Before secure erase was widespread there used to be cheapish 16 and 32GB SSDs for embedded devices, but a few of them made it into the thin/zero client space and a few white labelled low end pc's. they were actually twice the size. Basically 2 16s in a single 16 chassis. And what you would get is that the 2 drives were sort of in sync, I think it was a failover mechanism to deal with shitty drive quality. If drive A failed it would just connect to drive B instead and the user might not know about the failure. But the second drive would not wipe necessarily depending on how you wiped the first one. A few people retrieved data from the second disk under lab conditions, after wiping the first, so we had a report come through that we couldnt certify these disks as erased until they demonstrated compliance with secure erase. So we shredded probably a few thousand of them.

      I heard of similar issues with early nvme drives.

    • wtallis 6 hours ago
      If compliance is the goal, just use FIPS certified self-encrypting drives and trust them to wipe their encryption keys when instructed to do so. At that point, any failure is clearly the vendor's fault, not your own.
  • Neywiny 6 hours ago
    Gotta love breaking EFI changes. I don't know how many times my work laptop would do that and I couldn't boot anymore, only to remember some stressful time later that Linux would only boot with some of the settings flipped from their defaults. At least I never had to reinstall anything.
  • Luker88 3 hours ago
    best practice is always to encrypt with luks, and then just shred the header before selling.

    blkdiscard is just a TRIM command, the data remains there.

    A few years ago (2020?) I also learned that ssd firmware can be buggy when I bricked multiple really expensive enterprise ssd (samsung?)....by running trim. lol.

    • dist-epoch 51 minutes ago
      But how do you shred the header? The drive could have written the new shredded one to a new physical location (wear-leveling).
  • e40 3 days ago
    That was way longer than I expected. Wow.
  • russfink 7 hours ago
    sedutil-cli —yesIwantToEraseALLmydata $PSID /dev/sda1 or something like that.
    • theandrewbailey 7 hours ago
      Tip: Get a barcode scanner. The PSID is usually encoded in a bar/matrix code on the drive's label, next to the plaintext PSID.
  • unnouinceput 1 hour ago
    WTF is wrong with just copy/paste the same big unimportant file (like a movie) until you run out of space and then just delete them all? So much unnecessary hurdle with different utilities and commands that made my head spin. What do you think those utilities do in the background anyway if not filling the free space with junk, just like what a copy/paste does?
    • yrro 51 minutes ago
      That won't overwrite pages not allocated to a namespace (which can happen due to wear levelling/underprovisioning, or because the controller has decided to stop using that page because it's unhealthy).

      Flash looks like a simple array of blocks, but under the hood there is a controller that allocates writes to different pages. You need to tell the controller to erase all pages if you want to guarantee data destruction.

  • buckle8017 6 hours ago
    Smash it with a hammer.

    If you insist on erasing the data, overwrite the entire contents of the drive twice with random data.

    Doing it twice will blow away any cached as well (probably).