Unconfigured Ad Widget

Collapse

Another Samsung SSD has died

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • xfer42
    CGN/CGSSA Contributor
    CGN Contributor
    • Sep 2007
    • 709

    Another Samsung SSD has died

    Looks like Ive killed my 5th Samsung SSD.
    1. 850 500GB. I didnt bother with warranty. It could have been zapped. Not recognized anymore.
    2. 870 1TB, failing LBA (google "870 failing LBA", there was a bad batch)
    3. 870 1TB Same failing LBA
    4. 970 1TB M.2 NVMe. After almost 5 years, it failed. Samsung said they cant replace it so are sending a refund for ~$290.

    my 5th. It could be the same as 2 & 3, or one of the replacements.
    This drive is plugged into a SAS2 expander backplane (12 bay) and sits idle with unused KVM images. I noticed a SMART error on the console when I went to setup a new Minecraft server container for the kids. I need to figure out if I can update the firmware in linux. Theres about 150GB in use, and about 1.12TB written.


    Code:
    [root@localhost bedrock]# date
    Sun Apr 23 10:58:39 MDT 2023
    [root@localhost bedrock]# smartctl /dev/sdi -iH
    smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1062.9.1.el7.x86_64] (local build)
    Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Device Model:     Samsung SSD 870 EVO 1TB
    Serial Number:    S625NJ0R260255W
    LU WWN Device Id: 5 002538 f31238bfa
    Firmware Version: SVT01B6Q
    User Capacity:    1,000,204,886,016 bytes [1.00 TB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    Solid State Device
    Form Factor:      2.5 inches
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
    SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Sun Apr 23 10:58:41 2023 MDT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: FAILED!
    Drive failure expected in less than 24 hours. SAVE ALL DATA.
    Failed Attributes:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      5 Reallocated_Sector_Ct   0x0033   008   008   010    Pre-fail  Always   FAILING_NOW 1073
    179 Used_Rsvd_Blk_Cnt_Tot   0x0013   008   008   010    Pre-fail  Always   FAILING_NOW 1073
    183 Runtime_Bad_Block       0x0013   008   008   010    Pre-fail  Always   FAILING_NOW 1073
    Full SMART info
    Code:
    SMART Attributes Data Structure revision number: 1
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      5 Reallocated_Sector_Ct   0x0033   008   008   010    Pre-fail  Always   FAILING_NOW 1073
      9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       16516
     12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       4
    177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       2
    179 Used_Rsvd_Blk_Cnt_Tot   0x0013   008   008   010    Pre-fail  Always   FAILING_NOW 1073
    181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
    182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
    183 Runtime_Bad_Block       0x0013   008   008   010    Pre-fail  Always   FAILING_NOW 1073
    187 Reported_Uncorrect      0x0032   096   096   000    Old_age   Always       -       35381
    190 Airflow_Temperature_Cel 0x0032   075   064   000    Old_age   Always       -       25
    195 Hardware_ECC_Recovered  0x001a   199   199   000    Old_age   Always       -       35381
    199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
    235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       2
    241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       2415845200
    
    SMART Error Log Version: 1
    ATA Error Count: 35381 (device log contains only the most recent five errors)
            CR = Command Register [HEX]
            FR = Features Register [HEX]
            SC = Sector Count Register [HEX]
            SN = Sector Number Register [HEX]
            CL = Cylinder Low Register [HEX]
            CH = Cylinder High Register [HEX]
            DH = Device/Head Register [HEX]
            DC = Device Command Register [HEX]
            ER = Error register [HEX]
            ST = Status register [HEX]
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
    SS=sec, and sss=millisec. It "wraps" after 49.710 days.
    
    Error 35381 occurred at disk power-on lifetime: 15767 hours (656 days + 23 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      00 51 01 10 00 00 00  Error:  at LBA = 0x00000010 = 16
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 00 40 42 05 00 00  31d+21:32:30.695  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:30.695  CHECK POWER MODE
      2f 00 01 10 00 00 00 00  31d+21:32:30.695  READ LOG EXT
      60 08 00 40 42 05 00 00  31d+21:32:30.695  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:30.695  CHECK POWER MODE
    
    Error 35380 occurred at disk power-on lifetime: 15767 hours (656 days + 23 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      00 51 01 10 00 00 00  Error:  at LBA = 0x00000010 = 16
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 00 40 42 05 00 00  31d+21:32:29.913  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:29.913  CHECK POWER MODE
      2f 00 01 10 00 00 00 00  31d+21:32:29.913  READ LOG EXT
      60 08 00 40 42 05 00 00  31d+21:32:29.913  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:29.913  CHECK POWER MODE
    
    Error 35379 occurred at disk power-on lifetime: 15767 hours (656 days + 23 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      00 51 01 10 00 00 00  Error:  at LBA = 0x00000010 = 16
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 00 40 42 05 00 00  31d+21:32:29.280  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:29.280  CHECK POWER MODE
      2f 00 01 10 00 00 00 00  31d+21:32:29.280  READ LOG EXT
      60 08 00 40 42 05 00 00  31d+21:32:29.280  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:29.280  CHECK POWER MODE
    
    Error 35378 occurred at disk power-on lifetime: 15767 hours (656 days + 23 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      00 51 01 10 00 00 00  Error:  at LBA = 0x00000010 = 16
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 00 40 42 05 00 00  31d+21:32:28.534  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:28.534  CHECK POWER MODE
      2f 00 01 10 00 00 00 00  31d+21:32:28.534  READ LOG EXT
      60 08 00 40 42 05 00 00  31d+21:32:28.534  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:28.534  CHECK POWER MODE
    
    Error 35377 occurred at disk power-on lifetime: 15767 hours (656 days + 23 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      00 51 01 10 00 00 00  Error:  at LBA = 0x00000010 = 16
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      60 08 00 40 42 05 00 00  31d+21:32:27.915  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:27.915  CHECK POWER MODE
      2f 00 01 10 00 00 00 00  31d+21:32:27.915  READ LOG EXT
      60 08 00 40 42 05 00 00  31d+21:32:27.915  READ FPDMA QUEUED
      e5 00 00 00 00 00 00 00  31d+21:32:27.915  CHECK POWER MODE
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
      256        0    65535  Read_scanning was never started
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    Last edited by xfer42; 04-23-2023, 9:35 PM.
  • #2
    C.G.
    Calguns Addict
    • Oct 2005
    • 8199

    Bad luck, I've got a 960 and a much older 860 that have been running flawlessly.
    sigpic

    Comment

    • #3
      jorgi23
      Member
      • Jan 2015
      • 476

      sounds like your Motherboard has an issue?

      Comment

      • #4
        xfer42
        CGN/CGSSA Contributor
        CGN Contributor
        • Sep 2007
        • 709

        Originally posted by jorgi23
        sounds like your Motherboard has an issue?
        One of the 870s died on a ASUS ROG MAXIMUS VII HERO Z97
        The 970 NVMe and 870 (sata) died on a ASUS ROG STRIX X299-3
        This last one died on Supermicro dual xeon mobo with a 12 bay SAS2 expander.
        I dont know when the 850 failed.

        Comment

        • #5
          jorgi23
          Member
          • Jan 2015
          • 476

          Originally posted by xfer42
          One of the 870s died on a ASUS ROG MAXIMUS VII HERO Z97
          The 970 NVMe and 870 (sata) died on a ASUS ROG STRIX X299-3
          This last one died on Supermicro dual xeon mobo with a 12 bay SAS2 expander.
          I dont know when the 850 failed.
          been working with SSD's never have a issue, don't buy from ebay, which I don't think you do. there is a label on the SSD that has to be removed before install?


          Comment

          • #6
            xfer42
            CGN/CGSSA Contributor
            CGN Contributor
            • Sep 2007
            • 709

            Originally posted by C.G.
            Bad luck, I've got a 960 and a much older 860 that have been running flawlessly.
            Yeah. I should diversify. Or stop buying the cheaper EVO drives. Ive bought 12 Samsung SSDs from Amazon. 11 are the Samsung EVOs. The 5 that have failed me so far are "EVO". 3 of the 870 EVOs were made around Feb 2021 and I see some reference to possible bad flash around that time.


            Originally posted by jorgi23
            been working with SSD's never have a issue, don't buy from ebay, which I don't think you do. there is a label on the SSD that has to be removed before install?
            Ive never removed the label on the M.2 NVMe. On the Samsungs, thats where the Serial Number is. They ask for a picture of that label when you contact them about warranty. The SMART info for the 970 (that was M.2 NVMe) did show temp info, but did not indicate that it overheated. It went into low level Read Only. The other drives were SATA and complained about failing LBA. This last one is SATA on a SAS2 controller.

            Comment

            Working...
            UA-8071174-1