Main menu:


March 2018
M T W T F S S
« Oct    
 1234
567891011
12131415161718
19202122232425
262728293031  

Archives

Mitigating Nvidia resume from hibernation failures

I've seen and posted countless threads on hibernate cycles (suspend to disk) sporadically failing on resuming from hibernate starting with mutter 3.26.1 and I've spent months experimenting with possible fixes including replacing my samsung ram chipsets (2133mhz) with more recent 2x8GB = 16GB ddr4 (2400mhz) kingston chipsets.
The issues ranged from xid 56 errors, idling display engine errors, corrupt tty screens that only happen after resuming from suspend, and countless other things. I am assuming most of those happen when booting using legacy mode (uefi disabled) and hence why not everyone is able to reproduce.
This is a desktop with the integrated intel hd 520 graphics disabled in the bios settings so that the system automatically uses the pci-e 1050ti nvidia card, a low power consumption card that doesn't draw more than 75Watts from the pci-e connector. Always make sure your PSU can deliver at least 150Watts extra above what you need (for example, 650Watts if you need 500Watts or 450Watts if you need 300Watts).

I have so far managed to mitigate the resume failures with the following steps:

  1. Enable the deprecated persistence mode (Not nvidia-persistenced). I did this using a cron job.
    /etc/cron.d/nvidia-persistence
    @reboot root /usr/bin/nvidia-smi -pm 1

    Cronie (crond) starts after udev or whatever has loaded the nvidia drivers so it takes effect.
  2. Enable a static high-res tty resolution by manually editing grub.cfg to something similar to the following:

    /boot/grub/grub.cfg

    set timeout=5

    menuentry 'Arch Linux' {

    set root='hd0,1'

    set gfxpayload=1920×1080

    linux /vmlinuz-linux-lts cryptdevice=/dev/sda3:root resume=/dev/mapper/swap root=/dev/mapper/root ro modprobe.blacklist=nouveau acpi_osi=! acpi_osi=Linux

    initrd /intel-ucode.img /initramfs-linux-lts.img

    }

  3. A systemd hook that disables persistence mode on suspend and renables it on resume only if nvidia driver is loaded and Xorg is running (otherwise bad things can happen). Notice the 5 seconds sleep time. Don't log off your Xsession within those 5 seconds before persistence mode turns on again. I have a 1TB mechanical disk with 7200RPM speed and a 32GB swap partition. The 5 seconds are enough.
    /usr/lib/systemd/system-sleep/suspend-nvidia
    #!/bin/bash
    case $1 in
    pre)
    if (pgrep -x "Xorg" > /dev/null && lsmod | grep nvidia > /dev/null)
    then
    /usr/bin/nvidia-smi -pm 0
    fi
    ;;
    post)
    if (pgrep -x "Xorg" > /dev/null && lsmod | grep nvidia > /dev/null)
    then
    /usr/bin/sleep 5 && /usr/bin/nvidia-smi -pm 1
    fi
    /usr/bin/swapoff -a && /usr/bin/swapon -a
    ;;
    esac

    The Linux kernel is relatively stupid when it comes to power managment. It can swap things that should not be swapped after successive hibernate operations so with enough ram, we can safely purge and renable swap after resuming from hibernate to make things more robust.
  4. If for some reason you need to exit Xorg before hibernating, make sure nvidia drivers and kernel threads are unloaded before hibernating.
    /usr/bin/nvidia-smi -pm 0
    /usr/bin/modprobe -r nvidia-uvm
    /usr/bin/modprobe -r nvidia-drm

    Then after resuming from hibernate run the following before starting Xorg.
    /usr/bin/modprobe nvidia-uvm
    /usr/bin/modprobe nvidia-drm
    /usr/bin/nvidia-smi -pm 1

Some notes:

  1. The current Linux lts kernel (4.14) disables the OOM killing on entering the hibernating PM cycle and re-enables the OOM killer on resuming from hibernate.
  2. If resume fails when more than 50% of your ram is in use (8 out of 16GB for example), try exiting some programs such as web browsers first. I've seen bug reports on kernel.org with Intel users and AMD users complaining from this as well. I doubt it is related entirely to nvidia unless all three drivers are incorrectly wired.
  3. Disable modesetting if you don't use wayland as it *can* cause clutter to fail on loading additional mutter/gnome-shell instances.
    /etc/modprobe.d/nvidia-drm.conf
    # setting it to 1 breaks user switching in Gnome
    options nvidia-drm modeset=0
  4. I don't use cuda but people who do may be interested in this. I have it regardless.
    /etc/modules-load.d/nvidia-uvm.conf
    nvidia-uvm
  5. If power button only triggers hibernate/suspends once till Linux is rebooted, try the following package.
    Install it and add the button module to /etc/suspend-modules.conf.
    /etc/suspend-modules.conf
    button

Leave a Reply

Your email address will not be published. Required fields are marked *