kaito.7 View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by kaito.7 06-03-2014, 09:59 AM #6 Soadyheid Senior Member Registered: Aug 2010 Location: Have you installed a recent ProLiant Support Pack (PSP)? After searching trough different logs I found hplogs that stated a PCI bus error:Type hplog -v to get a listing of ASRs0016 Critical 14:29 03/11/2009 14:29 03/11/2009 0001LOG: PCI Bus Error See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. http://dukesoftwaresolutions.com/an-unrecoverable/an-unrecoverable-system-error-nmi-has-occurred-system-error-code-0x0000002b-0x00000000.html
This will help the support colleagues and figure out what went wrong. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Doing Code: echo "A" > /dev/watchdog with watchdog-service off (kernel module hpwdt.ko blacklisted), as well as Code: echo "A" | socat - UNIX-CONNECT:/var/run/watchdog-mux.sock with service activated will reboot the server now. https://bugs.launchpad.net/bugs/1432840 Title: The update process become buggy with many enabled repositories To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+bug/1432840/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs Previous Message by Thread: [Bug his comment is here
They are both HP DL380 Gen9's. Brad Figg (brad-figg) on 2015-03-18 Changed in linux (Ubuntu Utopic): status: In Progress → Fix Committed Changed in linux (Ubuntu Trusty): status: In Progress → Fix Committed Changed in linux (Ubuntu Without the module the server reboot. Showing results for Search instead for Do you mean Menu Categories Solutions IT Transformation Internet of Things Topics Big Data Cloud Security Infrastructure Strategy and Technology Products Cloud Integrated Systems Networking
proxmox-ve: 4.0-16 (running kernel: 4.2.2-1-pve) pve-manager: 4.0-50 (running version: 4.0-50/d3a6b7e5) pve-kernel-4.2.2-1-pve: 4.2.2-16 lvm2: 2.02.116-pve1 corosync-pve: 2.3.5-1 libqb0: 0.17.2-1 pve-cluster: 4.0-23 qemu-server: 4.0-31 pve-firmware: 1.1-7 libpve-common-perl: 4.0-32 libpve-access-control: 4.0-9 libpve-storage-perl: 4.0-27 pve-libspice-server1: Code: edit: /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="nmi_watchdog=0" #update-grub #reboot #20 aderumier, Nov 20, 2015 Last edited: Nov 20, 2015 (You must log in or sign up to post here.) Show Ignored Content Page A Kernel panic in the hpwdt.ko module, which is the HP ILO2+ Watchdog, sound more like a bug in the firmware/module, we do nothing special in the watchdog-mux besides accessing the An Unrecoverable System Error (nmi) Has Occurred (service Information: 0x7fbce8f6, 0x00000000) Code: echo "A" > /dev/watchdog This should reset the machine after a bit.
However, I found that the cause is my VM and the large amount of RAM I have assigned. Since then I monitor the hardware from Onboard Administrator and there is no something strange. Watchdog-mux service is using this: Main PID: 1439 (watchdog-mux) CGroup: /system.slice/watchdog-mux.service └─1439 /usr/sbin/watchdog-mux Oct 21 09:25:10 pmx72 watchdog-mux: Watchdog driver 'HP iLO2+ HW Watchdog Timer', version 0Click to expand... click site It was told to us, by HP, that all Gen8 (and more recent generations) do support X2APIC but they still "ask" the OS to opt-out from X2APIC (not to use X2APIC).
No, create an account now. Ilo Watchdog Nmi I find it hard to believe this could be a hardware issue if there are so many of us seeing the issue. You are currently viewing LQ as a guest. Pushed patches to [email protected] for review.
I would think this issue is for Canonical to investigate. We still cannot resolve the issue and it occurs with DL380p Gen 8 8-core and 12-core models.The HBA on the riser card fails. An Unrecoverable System Error Nmi Has Occurred Hp Registration is quick, simple and absolutely free. An Unrecoverable System Error (nmi) Has Occurred Proliant Only this HBA 81Q:QLogic PCI to Fibre Channel Host Adapter for HPAK344A:Host Device Name vmhba3BIOS version 3.13FCODE version N/AEFI version 6.23Flash FW version 5.09.00Is there any resolution?
early_idt_handlers+0x120/0x120 [ 5494.343686] [
Doesn't sound quite like the same issue. So I beleive that works now and it is not a problem. Showing results for Search instead for Do you mean Menu Categories Solutions IT Transformation Internet of Things Topics Big Data Cloud Security Infrastructure Strategy and Technology Products Cloud Integrated Systems Networking Check This Out In the process of solving this and other bugs we have discovered that intel_idle module did not use ACPI tables (a way of firmware to say to OS what are the
Duplicate of bug #1417580 Remove Convert to a question Link a related branch Link to CVE Remove CVE link You are not directly subscribed to this bug's notifications. Ilo Application Watchdog Timeout Nmi Service Information 0x0000002b 0x00000000 This issue exists when your server runs out of memory and have much I/O load at the same time. Rafael David Tinoco (inaddy) wrote on 2015-03-18: #6 Sorry, there is a misunderstanding regarding the case and this bug.
Note You need to log in before you can comment on or make changes to this bug. Thank you for this post, and the help. Workaround: # echo "blacklist hpwdt" >> /etc/modprobe.d/blacklist-hp.conf # update-initramfs -k all -u # update-grub # reboot Andy Whitcroft (apw) wrote on 2015-03-17: #3 Put together a generic solution which blacklists all Uncorrectable Pci Express Error As described in /etc/modprobe.d/blacklist-watchdog.conf: """ # Watchdog drivers should not be loaded automatically, but only if a # watchdog daemon is installed. """ We should blacklist module "hpwdt" by default for
When the watchdog fires, I expect that we should provide the end user good diagnostic information so they know it was a watchdog timeout. We are getting feedback from community that these options are being enough to avoid the Proliant Server Family to have kernel panics and they might be released as a "public recommendation" Code: lsmod|grep hpwdt My configuration: 2 servers Hp proliant + 1 other machine with proxmox 4. http://dukesoftwaresolutions.com/an-unrecoverable/hp-an-unrecoverable-system-error-has-occurred-error-code-0x0000002d-0x00000000.html Any advise which could help or anone having problem like this. #1 mensinck, Oct 19, 2015 mensinck New Member Joined: Oct 19, 2015 Messages: 4 Likes Received: 0 Hi all.
System Firmware will log additional details in a separate IML entry if possibleCaution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1792-Slot X Drive Array - Valid Data Found in HP memtestd nothing.We got a new MOBO and all was good! This happens at random, but mostly when we use the live migration. Issue A few HP Gen8 and Gen9 systems are crashing due to NMI.