I have been having some problems with server crashes. On two occasions I was able to have personnel at the co-location facility, where my server lives, look at the console immediately after a crash.
The kernel version running was 2.0.32 w/ SMP support on a dual Pentium Pro box.
When the server would crash, a message would be continuously displayed on the console (but not in the syslog):
Aiee: scheduling in interrupt: 0012BBD1
A search of the sources found that this condition was tested for in /usr/src/linux/sched.c on line 396 and the message printed on line 497.
It would appear that an interrupt was encountered during the schedule() operation. This would be a bad thing. (It's not nice to re-enter the scheduler via an interrupt)
Since the address being printed is, presumably, the return address after the schedule call, and is consistent, I am assuming that the scheduler is being re-entered while servicing some sort of interrupt from within the same ISR.
First, are my assumptions even close to reality?
Secondly, is this a "known" issue with the 2.0.32 kernel. I understand there have been some changes in the kernel SMP code between 2.0.32 and 2.0.33 so I am wondering if upgrading the kernel will fix this.
Thirdly, does this indicate some sort of hardware failure and if so, how can I trace this back to the device in question.
Finally, I am open to suggestions for other ideas and/or options here.
As always, any help is appreciated. Most suggestions taken seriously :)
Thanks, in advace,