[Prism54-devel] [Bug 60] New: 2.6 modprobe oops: request_irq/interrupt race

bugzilla-daemon@mcgrof.com bugzilla-daemon@mcgrof.com
Wed, 25 Feb 2004 08:45:48 +0000 (UTC)


http://prism54.org/cgi-bin/bugzilla/show_bug.cgi?id=60

           Summary: 2.6 modprobe oops: request_irq/interrupt race
           Product: prim54
           Version: 1.0.2.2
          Platform: ia32
        OS/Version: Other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Kernel patches
        AssignedTo: prism54-devel@prism54.org
        ReportedBy: vda@port.imtp.ilyichevsk.odessa.ua


I've reported this on a mailing list before.
Since it happened to me only when I flashed 'newer' BIOS,
I simply reverted BIOS then.

Now, I am able to trigger this on another box
under specific conditions: VESA VGA framebuffer console
800x600, 256 colors. In a text mode console or even
VESA framebuffer of higher res it does not happen.

But I am pretty confident it's a driver bug, because
all stack traces show basically the same thing happening:
an interrupt from card strucking us right inside

        rvalue = request_irq(pdev->irq, &islpci_interrupt,
                             SA_SHIRQ, ndev->name, priv);

(in islpci_hotplug.c)

I did not write down all stack traces I saw,
but all of them contained this call sequence:

prism54_probe+n
request_irq+n
islpci_interrupt+0   (a parameter on stack)
setup_irq+n
common_interrupt+n
do_IRQ+n
handle_IRQ_event+33/60
islpci_interrupt+293/510

I thougt about testing 2.6.3 with newer snapshot
(I have it already compiled). Since this is a race,
I might stop seeing this not because it's fixed,
but only because timing has subtly changed. :(

For completeness, here is an excerpt from my first mail
with trace copied from screen by hand:

driver_attach+n
bus_add_driver+n
driver_attach+n
bus_match+n
pci_device_probe+n
__pci_device_probe+n
pci_device_probe_static+n
islpci_interrupt+0  (+0: seems like a parameter on stack, not a ret addr)
prism54_probe+n
request_irq+n
islpci_interrupt+0
setup_irq+n
common_interrupt+n
do_IRQ+n  (an interrupt struck us?)
handle_IRQ_event+33/60
islpci_interrupt+293/510
islpci_eth_receive+2f6/4b0
netif_rx+a4/190
netif_rx+a4/190   (second one is due to printk("(from %p)", NET_CALLER(skb)).
see below)
Code: 0f 0b ..... (thats a BUG)

Here's how it happened:

islpci_eth.c
============
int
islpci_eth_receive(islpci_private *priv)
{
...
        /* the device has written an Ethernet frame in the data area
         * of the sk_buff without updating the structure, do it now */
        index = priv->free_data_rx % ISL38XX_CB_RX_QSIZE;
        size = le16_to_cpu(control_block->rx_data_low[index].size);
        skb = priv->data_low_rx[index];
...
        if (discard)
                dev_kfree_skb(skb);
        else
                netif_rx(skb);  <=====================
dev.c
=====
int netif_rx(struct sk_buff *skb)
{
....
drop:
        __get_cpu_var(netdev_rx_stat).dropped++;
        local_irq_restore(flags);

        kfree_skb(skb);  <=======================
        return NET_RX_DROP;
}
skbuff.c
========
void __kfree_skb(struct sk_buff *skb)
{
        if (skb->list) {
                printk(KERN_WARNING "Warning: kfree_skb passed an skb still "
                       "on a list (from %p).\n", NET_CALLER(skb));
                BUG(); <===============
        }



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.