[Prism54-devel] handle mgmt timeouts by resetting hardware

Wed Aug 4 17:28:13 UTC 2004

On Wednesday 04 August 2004 04:04, Luis R. Rodriguez wrote:
> Denis,
>
> this was a lot of work you did but I am not sure yet if the principal
> cause to mgt timeouts are due to locking screwups. I'll have to check
> myself how it possible that a recursive call is being made here, but
> last I checked the main problem was due to us not doing things the
> firmware likes, i.e.: remove the oid that causes a timeout on the commit
> list and the timeout is gone. This leads me to believe we just have to
> figure out a correct way to commit oids.
>
> I know in MLME_MANUAL mode you first set the MLME mode, then mode. Then
> set all you want (except essid I think), and to finalize you set
> mode again.
>
> I also know setting essid "unlatches" the setting of all oids. By this I
> mean that setting some oids don't necessarily "commit" until some event
> occurs -- that is, they're "latched". Setting essid is one way to
> unlatch them all.
>
> Also, since I've been working with WPA client support I've noticed that
> timeouts are more frequent in extended mode. Because of this I've been
> forced to dive into mlme_transaction lock hell and review it. So far it
> seems it all magically should work. Also, because of what I state above
> regarding the "unlatching" of the oids, I'm looking into an alternative
> method of "commiting" by just setting essid instead of going through the
> whole commit_list's.

Luis, I am not familiar with this oid business at all.
I just see a place in the code which is not finished:
current code just ignores mgmt timeout. Only
a /* TODO */ is there. But we do hit that codepath,
and frequently card is dead after that.

We need to react to that timeout somehow.
My experiments show that doing a reset
restores operation to normal.
Unfortunately, not always, but it most of the time it does.
So, why not do it?

As to doing mgmt business right and avoiding
timeouts at all, I don't know enough to be useful.
Sorry.

> And as for positive feedback for the patch:
>
> 1. IMO we should accept this patch only if we're certain that the cause of
> the timeouts is a recursive call and not a firmware bug.

Recursive call was possible before, although
I don't think it was actually happening.
But anyway, we must protect against that,
and my patch does that.

> 2. This patch should be re-done since, should it be accepted, we have to
> send it to netdev and policy is to split patches up into separate concrete
> pieces of work. i.e:
>
> 	patch 1: printk changes
> 	patch 2: space changes
> 	patch 3: remove NULL setting
> 	patch 4: timer work

Will do it.

> 	etc
>
> I realize this is a pain in the ass. It is, but it's kernel policy.
>
> 3. I don't think many people will like to add printk's for successful
> cases in routines unless we're debugging. Again, a good way to do this

I disagree. Reset is not frequent thing at all, few extra
printks there won't flood your syslog. But while our driver
is not yet mature, they will help in bug hunt.

I added "success" prints when I saw logs where I couldn't
figure out whether reset was finished or got stuck
on a semaphore or something.
--
vda