I feel you may be adding your critical sections at too high of a layer (either in the code, or the data structure) if it is severely affecting performance. Look up sharded locks, and totally order them if you must acquire 2 or more at once.
You may also want to implement reader/writer locks if your load has many more reads than writes.
Unfortunately, nobody really teaches you these things in a really clear way, and plenty of engineers don't fully understand it either.
I didn't give near enough details for you to speculate like that. what you said applies to some very different queues but not mine. (What I have is currenly lock free, though this is my third redesign with different locking strategies for each.)