Searching the upper cache lines in your binary search tree (sorted vector) for your target is unlikely to yield results. Instead you want to use the extra data in the line to shorten the search, which leads you to a B-Tree or B+tree.
For 4 byte keys and 4 byte child pointers (or indexes in to an array) your inner nodes would have 7 keys, 8 child pointers and 1 next pointer, completely filling a 64 byte cache-line and your tree depth for 1 million entries would go down from ~20 to ~7, the top few levels of which are likely to remain cache resident.
With some thought, it's possible to use SIMD on B-tree nodes to speed up the search within the node, but it's all very data dependent.
Binary searching a sorted array is isomorphic to a sorted binary tree with implicit child pointers.
It seems to me like there should be a sort order that stores the items as a fully-dense left-shifted binary tree from top-to-bottom (e.g. like the implicit heap in an in-place heap sort, but a binary search tree instead of a hea). Is there a name for this? Does it show any performance wins in practice?