A little over 15 years ago you could index the web with a small cluster. I remember people doing doing it with Cassandra or Elasticsearch. I'm sure you'd need a much bigger cluster, but outside video and images I imagine it's still doable even for a small organization, especially if you're filtering out content farms. Plus, there are many organizations interested in having access to an index, and I'm pretty more than a few currently running their own index and selling to analytics firms.
Index is one thing, great search over it is another.
A competitive, general-purpose web search engine with its own full index is _brutally_ hard and expensive.
This is the reason there are only a few world-class like russian yandex, chinese baidu (to not state the obvious names like google).