> Building a comparable one from scratch is like building a parallel national railroad..
Not too be pedantic here but I do have a noob question or two here:
1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it — like they did with LLM training for base models with the infamous "pile" dataset — because the upshot of offering this index for public good would break not just google's own monopoly but also other monopolies like android, which will introduce a breath of fresh air into a myriad of UX(mobile devices, browsers, maps, security). So, why don't they just do this already?
2. The other question is about "control", which the DoJ has provided guidance for but not yet enforced. IANAL, but why can't a state's attorney general enforce this?
Building an index is easy. Building a fresh index is extremely hard.
Ranking an index is hard. It's not just BM25 or cosine similarity. How do you prioritize certain domains over others? How do you rank homepages that typically have no real content in them for navigational queries?
Changing the behavior of 90% of the non-Chinese internet is unraveling 25 years and billions of dollars spent on ensuring Google is the default and sometimes only option.
Historically, it takes a significant technological counter position or anti-trust breakup for a behemoth like Google to lose its footing. Unfortunately for us, Google is currently competing well in the only true technological threat to their existence to appear in decades.
> If other tech companies really wanted to break this monopoly, why can't they just do it
Google is a verb, nobody can compete with that level of mindshare.
Scraping is hard. Very good scraping is even harder. And today, being a scraping business is veeery difficult; there are some "open"/public indices, but none of these other indices ever took off
A huge amount of the web is only crawlable with a googlebot user-agent and specific source IPs.
I don’t think it’s comparable to today’s AI race.
Google has a monopoly, an entrenched customer base, and stable revenue from a proven business model. Anyone trying to compete would have to pour massive money into infrastructure and then fight Google for users. In that game, Google already won.
The current AI landscape is different. Multiple players are competing in an emerging field with an uncertain business model. We’re still in the phase of building better products, where companies started from more similar footing and aren’t primarily battling for customers yet. In that context, investing heavily in the core technology can still make financial sense. A better comparison might be the early days of car makers, or the web browser wars before the market settled.
> If other tech companies really wanted to break this monopoly, why can't they just do it
Companies would rather sue than try and compete by investing their own money.
Apple had a chance to break Google's search monopoly, but they chose to take billions from them instead.
Microsoft had a chance (well another chance, after they gave up IE's lead) to break up Google's browser monopoly, but they decided to use Chromium for free instead.
Ultimately all these decisions come down to what's more profitable, not what's in the best interests of the public. We have learned this lesson x1000000. Stop relying on corporations to uphold freedoms (software or otherwise), becuase that simply isn't going to happen.
> 1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it?
FTA:
> Context matters: Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections. Today, publishers “consent” to Google’s crawling because the alternative - being invisible on a platform with 90% market share - is economically unacceptable. Google now enforces ToS and robots.txt against others from a position of monopoly power it accumulated without those constraints. The rules Google enforces today are not the rules it played by when building its dominance.