Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It’s easy to make a decent small search engine: nobody is working to game your algorithm. Vastly harder to do this as you scale to google size and popularity.


sort by: page size:

I agree but for a different reason.

Making a search engine that is good is actually a reasonable task as shown here on HN. It doesn't seem to take very many people, either. Google has set the bar really low.

By contrast, making a search engine that is monetizable and good seems to be an impossible task.


Building a basic search engine is relatively easy. Building one that rivals Google is extremely difficult, and not just because they're so big and convincing people to switch is hard. It's much easier to have good results when you know that the websites you're indexing don't care about you at all. Once you get popular enough to rival Google everyone and their mother will be trying to game you and that changes the problem significantly.

The original implementation of the Google search engine would get obliterated today, though I guess you have to start somewhere.


I often wonder this too, but... I think making "another Google" is not the best example.

I think it's hard to argue (although some people clearly did) that making a search engine is easy. You need to index all of Internet, cache it maybe too, rank stuff, filter out spam, work over different languages, do something about images, parse a lot of (broken) HTML, maybe run some js to figure out what a dynamic page should show, then serve it up super quickly... google is faster than pinging a local (!) NFS server at my work.

But then I remember a comparison on HN about how booking.com is O(50) people and Airbnb is O(2000) total. Or I remember the total headcounts at Twitter, an app that lets you post very short messages.

Or LinkedIn, a vast company with a very subpar UX, basically an inferior clone of a dozen of other social networking sites, only kept afloat by its network/monopoly effect. I mean, searching for people by name barely works! You're better off using Google to look for people on LI that its own search box!

No doubt it's just my cognitive limitation, like I can't visualize the vastness of the Cosmos, but this is a close second.


that niche engine looks like many magnitudes simpler problem than building competitive modern search.

Good luck creating a distributed search engine that produces results that come even close in quality to Google's. Search engines are hard. There is a reason Google dominates, and its not just inertia.

If there was any niche too small for Google search to recognize, Google search wouldn't be Google search.

Building a better search engine than Google is doable. Maybe it is not even very expensive - something in the order of tens/hundreds of millions might be enough.

The only problem is: nobody would care, people will still use Google because it's all they know. Some people even ignore the fact that underneath they use an operating system and a browser.


OK, but, for comparison, scaling a search engine is hard in US, and it's not an insight worth sharing, it's completely obvious to anyone who looks at the market for 5 minutes.

I can tell you don't have much experience working in search engines them. Search engines are all about tail. Even if you had just as good algos and index as Google, you likely won't have any chance to take market share because you will fail in, for example, local queries which constitutes 10-12% of all queries. Someone entering USPS tracking in search box won't see delivery time. Someone else entering flight number won't see its status. A kid entering 100+200 won't see answer. And so on. All these things matters. When comparing things people don't see what works, they see what doesn't work. Also what you say "easy", for example deep links, are typically significant multi-year large team efforts with large number of open questions and very active academic research.

Or building a search engine as good as Google is hard.

What is lacking are easy ways to make vertical specific search engines. You are never going to make a better valuing algorithm than Google for general purpose search. A system designed by domain experts should be able to beat Google most of the time within their own tiny niche.

There is a free market in search engines. There's not much competition going on.

There is a free market in search engines. There's not much competition going on.

Why is this?

I get that it's not easy to build a good search engine, but on the surface it doesn't seem to be that hard a technical problem to solve either. Is it simply that the R&D required to build something competitive is too high for most companies?


It's not like other search engines can't step up their game and build the same functionality.

You don't have to build a search engine that is better than Google's - you may only need to build one that is roughly in the running and keep it around until such time as you have to divest from Google.

And working for years on a project that may never see the light of day can be hard.


Ultimately I think it’s because the internet is just too big! Google were only able to do it because they had the right algorithm early in the age of the internet. They were then able to grow with it to achieve the scale required. Starting from scratch now on a general internet search engine would be close to impossible without 10s, if not 100s, billions of investment. And you would need that to build the index before even beginning to be competitive. No one is making bets that big on search, especially when the online advertising industry (which is the only way to fund it currently) is in danger of massive regulation.

I think there is massive opportunity for domain specific search engines though, imagine a search engine specifically designed for software engineers and developers, or one for academic research (not just papers but all online scientific content, news and discussion), or one targeting the arts. I think it’s these verticals that could be incredible.

You then potentially move towards a building “meta” search engines (if your are older than about 35 you will remember these) that work out what you are searching for and uses a domain specific engine.

Edit:

Just to add to this, people who say that “decentralised” search engines are the only way to compete with Google are not completely wrong, it’s just that it’s not about protocols and distributed indexes. It’s about a community of smaller search engines working within specific domains and collaborating (commercially) on meta search engines, prompting people to search on each others engines if it would be better for that search.

We almost need an “Open Search Co-Op” which smaller search engines can join to share technology and refers users to each other.


What if we didn't try to replicate google. Smaller and niche search engines would probably work better in this new world of vast information.

While I agree that it's very difficult to compete on generic search, it's certainly not hard to compete in niche domains. And you don't need anything like 15 billion pages for those.
next

Legal | privacy