search engine

Search engines work by building a database of what pages exist on the web (which it does by sending out “spiders” to “crawl” pages, following all the links they find to other pages), analysing them in some way, and using an algorithm to produce what it thinks are the most relevant results for any search term you type in.

For English-language sites, the two biggest search engines, which have the resources to do their own crawling, are Google and Microsoft’s Bing. These seem to be the best engines for most languages that aren’t Russian, Chinese, or Korean, too. Most other search engines show you the results from one of those two, sometimes with their own tweaks or additions.

The search engine that I usually use is DuckDuckGo, because I appreciate its focus on privacy (I don’t really want to help Google build a detailed profile on me so they can profit from selling “targeted advertising space” for ads that I religiously block, anyway). DuckDuckGo gets its results from Bing. When DuckDuckGo isn’t producing the kinds of results I’m looking for, I add the !sp bang to search the same term on StartPage instead. StartPage uses Google’s results, so I get to see what they are without Google actually seeing that it’s me who wants results for that term. StartPage is also privacy-focused but they do show non-targeted ads above their search results. The other reason they’re only my runner-up pick is that they don’t put your search term in the <title> field of their search result pages, meaning if you searched for a bunch of different things and kept all the tabs open, all those tabs will have identical names, so you can’t tell them apart without clicking on them.

Search engines are big business. For web browsers that can’t “freeload” off a larger, profitable company, selling the right to be the browser’s default search engine is generally their single largest source of revenue. For the search engines themselves, selling advertising space (“sponsored links”) can also be highly profitable, at least so long as the search engine itself gets enough traffic.

By far the most hegemonic search engine in the world today, with almost 92% market share as of January 2022, is Google. Indeed, English-speakers have coined the verb “to google” meaning “to do a web search” (it’s simply taken for granted you wouldn’t be using any other search engine). Google started out as a “mere” search engine company and rapidly took over the market with their search terms that were way better than any rival (like Lycos or AltaVista) could produce at the time. They soon married their search engine business with an advertising business, which was so explosively profitable that they were able to grow into the tech giant with a finger in every pie that they are today.

Google is often criticised for its use of search data (along with other sources of data) to build detailed profiles on internet users which they can use to offer “targeted advertising services” to would-be advertisers. Distaste for this business model has fuelled the growth of a number of more privacy-oriented rivals, including DuckDuckGo and StartPage, which I mentioned above.

Another criticism that people often make of the biggest search engines is that their search results seem to be getting worse and worse over time. Some specific ways in which they’re perceived to be getting worse include:

  • Many of the top-ranking results (not even including ads) for many types of searches are basically “content farm” garbage stuffed full of affiliate links and keywords, often written so incoherently it’s like a not-very-good AI has slapped it together. Sometimes I search things and half the first page results are blog posts that have all copy-pasted the same AI-regurgitated drivel word for word.
  • Their algorithms are increasingly not taking your search term at its word, but trying to be “smart” and second-guess what you “really meant”, showing you popular webpages that didn’t really match your query ahead of less-popular pages that actually did match your query. And of course, who makes webpages popular in the first place? Their algorithms! So even if this approach had any merit in the first place, the spiralling nature of it makes results more and more garbage over time.
  • A strong recency bias; Google outright refuses to show webpages over ten years old, for example. There would be a ton of web searches where information over ten years old is perfectly germane or even desirable (e.g. if you’re looking for info about an event that occurred over ten years ago).

There are some search engines that aim to take entirely new approaches to indexing and serving search results. Many of these are quite deliberately not trying to be all-purpose search engines that can do everything for everyone, but are purposely limited in scope. For example, there are Wiby(external link), Mar­gi­n­a­lia Search(external link) and Search My Site(external link), all of which focus on websites which could be described as part of the small web.

A couple of new, in-development search engines whose sales pitches for themselves seem to basically be “more privacy, better algorithms” are Andi(external link) (in a public alpha) and Kagi(external link) (in a private beta). Both of these plan to offer paid subscriptions; Andi at least also seems to plan to offer a free service as well.

Read More