Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

How Google displays their own search results (tables or divs) is irrelevant to how their crawler interprets everyone else's pages.


sort by: page size:

because google won't display search result pages that show one view to the crawler but another to the person who clicks the link.

100% agree.

I have a content website with some tables that I compiled from carefully scouring forums, taking my own measurements, and from people emailing me new info.

Well, Google just displays my tables in the search results, so no one needs to visit my site. This may be great for a user, but it's a real disincentive to create content if Google is just going to scrape it and display it.

Last time I looked, my site ranked #1 for the searches above, but a user would have to scroll through 3 screens worth of whatever else is on a search results page now to get to my link.


Google crawls the web, they don't scrape it. There's a big difference.

Also note that Google doesn't index everything they crawl.

There is no point of linking to google search results because it is personalized. Everyone sees different pages.

I actually think the crawlers are more sophisticated than that. Recently I got an email from Google saying the text on my website was not big enough when browsing my site with a low resolution - which is why they choose to lower my rank on the results page. Clearly they have pretty advanced tests when it comes to accessibility of text, not only the content.

Google is extracting more and more information from the sites in the search results, and displaying that information above the actual results. This negatively impacts the sites that the information is harvested from, since users just accept whatever google throws at them, and don't actually visit their site (which is a hit in revenue and engagement).

I completely understand Google's motivation, yet fully support the sites's complaints.


This is what Google has been trying to do, right? As they keep scraping more and more information out of the sites that they index and then provide thumbnail answers in the results page, there's no reason to click on anything.

If I google for the weather in Chicago, I'm not going to click on a link to weather.com for it, because the seven day forecast is right there.

If I google for information about a person, Google scrapes Wikipedia and puts that right in the results page sidebar.

Flight information, bus schedules, anything that looks like a calculation, a whole bevy of other things, Google just preempts any results and shows inline at the top of the search page. Why would you click through to anything?


Mostly because Google serves the page from their own servers and starts loading it before you even click a search result. It's great for Google search and bad for everyone else.

Because google will ban the page if the crawled content is not the same as the displayed content when you click their link.

Google has been serving those wanked URLs in search results for years, but only to logged-in users. Even if you turn off all search history settings.

They apparently use the data to measure the strength of their ranked results. But that doesn't explain the logged-in vs not disparity.


The big issue is that Google over time has moved to scrape the web.

One big thing Google has done recently is they extract what you search for from a website, and show it inline.

E.g. when I search for a recipe, I get the result in the Google search, and don’t need to visit their website. https://i.k8r.eu/aILGEg

Google extracts reviews from Yelp!, and I can read them without visiting Yelp!.

They’re doing this for everything, and AMP furthers this.

Google is trying to keep the user as long as possible on Google, and show snippets instead of letting users leave.

As result many of these sites lose out on ad impressions, and money.

Google has no way to disable this feature short of entire unlisting your site.


It used to be against guidelines to serve different content to google vs what users would see. Not sure if still the case, but I don't think it's in google's interest to give a result that the user actually can't access.

The internet is about hyperlinks, not hierarchical views - this move by Google feels very disingenuous.

I don't think they do any such thing, if anything they are rotating IPs/user agents to avoid being limited or blocked.

Google requires sites to send the crawler the same content as someone clicking a link on a Google results page would see, so even if some sites get creative covering it up with blurred boxes and similar dark patterns, the data is there in the markup.


I'm pretty sure Google is smart enough to recognize the main content of a page, and ignore things like widgets and navigation. That's Search Engine 101.

That's a false dichotomy. As the GP pointed out, Google started with a symbiotic relationship with the websites. Scraping the content and using it to help people find stuff (and profiting handsomely!) is not the problem.

Problem is all other behavior highlighted by GP - pushing organic results below fold, insta-answers totally stopping traffic to sites actually providing that content, and so on.

Google is now using its dominant position in search engine market to effectively bundle search with these other things that negatively impact publishers into an "all or nothing" deal.

"Don't want us to scrape and show your data? You have nice traffic to your page, it'd be a shame if something were to happen to it..."


Because google claims to care about the wider web not just its search product. This breaks the web.

Are you sure? I thought they were generally ok with it as long as it was the same information the user sees, just presented differently to help Google index it better
next

Legal | privacy