I have a content website with some tables that I compiled from carefully scouring forums, taking my own measurements, and from people emailing me new info.
Well, Google just displays my tables in the search results, so no one needs to visit my site. This may be great for a user, but it's a real disincentive to create content if Google is just going to scrape it and display it.
Last time I looked, my site ranked #1 for the searches above, but a user would have to scroll through 3 screens worth of whatever else is on a search results page now to get to my link.
I actually think the crawlers are more sophisticated than that. Recently I got an email from Google saying the text on my website was not big enough when browsing my site with a low resolution - which is why they choose to lower my rank on the results page. Clearly they have pretty advanced tests when it comes to accessibility of text, not only the content.
Google is extracting more and more information from the sites in the search results, and displaying that information above the actual results. This negatively impacts the sites that the information is harvested from, since users just accept whatever google throws at them, and don't actually visit their site (which is a hit in revenue and engagement).
I completely understand Google's motivation, yet fully support the sites's complaints.
This is what Google has been trying to do, right? As they keep scraping more and more information out of the sites that they index and then provide thumbnail answers in the results page, there's no reason to click on anything.
If I google for the weather in Chicago, I'm not going to click on a link to weather.com for it, because the seven day forecast is right there.
If I google for information about a person, Google scrapes Wikipedia and puts that right in the results page sidebar.
Flight information, bus schedules, anything that looks like a calculation, a whole bevy of other things, Google just preempts any results and shows inline at the top of the search page. Why would you click through to anything?
Mostly because Google serves the page from their own servers and starts loading it before you even click a search result. It's great for Google search and bad for everyone else.
It used to be against guidelines to serve different content to google vs what users would see. Not sure if still the case, but I don't think it's in google's interest to give a result that the user actually can't access.
I don't think they do any such thing, if anything they are rotating IPs/user agents to avoid being limited or blocked.
Google requires sites to send the crawler the same content as someone clicking a link on a Google results page would see, so even if some sites get creative covering it up with blurred boxes and similar dark patterns, the data is there in the markup.
I'm pretty sure Google is smart enough to recognize the main content of a page, and ignore things like widgets and navigation. That's Search Engine 101.
That's a false dichotomy. As the GP pointed out, Google started with a symbiotic relationship with the websites. Scraping the content and using it to help people find stuff (and profiting handsomely!) is not the problem.
Problem is all other behavior highlighted by GP - pushing organic results below fold, insta-answers totally stopping traffic to sites actually providing that content, and so on.
Google is now using its dominant position in search engine market to effectively bundle search with these other things that negatively impact publishers into an "all or nothing" deal.
"Don't want us to scrape and show your data? You have nice traffic to your page, it'd be a shame if something were to happen to it..."
Are you sure? I thought they were generally ok with it as long as it was the same information the user sees, just presented differently to help Google index it better
reply