Hacker Read

kparaju · 2018-04-23 14:56:24

I wouldn't be surprised if a lot of the pageviews are bots/crawlers. They don't hover over links.

nvm0n2 | karma 468 | avg karma 0.55 · | 2024-01-10 01:17:05

It's not uncommon for lightly trafficked sites to have 90% or more of their resources taken up by bots of various kinds. Remember Google isn't the only search engine and there are many crawlers that aren't search engines at all.

Quekid5 | karma 1417 | avg karma 1.79 · | 2020-12-17 02:03:14

I have no insider knowledge, but I'm going to guess that they even crawl using actual Chrome browsers once in a while... and penalize deviations from what you serve to the GoogleBot... and also factor in response times for either.

I don't think any single person can predict/know what's actually going on at this point.

reply

l11r | karma 4 | avg karma 1.0 · | 2023-01-26 03:26:43

Probably it's just a search bot like Googlebot.

ryanbrunner | karma 3922 | avg karma 3.37 · | 2017-09-12 12:27:58+00:00

I think it's more likely that they have hidden links and references to these URLs scattered across their "normal" pages. A human would never click on them, but any bot that's just blindly following links would stumble across them, which alerts Yelp that it's likely a bot accessing the site.

andymurd | karma 153 | avg karma 1.2 · | 2011-04-11 01:47:14

Maybe the page content is cloaked so that googlebot sees completely different text to human visitors.

jrockway | karma 72069 | avg karma 3.74 · | 2011-10-12 05:09:28

These don't seem like crawlers, but rather proxies requesting information on behalf of a user (perhaps with that user agent that is in the UA string). Those are "Google bots", but they aren't the Googlebot.

KronisLV | karma 5920 | avg karma 2.34 · | 2021-08-12 04:44:24

Maybe a different user agent or something? I'm sure that the companies writing these articles do want them to have good SEO and therefore allow crawling them for all sorts of bots, so that they'd show up on search engines.

snprbob86 | karma 7460 | avg karma 4.53 · | 2008-07-28 02:34:56

So it looks like Google grabbed about 40 links before giving up? I wonder what a good "score" is? At first, I'd guess less is better, but too few might be running the risk of throwing out potentially good pages. Too many, and the bot is just wasting effort. The 40 score could vary as well based on parallel conditions assuming many bot instances are sharing a task pool. Be sure to post the Microsoft results if/when they crawl you.

ajg1977 | karma 4887 | avg karma 9.85 · | 2010-05-01 00:44:52+00:00

I'm curious as to whether you considered if this would have any effect on PageRank for content you host?

This type of behavior - using javascript to present the user with links different than the googlebot found - seems like the type of thing they frown on.

reply

artursapek | karma 8048 | avg karma 3.04 · | 2017-06-05 20:55:29+00:00

They are known to crawl using human-like user agents instead of the typical Googlebot one precisely to counter this (weak) effort at playing the system. I'm surprised WSJ is surprised by the outcome here.

visarga | karma 12425 | avg karma 1.65 · | 2019-06-18 17:30:49+00:00

Yeah, just like Google bot crawls the web putting enormous load on people's servers (G-Bot can pull hundreds of thousands of pages per day from a single server), while being very sensitive to automated searches and banning your ass (IP) in a heart beat. They sure don't want to be crawled by anyone.

ry0ohki | karma 4012 | avg karma 4.8 · | 2013-02-01 20:05:47+00:00

I would think most bots wouldn't have JavaScript and therefore not register on GA. I think a lot of people are bored and do click the articles on the New page. That said, even clicks from the front page are often not actual readers (as you can tell by many of the comments).

eli | karma 29331 | avg karma 3.14 · | 2016-02-19 18:25:58+00:00

I'm 99% sure I've encountered a Googlebot crawling pages with the UA of a regular browser, presumably for exactly this purpose.

crazygringo | karma 69969 | avg karma 7.48 · | 2012-08-09 20:00:14+00:00

There are site viruses that serve up normal content to normal users, and so go undetected, but serve up spam like that only to GoogleBot, since they're designed to affect PageRank etc. of specific sites.

That was likely the case.

reply

danuker | karma 6850 | avg karma 2.78 · | 2022-08-05 02:31:30

Or the site is bad. Some sites show some content to Googlebot, and other content to mere mortals.

eli | karma 29331 | avg karma 3.14 · | 2016-02-19 22:23:45+00:00

I'm pretty sure that's just to see if the site is serving different content to mobile vs desktop. I think they also sometimes hit pages with no mention of googlebot.

pjc50 | karma 93685 | avg karma 3.72 · | 2023-07-25 04:19:16

> These scam sites load megabytes of junk, load slowly, have text interpersed with ads and modals that render right on top of them

Only if you're not googlebot. The crawler sees a much nicer site.

reply

_delirium | karma 42153 | avg karma 4.32 · | 2010-08-11 17:57:05

Yeah, MSNbot isn't very friendly, so they somewhat bring it on themselves. Both it and Yahoo's bot are much worse than Google at predicting likely page updates, as well--- on one of my sites that has a blank robots.txt, MSNbot accounts for >10x as many hits as GoogleBot, yet Google still manages to keep its index just as up to date.

Some people have had success with crawl-delay, though there are occasional reports of MSNbot not honoring that either.

reply

pionar | karma 1872 | avg karma 3.21 · | 2021-06-15 01:25:23+00:00

I doubt it. Google's crawling bots (like most search engine bots) follow a published standard for access rules for bots (robots.txt).