Hacker Read

Isofarro | karma 3311 | avg karma 4.59 · 2012-03-18 08:48:17+00:00

Various bits of the lowercase semantic web (and its related uppercase Semantic Web cousin) factors show through on Google results - mostly Google Rich Snippets, so reviews metadata. These mostly require human intervention to enable on a site-by-site basis (sometimes by the site owner, sometimes by humans in Google itself).

The deeper problem of the semantic web is that it's meta data, which tends not to be visible to visitors. And because of that, it's easy to forget / overlook / incomplete, and nefariously allows SEO over-optimisation invisibly that non-whitehat SEOers do in full view of the human audience.

I wish it wasn't true, but Cory Doctorow's portmanteau of metacrap is unfortunately still accurate ( http://www.well.com/~doctorow/metacrap.htm ). We have made several small metadata improvements over the years, but not much progress in substantially overturning Doctorow's original concerns.

Human beings are not great sources of accurate and up-to-date metadata. So we rely on scripts and services to fill in the blanks that we don't. (publish times being recorded in WordPress, for example.)

Converting human content into metadata ends up being an automated human-hands-off affair which means that those automative tools need to be able to parse and extract information / meaning from human prose. Very much what Google has been doing since inception.

It's flawed because it relies on automated interpretation of prose. But there isn't a viable non-flawed method of getting the same information without imposing academic-like constraints to the Web.

reply

projektx | karma 3 | avg karma 0.33 · 2012-03-18 14:35:25

Thanks for the Doctorow metacrap link, excellent reference.