Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Wow, this feels like something out of the mid-2000s.

The problem with the semantic web is that the incentives are mostly wrong. People make websites to be viewed by people. Doing the semantic stuff is work that can be difficult to get right. If it doesn't bring you visitors, why would (most) people bother.

Sure you can get fancy tools that use the data (lets ignore the scaling issues that many of them have), but fancy tools separates the data from the context, further reducing incentives (for many content creators). If they ever did take off, we would have massive spam/quality problems, because we have now separated the data from the website, with all the visual indicators of how spammy it is, which is perfect for dark SEO and other spammers/phisers.

For that matter, just look at metadata on the web in general and what a mess that is. <meta name=description (or keywords) - spammers took over and nobody use them. <link rel="next" i think old opera is the only thing to ever do anything with that metadata.

The only metadata systems that have ever worked is when the site author gets something out of it: e.g. technocrati tags, <link> to rss feeds, facebook opengraph, various google things, etc. Or on the other side, when that is their whole reason for being, like https://wikidata.org and maybe some glam stuff. Everyone making arbitrary metadata out of the goodness of their heart, and having it be of consistent quality and meaning is a pipe dream.

Not to mention the negative incentive of obliterating the walled garden which as much as it sucks is something the corporate overlords like a lot.



sort by: page size:

I think the semantic web never worked because of seo spam. The closest it got to adoption in any form was the keywords meta tag. We know how that ended up.

You can debate syntax forever but the semantic web will never rise without the proper incentives. Not only is there no incentive for industry to participate in it, there's in fact an anti-incentive to do so.

Say you've build a weather app/website. Being a good citizen, you publish "weatherevent" objects. Now anybody can consume this feed, remix it, aggregate, run some AI on it, new visualizations, whichever. A great thing for the world.

That's not how the world works. Your app is now obsolete. Anybody, typically somebody with more resources than you, will simply take that data and out-compete you, in ways fair on unfair (gaming ranking). You may conclude that this is good at the macro level, but surely the app owner disagrees on the micro level.

Say you're one of those foodies, writing recipes online with the typical irrelevant life story attached. The reason they do this is to gain relevance in Google (which is easily misled by lots of fluffy text), which creates traffic, which monetizes the ads.

Asking these foodies instead to write semantic recipe objects destroys the entire model. Somebody will build an app to scrape the recipes and that seals the fate of the foodie. No monetization therefore they'll stop producing the data.

In commercial settings, the idea that data has zero value and is therefore to be freely and openly shared is incredibly naive. You can't expect any entity to actively work against their own self-interest, even less so when it's existential.

As the author describes, even in the academic world, supposedly free of commercial pressure, there's no incentive or even an anti-incentive. People rather publish lots of papers. Doing things properly means less papers, so punishment.

Like I said, incentives. The incentive for contributing to the semantic web is far below zero.


The semantic web is not a technical problem, it's an incentive problem.

RSS can be considered a primitive separation of data and UI, yet was killed everywhere. When you hand over your data to the world, you lose all control of it. Monetization becomes impossible and you leave the door wide open for any competitor to destroy you.

That pretty much limits the idea to the "common goods" like Wikipedia and perhaps the academic world.

Even something silly as a semantic recipe for cooking is controversial. Somebody built a recipe scraping app and got a massive backlash from food bloggers. Their ad-infested 7000 word lectures intermixed with a recipe is their business model.

Unfortunately, we have very little common good data, that is free from personal or commercial interests. You can think of a million formats and databases but it won't take off without the right incentives.


The financial incentives have become stronger for building walled gardens than a semantically open web. The semantic data has been more useful to giants that monetize it, than to millions of small publishers who are supposed to abide by the rules and maintain it. The issue is even bigger if you are listing valuable goods - from products, to jobs, to real estate/rental listings as a part of your marketplace or business. Aggregators like google can scrape and circumvent you, by taking away your users earlier in the acquisition chain, so why bother giving them your product graph.

There was never any incentives aligned to make the Semantic Web work. For it to work, page authors had to go through the trouble of making their own content compatible, but it didn't really buy them anything in return and for big content producers it removed their main revenue source -- ads. As a result, the places that had lots of information never bothered, and small users never bothered, and the SW petered out.

In some ways it's similar to what we see in certain news sites. They only link to other pages on their own site even though it would be trivial to link out to the original information pages. Sites will even host public documents in in-line PDF readers in order to not link back to completely publicly available government sites -- scientific and engineering advancements are often vaguely and imprecisely talked about with no link to the original paper or announcement from the source. By living on ad revenue, these sites want to roach motel you into their hypertext jail and the result is information gets twisted and misreported where it propagates and is repeated in other places. News sites will even reference each other without linking back to the other site's original content.

Ads killed the Semantic Web.


At an old job, I knew some very idealistic folks who kept pushing semantic web business. "Let's do that everywhere!" As an exercise, I would have them open a browser, visit various sites, and then look at the source. "Go on, check to see if it validates," I would say with an anticipatory grin. Whether hand-crafted HTML or generated by any number of frameworks, many sites can barely manage to close their tags, asking for semantic references is a "just won't happen in practice" thing.

I have also seen a great deal of consultant money, programmer time, sys-admin sweat, and the like focused on these toweringly-designed, completely-unused triple stores, layer upon layer of hot technologies (ever-moving, construction on the tower never ceased) fused together to create a resource-intense monstrosity that, at the end of the day, barely got used. But hey, let's look at that jazz semantic web example one more time.

The most painful part is that I understand the urge to build a gleaming repository for information, where the cool URIs never change; SPARQLing pinnacles, ready to broadcast the Library of Alexandria, glimmer; and the serene manifold of abstract information lies RESTful ... but I have come to understand that the web of today is an endlessly bulldozed mudscape where Someone Very Important has to have that URL top-level yesterday (never mind that they will forget about it tomorrow), of shoddy materials and wildly varying workmanship, and where nobody is listening to your eager endpoints because the commercials are just too loud. I too once labored for information architecture, to have the correct thing in the obvious place, with accurate links and current knowledge, to provide visitors with the knowledge they desired ... but PR preempted all of it to push yet more nice photographs in yet another place: the Web as a technology for distributing images that would once live on glossy pamphlets.

The vision is lovely, but we who have always lived in the castle have walked alone.


You've linked to a PNG file and said "see! semantic web!"

I tried to dig into it, looking for some data and to see what you are talking about, and I finally find a piece of RDF, real semantic web stuff: http://dbtune.org:3030/sparql/?query=describe%20%3Chttp://db...

Um, ok. Now what? This is short XML file containing links, half of which are dead. The biggest problems with SW is that no one agreed on the labels, inputs and outputs, and that there are no mechanisms for data preservation or trust.

How have those been solved now?

(edit) I'm not hating on the idea, btw. It just doesn't seem to be a technological problem. It's a social one. The second you find a way to get people to structure their data for fun & profit, the SW will blossom. And then it will be spammed. And then someone will find a way to index it and filter out the spam, and by then it will be something good, but quite different from what was intended.

I am genuinely curious to know what has changed in the last few years that academics now take SW for granted.


The Semantic Web is dead. It was never really something. Even the regular decentralized Web has mostly died. It was replaced by mega corporations' walled gardens. 95% of content is in there; in Facebook, Google, Twitter, Instagram, Youtube, Netflix, Reddit, Twitch, Medium and a few small others. The rest is a skeleton barely being touched. The Semantic Web isn't even necessary or useful anymore, even if its technology were good.

I'm not sure why you were downvoted. "Semantic Web" was the first thing that came to my mind after reading the first couple paragraphs of the article. I thought he was going to head that direction as well. I was sorely disappointed!

There are surely diminishing returns for doing increasingly sophisticated things with the contents of HTML tags to parse and understand webpages, using inbound links to rank them, etc.

Cory Doctorow's essay, "Metacrap," does a great job of listing the reasons a Semantic Web-style metadata attempt will always fail when left to the "public" to implement. One thing that the old human-run Yahoo! and the Open Directory Project do get right are the quality of results, but since updates are made at the speed of human, these seem to be pretty much impossible to keep current.

Perhaps there is some neat way to use everyone's browsing histories to create a semantic link between content on the web. But that will never happen because of (extremely valid) privacy concerns.

Well, shame on the author for writing such a myopic rant piece containing no new ideas or proposals.


> The semantic web inexplicably dies

"Inexplicably" except for the fact that it doesn't actually work - outside purely charitable and socially-oriented efforts (e.g. Wikidata), there's no incentive for web sites to provide machine-readable, "semantic" information, and quite a bit of incentive not to. And Google didn't "create" the world's largest ad platform, either - they bought it (DoubleClick) and merged it with their technically-superior advertising offer.


The trouble with the "semantic web" is that the cost is paid by those creating content, and the benefit is obtained by those running web scrapers.

Tichy is correct. The semantic web is essentially a big knowledge management effort that's doomed to failure (like most KM efforts) because xyz Joe Random web page maker really doesn't see any incentive to go through and spend a couple hours of his time marking his page about futons up for the SW. It's a waste of his time.

Even if he did, by some miracle do that. There's no way somebody else can make much more use of that information beyond what they already can through other, lower effort facilities like search for example. It requires powerful, web-scale reasoning engines to even make sense of that, none of which we have today.

Supposing even 5 years from now, some startup is founded to do the reasoning, say Cuil 2.0, what do I get for it? Better page retrieval? Is it 2% better? Is it 30% better?

It better be 30% better, because 2% better isn't worth hundreds of billions of aggregate dollars for everybody to go through and mark up all their pages to make them SW ready. And since nobody really knows (there isn't a useful reasoning engine that does anything that you can't do with search in the web space that I'm aware of, and most of the ones I've seen provide results far worse than search, Wolfram Alpha is probably the closest to a half-working semantic reasoning system you can find, and it's not like it's taken the world by storm), nobody will even bother -- a classic catch-22.

The SW is just a repackaging of old broken AI ideas. We already know systems built on essentially the same principles don't do much for the average person, the SW is just a doubling down on the same bad ideas -- "it didn't work before because it wasn't big enough!"


Due to shitty academic job market, I was forced to take a job part of which could be described as "semantic web evangelist". If you read some incredibly optimistic article about the semantic web and how it solves all life's problems, there's a non-negligible chance that I wrote it.

The semantic web is a colossal waste of time and money. It does more harm than good. Unless you're a savvy PI (probably in a European country) who has used it to lasso a big fat grant from clueless committees who think in terms of buzzwords.

Every successful project which is linked in any way to semantic web, would have been successful without the semantic web.

The only original thing to come out of semantic web, RDF, causes more harm than good. Anyone who has had to deal with encoding sequences in RDF, or any other kind of nontrivial structured data, which is only possible with incredibly asinine "blank node" trickery, either knows this or is deceiving themselves.

Semantic reasoners are either limited to EL ontologies, which defeats the whole purpose of OWL's expressiveness; or else take dozens of gigabytes of RAM to reason over any ontology more complicated than "the pizza ontology". Utterly, thoroughly irrelevant in today's VPS-based world.

The semantic web "community" is about 75% made up of invalid URLs because all these academic geniuses are too busy theorizing about ontologies to realize that all the links into their department webpage are going to stop working when their one-year postdoc expires. Various URL-standardizing services try to address this, unfortunately the number of these services is approximately 1 per semantic web researcher, and in the long run they're no more stable than the department webpages.

The semantic web is a horrific waste of (mostly EU) grant funds that could be used on far more worthy research. The world would be a better place if the whole idea had never been invented.


I was a big believer in the semantic web for years, but there is a load of things wrong with it from conceptual problems to practical ones.

For starters the Semantic Web requires an enormous amount of labor to make things work at all. You need humans marking up stuff, often with no advantage other than the "greater good". In fact you do see semantic content where it makes sense today. Look at any successful websites header and you'll see a pretty large variety of semantic content, things that Google and social media platforms use the make the page more discoverable.

This problem is compounded by the fact that ML and NLP solved many of the practical problems that the semantic web was supposed to. Google basically works like a vast question answering system. If you want to find pictures of "frogs with hats on" you don't need semantic metadata.

A much larger problem is that the real vision of the semantic web wreaked of the classic "solution in search of a problem". The magic of semantic web wasn't the metadata; RDF was just the beginning.

RDF is literally a more verbose implementation of Prolog's predicates. The real goal was to build reasoning engines on top of RDF, essentially a prolog like reasoner that could answer queries. A big warning sign for me was that the majority of people doing "Semantic Web" work at the time didn't even know of the basics of how existing knowledge based representation and reasoning systems, like prolog, worked. They were inventing a Semantic future without any sense that this problem has been worked on in another form for decades.

OWL, which was the standard to be used for the reasoning part of the semantic web was computationally intractable in it's highest level description of the reasoning process. If you start with a computationally intractable abstraction as your formal specification, they you are starting very far from praxis.

For this reason it was hard to really do anything with the semantic web. Virtually nobody built weekend "semantic web demos" because there wasn't really anything you could do with it that you couldn't do easier with a simple database and some basic business logic... or just write in Prolog.

A few companies did use semantic, RDF databases but you quickly realize these offered no value over just building a traditional relational database, and today we have real graph data bases in abundance so any advantage you would get form processing boatloads of XML as a graph can be replicated without the markup overhead. And that's not even considering the work in graph representation coming out of deep learning.

Semantic web didn't work because it was half-pipe dream, and not even a very interesting one at that.


Yeah, semantic web really hacked the brains of academic-facing bureaucrats. It fell into this giant gap between what administrators don't know about business and what they don't know about technology... a gap big enough to shove every utopian idea about "an effortlessly integrated, data driven society" into.

There's no such thing as "right" way to represent any given data stream, just ways that are more or less suitable to specific tasks and interests. That's why HTML failed as descriptive language (and has become fine-grained-formatting language), and it's why symantic web was doa.


The idea is flawed.

Semantic Web expects people to do extra work, with no benefit. Create a benefit for those people, and you'll see it get done.


The failure of the semantic Web is that it's repetitively being built by and for technologists rather than to meet a real need of real end users. It's technologists in a vacuum building approaches that don't actually solve problems that millions of people have. So long as they keep doing that, it will perpetually fail.

Freebase as a prominent example, was pointless for an average person. There was no reason for it to exist in regards to doing something for millions of people.

Wikipedia, Quora, Stack Exchange, etc. are what people want to consume. Until the semantic Web leads to a dramatic improvement on those types of end user products, it's not going to matter.


Not going to happen. The reason for the Semantic Web never taking off were never technical. Websites already spend a lot of money on technical SEO and would happily add all sorts of metadata if only it helped them rank better. Of course, many sites’ metadata would blatantly “lie” and hence, the likes of Google would never trust it.

Re exposing an entire database of static content: again, reality gets in the way. Websites want to keep control over how they present their data. Not to mention that many news sites segregate their content as public and paywalled. Making raw content available as a structured and query able database may work for the likes of Wikipedia or arxiv.org. But it’ll not likely going to be adopted by commercial sites.


I don't see how else would they monetise it to be honest.

It's one of the main reasons the "semantic web" didn't really catch up back in the day. Those who are in position to really gather massive amounts of data won't share it all with the competition (namely Google and Facebook).

next

Legal | privacy