Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I'm quite sceptical that "a queryable ontology. With a rich and expressive grammar" would be so obviously easy to use and great that it threatened Google, rather than Google being a frontend for said ontology (that you still need to crawl first to use!). And indeed, what remains of semantic-web like data is massively pushed by Google today, because it makes it a easier for them to provide results based on that - and if you want to convince someone to add it to a website in a commercial setting, "it helps Google understand our site" is the primary argument that sticks (even though it also helps others parse sites).

Which IMHO points to the main problem: Publishing semantic data is work, and had no clear value proposition you could sell businesses on, and it ran out of steam before someone made a convincing one. Niches that see the need for such data publishing still are willing to use this stuff or alternatives, but for the general majority of publishers it isn't there, or actively seen as a negative.

It also didn't help that IMHO the focus was to much on what was theoretically possible, but not on making it actually easy to use, which made even more devs ignore it or build alternatives because the entry hurdle is steep. Plenty APIs could build on semantic web tech, but they don't because a custom REST API is typically just easier to do and thus more familiar. (Despite semantic tech having the groundworks for lots of what's seen as new-ish trends like API generators/machine readable API docs/...)



sort by: page size:

I think the people really pushing for the Semantic Web kind of gave up. You hardly ever hear that term anymore.

I guess the value proposition of "You can add a whole bunch of complexity to your webpage that won't affect what people see so robots can scrape your page easier" didn't really resonate with developers. Also, the proposals I saw were much too granular and focused on people writing scientific papers on the web. It wasn't a good mesh for the "garbage" web, which is like 99% of everything.


I think it was probably just too much extra work to expect wide adoption of the standards. It could never be strictly enforced by the time it was laid out, and the lack of enforcement meant it was a classic chicken-and-egg problem where you couldn't expect people to spend a ton of effort setting something up when no one else was doing it.

It turns out though, that improvements in NLP mean there is still hope for a slightly different incarnation of the Semantic Web. There are a lot of companies out there now, like Diffbot [1] or Google [2], that are providing semantic searching of web data with structured results. It's not exactly the same as the concept of the Semantic Web, which would've required content publishers to tag their own data, but it does move towards the Semantic Web goal of making web data processable by machines.

[1] https://www.diffbot.com/

[2] https://developers.google.com/knowledge-graph/


What happened to the Semantic Web was Machine Learning and NoSql databases. Even if the Semantic Web had been a good idea, it took a lot of work to get any benefits. Machine learning produced Big Wins For Free, or at least Comparatively Cheap. And they produced them from big piles of unformatted data requiring no standards meetings or agreements beforehand.

I felt (and said at the time) that the Semantic Web wasn't sufficient to achieve its own goals. The language they chose wasn't powerful enough to express sufficient semantics to enable the kind of data integration and integrity that they wanted. The result was ontologies that still required a lot of negotiation before you could start working -- and then provided little benefit.

So the semi-structured world instead picked NoSql databases, which promptly became full of impossible crap, but at least you could Move Fast And Break Things. And people took all that crap and ML'd it to get something -- what, exactly, is unclear, but it was a thing.

I'll note that I pursued ontologies with a more rigorous standard, and I couldn't get any traction, either. The up-front expense was too high, and I never managed to convey the story about how much good it would do you in the five-to-ten year time horizon. Nobody wanted to hear that. I still think it was a better approach than the Semantic Web, but in the end people chose flexibility over interoprability.

Semantic Web failed because it was neither Semantic nor Web. It wasn't smart enough to be Semantic, nor agile enough to be Web.


Wasn’t much of this covered by the whole Semantic Web effort? They built a ton of stuff for self-describing data (RDF) that federated over multiple hosts by default (SPARQL), provided meaningful names (URIs everywhere) and multiple tools for giving all this semantic meaning (OWL, RDFS).

It stalled for many many reasons, including:

- XML is kind of awful

- Over-general solution without clear advantages for a specific use case

- Mis-aligned incentives

- easy to abuse open query capabilities

- Hard to use for both data publishers and consumers

I think that JSON Schema with incremental enhancement via JSON-LD is a more promising tech stack for another try. That would let you take advantage of the massive investments in the current API ecosystem while carrying forward the best parts of the old Semantic Web Effort.

Of course the incentives are still hard to sort out. Commercial entities WANT lock-in. They will Embrace/Extend/Extinguish anything they can because they need a moat to make money. Honestly anti-trust regulations might be needed.


It's a pity the semantic web never took off. It might have greatly reduced the need for sophisticated centralised search-engines.

DISCLAIMER: Contains some self-promotion

> Imagine hovering over a UI element and seeing who implemented it and when, what project it was part of, why the project was initiated, and what kpis and goals it contributes to.

That's exactly what we are building at Field 33[0] with a package manager for ontologies (Plow[1]) as an underpinning to get a good level of flexibility/reusability/colaboration on all the concepts that go into your graph.

------

> Why isn’t semantic web more popular inside companies?

As part of building Field 33 we obviously also asked ourselves that question. My rough hypothesis would be that ~10 years ago semantic tech didn't provide tangible enough benefits, and since then got left behind in the dust by non-semantic tech.

That caused a tech chasm that widened and widened, where the non-semantic side became a lot more accessible with quasi-standards (REST) and new methods of querying data for frontend usage (GraphQL), while the status quo of the semantic web space is still SPARQL (a query language full of footguns). Same thing goes for triple stores (the prevalent databases in the space) that roughly go through the same advancements as RDBMs, just at a much slower pace.

It also doesn't help that most work being done in the space comes from academia rather than companies that utilize it in production scenarios.

There is quite a nice curated list of problems/papercuts about the semantic web/RDF space[2].

Overall, despite the current status quo, I'm quite optimistic that the space can have a revival.

[0]: https://field33.com

[1]: https://plow.pm

[2]: https://github.com/w3c/EasierRDF


I think the semantic web never worked because of seo spam. The closest it got to adoption in any form was the keywords meta tag. We know how that ended up.

I was a big believer in the semantic web for years, but there is a load of things wrong with it from conceptual problems to practical ones.

For starters the Semantic Web requires an enormous amount of labor to make things work at all. You need humans marking up stuff, often with no advantage other than the "greater good". In fact you do see semantic content where it makes sense today. Look at any successful websites header and you'll see a pretty large variety of semantic content, things that Google and social media platforms use the make the page more discoverable.

This problem is compounded by the fact that ML and NLP solved many of the practical problems that the semantic web was supposed to. Google basically works like a vast question answering system. If you want to find pictures of "frogs with hats on" you don't need semantic metadata.

A much larger problem is that the real vision of the semantic web wreaked of the classic "solution in search of a problem". The magic of semantic web wasn't the metadata; RDF was just the beginning.

RDF is literally a more verbose implementation of Prolog's predicates. The real goal was to build reasoning engines on top of RDF, essentially a prolog like reasoner that could answer queries. A big warning sign for me was that the majority of people doing "Semantic Web" work at the time didn't even know of the basics of how existing knowledge based representation and reasoning systems, like prolog, worked. They were inventing a Semantic future without any sense that this problem has been worked on in another form for decades.

OWL, which was the standard to be used for the reasoning part of the semantic web was computationally intractable in it's highest level description of the reasoning process. If you start with a computationally intractable abstraction as your formal specification, they you are starting very far from praxis.

For this reason it was hard to really do anything with the semantic web. Virtually nobody built weekend "semantic web demos" because there wasn't really anything you could do with it that you couldn't do easier with a simple database and some basic business logic... or just write in Prolog.

A few companies did use semantic, RDF databases but you quickly realize these offered no value over just building a traditional relational database, and today we have real graph data bases in abundance so any advantage you would get form processing boatloads of XML as a graph can be replicated without the markup overhead. And that's not even considering the work in graph representation coming out of deep learning.

Semantic web didn't work because it was half-pipe dream, and not even a very interesting one at that.


There's two big things missing in this discussion of the Semantic Web to me,

1. Developers. Historically Semantic Web was a lot of RDF & Sparql, which are both imo fairly hostile to developers. There were some decent libraries, but often written in a very oldschool style that made it difficult to even load or use, & with frankly pitiful documentation/tests. A lot of the databases/tooling was paid/proprietary.

The development story is looking much better. Oddball RDF & Sparql are joined by much more mainstream-dev friendly tools: Microdata which is pretty simple marked up HTML & JSON-LD which looks & works like JSON, with a little extra "context" sprinkled in at the top. Libraries are much improved & modernized & mainstream-dev compliant. Datastores like Apache Jena are far more used & there's a lot of ActivityPub & related json-ld-centric data-stores & systems being created & experimented with.

2. Users. The article talks about primary use cases for semantic web, and they are all huge massive industries, not people. We needed semantic web because it would help search. We needed semantic web because it would help social. We needed social web because it would help e-commerce (& look, an article from yesterday about just that![1]).

What's missing is end users. I don't mind that super-large data systems can do interesting things with semantic web. But to me, the purpose was always to enrich the information we users see online with our eyes with powerful & consistent data that our own machines can help use. Our navigator should be helping us, showing us what digital matter we are seeing on the page, rather than letting the page exist as one enormous standalone artifact implicitly composed of arbitrary text & images. There's meaning there, there's thing that we are working with, & semantic web gives us a common operating system for talking about things, & managing them.

Users are still somewhat missing from semantic web. Folks like ActivityPub are doing a wonderful & interesting job using Semantic Web to build common distributed platforms for social, where we can talk about digital matter like Shares and Photos and Favorites in a common way. For now, the semantic web tech remains under the hood, something abstract powering a client that abstracts over the semantic meaning to generate just another anonymous web page, filled with articles and photos and listens and viewings & other social entities, but presented through the veneer of the application, not as discrete social objects unto themselves. I think we're only just starting to explore how to open the Semantic Web up, how to represent semantic data entities & data stores, in a way that will let users interact directly with digital objects, rather than needing the artifice & instrumentation of the application. But this is pretty deep conjecture. What I think is clearer to say is that the end-user has, until very recently, has not seen or understood how semantic web technology might be helping them; it's been a tool for businesses & big data. I look forward to the interesting era of Semantic Web, the era now breaking upon us, when we get to explore how having structured meaningful data can be good for individuals, persons, for personal computing, for small & medium data, & especially, for us to begin to communicate with each other over better structured data. And I think JSON-LD, ActivityPub, & the semantic web is, by far, the most promising & straightforward way to explore these virtues of structured communication.

By contrast, the article's talk about "what's next" is yet more academic projects, machine learning, & trying to represent more things (like actions, which is something absolutely core to what ActivityPub does: represent activities[2]!).

[1] https://news.ycombinator.com/item?id=24557027

[2] https://www.w3.org/TR/activitystreams-vocabulary/


For those more familiar with it, didn't this decade-old standard die with other efforts the Semantic Web?

This. The Semantic Web produced some interesting technologies that lost their cool factor and dissolved into the background (SPARQL, graph DBs... etc.). Ontology everywhere simply never caught on because it wasn't worth it.

This co-opting of term by blockchain-backed everything is for something actively dangerous as opposed to simply too little bang for too much buck.


Hardly. The promise is still there, but there are barriers in place to get there.

One of the most useful aspects of the semantic web is how it enhances the search for information. Some web citizens have become conditioned to see Google as the pinnacle of what we can achieve through search, but we can do a lot better. Let's use an example to illustrate this. Imagine a presidential election was taking place and you want to understand the positions of the candidates on topics that matter to you. Let's say foreign policy was something you were interested in, including their proclivity for war. By allowing for searching on a richer set of metadata you can more easily access the information about the positions of these candidates, without the distortions of Google's page rank algorithms. Think of it like treating the information of the web as a database you can query more directly. That's the main promise of the semantic web.


Wasn’t this what the Semantic Web was supposed to enable?

You just described the semantic web. It’s beautiful. Now why did it fail?

Exactly.

A refinement on your second point is that the groups who would have benefited the most from semantic web were the googles of the world, but they were also the ones who needed it the least. Because they were well ahead of everybody else at building the NLP to extract structure from the existing www. In fact the existence of semantic web would have eroded their key advantage. So the ones in a position to encourage this and make it happen didn’t want it at all. So it was always DOA.


The semantic web ideas failed because the concept required open data integration. Companies don’t like freely exposing their data without branding or proprietary controls.

I personally think the semantic web was a solution in search of a problem. People weren't concerned about ontology and mappings. They were concerned with making the web more approachable. I think, therefore, money was invested into Mobile Apps / JS frameworks instead of tools for semantic web.

If the "semantic web" worked, they wouldn't have had to buy a startup. You don't need a startup to read kindly-provided, accurate, well-categorized RDF tuples out of a web page.

One of my favorite ironies of the "semantic web" is almost every story cited as a success story by its advocates is in fact a failure story, in that it involves somebody having to do something that wouldn't have been necessary if the semantic web actually existed.

By the way, LLMs completely finish killing off the semantic web. The semantic web will never take the form of the entire world providing nice public RDF snippets. You "just" feed the real-world goo into an LLM (or its sequels) and extract what you need. It turns out that it was literally easier to solve the language comprehension problem than to implement the semantic web. (Even if you don't consider the language comprehension problem solved yet, we are well on track to having it solved before the semantic web is implemented.)


"I don't think a "universal semantic" was ever a design goal of the semantic web."

And I'm pretty sure it was the whole point. Nobody would ever have written as many reams of marketing material if the pitch was "Hey, someday, you'll be able to reach out to the web, and with specialized software for your particular domain you can access specialized web sites with specialized tags that give you access to specialized data sets that can be fed to your specialized artificial intelligence engines!"

Because that pitch is basically a "yawn, yeah, duuuuuh", and dozens of examples could have been produced even ten years ago. The whole point was to have this interconnected web of everything linking to everything, and that's what's not possible.

next

Legal | privacy