What Happened to the Semantic Web?
Remember the Semantic Web? Probably not. It was all the rage about two decades ago. But nowadays, mentions of it are rare.
After starting with a bold vision for a new Web (see below), the Semantic Web gradually lost steam and faded into the background little by little. Still, there never was a kind of "reckoning" about why it was so… until now that the Semantic Web vision has to consider the presence of the disruptive and flashy Generative AI technology.
In this post, I examine why the Semantic Web, despite losing its initial shine, is an essential piece of the Web puzzle, even coexisting with the new Generative AI, and why Data Science professionals should be aware of it too.
But first, let's take a look at the man behind the creation of the Semantic Web.
The Man Behind the Web
Sir Tim Berners-Lee (TimBL) proposed the concept of the Semantic Web (SW) in 1999 while he was working at the World Wide Web Consortium (W3C). The idea was to create a web of data that machines could process, allowing them to understand the meaning of the information and make connections between different pieces of data.
We must understand that the SW was a second act for Tim Berners-Lee. In 1999 he was already a worldwide (no pun intended) celebrity as the creator of the World Wide Web long ago.
It was 1990 when, as a visiting scholar at the CERN in Switzerland, he proposed to make the Web a reality to his boss, Mike Sendall, who found the proposal "vague but exciting." TimBL said he invented almost nothing because the hyperlinks, the internet protocols, and many other elements were already available. Still, nobody had taken the step of putting them all together in place.
As TimBL put it,
I just had to take the hypertext idea and connect it to the TCP and DNS ideas, and – ta-da! – the World Wide Web.
A small step for man…
The first web page ever built is still available at this address—no fancy formatting, just a big text at the top stating "home of the first website."
Now you have an idea of who TimBL is …and why he was taken very seriously when his second proposal, the one of the SW, was released almost at the turn of the century.
The Vision of the Semantic Web
The SW was seen as the next step in the evolution of the internet, which had until then been focused on linking documents together. With the SW, the focus would shift to connecting data, creating a more intelligent and interconnected web.
The vision of the SW was to enable machines to understand the meaning of information on the Web and to use that understanding to provide more intelligent and personalized services to users. By the turn of the century, it was clear that the Web was not only for human consumption, but the way information was stored in web pages didn't help at all to automate information use. According to TimBL, the traditional web was becoming a kind of bottleneck to Web processing automation.
That vision was compelling: Unlike the traditional Web, where pages are designed to present information to humans, self-describing web pages would make it possible to automate its content processing. So it made a lot of sense to thoroughly overhaul the internet as we knew it. And exactly as he did for the Web, TimBL called for starting the new revolution.
I was there back then to witness how this movement was taking off—and not only to take notice: several of my graduate students were working on Semantic Web topics.
Technologies for making the SW a reality
But the Semantic Web was not only an idea—a set of technologies and standards were developed to make it become a reality, which is presented in the following points:
- Resource Description Framework (RDF): A data model for representing information on the Web. RDF uses URIs to identify resources, and it uses predicates to describe the relationships between resources.
- Web Ontology Language (OWL): An expressive language for describing the meaning of RDF data. OWL can be used to define complex relationships between resources and to define classes of resources organized in a hierarchy.
- SPARQL: A query language for RDF data. SPARQL can be used to extract information from RDF data stores.
- Semantic Web services: A type of web service that uses semantic markup to describe the service's capabilities and the data that it consumes and produces.
Several significant SW developments, such as the DBpedia, a sort of Wikipedia made of structured information following the standards mentioned in the table above, were undertaken. It contains more than 228 million entities, which looked like a lot when we were not used to talking about billions, like the number of nodes in Large Language Models (see below).
DBpedia is similar to the "Knowledge Graph" of Google, which was (and is) used for solving search queries. Google's Knowledge Graph is a network of concepts and names expressed as a collection of RDF triples. I can't give you much more information about it, as it's mostly Google's internal information. I'm sure the Knowledge Graph has been instrumental in providing Google an advantage over its competitors.
For instance, I'm sure you have noticed that sometimes Google retrieves pages where your keywords don't appear but are somehow related to your query. This is because of "Semantic Search." This search goes beyond simple keyword matching by incorporating terms in the Knowledge Graph related to your query. Semantic Search considers synonyms and other concept relations to enrich the search.
Even without fully understanding each technology in particular, we can see that from the tech point of view, virtually all the pieces were there. So, why was the "ta-da!" missing this time around? Why was the SW taking so long to materialize?
The Rise of Generative AI
The SW had the assumption that machine-readable formats were needed to get a grasp on the meaning of the information stored on the Web. TimBL assumed that human-readable information was not usable by machines.
And it wasn't for a long time.
It wasn't until November 2022 that, after scraping millions of Web documents, ChatGPT could answer questions posed by humans, albeit not very reliably.
Then things changed fast and dramatically. The impressive conversational capabilities of ChatGPT and other chatbots showed that the information stored on the Web was possible to "understand" by the machine. Further, it could answer questions and suggest actions to humans.
Who would have thought that? Not me, not Tim Berners-Lee.
Generative AI quickly overshadowed the SW. Its capabilities entranced developers, researchers, and corporations alike. The focus shifted from creating a structured SW to harnessing the power of AI to generate content.
Will there be a comeback?
Once the central assumption of the SW (that machines couldn't understand Web pages) was proven wrong, its future became uncertain. It seemed to many that the SW would become obsolete and go the way of the Dodo.
Not so fast.
First, specific SW technologies like RDF, SPARQL, and Semantic Web Services will be around long because they are part of the existing infrastructure. In fact, they are part of the Amazon Web Services offering.
Besides, I think Knowledge Graphs, either at Google or elsewhere, will complement the Generative AI solutions.
Let me give you a specific example.
Since the Spring of this year, I have been discussing with colleagues how Bing used the Knowledge Graph to answer questions (yes, Microsoft also has its version of the Knowledge Graph). Then I saw a technical diagram of Bing, and it became clear how they do it.
Here it goes:
When a query is presented to Bing, they use it to retrieve the relevant parts of the Knowledge graph. The result is similar to a set of statements that complement the original user query. This "enriched" query is then used as a prompt to ChatGPT or whatever Large Language Model is used. That's it.
The information coming from the Knowledge Graph has advantages over whatever the chatbot would do without it: the main one being that it is composed of reliable information instead of made-up facts, hallucinations, or whatever false information could be produced by the LLM alone.
There is another tantalizing possibility that nobody is talking about: AI could be used to annotate regular web pages and make them self-descriptive using the SW standards.
You know, one of the hurdles that slowed the SW was the cost of adopting it for each specific website. SW never became a priority to many companies, so they avoided channeling resources to overhaul their pages to the SW standards.
But with AI, it could be inexpensive to transform vanilla webpages into SW ones. In fact, one task at which chatbots are particularly good is to follow formats: I have verified this myself. So AI could remove the main obstacle for SW adoption, assuming there is still interest in it.
Closing thoughts
I think the Knowledge Graph and many of the technologies associated with the SW will remain helpful in the end.
The irony is that the Generative AI, which at first seems to be the one that would put the last nail in the coffin of the SW, could end up making a kind of symbiosis with it and make it relevant again. They are cooperating in applications like the new Web search with Generative AI and could continue to cooperate in the future.
So, to answer the question "Will there be a comeback of the SW," the answer is a resounding "yes," but not as TimBL originally intended it, that is, as a whole Web overhaul, but mainly as a support for other systems, which is less glamorous indeed. But equally useful.
Let's face it: the original TimBL vision for the Semantic Web will never, ever become a reality. It was (or seemed to be) a brilliant idea, and personally, I was convinced enough to jump into that train with great conviction. But technology doesn't foster brilliant ideas but convenient ones.
So, let's end with the new Semantic Web vision:
"Long life to the dowdy, dull, modest, real Semantic Web, which will play a supporting role to the bright, fancy, alluring Artificial Intelligence."