How Do We Know if a Text Is AI-generated?
In the fascinating and rapidly advancing realm of Artificial Intelligence, one of the most exciting advances has been the development of AI text generation. AI models, like GPT-3, Bloom, BERT, AlexaTM, and other large language models, can produce remarkably human-like text. This is both exciting and concerning at the same time. Such technological advances allow us to be creative in ways we didn't before. Still, they also open the door to deception. And the better these models get, the more challenging it will be to distinguish between a human-written text and an AI-generated text.
Since the release of ChatGPT, people all over the globe have been testing the limits of such AI models and using them to both gain knowledge, but also, in the case of some students, to solve homework and exams, which challenges the ethical implications of such technology. Especially as these models have become sophisticated enough to mimic human writing styles and maintain context over multiple passages, they still need to be fixed, even if their errors are minor.
That raises an important question, a question I get asked quite often by my friends and family members (I got asked that question many many times since ChatGPT was released…),
How can we know if a text is human-written or AI-generated?
This question is not new to the research world; detecting AI-generated text, we call this "deep fake text detection." Today, there are different tools that you can use to detect if a text is human-written or AI-generated, such as GPT-2 by OpenAI. But how do such tools work?
Different approaches are currently used to detect AI-generated text; new techniques are being researched and implemented to detect such text as the models used to generate these texts get more advanced.
This article will explore 5 different statistical approaches that can be used to detect AI-generated text.
Let's get right to it…
1. N-gram Analysis:
An N-gram is a sequence of N words or tokens from a given text sample. The "N" in N-gram is how many words are in the N-gram. For example:
- New York (2-gram).
- The Three Musketeers (3-gram).
- The group met regularly (4-gram).
Analyzing the frequency of different N-grams in a text makes it possible to determine patterns. For example, among the three N-gram examples we just went through, the first is the most common, and the third is the least common. By tracking the different N-grams, we can decide that they are more or less common in AI-generated text than in human-written text. For instance, an AI might use specific phrases or word combinations more frequently than a human writer. We can find the relation between the frequency of N-grams used by AI vs. humans by training our model on data generated by humans and AI.
2. Perplexity:
If you look up the word perplexed in the English dictionary, it will be defined as surprised or shocked, but, in the context of AI and NLP, in particular, perplexity measures how confidently a language model predicts a text. Estimating the perplexity of a model is done by quantifying how long a model needs to respond to a new text, or in other words, how "surprised" the model is by the new text. For example, an AI-generated text might lower the perplexity of a model; the better the model predicts the text. Perplexity is fast to calculate, which gives it an advantage over other approaches.
3. Burstiness:
In NLP, Slava Katz defines burstiness as the phenomenon where certain words appear in "bursts" within a document or a set of documents. The idea is that when a word is used once in a document, it's likely to be used again in the same document. AI-generated texts exhibit different patterns of burstiness than that written by a human, as they don't have the required cognitive processes to choose other synonyms.
4. Stylometry:
Stylometry is the study of linguistic style, and it can be used to identify authors or, in this case, the source of a text (human vs. AI). Everyone uses language. Differently some prefer short sentences, and some prefer long, connected ones. People use semi-colons and em0dashes (And other unique punctuations) differently from one person to another. Moreover, some people use the passive voice more than the active one or use more complex vocabulary. An AI-generated text might exhibit different stylistic features, even Writing about the same topic more than once. And since an AI doesn't have a style, these different styles can be used to detect if an AI writes a text.
5. Consistency and Coherence Analysis:
Following up on Stylometry, since AI models don't have their own style, the text they generate sometimes needs more consistency and long-term coherence. For example, AI might contradict itself or change topics and style abruptly in the middle of the text, leading to a more difficult-to-follow flow of ideas.
We can use Python to implement a very simplistic version to understand how each approach works.
import nltk
from collections import Counter
from sklearn.feature_extraction.text import CountVectorizer
from textblob import TextBlob
import numpy as np
text = "This is some sample text to detect. This is a text written by a human? or is it?!"
# 1. N-gram Analysis
def ngram_analysis(text, n=2):
n_grams = nltk.ngrams(text.split(), n)
freq_dist = nltk.FreqDist(n_grams)
print(freq_dist)
ngram_analysis(text)
# 2. Perplexity
# Perplexity calculation typically requires a language model
# This is just an example of how perplexity might be calculated given a probability distribution
def perplexity(text, model=None):
# This is a placeholder. In practice, we would use the model to get the probability of each word
prob_dist = nltk.FreqDist(text.split())
entropy = -1 * sum([p * np.log(p) for p in prob_dist.values()])
return np.exp(entropy)
print(perplexity(text))
# 3. Burstiness
def burstiness(text):
word_counts = Counter(text.split())
burstiness = len(word_counts) / np.std(list(word_counts.values()))
return burstiness
print(burstiness(text))
# 4. Stylometry
def stylometry(text):
blob = TextBlob(text)
avg_sentence_length = sum(len(sentence.words) for sentence in blob.sentences) / len(blob.sentences)
passive_voice = text.lower().count('was') + text.lower().count('were')
vocabulary_richness = len(set(text.split())) / len(text.split())
return avg_sentence_length, passive_voice, vocabulary_richness
print(stylometry(text))
# 5. Consistency and Coherence Analysis
# Again, this is a simple way to explain the idea of calculating the consistency of a text. In reality, more complex algorithms are used.
def consistency(text):
sentences = text.split(".")
topics = [sentence.split()[0] for sentence in sentences if sentence]
topic_changes = len(set(topics))
return topic_changes
print(consistency(text))
If you want to explore how to implement these approaches in-depth, check out these tutorials:
Final Thoughts
As we advance into AI, we will need more advanced and complex tools to detect AI-generated text to avoid misinformation and deception. Although this is a very active area of research today, researchers have developed tools to detect AI-written text, an example of such work is done by Edward Tian from Princeton University. Tian developed an experimental tool called GPTZero that uses "perplexity" and "burstiness" to estimate the likelihood that an AI-generated a piece of content. Another example is Noah Smith, a professor and NLP researcher at the University of Washington whose research focuses on the unique quality of human-written text intentionality. AI often generates text that needs to be more intentional and consistent, which may change as these language models improve. None of the approaches explored in this article is bulletproof; a combination of different techniques and an extensive training set is often used to build real-life AI-generated text classifiers.
Technology is always an exciting addition to our lives, capable of improving our quality of life. However, it can also present us with new challenges and obstacles. However, these obstacles are never a reason to abandon using new technologies but rather an opportunity to develop protocols and rules that allow us to take advantage of such great technology ethically.