Why OpenAI's API Is More Expensive for Non-English Languages
Beyond words: How byte pair encoding and Unicode encoding factor into pricing disparities- 22284Murphy ≡ DeepGuide
Byte-Pair Encoding For Beginners
An illustrative guide to BPE tokenizer in plain simple language- 23195Murphy ≡ DeepGuide
Structured Generative AI
How to constrain your model to output defined formats- 27815Murphy ≡ DeepGuide
The Art of Tokenization: Breaking Down Text for AI
Demystifying NLP: From Text to Embeddings- 27865Murphy ≡ DeepGuide
Under-trained and Unused tokens in Large Language Models
Existence of under-trained and unused tokens and Identification Techniques using GPT-2 Small as an Example- 28403Murphy ≡ DeepGuide
This Is How LLMs Break Down the Language
The science and art behind tokenization- 21523Murphy ≡ DeepGuide
LettuceDetect: A Hallucination Detection Framework for RAG Applications
How to capitalize on ModernBERT’s extended context window to build a token-level classifier for hallucination detection- 21292Murphy ≡ DeepGuide
We look at an implementation of the HyperLogLog cardinality estimati
Using clustering algorithms such as K-means is one of the most popul
Level up Your Data Game by Mastering These 4 Skills
Learn how to create an object-oriented approach to compare and evalu
When I was a beginner using Kubernetes, my main concern was getting
Tutorial and theory on how to carry out forecasts with moving averag