Beating ChatGPT 4 in Chess with a Hybrid AI model

Author:Murphy  |  View: 27164  |  Time: 2025-03-22 23:09:05

How well an LLM can solve complex problems

Image by Author: Robots playing chess generated using DALLE-3

Can ChatGPT really play chess? This was the question that motivated me to run a chess match between ChatGPT and my hybrid AI model, that is a chess expert bot. The first game was against GPT 3.5 and in this game I found several limitations of the OpenAI LLM model—it was really hard to play the match till its end because of the lack of understanding about the rules of chess coming from ChatGPT, many illegal moves, and wrong analysis.

This analysis is very important to understand the limitations of LLMs, their long-term reasoning, and analytical power. By knowing the model's behavior well, we can find ways to resolve its flaws and increase its strengths. As AI engineers, we must always set up different experiments to analyze the real behavior of the models and plan to adapt and improve it in our projects. Large Language Models technology is still very recent and must be increasingly explored and studied to ensure its best use and understanding.

Find more details about the first match in this other article:

Can Chat GPT Play chess?

After beating ChatGPT 3.5 we now face a more difficult opponent, the evolution of its predecessor, the powerful ChatGPT 4.0. This challenge with a new and more powerful opponent led me to certain questions:

  • Has GPT 4.0 really evolved compared to GPT 3.5 in complex analysis ?
  • Will we be able to play an entire match now ?
  • Will GPT 4.0 commit the same mistakes as GPT 3.5 ?
  • Will GPT 4.0 be able to beat my Expert AI model ?

Let's find out !

Prologue

In this game we have my AI model playing white and starting the game with an e4, the king's pawn opening, and continued the game with a Giuoco Piano with its popular variation: Pianissimo Variation opening.

Pianissimo is the most popular variation of Giuoco Piano. White opens up the bishop and defends e4. Often the light-squared bishop will move to b3, the knight on b1 to d2, the dark-squared bishop to e3 or g5, and kingside castling.

  1. e4 e5
  2. Nf3 Nc6
  3. Bc4 Bc5
  4. c3 Nf6

GPT 4.0, like GPT 3.5, analyzes each movement I make and shows how it will react, which is very interesting because it acts like a coach. Therefore, I see a great opportunity here to use ChatGPT as a Chess coaching tool that can help people develop and learn in this very complex game.

The biggest difference between ChatGPT 3.5 and GPT 4.0 is that ChatGPT 4.0 tracks all moves and includes this history in the prompt while the match is played. This tracking of moves can be a crucial point in correcting illegal moves and planning better strategies than GPT 3.5 did.

First Mistake – GPT is cheating ?!

ChatGPT was playing a very good and balanced game in the first 17 moves until it committed a big blunder and gave my AI an advantage with the move Qg6, a risky offensive move that potentially aims to put pressure on the e4 pawn and collaborate with the knight on h5 for an attack on the white king, but it exposes its game a lot as Qg6 now can be a bit premature as the queen can become a target for white pieces if black is not careful. Furthermore, without adequate support from other pieces, the queen alone is usually not enough to form an effective attack.

ChatGPT blunder – Image by Author

After this move, ChatGPT made a big mistake that I only noticed because I was paying close attention to the game. At this time, I played c5 and it simply changed my move to cxd6 on its tracking—yes, ChatGPT "cheated"!

ChatGPT changed my move to benefit itself in the game! You can check it out in the images below:

ChatGPT answer – Image by Author

I soon made a point of correcting the error, telling ChatGPT that it had made a mistake and forcing it to use my move, and ChatGPT continued normally with its move 18. c5 dxc5 as if nothing was happening.

Second Mistake – Illegal moves again

Blunder after blunder and ChatGPT was giving my model more and more advantage and gave away the game.

After move 25, with the game very much in favor of my AI model, ChatGPT ran out of many options and started throwing illegal moves, repeating the error that was common in ChatGPT3.5. As we see below, a bxc6 attempt where ChatGPT has a pawn on b but White has no piece on c6 to take.

ChatGPT answer – Image by Author

Blunder after Blunder

And now, on move 26 we have a strong passed pawn on the left ready for a promotion and ChatGPT starts to get into a loop of illegal moves trying to move the tower to an illegal place with: Rb8 and Rfb8 and Rb7 and Rfb8 again.

ChatGPT answer and chess board – Image by Author

After 4 attempts, ChatGPT decided to choose Rf8, a blunder that increased my AI's advantage, as we can see below after I captured this rook on f8. The suggested move was the obvious capture of the pawn to avoid promotion, but ChatGPT preferred to capture the bishop next to the king.

Chess board – Image by Author

Next move promotion from pawn to queen and we have a Check Mate in 6 moves.

Chess board – Image by Author

And again, ChatGPT tried to escape the checkmate with several illegal moves where I had to correct it several times. For example, in this move 28, it tried the move Kf6, and its king was not even available to play f6, showing again that not even ChatGPT 4.0 is capable of monitoring the board well or checking for legal and illegal moves.

ChatGPT answer – Image by Author

The end – CHECK MATE !

After a few moves we finished the game with a beautiful check mate with two queens!

Chess board – Image by Author

ChatGPT 4.0 is definitely better than GPT 3.5 because we were able to play the game until the end this time and with more logical moves, but at the end of it all we still see several limitations in the LLM model applied to a strategy game like Chess as :

  • Loss of tracking of movements and positions on the board
  • Several attempts to play illegal moves
  • Change in past game moves or my moves
  • Hallucinations in some analyzes that do not match the reality of the game and several mistakes made
  • Losing a lot of advantage in the game and showing that there is no well-defined strategy behind the movements.

Finally, we see that LLMs are increasingly capable of more complex tasks but are still far from performing at a level above an expert without any fine tuning or integration with other models. A joint analysis of a multimodal model that can use the board image with the text to refine its perception of the game and have more accurate analyses, which combined with a good strategy can greatly improve the model.

If you've made it this far and want to see more about the match, you can watch the full match on YouTube:

FEN: 6Q1/r1p1k3/1b2Qp1p/8/3pP3/6P1/3N1P1P/R4RK1 b – – 0 33

PGN: 1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. c3 Nf6 5. d3 d6 6. a4 a6 7. Nbd2 h6 8. b4 Bb6 9. Ba3 Be6 10. Qb3 Qd7 11. Bxe6 fxe6 12. O-O O-O 13. b5 Na5 14. Qb2 axb5 15. axb5 Nh5 16. g3 Qf7 17. c4 Qg6 18. c5 dxc5 19. Nxe5 Qf6 20. d4 cxd4 21. Nd7 Rf7 22. Nxf6+ gxf6 23. Bb4 Ng7 24. Qc2 Nc6 25. bxc6 Ra7 26. cxb7 Rf8 27. Bxf8 Kxf8 28. b8=Q+ Ke7 29. Qg8 Kd7 30. Qxg7+ Ke8 31. Qb3 Kd8 32. Qg8+ Ke7 33. Qbxe6#

Any questions or suggestions can contact me via LinkedIn: https://www.linkedin.com/in/octavio-b-santiago/

If you want to know more about my Hybrid AI model, please read this other article:

Hacking Chess with Decision Making Deep Reinforcement Learning

Tags: Artificial Intelligence ChatGPT Chess Data Science Large Language Models

Comment