Tokenization is a foundational step in natural language processing (NLP) and machine learning.
Large Language Models are big statistical calculators that work with numbers, not words. Tokenisation converts the words into numbers, with each number representing a position in a dictionary of all the possible words.
Read more
here.