Intermediate
Artificial Intelligence & Machine Learning
Q77 / 100
What is tokenization in the context of NLP and language models?
Correct! Well done.
Incorrect.
The correct answer is B) The process of splitting raw text into smaller units (words, subwords, or characters) that a model can map to numerical IDs and process
B
Correct Answer
The process of splitting raw text into smaller units (words, subwords, or characters) that a model can map to numerical IDs and process
Explanation
Tokenization converts text into discrete units — often subword pieces via algorithms like Byte-Pair Encoding (BPE) or WordPiece — which are then mapped to integer IDs and embedded. Subword tokenization balances vocabulary size with the ability to represent rare or unseen words.
Progress
77/100