Intermediate Artificial Intelligence & Machine Learning
Q77 / 100

What is tokenization in the context of NLP and language models?

Correct! Well done.

Incorrect.

The correct answer is B) The process of splitting raw text into smaller units (words, subwords, or characters) that a model can map to numerical IDs and process

B

Correct Answer

The process of splitting raw text into smaller units (words, subwords, or characters) that a model can map to numerical IDs and process

Explanation

Tokenization converts text into discrete units — often subword pieces via algorithms like Byte-Pair Encoding (BPE) or WordPiece — which are then mapped to integer IDs and embedded. Subword tokenization balances vocabulary size with the ability to represent rare or unseen words.

Progress
77/100