Why O is NOT needed for Zomi syllabification¶

1. What “O” means in BIO tagging¶

In standard BIO tagging:

The O tag is only meaningful when some tokens are not part of any unit the subject of interest (e.g., non-entity words in NER).

In Zomi syllabification:

Therefore:

There is no valid position where a character should be tagged as O.
Introducing O would create linguistically impossible patterns (e.g., a character “outside” any syllable).

For Zomi syllable segmentation, the correct tagset is:

Example: itna → ["i","t","n","a"] with tags:

No O is needed or meaningful here.

Hyphens in Zomi orthography (e.g., ki-itna) are:

So the CRF sees only real characters, not hyphens.
There is still no position where an O tag would be appropriate.

The O tag is essential in tasks where some tokens are outside labeled units, such as:

In these tasks, many tokens are not part of any labeled span, so O is required.

In Zomi syllabification, this situation never arises.

BIO with B and I is the linguistically and technically correct choice for Zomi syllable segmentation.