About Zomi‑Syl¶
Zomi‑Syl is a modular, dialect‑aware syllabification engine designed for the Zomi language and its closely related varieties.
It provides a unified, linguistically grounded framework for syllable segmentation across dialects, orthographies, and backend architectures.
Zomi‑Syl is built for:
- linguists
- NLP researchers
- educators
- community contributors
- developers integrating Zomi processing into applications
It is open‑source, reproducible, and designed for long‑term maintainability.
🎯 Core Goals¶
Zomi‑Syl was created to solve several long‑standing challenges in Zomi language technology:
- inconsistent syllabification across dialects
- lack of standardized tooling
- absence of reproducible, testable backends
- difficulty integrating rule‑based and ML‑based approaches
- need for a single, authoritative pipeline for research and production
The project aims to provide:
- dialect‑aware syllabification
- multiple interchangeable backends
- transparent evaluation and benchmarking
- clean API + CLI
- a unified metadata schema
- a single source of truth for profiles and rules
🧩 Architecture Overview¶
Zomi‑Syl is built on a modular pipeline:
- Profiles — dialect‑specific phonotactic inventories
- Backends — rule‑based, CRF, transformer, and future models
- Registry — unified backend and profile discovery
- Evaluation — benchmarking, metrics, reports
- CLI — user‑friendly command‑line interface
- Python API — programmatic access
This design ensures:
- reproducibility
- extensibility
- dialect flexibility
- backend neutrality
- clean separation of concerns
🌏 Dialect‑Aware by Design¶
Zomi‑Syl supports multiple Zomi varieties, including:
| Language / Group | ISO‑639‑3 Link | Glottolog Link |
|---|---|---|
| Gangte | gnb | gang1266 |
| Kom Rem | kmm | komr1235 |
| Mate | ||
| Paite | pck | pait1246 |
| Simte | smt | simt1235 |
| Siyin | csy | siyi1235 |
| Tedim | ctd | tedi1235 |
| Thado / Thadou | tcz | thad1235 |
| Thangkhal | ||
| Vaiphei | vap | vaip1236 |
| Zo / Zou | zom | zouu1234 |
| India Zomi | ||
| Myanmar Zomi | ||
| Zolai Standard |
Each dialect has its own:
- onset inventory
- nucleus inventory
- coda inventory
- tone behavior (if applicable)
- orthographic conventions
- rule set
See Dialect Profiles for details.
🛠 Backends¶
Zomi‑Syl supports multiple interchangeable backends:
- Rule‑based
- CRF (Conditional Random Field)
- Transformer‑based (experimental)
- Reverse rule‑based (diagnostic)
Each backend is:
- versioned
- benchmarked
- validated
- documented
See Backends for implementation details.
📊 Evaluation & Benchmarking¶
Zomi‑Syl includes a full evaluation suite:
- accuracy
- boundary F1
- confusion matrices
- confidence scoring
- regression testing
- dataset validation
See Benchmarking Guide.
📦 Installation & Usage¶
- Install: Installation
- Try the CLI: CLI Reference
- Try the API: Quick Start
🤝 Community & Contributions¶
Zomi‑Syl is a community‑driven project.
Contributions are welcome from:
- linguists
- developers
- educators
- native speakers
- researchers
See:
- Linguistic Contributors Guide
- Developer Guide
- Infra Maintainers Guide
📜 License¶
Zomi‑Syl is released under the MIT License, allowing:
- academic use
- commercial use
- modification
- redistribution
Next Steps¶
- Explore dialects: Dialect Profiles
- Learn the CLI: CLI Reference
- Add a backend: Adding New Backends
- Benchmark models: Benchmarking Guide