গড়েইপা লৌশিং (AI)

Contribute to Bishnupriya Manipuri Language Development

Building open-source AI for 500,000+ BPY speakers worldwide

সাম্ভাষা! | Thank you for helping build AI for Bishnupriya Manipuri.

🎯 How You Can Help

1. Training Data

Submit English↔BPY sentence pairs

BPY speakers | 5 min - 5 hrs

2. Code & Models

Training scripts, evaluation, apps

Developers | 1 hr+

3. Testing & Feedback

Report translation errors

Anyone | 5 min

1. Contributing Training Data

This is the most important. More data = better models. We accept data 3 ways:

Option A: Quick Single Sentences [5 minutes]

Submit 1-10 sentence pairs via email: usangraha@gmail.com

Subject: BPY Data Submission

English: The sun is hot.
BPY: বেলীগ তপ্তা ইসে।

English: Fifty books.
BPY: য়াংখেইহান লেরিক।
Rules:
  1. Natural BPY - Use everyday language, not literal word-for-word translation
  2. Correct script - Use Bengali script. Example: য়াংখেইহান not yang-khei-han
  3. Complete sentences - Not fragments. Include punctuation ।
  4. No code-mixing - Avoid Hindi/English words unless commonly used
  5. Check with V8.5.3 - Test at manipuri.com/articles/bpy.php. If our model already gets it 100% right, skip it. We need sentences it gets wrong.
What we need most:
  • Number + noun: Fifty books, Three children, One hundred rupees
  • Family terms: My father works, Her sister is beautiful
  • Daily verbs: I eat rice, She goes to school, We are coming
  • Questions: What is your name?, Where do you live?

Option B: Bulk CSV Upload [1 hour+]

For 50+ pairs, create a CSV and email to usangraha@gmail.com

CSV Format: bpy_training_data.csv

english,bpy_beng,source,notes
My name is John.,মর নাঙহান জন।,contributor_arunita,
The cat is sleeping.,মেকুরগ ঘুমজার।,wikipedia,
Fifty books.,য়াংখেইহান লেরিক।,book_scan_v1,number pattern
Quality checklist:
  • All BPY uses Bengali Unicode, not Romanized
  • No duplicate English sentences
  • No offensive/political/religious content without context
  • You are the author OR text is public domain/CC0
  • Each pair is a full sentence, 5-200 characters

Option C: Scanned Books [Advanced]

Have BPY books? Help us digitize them.

  1. Check copyright - Only public domain or books you own with permission
  2. Scan at 300 DPI - PDF or images
  3. OCR - Upload to Google Drive → Open with Google Docs
  4. Clean - Remove page numbers, fix OCR errors
  5. Submit - Email ZIP to usangraha@gmail.com with book title + author

2. Contributing Code & Models

Base: facebook/nllb-200-distilled-600M | Method: LoRA fine-tuning

Ideas we want: BPY→English model, larger base models, multilingual training, evaluation scripts

Setup:

git clone https://huggingface.co/BishnupriyaManipuri/nllb-bpy-beng-v8-5-3-merged
cd nllb-bpy-beng-v8-5-3-merged
pip install transformers peft datasets accelerate

HF Inference Endpoint:

https://hcurzfqqhq3x21kg.us-east-1.aws.endpoints.huggingface.cloud

3. Testing & Reporting Errors

Found a bad translation? Email usangraha@gmail.com

Subject: BPY Translation Bug

Input: Fifty books
Output: লেরিকহান লেরিকহান ❌
Expected: য়াংখেইহান লেরিক ✅
Version: V8.5.3

📜 Data License & Rights

By contributing, you agree:

  1. Your submissions are CC0 / Public Domain - free for commercial use
  2. You own the rights - Don't submit copyrighted text without permission
  3. No PII - No personal names, addresses, phone numbers in examples
  4. Attribution - We'll list contributors in CONTRIBUTORS.md unless you opt out

We will NOT: Sell your data, use for non-BPY purposes, or share your email without permission