Instructions to use nimuezorro/bilingual_children_speech_classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nimuezorro/bilingual_children_speech_classifier with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nimuezorro/bilingual_children_speech_classifier") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Bilingual Children Speech Classifiers
This repository contains the three trained classifiers for predicting a bilingual child's first language (L1) from English child speech samples.
The three classifiers are:
linear_svm
logistic_regression
random_forest
The classifier expects dense sentence embeddings produced with:
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Files
linear_svm.joblib: trained linear svm classifier.logistic_regression.joblib: trained logistic regression classifier.random_forest.joblib: trained scikit-learn Random Forest classifier.label_encoder.joblib: fitted label encoder used to map numeric model classes back to readable L1 labels.run_metadata.json: metadata for trained classifiers.model_comparison.csv: accuracy and macro-f1 comparison between models.
Intended Use
This model was created for an academic text classification assignment using cleaned CHILDES/CHAT bilingual child speech data and CodeX. It is designed to be used with the companion Hugging Face Space or with a local inference script that first embeds text using the Sentence Transformer model above.
Minimal Usage
Example case is for Random Forest classifer but can be switched out for another.
import joblib
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer(
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
)
classifier = joblib.load("random_forest.joblib")
label_encoder = joblib.load("label_encoder.joblib")
text = "I want to play with the toys and then go outside"
embedding = embedding_model.encode([text], convert_to_numpy=True)
encoded_prediction = classifier.predict(embedding)
prediction = label_encoder.inverse_transform(encoded_prediction)
print(prediction[0])
Limitations
This model was trained on a small academic dataset and should not be interpreted as a general-purpose or diagnostic language-background detector. Predictions are best understood as an exploratory machine-learning result within the scope of the training data and preprocessing pipeline.