gbyuvd
/

miniChembed-prototype

Sentence Similarity

sentence-transformers

molecular-similarity

cheminformatics

feature-extraction

text-embeddings-inference

Model card Files Files and versions

gbyuvd commited on Oct 27

Commit

6cbfeb0

·

verified ·

1 Parent(s): b463a40

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ library_name: sentence-transformers
 # miniChembed-prototype
-This is a **self-supervised molecular embedding** model trained using the **Barlow Twins** objective on approximately **24K unlabeled SMILES strings**. If validated as effective, it will be scaled to 2.1M molecules. The training data were compiled from public sources including:
 - **ChEMBL34** (Zdrazil et al., 2023)
 - **COCONUTDB** (Sorokina et al., 2021)
@@ -25,6 +25,9 @@ The model maps SMILES strings to a **320-dimensional dense vector space**, optim
 Unlike fixed fingerprints (e.g., ECFP4), this model learns representations directly from **stochastic SMILES augmentations**, encouraging invariance to syntactic variation while potentially maximizing representational diversity across molecules.
 The Barlow Twins objective explicitly minimizes redundancy between embedding dimensions, promoting structured, non-collapsed representations.
 ---
 ## Model Details

 # miniChembed-prototype
+This is an experimental **self-supervised molecular embedding** model trained using the **Barlow Twins** objective on approximately **24K unlabeled SMILES strings**. If validated as effective, it will be scaled to 2.1M molecules. The training data were compiled from public sources including:
 - **ChEMBL34** (Zdrazil et al., 2023)
 - **COCONUTDB** (Sorokina et al., 2021)
 Unlike fixed fingerprints (e.g., ECFP4), this model learns representations directly from **stochastic SMILES augmentations**, encouraging invariance to syntactic variation while potentially maximizing representational diversity across molecules.
 The Barlow Twins objective explicitly minimizes redundancy between embedding dimensions, promoting structured, non-collapsed representations.
+> Note: This is an experimental prototype.
+> Feel free to experiment with and edit the training script as you wish!
+> Correcting my mistake(s), tweaking augmentations, loss weights, optimizer settings, or network architecture could lead to even better representations.
 ---
 ## Model Details