Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ library_name: sentence-transformers
|
|
| 14 |
|
| 15 |
# miniChembed-prototype
|
| 16 |
|
| 17 |
-
This is
|
| 18 |
|
| 19 |
- **ChEMBL34** (Zdrazil et al., 2023)
|
| 20 |
- **COCONUTDB** (Sorokina et al., 2021)
|
|
@@ -25,6 +25,9 @@ The model maps SMILES strings to a **320-dimensional dense vector space**, optim
|
|
| 25 |
Unlike fixed fingerprints (e.g., ECFP4), this model learns representations directly from **stochastic SMILES augmentations**, encouraging invariance to syntactic variation while potentially maximizing representational diversity across molecules.
|
| 26 |
The Barlow Twins objective explicitly minimizes redundancy between embedding dimensions, promoting structured, non-collapsed representations.
|
| 27 |
|
|
|
|
|
|
|
|
|
|
| 28 |
---
|
| 29 |
|
| 30 |
## Model Details
|
|
|
|
| 14 |
|
| 15 |
# miniChembed-prototype
|
| 16 |
|
| 17 |
+
This is an experimental **self-supervised molecular embedding** model trained using the **Barlow Twins** objective on approximately **24K unlabeled SMILES strings**. If validated as effective, it will be scaled to 2.1M molecules. The training data were compiled from public sources including:
|
| 18 |
|
| 19 |
- **ChEMBL34** (Zdrazil et al., 2023)
|
| 20 |
- **COCONUTDB** (Sorokina et al., 2021)
|
|
|
|
| 25 |
Unlike fixed fingerprints (e.g., ECFP4), this model learns representations directly from **stochastic SMILES augmentations**, encouraging invariance to syntactic variation while potentially maximizing representational diversity across molecules.
|
| 26 |
The Barlow Twins objective explicitly minimizes redundancy between embedding dimensions, promoting structured, non-collapsed representations.
|
| 27 |
|
| 28 |
+
> Note: This is an experimental prototype.
|
| 29 |
+
> Feel free to experiment with and edit the training script as you wish!
|
| 30 |
+
> Correcting my mistake(s), tweaking augmentations, loss weights, optimizer settings, or network architecture could lead to even better representations.
|
| 31 |
---
|
| 32 |
|
| 33 |
## Model Details
|