Spaces:

IlPakoZ
/

DLRNA-BERTa

Running

App Files Files Community

IlPakoZ commited on Aug 29, 2025

Commit

3912a9f

verified ·

1 Parent(s): a857e83

Upload 18 files

Browse files

Files changed (18) hide show

Dockerfile +35 -0
README.md +161 -11
README_spaces.md +59 -0
app.py +307 -0
chemberta.py +123 -0
config.json +15 -0
configuration_dlmberta.py +9 -0
drug_tokenizer/vocab.json +1 -0
model.safetensors +3 -0
modeling_dlmberta.py +316 -0
requirements.txt +9 -0
scaler.config +2 -0
target_tokenizer/config.json +29 -0
target_tokenizer/merges.txt +0 -0
target_tokenizer/special_tokens_map.json +51 -0
target_tokenizer/tokenizer.json +0 -0
target_tokenizer/tokenizer_config.json +57 -0
target_tokenizer/vocab.json +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,35 @@

+FROM python:3.9-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    wget \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application files
+COPY . .
+# Create necessary directories
+RUN mkdir -p /app/models /app/cache
+# Set environment variables
+ENV TRANSFORMERS_CACHE=/app/cache
+ENV HF_HOME=/app/cache
+ENV GRADIO_SERVER_NAME=0.0.0.0
+ENV GRADIO_SERVER_PORT=7860
+# Expose the port
+EXPOSE 7860
+# Run the application
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,13 +1,163 @@
 ---
-title: DLRNA BERTADemo
-emoji: 👀
-colorFrom: red
-colorTo: blue
-sdk: gradio
-sdk_version: 5.44.1
-app_file: app.py
-pinned: false
-short_description: Demo of DLRNA-BERTA
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Drug-Target Interaction Prediction Model
+## Model Description
+This model predicts drug-target interactions using a novel architecture that combines:
+- Target RNA sequence encoding
+- Drug SMILES molecular representation
+- Cross-attention mechanism for interaction modeling
+- Regression head for binding affinity prediction
+## Architecture
+The model consists of several key components:
+1. **Target Encoder**: Processes RNA sequences of target
+2. **Drug Encoder**: Processes molecular SMILES representations
+3. **Cross-Attention Layer**: Models interactions between drug and target representations
+4. **Regression Head**: Predicts binding affinity scores
+## Usage
+### Using the Gradio Interface
+```python
+import gradio as gr
+from app import demo
+# Launch the interactive interface
+demo.launch()
+```
+### Programmatic Usage
+```python
+from modeling_dlmberta import InteractionModelATTNForRegression
+from configuration_dlmberta import InteractionModelATTNConfig
+# Load model
+config = InteractionModelATTNConfig.from_pretrained("path/to/model")
+model = InteractionModelATTNForRegression.from_pretrained("path/to/model", config=config)
+# Make predictions
+target_sequence = "AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU"
+drug_smiles = "C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2"
+prediction = model.predict_interaction(target_sequence, drug_smiles)
+```
+## Model Inputs
+- **Target Sequence**: RNA sequence of the target (string)
+- **Drug SMILES**: Simplified Molecular Input Line Entry System notation (string)
+## Model Outputs
+- **Binding Affinity**: Predicted binding affinity score (float)
+- **Attention Weights**: Cross-attention weights for interpretability (optional)
+## Training Data
+The model was trained on drug-target interaction datasets with binding affinity labels. Training details include:
+- Target sequences: RNA sequence from various families
+- Drug molecules: Diverse chemical compounds represented as SMILES
+- Labels: Experimental binding affinity measurements
+## Evaluation Metrics
+The model performance is evaluated using:
+- Mean Squared Error (MSE)
+- Root Mean Squared Error (RMSE)
+- Pearson Correlation Coefficient
+- Concordance Index (C-Index)
+## Limitations
+- Model performance depends on the quality and diversity of training data
+- May not generalize well to novel RNA classs or chemical scaffolds not seen during training
+- Computational requirements scale with sequence length
+- SMILES representations may not capture all relevant molecular properties
+## Interpretability Features
+The model includes interpretability features:
+- Cross-attention visualization showing drug-target interaction patterns
+- Token-level attention weights
+- Layer-wise contribution analysis
+## Citation
+If you use this model, please cite:
+```bibtex
+@article{your_paper,
+    title={Drug-Target Interaction Prediction with Cross-Attention},
+    author={Your Name},
+    journal={Your Journal},
+    year={2024}
+}
+```
+## License
+This model is released under the MIT License. See LICENSE file for details.
+## Contact
+For questions or issues, please contact: your.email@example.com
 ---
+## Files in this Repository
+- `modeling_dlmberta.py`: Main model implementation
+- `configuration_dlmberta.py`: Model configuration class
+- `chemberta.py`: Custom tokenizer for chemical SMILES
+- `app.py`: Gradio application interface
+- `requirements.txt`: Python dependencies
+- `Dockerfile`: Container configuration
+- `config.json`: Model configuration file
+## Installation
+1. Clone the repository:
+```bash
+git clone https://huggingface.co/your-username/your-model-name
+cd your-model-name
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Run the application:
+```bash
+python app.py
+```
+## Docker Usage
+Build and run with Docker:
+```bash
+docker build -t drug-target-model .
+docker run -p 7860:7860 drug-target-model
+```
+## Model Performance
+| Metric | Value |
+|--------|-------|
+| RMSE | 0.85 |
+| Pearson R | 0.72 |
+| C-Index | 0.68 |
+*Note: Replace with actual performance metrics from your evaluation*
+## Updates
+- **v1.0**: Initial model release
+- **v1.1**: Added interpretability features
+- **v1.2**: Improved tokenization and preprocessing

README_spaces.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+title: Drug-Target Interaction Predictor
+emoji: 🧬
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# Drug-Target Interaction Predictor
+An interactive application for predicting drug-target interactions using deep learning. This model uses a novel cross-attention architecture to model the interactions between drug molecules (represented as SMILES) and target RNA sequences.
+## Features
+- 🔮 **Prediction Interface**: Input RNA sequences and drug SMILES to get binding affinity predictions
+- ⚙️ **Model Management**: Load and configure different model checkpoints
+- 📊 **Interpretability**: Visualize attention weights to understand model decisions
+- 🧬 **Scientific Accuracy**: Based on state-of-the-art deep learning architectures
+## How to Use
+1. **Load Model**: Go to the "Model Settings" tab and specify the path to your trained model
+2. **Make Predictions**:
+   - Enter a target RNA sequence
+   - Enter a drug SMILES string
+   - Click "Predict Interaction" to get binding affinity score
+3. **Explore Examples**: Try the provided examples to see the model in action
+## Model Architecture
+The model combines:
+- Target protein encoder for processing amino acid sequences
+- Drug encoder for processing molecular SMILES representations
+- Cross-attention mechanism to capture drug-target interactions
+- Regression head for binding affinity prediction
+## Input Format
+- **Target Sequence**: Standard amino acid single-letter codes (e.g., "AUGCUAGCUAGUACGUA...")
+- **Drug SMILES**: Simplified Molecular Input Line Entry System notation (e.g., "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O")
+## Example Usage
+Try these example inputs:
+- **Target**: `AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU`
+- **Drug**: `C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2`
+## Technical Details
+- Built with Transformers and PyTorch
+- Uses Gradio for the interactive interface
+- Supports GPU acceleration when available
+- Includes attention visualization for model interpretability
+For more details, see the [model documentation](https://huggingface.co/IlPakoZ/DLRNA-BERTa9700).

app.py ADDED Viewed

	@@ -0,0 +1,307 @@

+import gradio as gr
+import torch
+import numpy as np
+from transformers import AutoModel, AutoTokenizer, AutoConfig, RobertaModel
+from modeling_dlmberta import InteractionModelATTNForRegression, StdScaler
+from configuration_dlmberta import InteractionModelATTNConfig
+from chemberta import ChembertaTokenizer
+import json
+import os
+from pathlib import Path
+import logging
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class DrugTargetInteractionApp:
+    def __init__(self):
+        self.model = None
+        self.target_tokenizer = None
+        self.drug_tokenizer = None
+        self.scaler = None
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    def load_model(self, model_path="./"):
+        """Load the pre-trained model and tokenizers"""
+        try:
+            # Load configuration
+            config = InteractionModelATTNConfig.from_pretrained(model_path)
+            # Load drug encoder (ChemBERTa)
+            drug_encoder_config = AutoConfig.from_pretrained("DeepChem/ChemBERTa-77M-MTR")
+            drug_encoder_config.pooler = None
+            drug_encoder = RobertaModel(config=drug_encoder_config, add_pooling_layer=False)
+            # Load target encoder
+            target_encoder = AutoModel.from_pretrained("IlPakoZ/RNA-BERTa9700")
+            # Load scaler if exists
+            scaler_path = os.path.join(model_path, "scaler.config")
+            scaler = None
+            if os.path.exists(scaler_path):
+                scaler = StdScaler()
+                scaler.load(model_path)
+            self.model = InteractionModelATTNForRegression.from_pretrained(
+                        model_path,
+                        config=config,
+                        target_encoder=target_encoder,
+                        drug_encoder=drug_encoder,
+                        scaler=scaler
+                    )
+            self.model.to(self.device)
+            self.model.eval()
+            # Load tokenizers
+            self.target_tokenizer = AutoTokenizer.from_pretrained(
+                os.path.join(model_path, "target_tokenizer")
+            )
+            # Load drug tokenizer (ChemBERTa)
+            vocab_file = os.path.join(model_path, "drug_tokenizer", "vocab.json")
+            self.drug_tokenizer = ChembertaTokenizer(vocab_file)
+            logger.info("Model and tokenizers loaded successfully!")
+            return True
+        except Exception as e:
+            logger.error(f"Error loading model: {str(e)}")
+            return False
+    def predict_interaction(self, target_sequence, drug_smiles, max_length=512):
+        """Predict drug-target interaction"""
+        if self.model is None:
+            return "Error: Model not loaded. Please load a model first."
+        try:
+            # Tokenize inputs
+            target_inputs = self.target_tokenizer(
+                target_sequence,
+                padding=True,
+                truncation=True,
+                max_length=max_length,
+                return_tensors="pt"
+            ).to(self.device)
+            drug_inputs = self.drug_tokenizer(
+                drug_smiles,
+                padding=True,
+                truncation=True,
+                max_length=max_length,
+                return_tensors="pt"
+            ).to(self.device)
+            # Make prediction
+            with torch.no_grad():
+                prediction = self.model(target_inputs, drug_inputs)
+                # Unscale if scaler exists
+                if self.model.scaler is not None:
+                    prediction = self.model.unscale(prediction)
+                prediction_value = prediction.cpu().numpy()[0][0]
+            return f"Predicted Binding Affinity: {prediction_value:.4f}"
+        except Exception as e:
+            logger.error(f"Prediction error: {str(e)}")
+            return f"Error during prediction: {str(e)}"
+    def get_attention_visualization(self, target_sequence, drug_smiles, max_length=512):
+        """Get attention weights for visualization"""
+        if self.model is None:
+            return None, "Model not loaded"
+        try:
+            # Enable interpretation mode
+            self.model.INTERPR_ENABLE_MODE()
+            # Tokenize inputs
+            target_inputs = self.target_tokenizer(
+                target_sequence,
+                padding=True,
+                truncation=True,
+                max_length=max_length,
+                return_tensors="pt"
+            ).to(self.device)
+            drug_inputs = self.drug_tokenizer(
+                drug_smiles,
+                padding=True,
+                truncation=True,
+                max_length=max_length,
+                return_tensors="pt"
+            ).to(self.device)
+            # Make prediction to get attention weights
+            with torch.no_grad():
+                _ = self.model(target_inputs, drug_inputs)
+                # Get attention weights
+                attention_weights = self.model.model.crossattention_weights
+                if attention_weights is not None:
+                    # Convert to numpy for visualization
+                    attention_weights = attention_weights.cpu().numpy()
+                    # Get tokens for visualization
+                    target_tokens = self.target_tokenizer.convert_ids_to_tokens(
+                        target_inputs["input_ids"][0], skip_special_tokens=True
+                    )
+                    drug_tokens = self.drug_tokenizer.convert_ids_to_tokens(
+                        drug_inputs["input_ids"][0], skip_special_tokens=True
+                    )
+                    return attention_weights, target_tokens, drug_tokens, "Attention visualization ready"
+                else:
+                    return None, None, None, "No attention weights available"
+        except Exception as e:
+            logger.error(f"Attention visualization error: {str(e)}")
+            return None, None, None, f"Error: {str(e)}"
+# Initialize the app
+app = DrugTargetInteractionApp()
+def predict_wrapper(target_seq, drug_smiles):
+    """Wrapper function for Gradio interface"""
+    if not target_seq.strip() or not drug_smiles.strip():
+        return "Please provide both target sequence and drug SMILES."
+    return app.predict_interaction(target_seq, drug_smiles)
+def load_model_wrapper(model_path):
+    """Wrapper function to load model"""
+    if app.load_model(model_path):
+        return "Model loaded successfully!"
+    else:
+        return "Failed to load model. Check the path and files."
+# Create Gradio interface
+with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()) as demo:
+    gr.HTML("""
+    <div style="text-align: center; margin-bottom: 30px;">
+        <h1 style="color: #2E86AB; font-size: 2.5em; margin-bottom: 10px;">
+            🧬 Drug-Target Interaction Predictor
+        </h1>
+        <p style="font-size: 1.2em; color: #666;">
+            Predict binding affinity between drugs and target RNA sequences using deep learning
+        </p>
+    </div>
+    """)
+    with gr.Tab("🔮 Prediction"):
+        with gr.Row():
+            with gr.Column(scale=1):
+                target_input = gr.Textbox(
+                    label="Target RNA Sequence",
+                    placeholder="Enter RNA sequence (e.g., AUGCUAGCUAGUACGUA...)",
+                    lines=4,
+                    max_lines=6
+                )
+                drug_input = gr.Textbox(
+                    label="Drug SMILES",
+                    placeholder="Enter SMILES notation (e.g., CC(C)CC1=CC=C(C=C1)C(C)C(=O)O)",
+                    lines=2
+                )
+                predict_btn = gr.Button("🚀 Predict Interaction", variant="primary", size="lg")
+            with gr.Column(scale=1):
+                prediction_output = gr.Textbox(
+                    label="Prediction Result",
+                    interactive=False,
+                    lines=3
+                )
+        # Example inputs
+        gr.HTML("<h3 style='margin-top: 20px; color: #2E86AB;'>📚 Example Inputs:</h3>")
+        examples = gr.Examples(
+            examples=[
+                [
+                    "AUGCUAGCUAGUACGUAUAUCUGCACUGC",
+                    "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
+                ],
+                [
+                    "AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU",
+                    "C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2"
+                ]
+            ],
+            inputs=[target_input, drug_input],
+            outputs=prediction_output,
+            fn=predict_wrapper,
+            cache_examples=False
+        )
+        predict_btn.click(
+            fn=predict_wrapper,
+            inputs=[target_input, drug_input],
+            outputs=prediction_output
+        )
+    with gr.Tab("⚙️ Model Settings"):
+        gr.HTML("<h3 style='color: #2E86AB;'>Model Configuration</h3>")
+        model_path_input = gr.Textbox(
+            label="Model Path",
+            value="./",
+            placeholder="Path to model directory"
+        )
+        load_model_btn = gr.Button("📥 Load Model", variant="secondary")
+        model_status = gr.Textbox(
+            label="Status",
+            interactive=False,
+            value="No model loaded"
+        )
+        load_model_btn.click(
+            fn=load_model_wrapper,
+            inputs=model_path_input,
+            outputs=model_status
+        )
+    with gr.Tab("ℹ️ About"):
+        gr.Markdown("""
+        ## About This Application
+        This application uses a deep learning model for predicting drug-target interactions. The model architecture includes:
+        - **Target Encoder**: Processes RNA sequences
+        - **Drug Encoder**: Processes molecular SMILES notation
+        - **Cross-Attention Mechanism**: Captures interactions between drugs and targets
+        - **Regression Head**: Predicts binding affinity scores
+        ### Input Requirements:
+        - **Target Sequence**: RNA sequence of the target
+        - **Drug SMILES**: Simplified Molecular Input Line Entry System notation
+        ### Model Features:
+        - Cross-attention for drug-target interaction modeling
+        - Dropout for regularization
+        - Layer normalization for stable training
+        - Interpretability mode for attention visualization
+        ### Usage Tips:
+        1. Load your trained model using the Model Settings tab
+        2. Enter a RNA sequence and drug SMILES
+        3. Click "Predict Interaction" to get binding affinity prediction
+        For best results, ensure your input sequences are properly formatted and within reasonable length limits.
+        """)
+# Launch the app
+if __name__ == "__main__":
+    # Try to load model on startup
+    if os.path.exists("./config.json"):
+        app.load_model("./")
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,
+        show_error=True
+    )

chemberta.py ADDED Viewed

	@@ -0,0 +1,123 @@

+from tokenizers.models import WordLevel
+from tokenizers import Tokenizer
+from tokenizers.pre_tokenizers import Split
+from tokenizers import Regex
+from tokenizers.processors import TemplateProcessing
+from transformers import BatchEncoding
+import torch
+class ChembertaTokenizer:
+    def __init__(self, vocab_file):
+        self.tokenizer = Tokenizer(
+            WordLevel.from_file(
+                vocab_file,
+                unk_token='[UNK]'
+        ))
+        self.tokenizer.pre_tokenizer = Split(
+            pattern=Regex(r"\[(.*?)\]|Cl|Br|>>|\\|.*?"),
+            behavior='isolated'
+        )
+        # Disable padding
+        self.tokenizer.encode_special_tokens = True
+        self.special_token_ids = {
+            self.tokenizer.token_to_id('[CLS]'),
+            self.tokenizer.token_to_id('[SEP]'),
+            self.tokenizer.token_to_id('[PAD]'),
+            self.tokenizer.token_to_id('[UNK]')
+        }
+        self.tokenizer.post_processor = TemplateProcessing(
+            single='[CLS] $A [SEP]',
+            pair='[CLS] $A [SEP] $B:1 [SEP]:1',
+            special_tokens=[
+                ('[CLS]', self.tokenizer.token_to_id('[CLS]')),
+                ('[SEP]', self.tokenizer.token_to_id('[SEP]'))
+            ]
+        )
+    def encode(self, inputs, padding=None, truncation=False,
+                 max_length=None, return_tensors=None):
+        # Configure padding/truncation
+        if padding:
+            self.tokenizer.enable_padding(pad_id=self.tokenizer.token_to_id('[PAD]'),
+                                          pad_token='[PAD]', length=max_length)
+        else:
+            self.tokenizer.no_padding()
+        if truncation:
+            self.tokenizer.enable_truncation(max_length=max_length)
+        else:
+            self.tokenizer.no_truncation()
+        if return_tensors == 'pt':
+            tensor_type = 'pt'
+        else:
+            tensor_type = None
+        # Handle batch or single input
+        if isinstance(inputs, list):
+            enc = self.tokenizer.encode_batch(inputs)
+            data = {
+                "input_ids": [e.ids for e in enc],
+                "attention_mask": [e.attention_mask for e in enc]
+            }
+            return BatchEncoding(data=data, encoding=enc, tensor_type=tensor_type)
+        else:
+            # Single sequence: wrap into batch of size 1
+            enc = [self.tokenizer.encode(inputs)]
+            data = {
+                "input_ids": [e.ids for e in enc],
+                "attention_mask": [e.attention_mask for e in enc]
+            }
+            return BatchEncoding(data=data, encoding=enc, tensor_type=tensor_type)
+    def __call__(self, inputs, padding=None, truncation=False,
+                 max_length=None, return_tensors=None):
+        return self.encode(inputs, padding=padding, truncation=truncation,
+                           max_length=max_length, return_tensors=return_tensors)
+    def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
+        def _decode_sequence(seq):
+            if skip_special_tokens:
+                seq = [idx for idx in seq if idx not in self.special_token_ids]
+            return [self.tokenizer.id_to_token(idx) for idx in seq]
+        # 1) batch: list of lists or torch tensor
+        if isinstance(ids, torch.Tensor):
+            ids = ids.tolist()
+            if len(ids) == 1:
+                ids = ids[0]
+        if isinstance(ids, (list)) and len(ids) > 0 and isinstance(ids[0], (list)):
+            return [_decode_sequence(seq) for seq in ids]
+        # 2) single sequence: list of ints or torch tensor
+        if isinstance(ids, (list)):
+            return _decode_sequence(ids)
+        # 3) single int
+        if isinstance(ids, int):
+            return self.tokenizer.id_to_token(ids)
+    def decode(self, ids, skip_special_tokens=False):
+        def _decode_sequence(seq):
+            if skip_special_tokens:
+                seq = [idx for idx in seq if idx not in self.special_token_ids]
+            return ''.join(self.tokenizer.id_to_token(idx) for idx in seq)
+        # 1) batch: list of lists or torch tensor
+        if isinstance(ids, torch.Tensor):
+            ids = ids.tolist()
+            if len(ids) == 1:
+                ids = ids[0]
+        if isinstance(ids, (list)) and len(ids) > 0 and isinstance(ids[0], (list)):
+            return [_decode_sequence(seq) for seq in ids]
+        # 2) single sequence: list of ints or torch tensor
+        if isinstance(ids, (list)):
+            return _decode_sequence(ids)
+        # 3) single int
+        if isinstance(ids, int):
+            return self.tokenizer.id_to_token(ids)

config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "architectures": [
+    "InteractionModelATTNForRegression"
+  ],
+  "attention_dropout": 0.425,
+  "auto_map": {
+    "AutoConfig": "configuration_dlmberta.InteractionModelATTNConfig",
+    "AutoModel": "modeling_dlmberta.InteractionModelATTNForRegression"
+  },
+  "hidden_dropout": 0.0816,
+  "model_type": "dlmberta",
+  "num_heads": 1,
+  "torch_dtype": "float32",
+  "transformers_version": "4.41.0"
+}

configuration_dlmberta.py ADDED Viewed

	@@ -0,0 +1,9 @@

+from transformers import PretrainedConfig
+class InteractionModelATTNConfig(PretrainedConfig):
+    model_type = "dlmberta"
+    def __init__(self, attention_dropout = 0.2, hidden_dropout = 0.2, num_heads = 1, **kwargs,):
+        self.num_heads = num_heads
+        self.hidden_dropout = hidden_dropout
+        self.attention_dropout = attention_dropout
+        super().__init__(**kwargs)

drug_tokenizer/vocab.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"[PAD]":0,"[unused1]":1,"[unused2]":2,"[unused3]":3,"[unused4]":4,"[unused5]":5,"[unused6]":6,"[unused7]":7,"[unused8]":8,"[unused9]":9,"[unused10]":10,"[UNK]":11,"[CLS]":12,"[SEP]":13,"[MASK]":14,"c":15,"C":16,"(":17,")":18,"O":19,"1":20,"2":21,"=":22,"N":23,".":24,"n":25,"3":26,"F":27,"Cl":28,">>":29,"~":30,"-":31,"4":32,"[C@H]":33,"S":34,"[C@@H]":35,"[O-]":36,"Br":37,"#":38,"/":39,"[nH]":40,"[N+]":41,"s":42,"5":43,"o":44,"P":45,"[Na+]":46,"[Si]":47,"I":48,"[Na]":49,"[Pd]":50,"[K+]":51,"[K]":52,"[P]":53,"B":54,"[C@]":55,"[C@@]":56,"[Cl-]":57,"6":58,"[OH-]":59,"\\":60,"[N-]":61,"[Li]":62,"[H]":63,"[2H]":64,"[NH4+]":65,"[c-]":66,"[P-]":67,"[Cs+]":68,"[Li+]":69,"[Cs]":70,"[NaH]":71,"[H-]":72,"[O+]":73,"[BH4-]":74,"[Cu]":75,"7":76,"[Mg]":77,"[Fe+2]":78,"[n+]":79,"[Sn]":80,"[BH-]":81,"[Pd+2]":82,"[CH]":83,"[I-]":84,"[Br-]":85,"[C-]":86,"[Zn]":87,"[B-]":88,"[F-]":89,"[Al]":90,"[P+]":91,"[BH3-]":92,"[Fe]":93,"[C]":94,"[AlH4]":95,"[Ni]":96,"[SiH]":97,"8":98,"[Cu+2]":99,"[Mn]":100,"[AlH]":101,"[nH+]":102,"[AlH4-]":103,"[O-2]":104,"[Cr]":105,"[Mg+2]":106,"[NH3+]":107,"[S@]":108,"[Pt]":109,"[Al+3]":110,"[S@@]":111,"[S-]":112,"[Ti]":113,"[Zn+2]":114,"[PH]":115,"[NH2+]":116,"[Ru]":117,"[Ag+]":118,"[S+]":119,"[I+3]":120,"[NH+]":121,"[Ca+2]":122,"[Ag]":123,"9":124,"[Os]":125,"[Se]":126,"[SiH2]":127,"[Ca]":128,"[Ti+4]":129,"[Ac]":130,"[Cu+]":131,"[S]":132,"[Rh]":133,"[Cl+3]":134,"[cH-]":135,"[Zn+]":136,"[O]":137,"[Cl+]":138,"[SH]":139,"[H+]":140,"[Pd+]":141,"[se]":142,"[PH+]":143,"[I]":144,"[Pt+2]":145,"[C+]":146,"[Mg+]":147,"[Hg]":148,"[W]":149,"[SnH]":150,"[SiH3]":151,"[Fe+3]":152,"[NH]":153,"[Mo]":154,"[CH2+]":155,"%10":156,"[CH2-]":157,"[CH2]":158,"[n-]":159,"[Ce+4]":160,"[NH-]":161,"[Co]":162,"[I+]":163,"[PH2]":164,"[Pt+4]":165,"[Ce]":166,"[B]":167,"[Sn+2]":168,"[Ba+2]":169,"%11":170,"[Fe-3]":171,"[18F]":172,"[SH-]":173,"[Pb+2]":174,"[Os-2]":175,"[Zr+4]":176,"[N]":177,"[Ir]":178,"[Bi]":179,"[Ni+2]":180,"[P@]":181,"[Co+2]":182,"[s+]":183,"[As]":184,"[P+3]":185,"[Hg+2]":186,"[Yb+3]":187,"[CH-]":188,"[Zr+2]":189,"[Mn+2]":190,"[CH+]":191,"[In]":192,"[KH]":193,"[Ce+3]":194,"[Zr]":195,"[AlH2-]":196,"[OH2+]":197,"[Ti+3]":198,"[Rh+2]":199,"[Sb]":200,"[S-2]":201,"%12":202,"[P@@]":203,"[Si@H]":204,"[Mn+4]":205,"p":206,"[Ba]":207,"[NH2-]":208,"[Ge]":209,"[Pb+4]":210,"[Cr+3]":211,"[Au]":212,"[LiH]":213,"[Sc+3]":214,"[o+]":215,"[Rh-3]":216,"%13":217,"[Br]":218,"[Sb-]":219,"[S@+]":220,"[I+2]":221,"[Ar]":222,"[V]":223,"[Cu-]":224,"[Al-]":225,"[Te]":226,"[13c]":227,"[13C]":228,"[Cl]":229,"[PH4+]":230,"[SiH4]":231,"[te]":232,"[CH3-]":233,"[S@@+]":234,"[Rh+3]":235,"[SH+]":236,"[Bi+3]":237,"[Br+2]":238,"[La]":239,"[La+3]":240,"[Pt-2]":241,"[N@@]":242,"[PH3+]":243,"[N@]":244,"[Si+4]":245,"[Sr+2]":246,"[Al+]":247,"[Pb]":248,"[SeH]":249,"[Si-]":250,"[V+5]":251,"[Y+3]":252,"[Re]":253,"[Ru+]":254,"[Sm]":255,"*":256,"[3H]":257,"[NH2]":258,"[Ag-]":259,"[13CH3]":260,"[OH+]":261,"[Ru+3]":262,"[OH]":263,"[Gd+3]":264,"[13CH2]":265,"[In+3]":266,"[Si@@]":267,"[Si@]":268,"[Ti+2]":269,"[Sn+]":270,"[Cl+2]":271,"[AlH-]":272,"[Pd-2]":273,"[SnH3]":274,"[B+3]":275,"[Cu-2]":276,"[Nd+3]":277,"[Pb+3]":278,"[13cH]":279,"[Fe-4]":280,"[Ga]":281,"[Sn+4]":282,"[Hg+]":283,"[11CH3]":284,"[Hf]":285,"[Pr]":286,"[Y]":287,"[S+2]":288,"[Cd]":289,"[Cr+6]":290,"[Zr+3]":291,"[Rh+]":292,"[CH3]":293,"[N-3]":294,"[Hf+2]":295,"[Th]":296,"[Sb+3]":297,"%14":298,"[Cr+2]":299,"[Ru+2]":300,"[Hf+4]":301,"[14C]":302,"[Ta]":303,"[Tl+]":304,"[B+]":305,"[Os+4]":306,"[PdH2]":307,"[Pd-]":308,"[Cd+2]":309,"[Co+3]":310,"[S+4]":311,"[Nb+5]":312,"[123I]":313,"[c+]":314,"[Rb+]":315,"[V+2]":316,"[CH3+]":317,"[Ag+2]":318,"[cH+]":319,"[Mn+3]":320,"[Se-]":321,"[As-]":322,"[Eu+3]":323,"[SH2]":324,"[Sm+3]":325,"[IH+]":326,"%15":327,"[OH3+]":328,"[PH3]":329,"[IH2+]":330,"[SH2+]":331,"[Ir+3]":332,"[AlH3]":333,"[Sc]":334,"[Yb]":335,"[15NH2]":336,"[Lu]":337,"[sH+]":338,"[Gd]":339,"[18F-]":340,"[SH3+]":341,"[SnH4]":342,"[TeH]":343,"[Si@@H]":344,"[Ga+3]":345,"[CaH2]":346,"[Tl]":347,"[Ta+5]":348,"[GeH]":349,"[Br+]":350,"[Sr]":351,"[Tl+3]":352,"[Sm+2]":353,"[PH5]":354,"%16":355,"[N@@+]":356,"[Au+3]":357,"[C-4]":358,"[Nd]":359,"[Ti+]":360,"[IH]":361,"[N@+]":362,"[125I]":363,"[Eu]":364,"[Sn+3]":365,"[Nb]":366,"[Er+3]":367,"[123I-]":368,"[14c]":369,"%17":370,"[SnH2]":371,"[YH]":372,"[Sb+5]":373,"[Pr+3]":374,"[Ir+]":375,"[N+3]":376,"[AlH2]":377,"[19F]":378,"%18":379,"[Tb]":380,"[14CH]":381,"[Mo+4]":382,"[Si+]":383,"[BH]":384,"[Be]":385,"[Rb]":386,"[pH]":387,"%19":388,"%20":389,"[Xe]":390,"[Ir-]":391,"[Be+2]":392,"[C+4]":393,"[RuH2]":394,"[15NH]":395,"[U+2]":396,"[Au-]":397,"%21":398,"%22":399,"[Au+]":400,"[15n]":401,"[Al+2]":402,"[Tb+3]":403,"[15N]":404,"[V+3]":405,"[W+6]":406,"[14CH3]":407,"[Cr+4]":408,"[ClH+]":409,"b":410,"[Ti+6]":411,"[Nd+]":412,"[Zr+]":413,"[PH2+]":414,"[Fm]":415,"[N@H+]":416,"[RuH]":417,"[Dy+3]":418,"%23":419,"[Hf+3]":420,"[W+4]":421,"[11C]":422,"[13CH]":423,"[Er]":424,"[124I]":425,"[LaH]":426,"[F]":427,"[siH]":428,"[Ga+]":429,"[Cm]":430,"[GeH3]":431,"[IH-]":432,"[U+6]":433,"[SeH+]":434,"[32P]":435,"[SeH-]":436,"[Pt-]":437,"[Ir+2]":438,"[se+]":439,"[U]":440,"[F+]":441,"[BH2]":442,"[As+]":443,"[Cf]":444,"[ClH2+]":445,"[Ni+]":446,"[TeH3]":447,"[SbH2]":448,"[Ag+3]":449,"%24":450,"[18O]":451,"[PH4]":452,"[Os+2]":453,"[Na-]":454,"[Sb+2]":455,"[V+4]":456,"[Ho+3]":457,"[68Ga]":458,"[PH-]":459,"[Bi+2]":460,"[Ce+2]":461,"[Pd+3]":462,"[99Tc]":463,"[13C@@H]":464,"[Fe+6]":465,"[c]":466,"[GeH2]":467,"[10B]":468,"[Cu+3]":469,"[Mo+2]":470,"[Cr+]":471,"[Pd+4]":472,"[Dy]":473,"[AsH]":474,"[Ba+]":475,"[SeH2]":476,"[In+]":477,"[TeH2]":478,"[BrH+]":479,"[14cH]":480,"[W+]":481,"[13C@H]":482,"[AsH2]":483,"[In+2]":484,"[N+2]":485,"[N@@H+]":486,"[SbH]":487,"[60Co]":488,"[AsH4+]":489,"[AsH3]":490,"[18OH]":491,"[Ru-2]":492,"[Na-2]":493,"[CuH2]":494,"[31P]":495,"[Ti+5]":496,"[35S]":497,"[P@@H]":498,"[ArH]":499,"[Co+]":500,"[Zr-2]":501,"[BH2-]":502,"[131I]":503,"[SH5]":504,"[VH]":505,"[B+2]":506,"[Yb+2]":507,"[14C@H]":508,"[211At]":509,"[NH3+2]":510,"[IrH]":511,"[IrH2]":512,"[Rh-]":513,"[Cr-]":514,"[Sb+]":515,"[Ni+3]":516,"[TaH3]":517,"[Tl+2]":518,"[64Cu]":519,"[Tc]":520,"[Cd+]":521,"[1H]":522,"[15nH]":523,"[AlH2+]":524,"[FH+2]":525,"[BiH3]":526,"[Ru-]":527,"[Mo+6]":528,"[AsH+]":529,"[BaH2]":530,"[BaH]":531,"[Fe+4]":532,"[229Th]":533,"[Th+4]":534,"[As+3]":535,"[NH+3]":536,"[P@H]":537,"[Li-]":538,"[7NaH]":539,"[Bi+]":540,"[PtH+2]":541,"[p-]":542,"[Re+5]":543,"[NiH]":544,"[Ni-]":545,"[Xe+]":546,"[Ca+]":547,"[11c]":548,"[Rh+4]":549,"[AcH]":550,"[HeH]":551,"[Sc+2]":552,"[Mn+]":553,"[UH]":554,"[14CH2]":555,"[SiH4+]":556,"[18OH2]":557,"[Ac-]":558,"[Re+4]":559,"[118Sn]":560,"[153Sm]":561,"[P+2]":562,"[9CH]":563,"[9CH3]":564,"[Y-]":565,"[NiH2]":566,"[Si+2]":567,"[Mn+6]":568,"[ZrH2]":569,"[C-2]":570,"[Bi+5]":571,"[24NaH]":572,"[Fr]":573,"[15CH]":574,"[Se+]":575,"[At]":576,"[P-3]":577,"[124I-]":578,"[CuH2-]":579,"[Nb+4]":580,"[Nb+3]":581,"[MgH]":582,"[Ir+4]":583,"[67Ga+3]":584,"[67Ga]":585,"[13N]":586,"[15OH2]":587,"[2NH]":588,"[Ho]":589,"[Cn]":590}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9cda24e3491cc24f77c27dc0a9a2933a05eab66ae96aa770de5db86d640cffce
+size 241171900

modeling_dlmberta.py ADDED Viewed

	@@ -0,0 +1,316 @@

+import torch
+from transformers import PreTrainedModel, PretrainedConfig
+import torch
+import torch.nn as nn
+from transformers import PretrainedConfig, PreTrainedModel
+from torch.nn.parameter import Parameter
+from torch.nn.init import xavier_uniform_, constant_
+from configuration_dlmberta import InteractionModelATTNConfig
+import math
+class StdScaler():
+    def fit(self, X):
+        self.mean_ = torch.mean(X).item()
+        self.std_ = torch.std(X, correction=0).item()
+    def fit_transform(self, X):
+        self.mean_ = torch.mean(X).item()
+        self.std_ = torch.std(X, correction=0).item()
+        return (X-self.mean_)/self.std_
+    def transform(self, X):
+        return (X-self.mean_)/self.std_
+    def inverse_transform(self, X):
+        return (X*self.std_)+self.mean_
+    def save(self, directory):
+        with open(directory+"/scaler.config", "w") as f:
+            f.write(str(self.mean_)+"\n")
+            f.write(str(self.std_)+"\n")
+    def load(self, directory):
+        with open(directory+"/scaler.config", "r") as f:
+            self.mean_ = float(f.readline())
+            self.std_ = float(f.readline())
+class InteractionModelATTNForRegression(PreTrainedModel):
+    config_class = InteractionModelATTNConfig
+    def __init__(self, config, target_encoder, drug_encoder, scaler=None):
+        super().__init__(config)
+        self.model = InteractionModelATTN(target_encoder,
+                                          drug_encoder,
+                                          scaler,
+                                          config.attention_dropout,
+                                          config.hidden_dropout,
+                                          config.num_heads)
+        self.scaler = scaler
+    def INTERPR_ENABLE_MODE(self):
+        self.model.INTERPR_ENABLE_MODE()
+    def INTERPR_OVERRIDE_ATTN(self, new_weights):
+        self.model.INTERPR_OVERRIDE_ATTN(new_weights)
+    def INTERPR_RESET_OVERRIDE_ATTN(self):
+        self.model.INTERPR_RESET_OVERRIDE_ATTN()
+    def forward(self, x1, x2):
+        return self.model(x1, x2)
+    def unscale(self, x):
+        return self.model.unscale(x)
+class CrossAttention(nn.Module):
+    def __init__(self, embed_dim, num_heads, attention_dropout=0.0, hidden_dropout=0.0, add_bias_kv=False, **factory_kwargs):
+        """
+        Initializes the CrossAttention layer.
+        Args:
+            embed_dim (int): Dimension of the input embeddings.
+            num_heads (int): Number of attention heads.
+            dropout (float): Dropout probability for attention weights.
+        """
+        super().__init__()
+        self.attention_dropout = attention_dropout
+        self.hidden_dropout = hidden_dropout
+        self.embed_dim = embed_dim
+        self.num_heads = num_heads
+        self.head_dim = embed_dim // num_heads
+        self.scaling = self.head_dim ** -0.5
+        if self.head_dim * num_heads != embed_dim:
+            raise ValueError("embed_dim must be divisible by num_heads")
+        # Linear projections for query, key, and value.
+        self.q_proj = nn.Linear(embed_dim, embed_dim)
+        self.k_proj = nn.Linear(embed_dim, embed_dim)
+        self.v_proj = nn.Linear(embed_dim, embed_dim)
+        self.attn_dropout = nn.Dropout(attention_dropout)
+        xavier_uniform_(self.q_proj.weight)
+        xavier_uniform_(self.k_proj.weight)
+        xavier_uniform_(self.v_proj.weight)
+        constant_(self.q_proj.bias, 0.)
+        constant_(self.k_proj.bias, 0.)
+        constant_(self.v_proj.bias, 0.)
+        # Output projection.
+        self.out_proj = nn.Linear(embed_dim, embed_dim)
+        constant_(self.out_proj.bias, 0)
+        self.drop_out = nn.Dropout(hidden_dropout)
+    def forward(self, query, key, value, key_padding_mask=None, attn_mask=None, replace_weights=None):
+        """
+        Forward pass for cross attention.
+        Args:
+            query (Tensor): Query embeddings of shape (batch_size, query_len, embed_dim).
+            key (Tensor): Key embeddings of shape (batch_size, key_len, embed_dim).
+            value (Tensor): Value embeddings of shape (batch_size, key_len, embed_dim).
+            attn_mask (Tensor, optional): Attention mask of shape (batch_size, num_heads, query_len, key_len).
+        Returns:
+            output (Tensor): The attended output of shape (batch_size, query_len, embed_dim).
+            attn_weights (Tensor): The attention weights of shape (batch_size, num_heads, query_len, key_len).
+        """
+        batch_size, query_len, _ = query.size()
+        _, key_len, _ = key.size()
+        Q = self.q_proj(query)
+        K = self.k_proj(key)
+        V = self.v_proj(value)
+        Q = Q.view(batch_size, self.num_heads, query_len, self.head_dim)
+        K = K.view(batch_size, self.num_heads, key_len, self.head_dim)
+        V = V.view(batch_size, self.num_heads, key_len, self.head_dim)
+        # Compute scaled dot-product attention scores
+        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.head_dim)  # (batch_size, num_heads, query_len, key_len)
+        if key_padding_mask is not None:
+            # Convert boolean mask (False -> -inf, True -> 0)
+            key_padding_mask = key_padding_mask.unsqueeze(1).unsqueeze(1)  # (B, 1, 1, key_len) for broadcasting
+            scores = scores.masked_fill(key_padding_mask, float('-inf'))  # Set masked positions to -inf
+        if replace_weights is not None:
+            scores = replace_weights
+        # Compute attention weights using softmax
+        attn_weights = torch.nn.functional.softmax(scores, dim=-1)  # (batch_size, num_heads, query_len, key_len)
+        self.scores = scores
+        if attn_mask is not None:
+            attn_mask = attn_mask.unsqueeze(1)  # Shape: (batch_size, 1, query_len, key_len)
+            attn_weights = attn_weights.masked_fill(attn_mask, 0)  # Set masked positions to 0
+        # Optionally apply dropout to the attention weights if self.dropout is defined
+        attn_weights = self.attn_dropout(attn_weights)
+        # Compute the weighted sum of the values
+        attn_output = torch.matmul(attn_weights, V)  # (batch_size, num_heads, query_len, head_dim)
+        # Recombine heads: transpose and reshape back to (batch_size, query_len, embed_dim)
+        attn_output = attn_output.transpose(1, 2).contiguous().view(batch_size, query_len, self.embed_dim)
+        # Final linear projection and dropout
+        output = self.out_proj(attn_output)
+        output = self.drop_out(output)
+        return output, attn_weights
+class InteractionModelATTN(nn.Module):
+    def __init__(self, target_encoder, drug_encoder, scaler, attention_dropout, hidden_dropout, num_heads=1, kernel_size=1):
+        super().__init__()
+        self.replace_weights = None
+        self.crossattention_weights = None
+        self.presum_layer = None
+        self.INTERPR_MODE = False
+        self.scaler = scaler
+        self.attention_dropout = attention_dropout
+        self.hidden_dropout = hidden_dropout
+        self.target_encoder = target_encoder
+        self.drug_encoder = drug_encoder
+        self.kernel_size = kernel_size
+        self.lin_map_target = nn.Linear(512, 384)
+        self.dropout_map_target = nn.Dropout(hidden_dropout)
+        self.lin_map_drug = nn.Linear(384, 384)
+        self.dropout_map_drug = nn.Dropout(hidden_dropout)
+        self.crossattention = CrossAttention(384, num_heads, attention_dropout, hidden_dropout)
+        self.norm = nn.LayerNorm(384)
+        self.summary1 = nn.Linear(384, 384)
+        self.summary2 = nn.Linear(384, 1)
+        self.dropout_summary = nn.Dropout(hidden_dropout)
+        self.layer_norm = nn.LayerNorm(384)
+        self.gelu = nn.GELU()
+        self.w = Parameter(torch.empty(512, 1))
+        self.b = Parameter(torch.zeros(1))
+        self.pdng = Parameter(torch.tensor(0.0))  # learnable padding value (0-dimensional)
+        xavier_uniform_(self.w)
+    def forward(self, x1, x2):
+        """
+        Forward pass for attention interaction model.
+        Args:
+            x1 (dict): A dictionary containing input tensors for the target encoder.
+                Expected keys:
+                    - 'input_ids' (torch.Tensor): Token IDs for the target input.
+                    - 'attention_mask' (torch.Tensor): Attention mask for the target input.
+            x2 (dict): A dictionary containing input tensors for the drug encoder.
+                Expected keys:
+                    - 'input_ids' (torch.Tensor): Token IDs for the drug input.
+                    - 'attention_mask' (torch.Tensor): Attention mask for the drug input.
+        Returns:
+            torch.Tensor: A tensor representing the predicted binding affinity.
+        """
+        x1["attention_mask"] = x1["attention_mask"].bool()   # Fix dropout model issue: https://github.com/pytorch/pytorch/issues/86120
+        y1 = self.target_encoder(**x1).last_hidden_state     # The target
+        query_mask = x1["attention_mask"].unsqueeze(-1).to(y1.dtype)
+        y1 = y1 * query_mask
+        x2["attention_mask"] = x2["attention_mask"].bool()   # Fix dropout model issue: https://github.com/pytorch/pytorch/issues/86120
+        y2 = self.drug_encoder(**x2).last_hidden_state       # The drug
+        key_mask = x2["attention_mask"].unsqueeze(-1).to(y2.dtype)
+        y2 = y2 * key_mask
+        y1 = self.lin_map_target(y1)
+        y1 = self.gelu(y1)
+        y1 = self.dropout_map_target(y1)
+        y2 = self.lin_map_drug(y2)
+        y2 = self.gelu(y2)
+        y2 = self.dropout_map_drug(y2)
+        key_padding_mask=(x2["attention_mask"] == 0) # S
+        replace_weights = None
+        # If in interpretation mode, allow the replacement of cross-attention weights
+        if self.INTERPR_MODE:
+            if self.replace_weights is not None:
+                replace_weights = self.replace_weights
+        out, _ = self.crossattention(y1, y2, y2, key_padding_mask=key_padding_mask, attn_mask=None, replace_weights=replace_weights)
+        # If in interpretation mode, make cross-attention weights and scores accessible from the outside
+        if self.INTERPR_MODE:
+            self.crossattention_weights = _
+            self.scores = self.crossattention.scores
+        out = self.summary1(out * query_mask)
+        out = self.gelu(out)
+        out = self.dropout_summary(out)
+        out = self.summary2(out).squeeze(-1)
+        # If in interpretation mode, make final summation layer contributions accessible from the outside
+        if self.INTERPR_MODE:
+            self.presum_layer = out
+        weighted = out * self.w.squeeze(1)  # [batch, seq_len]
+        padding_positions = ~x1["attention_mask"]           # True at padding
+        # assign learnable pdng to all padding positions
+        weighted = weighted.masked_fill(padding_positions, self.pdng.item())
+        # sum across sequence and add bias
+        result = weighted.sum(dim=1, keepdim=True) + self.b
+        return result
+    def train(self, mode = True):
+        super().train(mode)
+        self.target_encoder.train(mode)
+        self.drug_encoder.train(mode)
+        self.crossattention.train(mode)
+        return self
+    def eval(self):
+        super().eval()
+        self.target_encoder.eval()
+        self.drug_encoder.eval()
+        self.crossattention.eval()
+        return self
+    def INTERPR_ENABLE_MODE(self):
+        """
+        Enables the interpretability mode for the model.
+        """
+        if self.training:
+            raise RuntimeError("Cannot enable interpretability mode while the model is training.")
+        self.INTERPR_MODE = True
+    def INTERPR_OVERRIDE_ATTN(self, new_weights):
+        self.replace_weights = new_weights
+    def INTERPR_RESET_OVERRIDE_ATTN(self):
+        self.replace_weights = None
+    def unscale(self, x):
+        """
+        Unscales the labels using a scaler. If the scaler is not specified, don't do anything.
+        Parameters:
+            target_value: the target values to be unscaled
+        """
+        with torch.no_grad():
+            if self.scaler is None:
+                return x
+            unscaled = self.scaler.inverse_transform(x)
+        return unscaled

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio>=4.0.0
+torch>=1.9.0
+transformers>=4.21.0
+tokenizers>=0.13.0
+numpy>=1.21.0
+huggingface_hub>=0.10.0
+accelerate>=0.20.0
+datasets>=2.0.0
+safetensors>=0.3.0

scaler.config ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ 5.389976501464844
2	+ 1.3962712287902832

target_tokenizer/config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "architectures": [
+    "RobertaForMaskedLM",
+    "RobertaModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "attn_mult": 5.656854249492381,
+  "bos_token_id": 0,
+  "classifier_dropout": null,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 512,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 12,
+  "output_hidden_states": true,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 9700
+}

target_tokenizer/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

target_tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

target_tokenizer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

target_tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "RobertaTokenizer",
+  "trim_offsets": true,
+  "unk_token": "<unk>"
+}

target_tokenizer/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff