Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
title: Drug-Target Interaction Predictor
emoji: ๐งฌ
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit
Drug-target interaction predictor
An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features.
Features
- ๐ฎ Binding affinity prediction: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions
- ๐ Interactive visualizations: Generate cross-attention heatmaps and contribution analysis plots
- ๐งฌ RNA-drug interaction analysis: Understand how different tokens contribute to binding predictions
- โ๏ธ Model management: Load and configure different model checkpoints
- ๐ฏ Interpretability tools: Visualize attention weights and token-level contributions
- ๐ Performance metrics: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA)
How to use
1. Prediction tab
- Load model: The model loads automatically on startup (if available in the current directory)
- Enter inputs:
- Target RNA sequence (nucleotides: A, U, G, C)
- Drug SMILES string (molecular representation)
- Get results: Click "Predict Interaction" to receive binding affinity prediction (pKd value)
2. Visualizations tab
- Generate analysis: Use the same inputs to create detailed visualizations
- Cross-attention heatmap: Shows interaction patterns between drug and target tokens
- Raw pKd contribution: Displays signed contributions from each target token (only when pKd > 0)
- Normalized pKd contribution: Shows normalized contributions for all predictions
3. Model settings tab
- Custom models: Load your own trained models by specifying the model directory path
- Status monitoring: Check model loading status and configuration
Model architecture
The model combines state-of-the-art language models with cross-attention mechanisms:
- Target encoder: RNA-BERTa model for processing RNA sequences
- Drug encoder: ChemBERTa-77M-MTR model [1] for molecular SMILES processing
- Cross-attention: Single-head attention mechanism (384-dimensional embeddings)
- Regression head: Learnable weighted sum with scaling and bias parameters
- Interpretability: Built-in interpretation mode for attention analysis
Performance on ROBIN test datasets
Evaluated on external ROBIN test datasets [2] across different RNA classes:
| RNA Class | Precision | Specificity | Recall | AUROC | F1 Score |
|---|---|---|---|---|---|
| Aptamers | 0.648 | 0.002 | 1.000 | 0.571 | 0.787 |
| Riboswitch | 0.519 | 0.035 | 0.972 | 0.577 | 0.677 |
| Viral RNA | 0.562 | 0.095 | 0.943 | 0.579 | 0.704 |
| miRNA | 0.373 | 0.028 | 0.991 | 0.596 | 0.542 |
Example usage
Try these example inputs to see the model in action:
Example 1:
- Target:
AUGCUAGCUAGUACGUAUAUCUGCACUGC - Drug:
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
Example 2:
- Target:
AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU - Drug:
C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2
Input format requirements
Target sequence:
- RNA sequences using nucleotides A, U, G, C
- Maximum length: 512 tokens
- Automatically truncated/padded as needed
Drug SMILES:
- Standard SMILES notation for molecular structures
- Maximum length: 512 tokens
- Example:
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O(Ibuprofen)
Technical specifications
- Model size: RNA-BERTa + ChemBERTa-77M-MTR backbone
- Attention heads: 1 (single-head cross-attention)
- Embedding dimension: 384 for cross-attention layer
- Maximum sequence length: 512 tokens for both inputs
- Output range: Continuous pKd values (can be negative)
- Scaling: Built-in StdScaler for target value normalization
Visualization features
Cross-attention heatmap
- Displays attention weights between drug and target tokens
- Helps identify which molecular features interact with specific RNA regions
- Color intensity represents attention strength
Contribution analysis
- Unnormalized contributions: Signed values showing positive/negative token impacts
- Normalized contributions: Non-negative values showing relative token importance (only for pKd > 0)
- Token-level breakdown of final prediction components
Limitations & considerations
- RNA class variation: Performance differs across RNA classes (miRNA shows lower precision)
- Novel sequences: May not generalize well to completely unseen RNA families or chemical scaffolds
- Sequence length: Limited to 512 tokens (longer sequences are truncated)
- SMILES limitations: May not capture all 3D molecular properties
- Single attention head: May limit capacity for complex interaction patterns
Scientific applications
This tool can be used for:
- Drug discovery and design
- RNA-targeted therapeutics research
- Molecular interaction analysis
- Binding affinity prediction
- Structure-activity relationship studies
- Lead compound optimization
Technical support
For technical issues or questions:
- Check model loading status in the Model Settings tab
- Ensure input sequences are properly formatted
- Verify SMILES notation validity
- Review example inputs for correct format
Data sources
The model leverages:
- RNA-BERTa: Pre-trained on diverse RNA sequences
- ChemBERTa-77M-MTR: Trained on molecular property prediction tasks [1]
- ROBIN Datasets: External validation across multiple RNA classes [2]
For more detailed technical documentation, model architecture details, and programmatic usage, visit the model repository.
Citations
[1]
@article{ahmad2022chemberta,
title={Chemberta-2: Towards chemical foundation models},
author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath},
journal={arXiv preprint arXiv:2209.01712},
year={2022}
}
[2]
@article{krishnan2024reliable,
title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning},
author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael},
journal={Briefings in Bioinformatics},
volume={25},
number={2},
pages={bbae002},
year={2024},
publisher={Oxford University Press}
}