Spaces:

IlPakoZ
/

DLRNA-BERTa

Sleeping

App Files Files Community

DLRNA-BERTa / README_spaces.md

IlPakoZ

Rename readme_spaces.md to README_spaces.md

ed2f836 verified 4 months ago

preview code

raw

history blame contribute delete

6.59 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: Drug-Target Interaction Predictor
emoji: 🧬
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit

Drug-target interaction predictor

An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features.

Features

🔮 Binding affinity prediction: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions
📊 Interactive visualizations: Generate cross-attention heatmaps and contribution analysis plots
🧬 RNA-drug interaction analysis: Understand how different tokens contribute to binding predictions
⚙️ Model management: Load and configure different model checkpoints
🎯 Interpretability tools: Visualize attention weights and token-level contributions
📈 Performance metrics: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA)

How to use

1. Prediction tab

Load model: The model loads automatically on startup (if available in the current directory)
Enter inputs:
- Target RNA sequence (nucleotides: A, U, G, C)
- Drug SMILES string (molecular representation)
Get results: Click "Predict Interaction" to receive binding affinity prediction (pKd value)

2. Visualizations tab

Generate analysis: Use the same inputs to create detailed visualizations
Cross-attention heatmap: Shows interaction patterns between drug and target tokens
Raw pKd contribution: Displays signed contributions from each target token (only when pKd > 0)
Normalized pKd contribution: Shows normalized contributions for all predictions

3. Model settings tab

Custom models: Load your own trained models by specifying the model directory path
Status monitoring: Check model loading status and configuration

Model architecture

The model combines state-of-the-art language models with cross-attention mechanisms:

Target encoder: RNA-BERTa model for processing RNA sequences
Drug encoder: ChemBERTa-77M-MTR model [1] for molecular SMILES processing
Cross-attention: Single-head attention mechanism (384-dimensional embeddings)
Regression head: Learnable weighted sum with scaling and bias parameters
Interpretability: Built-in interpretation mode for attention analysis

Performance on ROBIN test datasets

Evaluated on external ROBIN test datasets [2] across different RNA classes:

RNA Class	Precision	Specificity	Recall	AUROC	F1 Score
Aptamers	0.648	0.002	1.000	0.571	0.787
Riboswitch	0.519	0.035	0.972	0.577	0.677
Viral RNA	0.562	0.095	0.943	0.579	0.704
miRNA	0.373	0.028	0.991	0.596	0.542

Example usage

Try these example inputs to see the model in action:

Example 1:

Target: AUGCUAGCUAGUACGUAUAUCUGCACUGC
Drug: CC(C)CC1=CC=C(C=C1)C(C)C(=O)O

Example 2:

Target: AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU
Drug: C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2

Input format requirements

Target sequence:
- RNA sequences using nucleotides A, U, G, C
- Maximum length: 512 tokens
- Automatically truncated/padded as needed
Drug SMILES:
- Standard SMILES notation for molecular structures
- Maximum length: 512 tokens
- Example: CC(C)CC1=CC=C(C=C1)C(C)C(=O)O (Ibuprofen)

Technical specifications

Model size: RNA-BERTa + ChemBERTa-77M-MTR backbone
Attention heads: 1 (single-head cross-attention)
Embedding dimension: 384 for cross-attention layer
Maximum sequence length: 512 tokens for both inputs
Output range: Continuous pKd values (can be negative)
Scaling: Built-in StdScaler for target value normalization

Visualization features

Cross-attention heatmap

Displays attention weights between drug and target tokens
Helps identify which molecular features interact with specific RNA regions
Color intensity represents attention strength

Contribution analysis

Unnormalized contributions: Signed values showing positive/negative token impacts
Normalized contributions: Non-negative values showing relative token importance (only for pKd > 0)
Token-level breakdown of final prediction components

Limitations & considerations

RNA class variation: Performance differs across RNA classes (miRNA shows lower precision)
Novel sequences: May not generalize well to completely unseen RNA families or chemical scaffolds
Sequence length: Limited to 512 tokens (longer sequences are truncated)
SMILES limitations: May not capture all 3D molecular properties
Single attention head: May limit capacity for complex interaction patterns

Scientific applications

This tool can be used for:

Drug discovery and design
RNA-targeted therapeutics research
Molecular interaction analysis
Binding affinity prediction
Structure-activity relationship studies
Lead compound optimization

Technical support

For technical issues or questions:

Check model loading status in the Model Settings tab
Ensure input sequences are properly formatted
Verify SMILES notation validity
Review example inputs for correct format

Data sources

The model leverages:

RNA-BERTa: Pre-trained on diverse RNA sequences
ChemBERTa-77M-MTR: Trained on molecular property prediction tasks [1]
ROBIN Datasets: External validation across multiple RNA classes [2]

For more detailed technical documentation, model architecture details, and programmatic usage, visit the model repository.

Citations

[1]

@article{ahmad2022chemberta,
  title={Chemberta-2: Towards chemical foundation models},
  author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath},
  journal={arXiv preprint arXiv:2209.01712},
  year={2022}
}

[2]

@article{krishnan2024reliable,
  title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning},
  author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael},
  journal={Briefings in Bioinformatics},
  volume={25},
  number={2},
  pages={bbae002},
  year={2024},
  publisher={Oxford University Press}
}