DLRNA-BERTa / README_spaces.md
IlPakoZ's picture
Rename readme_spaces.md to README_spaces.md
ed2f836 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: Drug-Target Interaction Predictor
emoji: ๐Ÿงฌ
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit

Drug-target interaction predictor

An interactive deep learning application for predicting drug-target interactions using a novel cross-attention architecture. This model processes RNA sequences and drug SMILES representations to predict binding affinity scores (pKd values) with interpretability features.

Features

  • ๐Ÿ”ฎ Binding affinity prediction: Input RNA sequences and drug SMILES to get quantitative binding affinity predictions
  • ๐Ÿ“Š Interactive visualizations: Generate cross-attention heatmaps and contribution analysis plots
  • ๐Ÿงฌ RNA-drug interaction analysis: Understand how different tokens contribute to binding predictions
  • โš™๏ธ Model management: Load and configure different model checkpoints
  • ๐ŸŽฏ Interpretability tools: Visualize attention weights and token-level contributions
  • ๐Ÿ“ˆ Performance metrics: Evaluated on multiple RNA classes (Aptamers, Riboswitches, Viral RNA, miRNA)

How to use

1. Prediction tab

  • Load model: The model loads automatically on startup (if available in the current directory)
  • Enter inputs:
    • Target RNA sequence (nucleotides: A, U, G, C)
    • Drug SMILES string (molecular representation)
  • Get results: Click "Predict Interaction" to receive binding affinity prediction (pKd value)

2. Visualizations tab

  • Generate analysis: Use the same inputs to create detailed visualizations
  • Cross-attention heatmap: Shows interaction patterns between drug and target tokens
  • Raw pKd contribution: Displays signed contributions from each target token (only when pKd > 0)
  • Normalized pKd contribution: Shows normalized contributions for all predictions

3. Model settings tab

  • Custom models: Load your own trained models by specifying the model directory path
  • Status monitoring: Check model loading status and configuration

Model architecture

The model combines state-of-the-art language models with cross-attention mechanisms:

  • Target encoder: RNA-BERTa model for processing RNA sequences
  • Drug encoder: ChemBERTa-77M-MTR model [1] for molecular SMILES processing
  • Cross-attention: Single-head attention mechanism (384-dimensional embeddings)
  • Regression head: Learnable weighted sum with scaling and bias parameters
  • Interpretability: Built-in interpretation mode for attention analysis

Performance on ROBIN test datasets

Evaluated on external ROBIN test datasets [2] across different RNA classes:

RNA Class Precision Specificity Recall AUROC F1 Score
Aptamers 0.648 0.002 1.000 0.571 0.787
Riboswitch 0.519 0.035 0.972 0.577 0.677
Viral RNA 0.562 0.095 0.943 0.579 0.704
miRNA 0.373 0.028 0.991 0.596 0.542

Example usage

Try these example inputs to see the model in action:

Example 1:

  • Target: AUGCUAGCUAGUACGUAUAUCUGCACUGC
  • Drug: CC(C)CC1=CC=C(C=C1)C(C)C(=O)O

Example 2:

  • Target: AUGCGAUCGACGUACGUUAGCCGUAGCGUAGCUAGUGUAGCUAGUAGCU
  • Drug: C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2

Input format requirements

  • Target sequence:

    • RNA sequences using nucleotides A, U, G, C
    • Maximum length: 512 tokens
    • Automatically truncated/padded as needed
  • Drug SMILES:

    • Standard SMILES notation for molecular structures
    • Maximum length: 512 tokens
    • Example: CC(C)CC1=CC=C(C=C1)C(C)C(=O)O (Ibuprofen)

Technical specifications

  • Model size: RNA-BERTa + ChemBERTa-77M-MTR backbone
  • Attention heads: 1 (single-head cross-attention)
  • Embedding dimension: 384 for cross-attention layer
  • Maximum sequence length: 512 tokens for both inputs
  • Output range: Continuous pKd values (can be negative)
  • Scaling: Built-in StdScaler for target value normalization

Visualization features

Cross-attention heatmap

  • Displays attention weights between drug and target tokens
  • Helps identify which molecular features interact with specific RNA regions
  • Color intensity represents attention strength

Contribution analysis

  • Unnormalized contributions: Signed values showing positive/negative token impacts
  • Normalized contributions: Non-negative values showing relative token importance (only for pKd > 0)
  • Token-level breakdown of final prediction components

Limitations & considerations

  • RNA class variation: Performance differs across RNA classes (miRNA shows lower precision)
  • Novel sequences: May not generalize well to completely unseen RNA families or chemical scaffolds
  • Sequence length: Limited to 512 tokens (longer sequences are truncated)
  • SMILES limitations: May not capture all 3D molecular properties
  • Single attention head: May limit capacity for complex interaction patterns

Scientific applications

This tool can be used for:

  • Drug discovery and design
  • RNA-targeted therapeutics research
  • Molecular interaction analysis
  • Binding affinity prediction
  • Structure-activity relationship studies
  • Lead compound optimization

Technical support

For technical issues or questions:

  • Check model loading status in the Model Settings tab
  • Ensure input sequences are properly formatted
  • Verify SMILES notation validity
  • Review example inputs for correct format

Data sources

The model leverages:

  • RNA-BERTa: Pre-trained on diverse RNA sequences
  • ChemBERTa-77M-MTR: Trained on molecular property prediction tasks [1]
  • ROBIN Datasets: External validation across multiple RNA classes [2]

For more detailed technical documentation, model architecture details, and programmatic usage, visit the model repository.

Citations

[1]

@article{ahmad2022chemberta,
  title={Chemberta-2: Towards chemical foundation models},
  author={Ahmad, Walid and Simon, Elana and Chithrananda, Seyone and Grand, Gabriel and Ramsundar, Bharath},
  journal={arXiv preprint arXiv:2209.01712},
  year={2022}
}

[2]

@article{krishnan2024reliable,
  title={Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning},
  author={Krishnan, Sowmya R and Roy, Arijit and Gromiha, M Michael},
  journal={Briefings in Bioinformatics},
  volume={25},
  number={2},
  pages={bbae002},
  year={2024},
  publisher={Oxford University Press}
}