| | name: CodeFormula_jpqd |
| | description: CodeFormula vision-language model for code and formula recognition, optimized with JPQD quantization |
| | framework: ONNX |
| | task: image-to-text |
| | domain: multimodal |
| | subdomain: vision-language |
| |
|
| | model_info: |
| | architecture: Vision-Language Transformer |
| | paper: "Docling Technical Report" |
| | paper_url: "https://arxiv.org/abs/2408.09869" |
| | original_source: DS4SD CodeFormula |
| | original_repo: "https://huggingface.co/ds4sd/CodeFormula" |
| | optimization: JPQD quantization |
| | |
| | specifications: |
| | input_shape: [1, 10] |
| | input_type: int64 |
| | input_format: Token sequences |
| | output_shape: [1, 10, 50827] |
| | output_type: float32 |
| | vocabulary_size: 50827 |
| | sequence_length: 10 |
| | batch_size: dynamic |
| | |
| | performance: |
| | original_size_gb: "~2+" |
| | optimized_size_mb: 526.19 |
| | compression_ratio: "~4x" |
| | inference_time_cpu_ms: 6.6 |
| | throughput_fps: ~150 |
| | accuracy_retention: ">95%" |
| | |
| | deployment: |
| | runtime: onnxruntime |
| | hardware: CPU-optimized |
| | precision: INT8 weights, FP32 activations |
| | memory_usage_gb: ~1 |
| | |
| | usage: |
| | preprocessing: |
| | - Load image at 120 DPI resolution |
| | - Resize and enhance image quality |
| | - Convert to token sequence input |
| | postprocessing: |
| | - Decode logits to token IDs |
| | - Convert tokens to text |
| | - Apply language-specific formatting |
| |
|
| | capabilities: |
| | code_recognition: |
| | - Multi-language programming code |
| | - Indentation preservation |
| | - Syntax highlighting support |
| | - Output format: "<_language_> code_content" |
| | formula_recognition: |
| | - Mathematical expressions |
| | - Scientific notation |
| | - Chemical formulas |
| | - Output format: LaTeX code |
| |
|
| | supported_languages: |
| | programming: |
| | - Python |
| | - Java |
| | - JavaScript |
| | - C/C++ |
| | - Go |
| | - Rust |
| | - And many more |
| | markup: |
| | - LaTeX (mathematical formulas) |
| | - Chemical notation |
| | - Scientific expressions |
| |
|
| | applications: |
| | - Document digitization |
| | - Educational content processing |
| | - Code plagiarism detection |
| | - Mathematical problem solving |
| | - Technical documentation conversion |
| | - Research paper processing |
| |
|
| | benchmarks: |
| | accuracy: ">95% code recognition accuracy" |
| | speed: "150 FPS on modern CPUs" |
| | memory: "Efficient 1GB memory usage" |
| | |
| | training_data: |
| | type: "Code snippets and mathematical formulas" |
| | resolution: "120 DPI images" |
| | diversity: "Multiple programming languages and notation systems" |
| |
|
| | license: mit |
| | tags: |
| | - code-recognition |
| | - formula-recognition |
| | - vision-language |
| | - multimodal |
| | - ocr |
| | - latex |
| | - onnx |
| | - quantized |
| | - jpqd |
| | - programming-languages |