new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Mar 30

C2|Q>: A Robust Framework for Bridging Classical and Quantum Software Development

QSE is emerging as a critical discipline to make quantum computing accessible to a broader developer community; however, most quantum development environments still require developers to engage with low-level details across the software stack - including problem encoding, circuit construction, algorithm configuration, hardware selection, and result interpretation - making them difficult for classical software engineers to use. To bridge this gap, we present C2|Q>, a hardware-agnostic quantum software development framework that translates specific types of classical specifications into quantum-executable programs while preserving methodological rigor. The framework applies modular SE principles by classifying the workflow into three core modules: an encoder that classifies problems, produces Quantum-Compatible Formats, and constructs quantum circuits, a deployment module that generates circuits and recommends hardware based on fidelity, runtime, and cost, and a decoder that interprets quantum outputs into classical solutions. In evaluation, the encoder module achieved a 93.8% completion rate, the hardware recommendation module consistently selected the appropriate quantum devices for workloads scaling up to 56 qubits. End-to-end experiments on 434 Python programs and 100 JSON problem instances show that the full C2|Q> workflow executes reliably on simulators and can be deployed successfully on representative real quantum hardware, with empirical runs limited to small- and medium-sized instances consistent with current NISQ capabilities. These results indicate that C2|Q> lowers the entry barrier to quantum software development by providing a reproducible, extensible toolchain that connects classical specifications to quantum execution. The open-source implementation of C2|Q> is available at https://github.com/C2-Q/C2Q and as a Python package at https://pypi.org/project/c2q-framework/.

  • 7 authors
·
Oct 3, 2025

One-Time Universal Hashing Quantum Digital Signatures without Perfect Keys

Quantum digital signatures (QDS), generating correlated bit strings among three remote parties for signatures through quantum law, can guarantee non-repudiation, authenticity, and integrity of messages. Recently, one-time universal hashing QDS framework, exploiting the quantum asymmetric encryption and universal hash functions, has been proposed to significantly improve the signature rate and ensure unconditional security by directly signing the hash value of long messages. However, similar to quantum key distribution, this framework utilizes keys with perfect secrecy by performing privacy amplification that introduces cumbersome matrix operations, thereby consuming large computational resources, causing delays and increasing failure probability. Here, we prove that, different from private communication, imperfect quantum keys with limited information leakage can be used for digital signatures and authentication without compromising the security while having eight orders of magnitude improvement on signature rate for signing a megabit message compared with conventional single-bit schemes. This study significantly reduces the delay for data postprocessing and is compatible with any quantum key generation protocols. In our simulation, taking two-photon twin-field key generation protocol as an example, QDS can be practically implemented over a fiber distance of 650 km between the signer and receiver. For the first time, this study offers a cryptographic application of quantum keys with imperfect secrecy and paves a way for the practical and agile implementation of digital signatures in a future quantum network.

  • 7 authors
·
Jan 3, 2023

A Comparative Study of Quantum Optimization Techniques for Solving Combinatorial Optimization Benchmark Problems

Quantum optimization holds promise for addressing classically intractable combinatorial problems, yet a standardized framework for benchmarking its performance, particularly in terms of solution quality, computational speed, and scalability is still lacking. In this work, we introduce a comprehensive benchmarking framework designed to systematically evaluate a range of quantum optimization techniques against well-established NP-hard combinatorial problems. Our framework focuses on key problem classes, including the Multi-Dimensional Knapsack Problem (MDKP), Maximum Independent Set (MIS), Quadratic Assignment Problem (QAP), and Market Share Problem (MSP). Our study evaluates gate-based quantum approaches, including the Variational Quantum Eigensolver (VQE) and its CVaR-enhanced variant, alongside advanced quantum algorithms such as the Quantum Approximate Optimization Algorithm (QAOA) and its extensions. To address resource constraints, we incorporate qubit compression techniques like Pauli Correlation Encoding (PCE) and Quantum Random Access Optimization (QRAO). Experimental results, obtained from simulated quantum environments and classical solvers, provide key insights into feasibility, optimality gaps, and scalability. Our findings highlight both the promise and current limitations of quantum optimization, offering a structured pathway for future research and practical applications in quantum-enhanced decision-making.

  • 2 authors
·
Mar 15, 2025

Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code

Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ significantly from classical programming approaches or languages. A Code LLM specializing in quantum computing requires a foundational understanding of quantum computing and quantum information theory. However, the scarcity of available quantum code examples and the rapidly evolving field, which necessitates continuous dataset updates, present significant challenges. Moreover, we discuss our work on training Code LLMs to produce high-quality quantum code using the Qiskit library. This work includes an examination of the various aspects of the LLMs used for training and the specific training conditions, as well as the results obtained with our current models. To evaluate our models, we have developed a custom benchmark, similar to HumanEval, which includes a set of tests specifically designed for the field of quantum computing programming using Qiskit. Our findings indicate that our model outperforms existing state-of-the-art models in quantum computing tasks. We also provide examples of code suggestions, comparing our model to other relevant code LLMs. Finally, we introduce a discussion on the potential benefits of Code LLMs for quantum computing computational scientists, researchers, and practitioners. We also explore various features and future work that could be relevant in this context.

  • 8 authors
·
May 29, 2024

Quantum Transfer Learning for MNIST Classification Using a Hybrid Quantum-Classical Approach

In this research, we explore the integration of quantum computing with classical machine learning for image classification tasks, specifically focusing on the MNIST dataset. We propose a hybrid quantum-classical approach that leverages the strengths of both paradigms. The process begins with preprocessing the MNIST dataset, normalizing the pixel values, and reshaping the images into vectors. An autoencoder compresses these 784-dimensional vectors into a 64-dimensional latent space, effectively reducing the data's dimensionality while preserving essential features. These compressed features are then processed using a quantum circuit implemented on a 5-qubit system. The quantum circuit applies rotation gates based on the feature values, followed by Hadamard and CNOT gates to entangle the qubits, and measurements are taken to generate quantum outcomes. These outcomes serve as input for a classical neural network designed to classify the MNIST digits. The classical neural network comprises multiple dense layers with batch normalization and dropout to enhance generalization and performance. We evaluate the performance of this hybrid model and compare it with a purely classical approach. The experimental results indicate that while the hybrid model demonstrates the feasibility of integrating quantum computing with classical techniques, the accuracy of the final model, trained on quantum outcomes, is currently lower than the classical model trained on compressed features. This research highlights the potential of quantum computing in machine learning, though further optimization and advanced quantum algorithms are necessary to achieve superior performance.

  • 1 authors
·
Aug 5, 2024

Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning

Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.

  • 6 authors
·
Apr 14, 2022

Efficient Magic State Cultivation on RP^2

Preparing high-fidelity logical magic states is crucial for fault-tolerant quantum computation. Among prior attempts to reduce the substantial cost of magic state preparation, magic state cultivation (MSC), a recently proposed protocol for preparing T states without magic state distillation, achieves state-of-the-art efficiency. Inspired by this work, we propose a new MSC procedure that would produce a logical T state on a rotated surface code at a further reduced cost. For our MSC protocol, we define a new code family, the RP^2 code, by putting the rotated surface code on RP^2 (a two-dimensional manifold), as well as two self-dual CSS codes named SRP-3 and SRP-5 respectively. Small RP^2 codes are used to hold logical information and checked by syndrome extraction (SE) circuits. We design fast morphing circuits that enable switching between a distance 3 (5) RP^2 code and an SRP-3 (SRP-5) code on which we can efficiently check the correctness of the logical state. To preserve the high accuracy of the cultivated logical T state, we design an efficient and easy-to-decode expansion stage that grows a small RP^2 code to a large rotated surface code in one round. Our MSC protocol utilizes non-local connectivity, available on both neutral atom array and ion trap platforms. According to our Monte Carlo sampling results, our MSC protocol requires about an order of magnitude smaller space-time volume to reach a target logical error rate around 10^{-9} compared to the original MSC protocol.

  • 4 authors
·
Mar 24, 2025

Understanding Quantum Technologies 2025

Understanding Quantum Technologies 2025 is the 8th update of a free open science ebook that provides a 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections, quantum computing energetics and a new subsection of the effects of the Lieb-Robinson limit), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors scientific and engineering approaches and roadmaps), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs and manufacturing process, raw materials), unconventional computing (potential alternatives to quantum and classical computing), quantum computing algorithms, software development tools, resource estimate and benchmark tools, use case and case studies analysis methodologies, application use cases per market, quantum communications and cryptography (including QKD, PQC and QPU interconnect technologies), quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an update to the 2024, 2023, 2022, and 2021 editions published respectively in October 2024, 2023, 2022 and 2021. An update log is provided at the end of the book.

  • 1 authors
·
Nov 24, 2021

QuXAI: Explainers for Hybrid Quantum Machine Learning Models

The emergence of hybrid quantum-classical machine learning (HQML) models opens new horizons of computational intelligence but their fundamental complexity frequently leads to black box behavior that undermines transparency and reliability in their application. Although XAI for quantum systems still in its infancy, a major research gap is evident in robust global and local explainability approaches that are designed for HQML architectures that employ quantized feature encoding followed by classical learning. The gap is the focus of this work, which introduces QuXAI, an framework based upon Q-MEDLEY, an explainer for explaining feature importance in these hybrid systems. Our model entails the creation of HQML models incorporating quantum feature maps, the use of Q-MEDLEY, which combines feature based inferences, preserving the quantum transformation stage and visualizing the resulting attributions. Our result shows that Q-MEDLEY delineates influential classical aspects in HQML models, as well as separates their noise, and competes well against established XAI techniques in classical validation settings. Ablation studies more significantly expose the virtues of the composite structure used in Q-MEDLEY. The implications of this work are critically important, as it provides a route to improve the interpretability and reliability of HQML models, thus promoting greater confidence and being able to engage in safer and more responsible use of quantum-enhanced AI technology.

  • 6 authors
·
May 15, 2025 3

A Resource Efficient Quantum Kernel

Quantum processors may enhance machine learning by mapping high-dimensional data onto quantum systems for processing. Conventional feature maps, for encoding data onto a quantum circuit are currently impractical, as the number of entangling gates scales quadratically with the dimension of the dataset and the number of qubits. In this work, we introduce a quantum feature map designed to handle high-dimensional data with a significantly reduced number of qubits and entangling operations. Our approach preserves essential data characteristics while promoting computational efficiency, as evidenced by extensive experiments on benchmark datasets that demonstrate a marked improvement in both accuracy and resource utilization when using our feature map as a kernel for characterization, as compared to state-of-the-art quantum feature maps. Our noisy simulation results, combined with lower resource requirements, highlight our map's ability to function within the constraints of noisy intermediate-scale quantum devices. Through numerical simulations and small-scale implementation on a superconducting circuit quantum computing platform, we demonstrate that our scheme performs on par or better than a set of classical algorithms for classification. While quantum kernels are typically stymied by exponential concentration, our approach is affected with a slower rate with respect to both the number of qubits and features, which allows practical applications to remain within reach. Our findings herald a promising avenue for the practical implementation of quantum machine learning algorithms on near future quantum computing platforms.

  • 4 authors
·
Jul 4, 2025

Adapting Quantum Machine Learning for Energy Dissociation of Bonds

Accurate prediction of bond dissociation energies (BDEs) underpins mechanistic insight and the rational design of molecules and materials. We present a systematic, reproducible benchmark comparing quantum and classical machine learning models for BDE prediction using a chemically curated feature set encompassing atomic properties (atomic numbers, hybridization), bond characteristics (bond order, type), and local environmental descriptors. Our quantum framework, implemented in Qiskit Aer on six qubits, employs ZZFeatureMap encodings with variational ansatz (RealAmplitudes) across multiple architectures Variational Quantum Regressors (VQR), Quantum Support Vector Regressors (QSVR), Quantum Neural Networks (QNN), Quantum Convolutional Neural Networks (QCNN), and Quantum Random Forests (QRF). These are rigorously benchmarked against strong classical baselines, including Support Vector Regression (SVR), Random Forests (RF), and Multi-Layer Perceptrons (MLP). Comprehensive evaluation spanning absolute and relative error metrics, threshold accuracies, and error distributions shows that top-performing quantum models (QCNN, QRF) match the predictive accuracy and robustness of classical ensembles and deep networks, particularly within the chemically prevalent mid-range BDE regime. These findings establish a transparent baseline for quantum-enhanced molecular property prediction and outline a practical foundation for advancing quantum computational chemistry toward near chemical accuracy.

  • 3 authors
·
Oct 7, 2025

Foundations for Near-Term Quantum Natural Language Processing

We provide conceptual and mathematical foundations for near-term quantum natural language processing (QNLP), and do so in quantum computer scientist friendly terms. We opted for an expository presentation style, and provide references for supporting empirical evidence and formal statements concerning mathematical generality. We recall how the quantum model for natural language that we employ canonically combines linguistic meanings with rich linguistic structure, most notably grammar. In particular, the fact that it takes a quantum-like model to combine meaning and structure, establishes QNLP as quantum-native, on par with simulation of quantum systems. Moreover, the now leading Noisy Intermediate-Scale Quantum (NISQ) paradigm for encoding classical data on quantum hardware, variational quantum circuits, makes NISQ exceptionally QNLP-friendly: linguistic structure can be encoded as a free lunch, in contrast to the apparently exponentially expensive classical encoding of grammar. Quantum speed-up for QNLP tasks has already been established in previous work with Will Zeng. Here we provide a broader range of tasks which all enjoy the same advantage. Diagrammatic reasoning is at the heart of QNLP. Firstly, the quantum model interprets language as quantum processes via the diagrammatic formalism of categorical quantum mechanics. Secondly, these diagrams are via ZX-calculus translated into quantum circuits. Parameterisations of meanings then become the circuit variables to be learned. Our encoding of linguistic structure within quantum circuits also embodies a novel approach for establishing word-meanings that goes beyond the current standards in mainstream AI, by placing linguistic structure at the heart of Wittgenstein's meaning-is-context.

  • 4 authors
·
Dec 7, 2020

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

State Space Models (SSMs) are emerging as a compelling alternative to Transformers because of their consistent memory usage and high performance. Despite this, scaling up SSMs on cloud services or limited-resource devices is challenging due to their storage requirements and computational power. To overcome this, quantizing SSMs with low bit-width data formats can reduce model size and benefit from hardware acceleration. As SSMs are prone to quantization-induced errors, recent efforts have focused on optimizing a particular model or bit-width for efficiency without sacrificing performance. However, distinct bit-width configurations are essential for different scenarios, like W4A8 for boosting large-batch decoding speed, and W4A16 for enhancing generation speed in short prompt applications for a single user. To this end, we present Quamba2, compatible with W8A8, W4A8, and W4A16 for both Mamba1 and Mamba2 backbones, addressing the growing demand for SSM deployment on various platforms. Based on the channel order preserving and activation persistence of SSMs, we propose an offline approach to quantize inputs of a linear recurrence in 8-bit by sorting and clustering for input x, combined with a per-state-group quantization for input-dependent parameters B and C. To ensure compute-invariance in the SSM output, we rearrange weights offline according to the clustering sequence. The experiments show that Quamba2-8B outperforms several state-of-the-art SSM quantization methods and delivers 1.3times and 3times speed-ups in the pre-filling and generation stages, respectively, while offering 4times memory reduction with only a 1.6% average accuracy drop. The evaluation on MMLU shows the generalizability and robustness of our framework. The code and quantized models will be released at: https://github.com/enyac-group/Quamba.

Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements

Learning compatible representations enables the interchangeable use of semantic features as models are updated over time. This is particularly relevant in search and retrieval systems where it is crucial to avoid reprocessing of the gallery images with the updated model. While recent research has shown promising empirical evidence, there is still a lack of comprehensive theoretical understanding about learning compatible representations. In this paper, we demonstrate that the stationary representations learned by the d-Simplex fixed classifier optimally approximate compatibility representation according to the two inequality constraints of its formal definition. This not only establishes a solid foundation for future works in this line of research but also presents implications that can be exploited in practical learning scenarios. An exemplary application is the now-standard practice of downloading and fine-tuning new pre-trained models. Specifically, we show the strengths and critical issues of stationary representations in the case in which a model undergoing sequential fine-tuning is asynchronously replaced by downloading a better-performing model pre-trained elsewhere. Such a representation enables seamless delivery of retrieval service (i.e., no reprocessing of gallery images) and offers improved performance without operational disruptions during model replacement. Code available at: https://github.com/miccunifi/iamcl2r.

  • 4 authors
·
May 4, 2024

Hardware-efficient Variational Quantum Eigensolver for Small Molecules and Quantum Magnets

Quantum computers can be used to address molecular structure, materials science and condensed matter physics problems, which currently stretch the limits of existing high-performance computing resources. Finding exact numerical solutions to these interacting fermion problems has exponential cost, while Monte Carlo methods are plagued by the fermionic sign problem. These limitations of classical computational methods have made even few-atom molecular structures problems of practical interest for medium-sized quantum computers. Yet, thus far experimental implementations have been restricted to molecules involving only Period I elements. Here, we demonstrate the experimental optimization of up to six-qubit Hamiltonian problems with over a hundred Pauli terms, determining the ground state energy for molecules of increasing size, up to BeH2. This is enabled by a hardware-efficient variational quantum eigensolver with trial states specifically tailored to the available interactions in our quantum processor, combined with a compact encoding of fermionic Hamiltonians and a robust stochastic optimization routine. We further demonstrate the flexibility of our approach by applying the technique to a problem of quantum magnetism. Across all studied problems, we find agreement between experiment and numerical simulations with a noisy model of the device. These results help elucidate the requirements for scaling the method to larger systems, and aim at bridging the gap between problems at the forefront of high-performance computing and their implementation on quantum hardware.

  • 7 authors
·
Apr 17, 2017

QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing

We present QuantumLLMInstruct (QLMMI), an innovative dataset featuring over 500,000 meticulously curated instruction-following problem-solution pairs designed specifically for quantum computing - the largest and most comprehensive dataset of its kind. Originating from over 90 primary seed domains and encompassing hundreds of subdomains autonomously generated by LLMs, QLMMI marks a transformative step in the diversity and richness of quantum computing datasets. Designed for instruction fine-tuning, QLMMI seeks to significantly improve LLM performance in addressing complex quantum computing challenges across a wide range of quantum physics topics. While Large Language Models (LLMs) have propelled advancements in computational science with datasets like Omni-MATH and OpenMathInstruct, these primarily target Olympiad-level mathematics, leaving quantum computing largely unexplored. The creation of QLMMI follows a rigorous four-stage methodology. Initially, foundational problems are developed using predefined templates, focusing on critical areas such as synthetic Hamiltonians, QASM code generation, Jordan-Wigner transformations, and Trotter-Suzuki quantum circuit decompositions. Next, detailed and domain-specific solutions are crafted to ensure accuracy and relevance. In the third stage, the dataset is enriched through advanced reasoning techniques, including Chain-of-Thought (CoT) and Task-Oriented Reasoning and Action (ToRA), which enhance problem-solution diversity while adhering to strict mathematical standards. Lastly, a zero-shot Judge LLM performs self-assessments to validate the dataset's quality and reliability, minimizing human oversight requirements.

  • 1 authors
·
Dec 30, 2024

SeQUeNCe: A Customizable Discrete-Event Simulator of Quantum Networks

Recent advances in quantum information science enabled the development of quantum communication network prototypes and created an opportunity to study full-stack quantum network architectures. This work develops SeQUeNCe, a comprehensive, customizable quantum network simulator. Our simulator consists of five modules: Hardware models, Entanglement Management protocols, Resource Management, Network Management, and Application. This framework is suitable for simulation of quantum network prototypes that capture the breadth of current and future hardware technologies and protocols. We implement a comprehensive suite of network protocols and demonstrate the use of SeQUeNCe by simulating a photonic quantum network with nine routers equipped with quantum memories. The simulation capabilities are illustrated in three use cases. We show the dependence of quantum network throughput on several key hardware parameters and study the impact of classical control message latency. We also investigate quantum memory usage efficiency in routers and demonstrate that redistributing memory according to anticipated load increases network capacity by 69.1% and throughput by 6.8%. We design SeQUeNCe to enable comparisons of alternative quantum network technologies, experiment planning, and validation and to aid with new protocol design. We are releasing SeQUeNCe as an open source tool and aim to generate community interest in extending it.

  • 7 authors
·
Sep 24, 2020

COMET: Towards Partical W4A4KV4 LLMs Serving

Quantization is a widely-used compression technology to reduce the overhead of serving large language models (LLMs) on terminal devices and in cloud data centers. However, prevalent quantization methods, such as 8-bit weight-activation or 4-bit weight-only quantization, achieve limited performance improvements due to poor support for low-precision (e.g., 4-bit) activation. This work, for the first time, realizes practical W4A4KV4 serving for LLMs, fully utilizing the INT4 tensor cores on modern GPUs and reducing the memory bottleneck caused by the KV cache. Specifically, we propose a novel fine-grained mixed-precision quantization algorithm (FMPQ) that compresses most activations into 4-bit with negligible accuracy loss. To support mixed-precision matrix multiplication for W4A4 and W4A8, we develop a highly optimized W4Ax kernel. Our approach introduces a novel mixed-precision data layout to facilitate access and fast dequantization for activation and weight tensors, utilizing the GPU's software pipeline to hide the overhead of data loading and conversion. Additionally, we propose fine-grained streaming multiprocessor (SM) scheduling to achieve load balance across different SMs. We integrate the optimized W4Ax kernel into our inference framework, COMET, and provide efficient management to support popular LLMs such as LLaMA-3-70B. Extensive evaluations demonstrate that, when running LLaMA family models on a single A100-80G-SMX4, COMET achieves a kernel-level speedup of 2.88times over cuBLAS and a 2.02 times throughput improvement compared to TensorRT-LLM from an end-to-end framework perspective.

  • 9 authors
·
Oct 15, 2024

ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs

The emergence of fine-grained numerical formats like NVFP4 presents new opportunities for efficient Large Language Model (LLM) inference. However, it is difficult to adapt existing Post-Training Quantization (PTQ) strategies to these formats: rotation-based methods compromise fine-grained block isolation; smoothing techniques struggle with significant 4-bit quantization errors; and mixed-precision approaches often conflict with hardware constraints on unified-precision computation. To address these challenges, we propose ARCQuant, a framework that boosts NVFP4 performance via Augmented Residual Channels. Distinct from methods that compromise block isolation or hardware uniformity, ARCQuant maintains a strictly unified NVFP4 format by augmenting the activation matrix with quantized residual channels. This design integrates the error compensation process directly into the matrix reduction dimension, enabling the use of standard, highly optimized GEMM kernels with minimal overhead. Theoretical analysis confirms that the worst-case error bound of our dual-stage NVFP4 quantization is comparable to that of standard 8-bit formats such as MXFP8. Extensive experiments on LLaMA and Qwen models demonstrate that ARCQuant achieves state-of-the-art accuracy, comparable to full-precision baselines in perplexity and downstream tasks. Furthermore, deployment on RTX 5090 and RTX PRO 6000 GPUs confirms practical benefits, achieving up to 3x speedup over FP16. Our code is available at https://github.com/actypedef/ARCQuant .

  • 6 authors
·
Jan 12

Quantum Visual Fields with Neural Amplitude Encoding

Quantum Implicit Neural Representations (QINRs) include components for learning and execution on gate-based quantum computers. While QINRs recently emerged as a promising new paradigm, many challenges concerning their architecture and ansatz design, the utility of quantum-mechanical properties, training efficiency and the interplay with classical modules remain. This paper advances the field by introducing a new type of QINR for 2D image and 3D geometric field learning, which we collectively refer to as Quantum Visual Field (QVF). QVF encodes classical data into quantum statevectors using neural amplitude encoding grounded in a learnable energy manifold, ensuring meaningful Hilbert space embeddings. Our ansatz follows a fully entangled design of learnable parametrised quantum circuits, with quantum (unitary) operations performed in the real Hilbert space, resulting in numerically stable training with fast convergence. QVF does not rely on classical post-processing -- in contrast to the previous QINR learning approach -- and directly employs projective measurement to extract learned signals encoded in the ansatz. Experiments on a quantum hardware simulator demonstrate that QVF outperforms the existing quantum approach and widely used classical foundational baselines in terms of visual representation accuracy across various metrics and model characteristics, such as learning of high-frequency details. We also show applications of QVF in 2D and 3D field completion and 3D shape interpolation, highlighting its practical potential.

  • 3 authors
·
Aug 14, 2025

Qudit Designs and Where to Find Them

Unitary t-designs are some of the most versatile tools in quantum information theory. Their applications range from randomized benchmarking and shadow tomography, to more fundamental ones such as emulating quantum chaos and establishing exponential separations between classical and quantum query complexity. While unitary designs originating from a group structure, such as the Clifford group, have proven to be incredibly useful for qubit systems, unfortunately, this is no longer true for qudits. In fact, the classification of finite-group representations rules out the existence of unitary 2-designs for arbitrary qudit dimensions. This severely limits the applicability of standard quantum information primitives when it comes to qudit systems. We overcome these limitations with a three-fold contribution. First, we introduce a general technique to construct families of weighted state t-designs in arbitrary qudit dimensions. These weighted state-designs generalize classical shadow tomography protocol from qubits to qudits. Second, we introduce a Clifford character RB that allows us to benchmark the qudit Clifford group in any dimension, including non-prime-power dimensions. And third, we establish bounds on the quantum circuit complexity of generating approximate unitary-designs from native gates in existing quantum hardware such as high-spin and cavity-QED qudits. Our work further highlights the analogy between spin and optical coherent states by proving that spin-GKP codewords form a state 2-design while spin coherent states do not; in direct analogy with the optical case. This work is structured as a pedagogical and self-contained introduction to unitary designs and their applications to qudit systems.

  • 5 authors
·
Mar 3

Teleportation of entanglement over 143 km

As a direct consequence of the no-cloning theorem, the deterministic amplification as in classical communication is impossible for quantum states. This calls for more advanced techniques in a future global quantum network, e.g. for cloud quantum computing. A unique solution is the teleportation of an entangled state, i.e. entanglement swapping, representing the central resource to relay entanglement between distant nodes. Together with entanglement purification and a quantum memory it constitutes a so-called quantum repeater. Since the aforementioned building blocks have been individually demonstrated in laboratory setups only, the applicability of the required technology in real-world scenarios remained to be proven. Here we present a free-space entanglement-swapping experiment between the Canary Islands of La Palma and Tenerife, verifying the presence of quantum entanglement between two previously independent photons separated by 143 km. We obtained an expectation value for the entanglement-witness operator, more than 6 standard deviations beyond the classical limit. By consecutive generation of the two required photon pairs and space-like separation of the relevant measurement events, we also showed the feasibility of the swapping protocol in a long-distance scenario, where the independence of the nodes is highly demanded. Since our results already allow for efficient implementation of entanglement purification, we anticipate our assay to lay the ground for a fully-fledged quantum repeater over a realistic high-loss and even turbulent quantum channel.

  • 7 authors
·
Feb 28, 2014

An Architecture for Meeting Quality-of-Service Requirements in Multi-User Quantum Networks

Quantum communication can enhance internet technology by enabling novel applications that are provably impossible classically. The successful execution of such applications relies on the generation of quantum entanglement between different users of the network which meets stringent performance requirements. Alongside traditional metrics such as throughput and jitter, one must ensure the generated entanglement is of sufficiently high quality. Meeting such performance requirements demands a careful orchestration of many devices in the network, giving rise to a fundamentally new scheduling problem. Furthermore, technological limitations of near-term quantum devices impose significant constraints on scheduling methods hoping to meet performance requirements. In this work, we propose the first end-to-end design of a centralized quantum network with multiple users that orchestrates the delivery of entanglement which meets quality-of-service (QoS) requirements of applications. We achieve this by using a centrally constructed schedule that manages usage of devices and ensures the coordinated execution of different quantum operations throughout the network. We use periodic task scheduling and resource-constrained project scheduling techniques, including a novel heuristic, to construct the schedules. Our simulations of four small networks using hardware-validated network parameters, and of a real-world fiber topology using futuristic parameters, illustrate trade-offs between traditional and quantum performance metrics.

  • 2 authors
·
Nov 25, 2021

Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation

Large language models (LLM) have achieved remarkable outcomes in addressing complex problems, including math, coding, and analyzing large amounts of scientific reports. Yet few works have explored the potential of LLM in quantum computing. The most challenging problem is how to leverage LLMs to automatically generate quantum circuits at a large scale. In this paper, we address such a challenge by fine-tuning LLMs and injecting the domain-specific knowledge of quantum computing. In particular, we investigate the mechanisms to generate training data sets and construct the end-to-end pipeline to fine-tune pre-trained LLMs that produce parameterized quantum circuits for optimization problems. We have prepared 14,000 quantum circuits covering a substantial part of the quantum optimization landscape: 12 optimization problem instances and their optimized QAOA, VQE, and adaptive VQE circuits. The fine-tuned LLMs can construct syntactically correct parametrized quantum circuits in the most recent OpenQASM 3.0. We have evaluated the quality of the parameters by comparing them to the optimized expectation values and distributions. Our evaluation shows that the fine-tuned LLM outperforms state-of-the-art models and that the parameters are better than random. The LLM-generated parametrized circuits and initial parameters can be used as a starting point for further optimization, e.g., templates in quantum machine learning and the benchmark for compilers and hardware.

  • 4 authors
·
Apr 15, 2025

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.

  • 6 authors
·
Jul 19, 2023

Entanglement Purification in Quantum Networks: Guaranteed Improvement and Optimal Time

While the concept of entanglement purification protocols (EPPs) is straightforward, the integration of EPPs in network architectures requires careful performance evaluations and optimizations that take into account realistic conditions and imperfections, especially probabilistic entanglement generation and quantum memory decoherence. It is important to understand what is guaranteed to be improved from successful EPP with arbitrary non-identical input, which determines whether we want to perform the EPP at all. When successful EPP can offer improvement, the time to perform the EPP should also be optimized to maximize the improvement. In this work, we study the guaranteed improvement and optimal time for the CNOT-based recurrence EPP, previously shown to be optimal in various scenarios. We firstly prove guaranteed improvement for multiple figures of merit, including fidelity and several entanglement measures when compared to practical baselines as functions of input states. However, it is noteworthy that the guaranteed improvement we prove does not imply the universality of the EPP as introduced in arXiv:2407.21760. Then we prove robust, parameter-independent optimal time for typical error models and figures of merit. We further explore memory decoherence described by continuous-time Pauli channels, and demonstrate the phenomenon of optimal time transition when the memory decoherence error pattern changes. Our work deepens the understanding of EPP performance in realistic scenarios and offers insights into optimizing quantum networks that integrate EPPs.

  • 5 authors
·
May 4, 2025

Lorentz-Equivariant Quantum Graph Neural Network for High-Energy Physics

The rapid data surge from the high-luminosity Large Hadron Collider introduces critical computational challenges requiring novel approaches for efficient data processing in particle physics. Quantum machine learning, with its capability to leverage the extensive Hilbert space of quantum hardware, offers a promising solution. However, current quantum graph neural networks (GNNs) lack robustness to noise and are often constrained by fixed symmetry groups, limiting adaptability in complex particle interaction modeling. This paper demonstrates that replacing the Lorentz Group Equivariant Block modules in LorentzNet with a dressed quantum circuit significantly enhances performance despite using nearly 5.5 times fewer parameters. Additionally, quantum circuits effectively replace MLPs by inherently preserving symmetries, with Lorentz symmetry integration ensuring robust handling of relativistic invariance. Our Lorentz-Equivariant Quantum Graph Neural Network (Lorentz-EQGNN) achieved 74.00% test accuracy and an AUC of 87.38% on the Quark-Gluon jet tagging dataset, outperforming the classical and quantum GNNs with a reduced architecture using only 4 qubits. On the Electron-Photon dataset, Lorentz-EQGNN reached 67.00% test accuracy and an AUC of 68.20%, demonstrating competitive results with just 800 training samples. Evaluation of our model on generic MNIST and FashionMNIST datasets confirmed Lorentz-EQGNN's efficiency, achieving 88.10% and 74.80% test accuracy, respectively. Ablation studies validated the impact of quantum components on performance, with notable improvements in background rejection rates over classical counterparts. These results highlight Lorentz-EQGNN's potential for immediate applications in noise-resilient jet tagging, event classification, and broader data-scarce HEP tasks.

  • 5 authors
·
Nov 3, 2024

Programmable Heisenberg interactions between Floquet qubits

The fundamental trade-off between robustness and tunability is a central challenge in the pursuit of quantum simulation and fault-tolerant quantum computation. In particular, many emerging quantum architectures are designed to achieve high coherence at the expense of having fixed spectra and consequently limited types of controllable interactions. Here, by adiabatically transforming fixed-frequency superconducting circuits into modifiable Floquet qubits, we demonstrate an XXZ Heisenberg interaction with fully adjustable anisotropy. This interaction model is on one hand the basis for many-body quantum simulation of spin systems, and on the other hand the primitive for an expressive quantum gate set. To illustrate the robustness and versatility of our Floquet protocol, we tailor the Heisenberg Hamiltonian and implement two-qubit iSWAP, CZ, and SWAP gates with estimated fidelities of 99.32(3)%, 99.72(2)%, and 98.93(5)%, respectively. In addition, we implement a Heisenberg interaction between higher energy levels and employ it to construct a three-qubit CCZ gate with a fidelity of 96.18(5)%. Importantly, the protocol is applicable to various fixed-frequency high-coherence platforms, thereby unlocking a suite of essential interactions for high-performance quantum information processing. From a broader perspective, our work provides compelling avenues for future exploration of quantum electrodynamics and optimal control using the Floquet framework.

  • 12 authors
·
Nov 18, 2022

KetGPT - Dataset Augmentation of Quantum Circuits using Transformers

Quantum algorithms, represented as quantum circuits, can be used as benchmarks for assessing the performance of quantum systems. Existing datasets, widely utilized in the field, suffer from limitations in size and versatility, leading researchers to employ randomly generated circuits. Random circuits are, however, not representative benchmarks as they lack the inherent properties of real quantum algorithms for which the quantum systems are manufactured. This shortage of `useful' quantum benchmarks poses a challenge to advancing the development and comparison of quantum compilers and hardware. This research aims to enhance the existing quantum circuit datasets by generating what we refer to as `realistic-looking' circuits by employing the Transformer machine learning architecture. For this purpose, we introduce KetGPT, a tool that generates synthetic circuits in OpenQASM language, whose structure is based on quantum circuits derived from existing quantum algorithms and follows the typical patterns of human-written algorithm-based code (e.g., order of gates and qubits). Our three-fold verification process, involving manual inspection and Qiskit framework execution, transformer-based classification, and structural analysis, demonstrates the efficacy of KetGPT in producing large amounts of additional circuits that closely align with algorithm-based structures. Beyond benchmarking, we envision KetGPT contributing substantially to AI-driven quantum compilers and systems.

  • 4 authors
·
Feb 20, 2024

Less Quantum, More Advantage: An End-to-End Quantum Algorithm for the Jones Polynomial

We present an end-to-end reconfigurable algorithmic pipeline for solving a famous problem in knot theory using a noisy digital quantum computer, namely computing the value of the Jones polynomial at the fifth root of unity within additive error for any input link, i.e. a closed braid. This problem is DQC1-complete for Markov-closed braids and BQP-complete for Plat-closed braids, and we accommodate both versions of the problem. Even though it is widely believed that DQC1 is strictly contained in BQP, and so is 'less quantum', the resource requirements of classical algorithms for the DQC1 version are at least as high as for the BQP version, and so we potentially gain 'more advantage' by focusing on Markov-closed braids in our exposition. We demonstrate our quantum algorithm on Quantinuum's H2-2 quantum computer and show the effect of problem-tailored error-mitigation techniques. Further, leveraging that the Jones polynomial is a link invariant, we construct an efficiently verifiable benchmark to characterise the effect of noise present in a given quantum processor. In parallel, we implement and benchmark the state-of-the-art tensor-network-based classical algorithms for computing the Jones polynomial. The practical tools provided in this work allow for precise resource estimation to identify near-term quantum advantage for a meaningful quantum-native problem in knot theory.

  • 9 authors
·
Mar 7, 2025

Quantum machine learning for image classification

Image classification, a pivotal task in multiple industries, faces computational challenges due to the burgeoning volume of visual data. This research addresses these challenges by introducing two quantum machine learning models that leverage the principles of quantum mechanics for effective computations. Our first model, a hybrid quantum neural network with parallel quantum circuits, enables the execution of computations even in the noisy intermediate-scale quantum era, where circuits with a large number of qubits are currently infeasible. This model demonstrated a record-breaking classification accuracy of 99.21% on the full MNIST dataset, surpassing the performance of known quantum-classical models, while having eight times fewer parameters than its classical counterpart. Also, the results of testing this hybrid model on a Medical MNIST (classification accuracy over 99%), and on CIFAR-10 (classification accuracy over 82%), can serve as evidence of the generalizability of the model and highlights the efficiency of quantum layers in distinguishing common features of input data. Our second model introduces a hybrid quantum neural network with a Quanvolutional layer, reducing image resolution via a convolution process. The model matches the performance of its classical counterpart, having four times fewer trainable parameters, and outperforms a classical model with equal weight parameters. These models represent advancements in quantum machine learning research and illuminate the path towards more accurate image classification systems.

  • 5 authors
·
Apr 18, 2023

Adaptive Graph Shrinking for Quantum Optimization of Constrained Combinatorial Problems

A range of quantum algorithms, especially those leveraging variational parameterization and circuit-based optimization, are being studied as alternatives for solving classically intractable combinatorial optimization problems (COPs). However, their applicability is limited by hardware constraints, including shallow circuit depth, limited qubit counts, and noise. To mitigate these issues, we propose a hybrid classical--quantum framework based on graph shrinking to reduce the number of variables and constraints in QUBO formulations of COPs, while preserving problem structure. Our approach introduces three key ideas: (i) constraint-aware shrinking that prevents merges that will likely violate problem-specific feasibility constraints, (ii) a verification-and-repair pipeline to correct infeasible solutions post-optimization, and (iii) adaptive strategies for recalculating correlations and controlling the graph shrinking process. We apply our approach to three standard benchmark problems: Multidimensional Knapsack (MDKP), Maximum Independent Set (MIS), and the Quadratic Assignment Problem (QAP). Empirical results show that our approach improves solution feasibility, reduces repair complexity, and enhances quantum optimization quality on hardware-limited instances. These findings demonstrate a scalable pathway for applying near-term quantum algorithms to classically challenging constrained optimization problems.

  • 2 authors
·
Jun 17, 2025

iFairy: the First 2-bit Complex LLM with All Parameters in {pm1, pm i}

Quantization-Aware Training (QAT) integrates quantization into the training loop, enabling LLMs to learn robust low-bit representations, and is widely recognized as one of the most promising research directions. All current QAT research focuses on minimizing quantization error on full-precision models, where the full-precision accuracy acts as an upper bound (accuracy ceiling). No existing method has even attempted to surpass this ceiling. To break this ceiling, we propose a new paradigm: raising the ceiling (full-precision model), and then still quantizing it efficiently into 2 bits. We propose Fairypm i, the first 2-bit quantization framework for complex-valued LLMs. Specifically, our method leverages the representational advantages of the complex domain to boost full-precision accuracy. We map weights to the fourth roots of unity {pm1, pm i}, forming a perfectly symmetric and information-theoretically optimal 2-bit representation. Importantly, each quantized weight has either a zero real or imaginary part, enabling multiplication-free inference using only additions and element swaps. Experimental results show that Fairypm i outperforms the ceiling of existing 2-bit quantization approaches in terms of both PPL and downstream tasks, while maintaining strict storage and compute efficiency. This work opens a new direction for building highly accurate and practical LLMs under extremely low-bit constraints.

  • 10 authors
·
Aug 7, 2025

1d-qt-ideal-solver: 1D Idealized Quantum Tunneling Solver with Absorbing Boundaries

We present 1d-qt-ideal-solver, an open-source Python library for simulating one-dimensional quantum tunneling dynamics under idealized coherent conditions. The solver implements the split-operator method with second-order Trotter-Suzuki factorization, utilizing FFT-based spectral differentiation for the kinetic operator and complex absorbing potentials to eliminate boundary reflections. Numba just-in-time compilation achieves performance comparable to compiled languages while maintaining code accessibility. We validate the implementation through two canonical test cases: rectangular barriers modeling field emission through oxide layers and Gaussian barriers approximating scanning tunneling microscopy interactions. Both simulations achieve exceptional numerical fidelity with machine-precision energy conservation over femtosecond-scale propagation. Comparative analysis employing information-theoretic measures and nonparametric hypothesis tests reveals that rectangular barriers exhibit moderately higher transmission coefficients than Gaussian barriers in the over-barrier regime, though Jensen-Shannon divergence analysis indicates modest practical differences between geometries. Phase space analysis confirms complete decoherence when averaged over spatial-temporal domains. The library name reflects its scope: idealized signifies deliberate exclusion of dissipation, environmental coupling, and many-body interactions, limiting applicability to qualitative insights and pedagogical purposes rather than quantitative experimental predictions. Distributed under the MIT License, the library provides a deployable tool for teaching quantum mechanics and preliminary exploration of tunneling dynamics.

  • 5 authors
·
Dec 27, 2025

QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead

Serving LLMs requires substantial memory due to the storage requirements of Key-Value (KV) embeddings in the KV cache, which grows with sequence length. An effective approach to compress KV cache is quantization. However, traditional quantization methods face significant memory overhead due to the need to store quantization constants (at least a zero point and a scale) in full precision per data block. Depending on the block size, this overhead can add 1 or 2 bits per quantized number. We introduce QJL, a new quantization approach that consists of a Johnson-Lindenstrauss (JL) transform followed by sign-bit quantization. In contrast to existing methods, QJL eliminates memory overheads by removing the need for storing quantization constants. We propose an asymmetric estimator for the inner product of two vectors and demonstrate that applying QJL to one vector and a standard JL transform without quantization to the other provides an unbiased estimator with minimal distortion. We have developed an efficient implementation of the QJL sketch and its corresponding inner product estimator, incorporating a lightweight CUDA kernel for optimized computation. When applied across various LLMs and NLP tasks to quantize the KV cache to only 3 bits, QJL demonstrates a more than fivefold reduction in KV cache memory usage without compromising accuracy, all while achieving faster runtime. Codes are available at https://github.com/amirzandieh/QJL.

  • 3 authors
·
Jun 5, 2024

Establishing Baselines for Photonic Quantum Machine Learning: Insights from an Open, Collaborative Initiative

The Perceval Challenge is an open, reproducible benchmark designed to assess the potential of photonic quantum computing for machine learning. Focusing on a reduced and hardware-feasible version of the MNIST digit classification task or near-term photonic processors, it offers a concrete framework to evaluate how photonic quantum circuits learn and generalize from limited data. Conducted over more than three months, the challenge attracted 64 teams worldwide in its first phase. After an initial selection, 11 finalist teams were granted access to GPU resources for large-scale simulation and photonic hardware execution through cloud service. The results establish the first unified baseline of photonic machine-learning performance, revealing complementary strengths between variational, hardware-native, and hybrid approaches. This challenge also underscores the importance of open, reproducible experimentation and interdisciplinary collaboration, highlighting how shared benchmarks can accelerate progress in quantum-enhanced learning. All implementations are publicly available in a single shared repository (https://github.com/Quandela/HybridAIQuantum-Challenge), supporting transparent benchmarking and cumulative research. Beyond this specific task, the Perceval Challenge illustrates how systematic, collaborative experimentation can map the current landscape of photonic quantum machine learning and pave the way toward hybrid, quantum-augmented AI workflows.

  • 31 authors
·
Oct 29, 2025

Hybrid Quantum-Classical Model for Image Classification

This study presents a systematic comparison between hybrid quantum-classical neural networks and purely classical models across three benchmark datasets (MNIST, CIFAR100, and STL10) to evaluate their performance, efficiency, and robustness. The hybrid models integrate parameterized quantum circuits with classical deep learning architectures, while the classical counterparts use conventional convolutional neural networks (CNNs). Experiments were conducted over 50 training epochs for each dataset, with evaluations on validation accuracy, test accuracy, training time, computational resource usage, and adversarial robustness (tested with epsilon=0.1 perturbations).Key findings demonstrate that hybrid models consistently outperform classical models in final accuracy, achieving {99.38\% (MNIST), 41.69\% (CIFAR100), and 74.05\% (STL10) validation accuracy, compared to classical benchmarks of 98.21\%, 32.25\%, and 63.76\%, respectively. Notably, the hybrid advantage scales with dataset complexity, showing the most significant gains on CIFAR100 (+9.44\%) and STL10 (+10.29\%). Hybrid models also train 5--12times faster (e.g., 21.23s vs. 108.44s per epoch on MNIST) and use 6--32\% fewer parameters} while maintaining superior generalization to unseen test data.Adversarial robustness tests reveal that hybrid models are significantly more resilient on simpler datasets (e.g., 45.27\% robust accuracy on MNIST vs. 10.80\% for classical) but show comparable fragility on complex datasets like CIFAR100 (sim1\% robustness for both). Resource efficiency analyses indicate that hybrid models consume less memory (4--5GB vs. 5--6GB for classical) and lower CPU utilization (9.5\% vs. 23.2\% on average).These results suggest that hybrid quantum-classical architectures offer compelling advantages in accuracy, training efficiency, and parameter scalability, particularly for complex vision tasks.

  • 1 authors
·
Sep 14, 2025 2

QMCPy: A Python Software for Randomized Low-Discrepancy Sequences, Quasi-Monte Carlo, and Fast Kernel Methods

Low-discrepancy (LD) sequences have been extensively used as efficient experimental designs across many scientific disciplines. QMCPy (https://qmcsoftware.github.io/QMCSoftware/) is an accessible Python library which provides a unified implementation of randomized LD sequences, automatic variable transformations, adaptive Quasi-Monte Carlo error estimation algorithms, and fast kernel methods. This article focuses on recent updates to QMCPy which broaden support for randomized LD sequences and add new tools to enable fast kernel methods using LD sequences. Specifically, we give a unified description of the supported LD lattices, digital nets, and Halton point sets, along with randomization options including random permutations / shifts, linear matrix scrambling (LMS), and nested uniform scrambling (NUS). We also support higher-order digital nets, higher-order scrambling with LMS or NUS, and Halton scrambling with LMS or NUS. For fast kernel methods, we provide shift-invariant (SI) and digitally-shift-invariant (DSI) kernels, including a new set of higher-order smoothness DSI kernels. When SI and DSI kernels are respectively paired with n LD lattice and digital net points, the resulting Gram matrices permit multiplication and inversion at only O(n log n) cost. These fast operations utilize QMCPy's implementation of the fast Fourier transform in bit-reversed order (FFTBR), inverse FFTBR (IFFTBR), and fast Walsh--Hadamard transform (FWHT).

  • 1 authors
·
Feb 19, 2025

Q-Cluster: Quantum Error Mitigation Through Noise-Aware Unsupervised Learning

Quantum error mitigation (QEM) is critical in reducing the impact of noise in the pre-fault-tolerant era, and is expected to complement error correction in fault-tolerant quantum computing (FTQC). In this paper, we propose a novel QEM approach, Q-Cluster, that uses unsupervised learning (clustering) to reshape the measured bit-string distribution. Our approach starts with a simplified bit-flip noise model. It first performs clustering on noisy measurement results, i.e., bit-strings, based on the Hamming distance. The centroid of each cluster is calculated using a qubit-wise majority vote. Next, the noisy distribution is adjusted with the clustering outcomes and the bit-flip error rates using Bayesian inference. Our simulation results show that Q-Cluster can mitigate high noise rates (up to 40% per qubit) with the simple bit-flip noise model. However, real quantum computers do not fit such a simple noise model. To address the problem, we (a) apply Pauli twirling to tailor the complex noise channels to Pauli errors, and (b) employ a machine learning model, ExtraTrees regressor, to estimate an effective bit-flip error rate using a feature vector consisting of machine calibration data (gate & measurement error rates), circuit features (number of qubits, numbers of different types of gates, etc.) and the shape of the noisy distribution (entropy). Our experimental results show that our proposed Q-Cluster scheme improves the fidelity by a factor of 1.46x, on average, compared to the unmitigated output distribution, for a set of low-entropy benchmarks on five different IBM quantum machines. Our approach outperforms the state-of-art QEM approaches M3 [24], Hammer [35], and QBEEP [33] by 1.29x, 1.47x, and 2.65x, respectively.

  • 3 authors
·
Apr 14, 2025

ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their practical application is constrained by substantial memory and computational demands. Post-training quantization (PTQ) is considered an effective method to accelerate LLM inference. Despite its growing popularity in LLM model compression, PTQ deployment faces two major challenges. First, low-bit quantization leads to performance degradation. Second, restricted by the limited integer computing unit type on GPUs, quantized matrix operations with different precisions cannot be effectively accelerated. To address these issues, we introduce a novel arbitrary-bit quantization algorithm and inference framework, ABQ-LLM. It achieves superior performance across various quantization settings and enables efficient arbitrary-precision quantized inference on the GPU. ABQ-LLM introduces several key innovations: (1) a distribution correction method for transformer blocks to mitigate distribution differences caused by full quantization of weights and activations, improving performance at low bit-widths. (2) the bit balance strategy to counteract performance degradation from asymmetric distribution issues at very low bit-widths (e.g., 2-bit). (3) an innovative quantization acceleration framework that reconstructs the quantization matrix multiplication of arbitrary precision combinations based on BTC (Binary TensorCore) equivalents, gets rid of the limitations of INT4/INT8 computing units. ABQ-LLM can convert each component bit width gain into actual acceleration gain, maximizing performance under mixed precision(e.g., W6A6, W2A8). Based on W2*A8 quantization configuration on LLaMA-7B model, it achieved a WikiText2 perplexity of 7.59 (2.17downarrow vs 9.76 in AffineQuant). Compared to SmoothQuant, we realized 1.6times acceleration improvement and 2.7times memory compression gain.

  • 9 authors
·
Aug 16, 2024