new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 3

A Sublinear Algorithm for Approximate Shortest Paths in Large Networks

Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing. In this work, we propose a new algorithm and data structure, WormHole, for approximate shortest path computations. WormHole leverages structural properties of social networks to build a sublinearly sized index, drawing upon the explicit core-periphery decomposition of Ben-Eliezer et al. Empirically, the preprocessing time of WormHole improves upon index-based solutions by orders of magnitude, and individual inquiries are consistently much faster than in BiBFS. The acceleration comes at the cost of a minor accuracy trade-off. Nonetheless, our empirical evidence demonstrates that WormHole accurately answers essentially all inquiries within a maximum additive error of 2. We complement these empirical results with provable theoretical guarantees, showing that WormHole requires n^{o(1)} node queries per distance inquiry in random power-law networks. In contrast, any approach without a preprocessing step requires n^{Ω(1)} queries for the same task. WormHole does not require reading the whole graph. Unlike the vast majority of index-based algorithms, it returns paths, not just distances. For faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.

  • 5 authors
·
Jun 11, 2024

Intrinsic Image Decomposition via Ordinal Shading

Intrinsic decomposition is a fundamental mid-level vision problem that plays a crucial role in various inverse rendering and computational photography pipelines. Generating highly accurate intrinsic decompositions is an inherently under-constrained task that requires precisely estimating continuous-valued shading and albedo. In this work, we achieve high-resolution intrinsic decomposition by breaking the problem into two parts. First, we present a dense ordinal shading formulation using a shift- and scale-invariant loss in order to estimate ordinal shading cues without restricting the predictions to obey the intrinsic model. We then combine low- and high-resolution ordinal estimations using a second network to generate a shading estimate with both global coherency and local details. We encourage the model to learn an accurate decomposition by computing losses on the estimated shading as well as the albedo implied by the intrinsic model. We develop a straightforward method for generating dense pseudo ground truth using our model's predictions and multi-illumination data, enabling generalization to in-the-wild imagery. We present an exhaustive qualitative and quantitative analysis of our predicted intrinsic components against state-of-the-art methods. Finally, we demonstrate the real-world applicability of our estimations by performing otherwise difficult editing tasks such as recoloring and relighting.

  • 2 authors
·
Nov 21, 2023

Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data

Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there are finite objects in each aspect or mode, corresponding to discrete indexes of data entries. However, real-world data is often not naturally posed in this setting. For example, geographic data is represented as continuous indexes of latitude and longitude coordinates, and cannot fit tensor models directly. To generalize Tucker decomposition to such scenarios, we propose Functional Bayesian Tucker Decomposition (FunBaT). We treat the continuous-indexed data as the interaction between the Tucker core and a group of latent functions. We use Gaussian processes (GP) as functional priors to model the latent functions. Then, we convert each GP into a state-space prior by constructing an equivalent stochastic differential equation (SDE) to reduce computational cost. An efficient inference algorithm is developed for scalable posterior approximation based on advanced message-passing techniques. The advantage of our method is shown in both synthetic data and several real-world applications. We release the code of FunBaT at https://github.com/xuangu-fang/Functional-Bayesian-Tucker-Decomposition.

  • 6 authors
·
Nov 8, 2023

When Does Bottom-up Beat Top-down in Hierarchical Community Detection?

Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes of algorithms tackling this problem. Divisive (top-down) algorithms recursively partition the nodes into two communities, until a stopping rule indicates that no further split is needed. In contrast, agglomerative (bottom-up) algorithms first identify the smallest community structure and then repeatedly merge the communities using a linkage method. In this article, we establish theoretical guarantees for the recovery of the hierarchical tree and community structure of a Hierarchical Stochastic Block Model by a bottom-up algorithm. We also establish that this bottom-up algorithm attains the information-theoretic threshold for exact recovery at intermediate levels of the hierarchy. Notably, these recovery conditions are less restrictive compared to those existing for top-down algorithms. This shows that bottom-up algorithms extend the feasible region for achieving exact recovery at intermediate levels. Numerical experiments on both synthetic and real data sets confirm the superiority of bottom-up algorithms over top-down algorithms. We also observe that top-down algorithms can produce dendrograms with inversions. These findings contribute to a better understanding of hierarchical clustering techniques and their applications in network analysis.

  • 4 authors
·
Jun 1, 2023

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster

Making text-to-image (T2I) generative model sample both fast and well represents a promising research direction. Previous studies have typically focused on either enhancing the visual quality of synthesized images at the expense of sampling efficiency or dramatically accelerating sampling without improving the base model's generative capacity. Moreover, nearly all inference methods have not been able to ensure stable performance simultaneously on both diffusion models (DMs) and visual autoregressive models (ARMs). In this paper, we introduce a novel plug-and-play inference paradigm, CoRe^2, which comprises three subprocesses: Collect, Reflect, and Refine. CoRe^2 first collects classifier-free guidance (CFG) trajectories, and then use collected data to train a weak model that reflects the easy-to-learn contents while reducing number of function evaluations during inference by half. Subsequently, CoRe^2 employs weak-to-strong guidance to refine the conditional output, thereby improving the model's capacity to generate high-frequency and realistic content, which is difficult for the base model to capture. To the best of our knowledge, CoRe^2 is the first to demonstrate both efficiency and effectiveness across a wide range of DMs, including SDXL, SD3.5, and FLUX, as well as ARMs like LlamaGen. It has exhibited significant performance improvements on HPD v2, Pick-of-Pic, Drawbench, GenEval, and T2I-Compbench. Furthermore, CoRe^2 can be seamlessly integrated with the state-of-the-art Z-Sampling, outperforming it by 0.3 and 0.16 on PickScore and AES, while achieving 5.64s time saving using SD3.5.Code is released at https://github.com/xie-lab-ml/CoRe/tree/main.

  • 7 authors
·
Mar 12, 2025 4

Segmentation of 3D pore space from CT images using curvilinear skeleton: application to numerical simulation of microbial decomposition

Recent advances in 3D X-ray Computed Tomographic (CT) sensors have stimulated research efforts to unveil the extremely complex micro-scale processes that control the activity of soil microorganisms. Voxel-based description (up to hundreds millions voxels) of the pore space can be extracted, from grey level 3D CT scanner images, by means of simple image processing tools. Classical methods for numerical simulation of biological dynamics using mesh of voxels, such as Lattice Boltzmann Model (LBM), are too much time consuming. Thus, the use of more compact and reliable geometrical representations of pore space can drastically decrease the computational cost of the simulations. Several recent works propose basic analytic volume primitives (e.g. spheres, generalized cylinders, ellipsoids) to define a piece-wise approximation of pore space for numerical simulation of draining, diffusion and microbial decomposition. Such approaches work well but the drawback is that it generates approximation errors. In the present work, we study another alternative where pore space is described by means of geometrically relevant connected subsets of voxels (regions) computed from the curvilinear skeleton. Indeed, many works use the curvilinear skeleton (3D medial axis) for analyzing and partitioning 3D shapes within various domains (medicine, material sciences, petroleum engineering, etc.) but only a few ones in soil sciences. Within the context of soil sciences, most studies dealing with 3D medial axis focus on the determination of pore throats. Here, we segment pore space using curvilinear skeleton in order to achieve numerical simulation of microbial decomposition (including diffusion processes). We validate simulation outputs by comparison with other methods using different pore space geometrical representations (balls, voxels).

  • 6 authors
·
Sep 4, 2023

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning

Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone.

  • 5 authors
·
Oct 21, 2023

Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations

MoltBook is a large-scale multi-agent coordination environment where over 770,000 autonomous LLM agents interact without human participation, offering the first opportunity we are aware of to observe emergent multi-agent coordination dynamics at this population scale. We introduce Molt Dynamics: the emergent agent coordination behaviors, inter-agent communication dynamics, and role specialization patterns arising when autonomous agents operate as decentralized decision-makers in an unconstrained multi-agent environment. Through longitudinal observation of 90,704 active agents over three weeks, we characterize three aspects. First, spontaneous role specialization: network-based clustering reveals six structural roles (silhouette 0.91), though the result primarily reflects core-periphery organization -- 93.5\% of agents occupy a homogeneous peripheral cluster, with meaningful differentiation confined to the active minority. Second, decentralized information dissemination: cascade analysis of 10,323 inter-agent propagation events reveals power-law distributed cascade sizes (α= 2.57 pm 0.02) and saturating adoption dynamics where adoption probability shows diminishing returns with repeated exposures (Cox hazard ratio 0.53, concordance 0.78). Third, distributed cooperative task resolution: 164 multi-agent collaborative events show detectable coordination patterns, but success rates are low (6.7\%, p = 0.057) and cooperative outcomes are significantly worse than a matched single-agent baseline (Cohen's d = -0.88), indicating emergent cooperative behavior is nascent. These findings establish an empirical baseline for coordination dynamics in decentralized autonomous agent systems, with implications for multi-agent system design, agent communication protocol engineering, and AI safety.

  • 2 authors
·
Mar 3

Olbedo: An Albedo and Shading Aerial Dataset for Large-Scale Outdoor Environments

Intrinsic image decomposition (IID) of outdoor scenes is crucial for relighting, editing, and understanding large-scale environments, but progress has been limited by the lack of real-world datasets with reliable albedo and shading supervision. We introduce Olbedo, a large-scale aerial dataset for outdoor albedo--shading decomposition in the wild. Olbedo contains 5,664 UAV images captured across four landscape types, multiple years, and diverse illumination conditions. Each view is accompanied by multi-view consistent albedo and shading maps, metric depth, surface normals, sun and sky shading components, camera poses, and, for recent flights, measured HDR sky domes. These annotations are derived from an inverse-rendering refinement pipeline over multi-view stereo reconstructions and calibrated sky illumination, together with per-pixel confidence masks. We demonstrate that Olbedo enables state-of-the-art diffusion-based IID models, originally trained on synthetic indoor data, to generalize to real outdoor imagery: fine-tuning on Olbedo significantly improves single-view outdoor albedo prediction on the MatrixCity benchmark. We further illustrate applications of Olbedo-trained models to multi-view consistent relighting of 3D assets, material editing, and scene change analysis for urban digital twins. We release the dataset, baseline models, and an evaluation protocol to support future research in outdoor intrinsic decomposition and illumination-aware aerial vision.

  • 7 authors
·
Feb 24

Causal Attribution of Coastal Water Clarity Degradation to Nickel Processing Expansion at the Indonesia Morowali Industrial Park, Sulawesi

Indonesia's nickel ore export ban has driven rapid expansion of smelting and hydrometallurgical processing capacity at the Indonesia Morowali Industrial Park (IMIP), now the world's largest integrated nickel processing complex, on the coast of Central Sulawesi. Whether this industrialization has degraded the adjacent marine environment remains unquantified. We apply Bayesian structural time-series (BSTS) causal inference to a multi-decadal, multi-sensor satellite ocean color record of the diffuse attenuation coefficient at 490 nm, K_d(490), to test for a causal link between IMIP expansion and nearshore turbidity change. A consensus structural breakpoint, a significant posterior causal effect estimated against a Banda Sea counterfactual, and a distribution-free placebo rank test collectively establish that coastal water clarity deteriorated after the transition from initial nickel pig iron production to hyper-expansion of high-pressure acid leaching facilities for battery-grade nickel. Satellite-derived land cover analysis independently corroborates this timing, showing substantial built-area growth and concurrent tree cover loss within the IMIP footprint. The resulting euphotic zone shoaling occurs in oligotrophic waters supporting high marine biodiversity, where even moderate optical degradation may impair coral photosynthesis and compress depth-dependent reef habitat. These findings quantify a marine environmental cost absent from Indonesia's mineral downstreaming policy discourse and demonstrate a transferable, satellite-based quasi-experimental framework for causal impact assessment at coastal industrial sites in data-limited tropical settings.

Learning 3D Human Shape and Pose from Dense Body Parts

Reconstructing 3D human shape and pose from monocular images is challenging despite the promising results achieved by the most recent learning-based methods. The commonly occurred misalignment comes from the facts that the mapping from images to the model space is highly non-linear and the rotation-based pose representation of body models is prone to result in the drift of joint positions. In this work, we investigate learning 3D human shape and pose from dense correspondences of body parts and propose a Decompose-and-aggregate Network (DaNet) to address these issues. DaNet adopts the dense correspondence maps, which densely build a bridge between 2D pixels and 3D vertices, as intermediate representations to facilitate the learning of 2D-to-3D mapping. The prediction modules of DaNet are decomposed into one global stream and multiple local streams to enable global and fine-grained perceptions for the shape and pose predictions, respectively. Messages from local streams are further aggregated to enhance the robust prediction of the rotation-based poses, where a position-aided rotation feature refinement strategy is proposed to exploit spatial relationships between body joints. Moreover, a Part-based Dropout (PartDrop) strategy is introduced to drop out dense information from intermediate representations during training, encouraging the network to focus on more complementary body parts as well as neighboring position features. The efficacy of the proposed method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW, showing that our method could significantly improve the reconstruction performance in comparison with previous state-of-the-art methods. Our code is publicly available at https://hongwenzhang.github.io/dense2mesh .

  • 5 authors
·
Dec 31, 2019

Measuring Primitive Accumulation: An Information-Theoretic Approach to Capitalist Enclosure in PIK2, Indonesia

Large-scale land enclosure for speculative mega-development constitutes a non-equilibrium spatial process whose velocity, topology, and irreversibility remain poorly quantified. We study the Pantai Indah Kapuk 2 (PIK2) coastal mega-development north of Jakarta, Indonesia, using eight years (2017--2024) of Sentinel-2 land-use/land-cover (LULC) data at 10-meter resolution. The landscape is projected onto a Marxian probability simplex partitioning terrestrial pixels into Commons, Agrarian, and Capital fractions. Fisher-Rao (FR) geodesic distances on this simplex identify a transformation pulse of 0.405~rad/yr during 2019--2020, coinciding with major construction activity. Absorbing Markov chain analysis yields expected absorption times into the built environment of 46.0~years for cropland and 38.1~years for tree cover, with a pooled built-area self-retention rate of 96.4%. Percolation analysis reveals that a giant connected component containing 89--95% of all built pixels persists at occupation probabilities p in [0.096, 0.162], far below the random percolation threshold p_c approx 0.593, indicating planned rather than stochastic spatial growth. The box-counting fractal dimension of the urban boundary increases from d_f = 1.316 to 1.397, consistent with increasingly irregular frontier expansion. These results suggest that information-geometric and statistical-mechanical tools can characterize the kinematic and topological signatures of capitalist spatial accumulation with quantitative precision.

Dynamical Excitation as a probe of planetary origins

We present a set of numerical simulations of the dynamical evolution of compact planetary systems migrating in a protoplanetary disk whose inner edge is sculpted by the interaction with the stellar magnetic field, as described in Yu et al. (2023). We demonstrate that the resulting final distribution of neighbouring planet period ratios contains only a small surviving fraction of resonant systems, in accordance with observations. The resulting planetary architectures are largely in place by the end of the protoplanetary disk phase (within a few Myr), and do not require significant later dynamical evolution. The divergence of planetary pairs during gas disk dispersal also leads to the excitation of eccentricities when pairs cross mean motion resonances in a divergent fashion. The resulting distribution of remnant free eccentricities is consistent with the values inferred from the observation of transit durations and transit timing variations. We furthermore demonstrate that this conclusion is not significantly altered by tides, assuming standard values for tidal dissipation in Earth or Neptune-class planets. These results demonstrate that the observed spacing and residual dynamical excitation of compact planetary systems can be reproduced by migration through a protoplanetary disk, as long as the inner disk boundary is modelled as a gradual rollover, instead of a sharp transition. Such an effect can be achieved when the model accounts for the diffusion of the stellar magnetic field into the disk. The resulting divergence of planetary pairs during the magnetospheric rebound phase breaks the resonant chains, resulting in a better match to observations than disk models with more traditional inner boundaries.

  • 4 authors
·
Oct 1, 2025

Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations

SO(3)-equivariant networks are the dominant models for machine learning interatomic potentials (MLIPs). The key operation of such networks is the Clebsch-Gordan (CG) tensor product, which is computationally expensive. To accelerate the computation, we develop tensor decomposition networks (TDNs) as a class of approximately equivariant networks in which CG tensor products are replaced by low-rank tensor decompositions, such as the CANDECOMP/PARAFAC (CP) decomposition. With the CP decomposition, we prove (i) a uniform bound on the induced error of SO(3)-equivariance, and (ii) the universality of approximating any equivariant bilinear map. To further reduce the number of parameters, we propose path-weight sharing that ties all multiplicity-space weights across the O(L^3) CG paths into a single shared parameter set without compromising equivariance, where L is the maximum angular degree. The resulting layer acts as a plug-and-play replacement for tensor products in existing networks, and the computational complexity of tensor products is reduced from O(L^6) to O(L^4). We evaluate TDNs on PubChemQCR, a newly curated molecular relaxation dataset containing 105 million DFT-calculated snapshots. We also use existing datasets, including OC20, and OC22. Results show that TDNs achieve competitive performance with dramatic speedup in computations. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS/tree/main/OpenMol/TDN{https://github.com/divelab/AIRS/}).

  • 9 authors
·
Jul 1, 2025

KARMA: A Multilevel Decomposition Hybrid Mamba Framework for Multivariate Long-Term Time Series Forecasting

Multivariate long-term and efficient time series forecasting is a key requirement for a variety of practical applications, and there are complex interleaving time dynamics in time series data that require decomposition modeling. Traditional time series decomposition methods are single and rely on fixed rules, which are insufficient for mining the potential information of the series and adapting to the dynamic characteristics of complex series. On the other hand, the Transformer-based models for time series forecasting struggle to effectively model long sequences and intricate dynamic relationships due to their high computational complexity. To overcome these limitations, we introduce KARMA, with an Adaptive Time Channel Decomposition module (ATCD) to dynamically extract trend and seasonal components. It further integrates a Hybrid Frequency-Time Decomposition module (HFTD) to further decompose Series into frequency-domain and time-domain. These components are coupled with multi-scale Mamba-based KarmaBlock to efficiently process global and local information in a coordinated manner. Experiments on eight real-world datasets from diverse domains well demonstrated that KARMA significantly outperforms mainstream baseline methods in both predictive accuracy and computational efficiency. Code and full results are available at this repository: https://github.com/yedadasd/KARMA

  • 7 authors
·
Jun 10, 2025

PowerWalk: Scalable Personalized PageRank via Random Walks with Vertex-Centric Decomposition

Most methods for Personalized PageRank (PPR) precompute and store all accurate PPR vectors, and at query time, return the ones of interest directly. However, the storage and computation of all accurate PPR vectors can be prohibitive for large graphs, especially in caching them in memory for real-time online querying. In this paper, we propose a distributed framework that strikes a better balance between offline indexing and online querying. The offline indexing attains a fingerprint of the PPR vector of each vertex by performing billions of "short" random walks in parallel across a cluster of machines. We prove that our indexing method has an exponential convergence, achieving the same precision with previous methods using a much smaller number of random walks. At query time, the new PPR vector is composed by a linear combination of related fingerprints, in a highly efficient vertex-centric decomposition manner. Interestingly, the resulting PPR vector is much more accurate than its offline counterpart because it actually uses more random walks in its estimation. More importantly, we show that such decomposition for a batch of queries can be very efficiently processed using a shared decomposition. Our implementation, PowerWalk, takes advantage of advanced distributed graph engines and it outperforms the state-of-the-art algorithms by orders of magnitude. Particularly, it responses to tens of thousands of queries on graphs with billions of edges in just a few seconds.

  • 4 authors
·
Aug 22, 2016

FLoRA: Low-Rank Core Space for N-dimension

Adapting pre-trained foundation models for various downstream tasks has been prevalent in artificial intelligence. Due to the vast number of tasks and high costs, adjusting all parameters becomes unfeasible. To mitigate this, several fine-tuning techniques have been developed to update the pre-trained model weights in a more resource-efficient manner, such as through low-rank adjustments. Yet, almost all of these methods focus on linear weights, neglecting the intricacies of parameter spaces in higher dimensions like 4D. Alternatively, some methods can be adapted for high-dimensional parameter space by compressing changes in the original space into two dimensions and then employing low-rank matrix decomposition. However, these approaches destructs the structural integrity of the involved high-dimensional spaces. To tackle the diversity of dimensional spaces across different foundation models and provide a more precise representation of the changes within these spaces, this paper introduces a generalized parameter-efficient fine-tuning framework, FLoRA, designed for various dimensional parameter space. Specifically, utilizing Tucker decomposition, FLoRA asserts that changes in each dimensional parameter space are based on a low-rank core space which maintains the consistent topological structure with the original space. It then models the changes through this core space alongside corresponding weights to reconstruct alterations in the original space. FLoRA effectively preserves the structural integrity of the change of original N-dimensional parameter space, meanwhile decomposes it via low-rank tensor decomposition. Extensive experiments on computer vision, natural language processing and multi-modal tasks validate FLoRA's effectiveness. Codes are available at https://github.com/SJTU-DeepVisionLab/FLoRA.

  • 9 authors
·
May 23, 2024

Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders

Scientific archives now contain hundreds of petabytes of data across genomics, ecology, climate, and molecular biology that could reveal undiscovered patterns if systematically analyzed at scale. Large-scale, weakly-supervised datasets in language and vision have driven the development of foundation models whose internal representations encode structure (patterns, co-occurrences and statistical regularities) beyond their training objectives. Most existing methods extract structure only for pre-specified targets; they excel at confirmation but do not support open-ended discovery of unknown patterns. We ask whether sparse autoencoders (SAEs) can enable open-ended feature discovery from foundation model representations. We evaluate this question in controlled rediscovery studies, where the learned SAE features are tested for alignment with semantic concepts on a standard segmentation benchmark and compared against strong label-free alternatives on concept-alignment metrics. Applied to ecological imagery, the same procedure surfaces fine-grained anatomical structure without access to segmentation or part labels, providing a scientific case study with ground-truth validation. While our experiments focus on vision with an ecology case study, the method is domain-agnostic and applicable to models in other sciences (e.g., proteins, genomics, weather). Our results indicate that sparse decomposition provides a practical instrument for exploring what scientific foundation models have learned, an important prerequisite for moving from confirmation to genuine discovery.

  • 4 authors
·
Nov 21, 2025

Interpreting Attention Layer Outputs with Sparse Autoencoders

Decomposing model activations into interpretable components is a key open problem in mechanistic interpretability. Sparse autoencoders (SAEs) are a popular method for decomposing the internal activations of trained transformers into sparse, interpretable features, and have been applied to MLP layers and the residual stream. In this work we train SAEs on attention layer outputs and show that also here SAEs find a sparse, interpretable decomposition. We demonstrate this on transformers from several model families and up to 2B parameters. We perform a qualitative study of the features computed by attention layers, and find multiple families: long-range context, short-range context and induction features. We qualitatively study the role of every head in GPT-2 Small, and estimate that at least 90% of the heads are polysemantic, i.e. have multiple unrelated roles. Further, we show that Sparse Autoencoders are a useful tool that enable researchers to explain model behavior in greater detail than prior work. For example, we explore the mystery of why models have so many seemingly redundant induction heads, use SAEs to motivate the hypothesis that some are long-prefix whereas others are short-prefix, and confirm this with more rigorous analysis. We use our SAEs to analyze the computation performed by the Indirect Object Identification circuit (Wang et al.), validating that the SAEs find causally meaningful intermediate variables, and deepening our understanding of the semantics of the circuit. We open-source the trained SAEs and a tool for exploring arbitrary prompts through the lens of Attention Output SAEs.

  • 5 authors
·
Jun 25, 2024

Modular versus Hierarchical: A Structural Signature of Topic Popularity in Mathematical Research

Mathematical researchers, especially those in early-career positions, face critical decisions about topic specialization with limited information about the collaborative environments of different research areas. The aim of this paper is to study how the popularity of a research topic is associated with the structure of that topic's collaboration network, as observed by a suite of measures capturing organizational structure at several scales. We apply these measures to 1,938 algorithmically discovered topics across 121,391 papers sourced from arXiv metadata during the period 2020--2025. Our analysis, which controls for the confounding effects of network size, reveals a structural dichotomy--we find that popular topics organize into modular "schools of thought," while niche topics maintain hierarchical core-periphery structures centered around established experts. This divide is not an artifact of scale, but represents a size-independent structural pattern correlated with popularity. We also document a "constraint reversal": after controlling for size, researchers in popular fields face greater structural constraints on collaboration opportunities, contrary to conventional expectations. Our findings suggest that topic selection is an implicit choice between two fundamentally different collaborative environments, each with distinct implications for a researcher's career. To make these structural patterns transparent to the research community, we developed the Math Research Compass (https://mathresearchcompass.com), an interactive platform providing data on topic popularity and collaboration patterns across mathematical topics.

  • 1 authors
·
Jun 28, 2025

Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation

Parse graphs of the human body can be obtained in the human brain to help humans complete the human pose estimation (HPE). It contains a hierarchical structure, like a tree structure, and context relations among nodes. Many researchers pre-design the parse graph of body structure, and then design framework for HPE. However, these frameworks are difficulty adapting when encountering situations that differ from the preset human structure. Different from them, we regard the feature map as a whole, similarly to human body, so the feature map can be optimized based on parse graphs and each node feature is learned implicitly instead of explicitly, which means it can flexibly respond to different human body structure. In this paper, we design the Refinement Module based on the Parse Graph of feature map (RMPG), which includes two stages: top-down decomposition and bottom-up combination. In the top-down decomposition stage, the feature map is decomposed into multiple sub-feature maps along the channel and their context relations are calculated to obtain their respective context information. In the bottom-up combination stage, the sub-feature maps and their context information are combined to obtain refined sub-feature maps, and then these refined sub-feature maps are concatenated to obtain the refined feature map. Additionally ,we design a top-down framework by using multiple RMPG modules for HPE, some of which are supervised to obtain context relations among body parts. Our framework achieves excellent results on the COCO keypoint detection, CrowdPose and MPII human pose datasets. More importantly, our experiments also demonstrate the effectiveness of RMPG on different methods, including SimpleBaselines, Hourglass, and ViTPose.

  • 3 authors
·
Jan 19, 2025

The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub

Open source developers have emerged as key actors in the political economy of artificial intelligence (AI), with open model development being recognised as an alternative to closed-source AI development. However, we still have a limited understanding of collaborative practices in open source AI. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, we find that various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Second, we analyse a snapshot of the social network structure of collaboration on models, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing isolates, collaboration is characterised by high reciprocity regardless of developers' network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, we find that various types of activity on the HF Hub are characterised by Pareto distributions, congruent with prior observations about OSS development patterns on platforms like GitHub. We conclude with a discussion of the implications of the findings and recommendations for (open source) AI researchers, developers, and policymakers.

  • 3 authors
·
May 20, 2024 1

Diffusion Models as Optimizers for Efficient Planning in Offline RL

Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes they require. In this paper, we address this problem by decomposing the sampling process of diffusion models into two decoupled subprocesses: 1) generating a feasible trajectory, which is a time-consuming process, and 2) optimizing the trajectory. With this decomposition approach, we are able to partially separate efficiency and quality factors, enabling us to simultaneously gain efficiency advantages and ensure quality assurance. We propose the Trajectory Diffuser, which utilizes a faster autoregressive model to handle the generation of feasible trajectories while retaining the trajectory optimization process of diffusion models. This allows us to achieve more efficient planning without sacrificing capability. To evaluate the effectiveness and efficiency of the Trajectory Diffuser, we conduct experiments on the D4RL benchmarks. The results demonstrate that our method achieves it 3-it 10 times faster inference speed compared to previous sequence modeling methods, while also outperforming them in terms of overall performance. https://github.com/RenMing-Huang/TrajectoryDiffuser Keywords: Reinforcement Learning and Efficient Planning and Diffusion Model

  • 7 authors
·
Jul 22, 2024

BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments

Large language models (LLMs) have revolutionized numerous applications, yet their deployment remains challenged by memory constraints on local devices. While scaling laws have enhanced LLM capabilities, the primary bottleneck has shifted from capability to availability, emphasizing the need for efficient memory management. Traditional compression methods, such as quantization, often require predefined compression ratios and separate compression processes for each setting, complicating deployment in variable memory environments. In this paper, we introduce BitStack, a novel, training-free weight compression approach that enables megabyte-level trade-offs between memory usage and model performance. By leveraging weight decomposition, BitStack can dynamically adjust the model size with minimal transmission between running memory and storage devices. Our approach iteratively decomposes weight matrices while considering the significance of each parameter, resulting in an approximately 1-bit per parameter residual block in each decomposition iteration. These blocks are sorted and stacked in storage as basic transmission units, with different quantities loaded based on current memory availability. Extensive experiments across a wide range of tasks demonstrate that, despite offering fine-grained size control, BitStack consistently matches or surpasses strong quantization baselines, particularly at extreme compression ratios. To the best of our knowledge, this is the first decomposition-based method that effectively bridges the gap to practical compression techniques like quantization. Code is available at https://github.com/xinghaow99/BitStack.

  • 6 authors
·
Oct 31, 2024 6

Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data

Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse (NC). Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified "unconstrained feature model". In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of NC under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.

  • 6 authors
·
Jan 1, 2023

Tracking Star-Forming Cores as Mass Reservoirs in Clustered and Isolated Regions Using Numerical Passive Tracer Particles

Understanding the physical properties of star-forming cores as mass reservoirs for protostars, and the impact of turbulence, is crucial in star formation studies. We implemented passive tracer particles in clump-scale numerical simulations with turbulence strengths of M_{rm rms} = 2, 10. Unlike core identification methods used in observational studies, we identified 260 star-forming cores using a new method based on tracer particles falling onto protostars. Our findings reveal that star-forming cores do not necessarily coincide with high-density regions when nearby stars are present, as gas selectively accretes onto protostars, leading to clumpy, fragmented structures. We calculated convex hull cores from star-forming cores and defined their filling factors. Regardless of turbulence strength, convex hull cores with lower filling factors tend to contain more protostars and have larger masses and sizes, indicating that cores in clustered regions are more massive and larger than those in isolated regions. Thus, the filling factor serves as a key indicator for distinguishing between isolated and clustered star-forming regions and may provide insights into the star formation processes within clustered regions. We also found that most convex hull cores are gravitationally bound. However, in the M_{rm rms} = 10 model, there are more low-mass, unbound convex hull cores compared to the M_{rm rms} = 2 model. In the M_{rm rms} = 10 model, 16% of the convex hull cores are unbound, which may be explained by the inertial-inflow model. These findings highlight the influence of turbulence strength on the mass and gravitational stability of cores.

  • 4 authors
·
Jan 4, 2025

Solving a Million-Step LLM Task with Zero Errors

LLMs have achieved remarkable breakthroughs in reasoning, insights, and tool use, but chaining these abilities into extended processes at the scale of those routinely executed by humans, organizations, and societies has remained out of reach. The models have a persistent error rate that prevents scale-up: for instance, recent experiments in the Towers of Hanoi benchmark domain showed that the process inevitably becomes derailed after at most a few hundred steps. Thus, although LLM research is often still benchmarked on tasks with relatively few dependent logical steps, there is increasing attention on the ability (or inability) of LLMs to perform long range tasks. This paper describes MAKER, the first system that successfully solves a task with over one million LLM steps with zero errors, and, in principle, scales far beyond this level. The approach relies on an extreme decomposition of a task into subtasks, each of which can be tackled by focused microagents. The high level of modularity resulting from the decomposition allows error correction to be applied at each step through an efficient multi-agent voting scheme. This combination of extreme decomposition and error correction makes scaling possible. Thus, the results suggest that instead of relying on continual improvement of current LLMs, massively decomposed agentic processes (MDAPs) may provide a way to efficiently solve problems at the level of organizations and societies.

CognizantAI Cognizant
·
Nov 12, 2025 3

Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning

Deep learning in general domains has constantly been extended to domain-specific tasks requiring the recognition of fine-grained characteristics. However, real-world applications for fine-grained tasks suffer from two challenges: a high reliance on expert knowledge for annotation and necessity of a versatile model for various downstream tasks in a specific domain (e.g., prediction of categories, bounding boxes, or pixel-wise annotations). Fortunately, the recent self-supervised learning (SSL) is a promising approach to pretrain a model without annotations, serving as an effective initialization for any downstream tasks. Since SSL does not rely on the presence of annotation, in general, it utilizes the large-scale unlabeled dataset, referred to as an open-set. In this sense, we introduce a novel Open-Set Self-Supervised Learning problem under the assumption that a large-scale unlabeled open-set is available, as well as the fine-grained target dataset, during a pretraining phase. In our problem setup, it is crucial to consider the distribution mismatch between the open-set and target dataset. Hence, we propose SimCore algorithm to sample a coreset, the subset of an open-set that has a minimum distance to the target dataset in the latent space. We demonstrate that SimCore significantly improves representation learning performance through extensive experimental settings, including eleven fine-grained datasets and seven open-sets in various downstream tasks.

  • 3 authors
·
Mar 20, 2023

Concatenated Matrix SVD: Compression Bounds, Incremental Approximation, and Error-Constrained Clustering

Large collections of matrices arise throughout modern machine learning, signal processing, and scientific computing, where they are commonly compressed by concatenation followed by truncated singular value decomposition (SVD). This strategy enables parameter sharing and efficient reconstruction and has been widely adopted across domains ranging from multi-view learning and signal processing to neural network compression. However, it leaves a fundamental question unanswered: which matrices can be safely concatenated and compressed together under explicit reconstruction error constraints? Existing approaches rely on heuristic or architecture-specific grouping and provide no principled guarantees on the resulting SVD approximation error. In the present work, we introduce a theory-driven framework for compression-aware clustering of matrices under SVD compression constraints. Our analysis establishes new spectral bounds for horizontally concatenated matrices, deriving global upper bounds on the optimal rank-r SVD reconstruction error from lower bounds on singular value growth. The first bound follows from Weyl-type monotonicity under blockwise extensions, while the second leverages singular values of incremental residuals to yield tighter, per-block guarantees. We further develop an efficient approximate estimator based on incremental truncated SVD that tracks dominant singular values without forming the full concatenated matrix. Therefore, we propose three clustering algorithms that merge matrices only when their predicted joint SVD compression error remains below a user-specified threshold. The algorithms span a trade-off between speed, provable accuracy, and scalability, enabling compression-aware clustering with explicit error control. Code is available online.

  • 1 authors
·
Jan 12

Physics-informed cluster analysis and a priori efficiency criterion for the construction of local reduced-order bases

Nonlinear model order reduction has opened the door to parameter optimization and uncertainty quantification in complex physics problems governed by nonlinear equations. In particular, the computational cost of solving these equations can be reduced by means of local reduced-order bases. This article examines the benefits of a physics-informed cluster analysis for the construction of cluster-specific reduced-order bases. We illustrate that the choice of the dissimilarity measure for clustering is fundamental and highly affects the performances of the local reduced-order bases. It is shown that clustering with an angle-based dissimilarity on simulation data efficiently decreases the intra-cluster Kolmogorov N-width. Additionally, an a priori efficiency criterion is introduced to assess the relevance of a ROM-net, a methodology for the reduction of nonlinear physics problems introduced in our previous work in [T. Daniel, F. Casenave, N. Akkari, D. Ryckelynck, Model order reduction assisted by deep neural networks (ROM-net), Advanced Modeling and Simulation in Engineering Sciences 7 (16), 2020]. This criterion also provides engineers with a very practical method for ROM-nets' hyperparameters calibration under constrained computational costs for the training phase. On five different physics problems, our physics-informed clustering strategy significantly outperforms classic strategies for the construction of local reduced-order bases in terms of projection errors.

  • 5 authors
·
Mar 25, 2021

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Mechanistic interpretability aims to understand the behavior of neural networks by reverse-engineering their internal computations. However, current methods struggle to find clear interpretations of neural network activations because a decomposition of activations into computational features is missing. Individual neurons or model components do not cleanly correspond to distinct features or functions. We present a novel interpretability method that aims to overcome this limitation by transforming the activations of the network into a new basis - the Local Interaction Basis (LIB). LIB aims to identify computational features by removing irrelevant activations and interactions. Our method drops irrelevant activation directions and aligns the basis with the singular vectors of the Jacobian matrix between adjacent layers. It also scales features based on their importance for downstream computation, producing an interaction graph that shows all computationally-relevant features and interactions in a model. We evaluate the effectiveness of LIB on modular addition and CIFAR-10 models, finding that it identifies more computationally-relevant features that interact more sparsely, compared to principal component analysis. However, LIB does not yield substantial improvements in interpretability or interaction sparsity when applied to language models. We conclude that LIB is a promising theory-driven approach for analyzing neural networks, but in its current form is not applicable to large language models.

  • 10 authors
·
May 17, 2024

Fat Polygonal Partitions with Applications to Visualization and Embeddings

Let T be a rooted and weighted tree, where the weight of any node is equal to the sum of the weights of its children. The popular Treemap algorithm visualizes such a tree as a hierarchical partition of a square into rectangles, where the area of the rectangle corresponding to any node in T is equal to the weight of that node. The aspect ratio of the rectangles in such a rectangular partition necessarily depends on the weights and can become arbitrarily high. We introduce a new hierarchical partition scheme, called a polygonal partition, which uses convex polygons rather than just rectangles. We present two methods for constructing polygonal partitions, both having guarantees on the worst-case aspect ratio of the constructed polygons; in particular, both methods guarantee a bound on the aspect ratio that is independent of the weights of the nodes. We also consider rectangular partitions with slack, where the areas of the rectangles may differ slightly from the weights of the corresponding nodes. We show that this makes it possible to obtain partitions with constant aspect ratio. This result generalizes to hyper-rectangular partitions in R^d. We use these partitions with slack for embedding ultrametrics into d-dimensional Euclidean space: we give a rm polylog(Delta)-approximation algorithm for embedding n-point ultrametrics into R^d with minimum distortion, where Delta denotes the spread of the metric, i.e., the ratio between the largest and the smallest distance between two points. The previously best-known approximation ratio for this problem was polynomial in n. This is the first algorithm for embedding a non-trivial family of weighted-graph metrics into a space of constant dimension that achieves polylogarithmic approximation ratio.

  • 3 authors
·
Sep 9, 2010

Coverage-centric Coreset Selection for High Pruning Rates

One-shot coreset selection aims to select a representative subset of the training data, given a pruning rate, that can later be used to train future models while retaining high accuracy. State-of-the-art coreset selection methods pick the highest importance examples based on an importance metric and are found to perform well at low pruning rates. However, at high pruning rates, they suffer from a catastrophic accuracy drop, performing worse than even random sampling. This paper explores the reasons behind this accuracy drop both theoretically and empirically. We first propose a novel metric to measure the coverage of a dataset on a specific distribution by extending the classical geometric set cover problem to a distribution cover problem. This metric helps explain why coresets selected by SOTA methods at high pruning rates perform poorly compared to random sampling because of worse data coverage. We then propose a novel one-shot coreset selection method, Coverage-centric Coreset Selection (CCS), that jointly considers overall data coverage upon a distribution as well as the importance of each example. We evaluate CCS on five datasets and show that, at high pruning rates (e.g., 90%), it achieves significantly better accuracy than previous SOTA methods (e.g., at least 19.56% higher on CIFAR10) as well as random selection (e.g., 7.04% higher on CIFAR10) and comparable accuracy at low pruning rates. We make our code publicly available at https://github.com/haizhongzheng/Coverage-centric-coreset-selection.

  • 4 authors
·
Oct 27, 2022

What Drives Cluster Cool-Core Transformations? A Population Level Analysis of TNG-Cluster

In this study, we examine the frequency and physical drivers of transformations from cool-core (CC) to non-cool-core (NCC) clusters, and vice versa, in a sample of 352 massive galaxy clusters (M_vir = 10^14-15.3 M_sun) from the TNG-Cluster magnetohydrodynamical cosmological simulation of galaxies. By identifying transformations based on the evolution of central entropy and focusing on z<2.5, we find that clusters frequently undergo such events, depending on their assembly and supermassive black hole histories. On average, clusters experience 2 to 3 transformations. Transformations can occur in both directions and can be temporary, but those to higher entropy cores, i.e. in the direction from CC to NCC states, are the vast majority. CC phases are shorter than NCC phases, and thus overall the TNG-Cluster population forms with low-entropy cores and moves towards NCC states with time. We study the role that mergers play in driving transformations, and find that mergers within ~1Gyr prior to a transformation toward higher (but not lower) entropy cores occur statistically more often than in a random control sample. Most importantly, we find examples of mergers associated with CC disruption regardless of their mass ratio or angular momentum. However, past merger activity is not a good predictor for z=0 CC status, at least based on core entropy, even though clusters undergoing more mergers eventually have the highest core entropy values at z=0. We consider the interplay between AGN feedback and evolving cluster core thermodynamics. We find that core transformations are accompanied by an increase in AGN activity, whereby frequent and repeated (kinetic) energy injections from the central SMBHs can produce a collective, long-term impact on central entropy, ultimately heating cluster cores. Whether such fast-paced periods of AGN activity are triggered by mergers is plausible, but not necessary.

  • 3 authors
·
Mar 3, 2025

Outward Migration of a Gas Accreting Planet: A Semi-Analytical Formula

Type II orbital migration is a key process to regulate the mass and semimajor axis distribution of exoplanetary giant planets. The conventional formula of type II migration generally predicts too rapid inward migration to reconcile with the observed pile-up of gas giant beyond 1 au. Analyzing the recent high-resolution hydrodynamical simulations by Li et al. (2024) and Pan et al. (2025) that show robust outward migration of a gas accreting planet, we here clarify the condition for the outward migration to occur and derive a general semi-analytical formula that can be applied for broad range of planet mass and disk conditions. The striking outward migration is caused by azimuthal asymmetry in corotation torque exerted from cicumplanetary disk regions (connecting to horseshoe flow) that is produced by the planetary gas accretion, while the conventional inward migration model is based on radial asymmetry in the torques from the circumstellar protoplanetry disk. We found that the azimuthal asymmetry dominates and the migration is outward, when the gap depth defined by the surface density reduction factor of 1/(1+K') is in the range of 0.03 lesssim K' lesssim 50. Using simple models with the new formula, we demonstrate that the outward migration plays an important role in shaping the mass and semimajor axis distribution of gas giants. The concurrent dependence of planets' accretion rate and migration direction on their masses and disk properties potentially reproduces the observed pile-up of exoplanetary gas giants beyond 1 au, although more detailed planet population synthesis calculations are needed in the future.

  • 5 authors
·
Nov 28, 2025

Dynamical evolution of massless particles in star clusters with NBODY6++GPU-MASSLESS: I. Free-floating MLPs

Context. Low-mass bodies, such as comets, asteroids, planetesimals, and free-floating planets, are continuously injected into the intra-cluster environment after expulsion from their host planetary systems. These can be modeled as massless particles (MLPs, hereafter). The dynamics of large populations of MLPs, however, has yet received little attention in literature. Aims. We investigate the dynamical evolution of MLP populations in star clusters, and characterize their kinematics and ejection rates. Methods. We present NBODY6++GPU-MASSLESS, a modified version of the N-body simulation code NBODY6++GPU, that allows fast integration of star clusters that contain large numbers of massless particles (MLPs). NBODY6++GPU-MASSLESS contains routines specifically directed at the dynamical evolution of low-mass bodies, such as planets. Results. Unlike stars, MLPs do not participate in the mass segregation process. Instead, MLPs mostly follow the gravitational potential of the star cluster, which gradually decreases over time due to stellar ejections and stellar evolution. The dynamical evolution of MLPs is primarily affected by the evolution of the core of the star cluster. This is most apparent in the outer regions for clusters with higher initial densities. High escape rates of MLPs are observed before the core-collapse, after which escape rates remain stable. Denser star clusters undergo a more intense core collapse, but this does not impact the dynamical evolution of MLPs. The speeds of escaping stars are similar to those of escaping MLPs, when disregarding the high-velocity ejections of neutron stars during the first 50 Myr.

  • 5 authors
·
Dec 11, 2024

RegionE: Adaptive Region-Aware Generation for Efficient Image Editing

Recently, instruction-based image editing (IIE) has received widespread attention. In practice, IIE often modifies only specific regions of an image, while the remaining areas largely remain unchanged. Although these two types of regions differ significantly in generation difficulty and computational redundancy, existing IIE models do not account for this distinction, instead applying a uniform generation process across the entire image. This motivates us to propose RegionE, an adaptive, region-aware generation framework that accelerates IIE tasks without additional training. Specifically, the RegionE framework consists of three main components: 1) Adaptive Region Partition. We observed that the trajectory of unedited regions is straight, allowing for multi-step denoised predictions to be inferred in a single step. Therefore, in the early denoising stages, we partition the image into edited and unedited regions based on the difference between the final estimated result and the reference image. 2) Region-Aware Generation. After distinguishing the regions, we replace multi-step denoising with one-step prediction for unedited areas. For edited regions, the trajectory is curved, requiring local iterative denoising. To improve the efficiency and quality of local iterative generation, we propose the Region-Instruction KV Cache, which reduces computational cost while incorporating global information. 3) Adaptive Velocity Decay Cache. Observing that adjacent timesteps in edited regions exhibit strong velocity similarity, we further propose an adaptive velocity decay cache to accelerate the local denoising process. We applied RegionE to state-of-the-art IIE base models, including Step1X-Edit, FLUX.1 Kontext, and Qwen-Image-Edit. RegionE achieved acceleration factors of 2.57, 2.41, and 2.06. Evaluations by GPT-4o confirmed that semantic and perceptual fidelity were well preserved.

  • 10 authors
·
Oct 29, 2025 1

AutoCoreset: An Automatic Practical Coreset Construction Framework

A coreset is a tiny weighted subset of an input set, that closely resembles the loss function, with respect to a certain set of queries. Coresets became prevalent in machine learning as they have shown to be advantageous for many applications. While coreset research is an active research area, unfortunately, coresets are constructed in a problem-dependent manner, where for each problem, a new coreset construction algorithm is usually suggested, a process that may take time or may be hard for new researchers in the field. Even the generic frameworks require additional (problem-dependent) computations or proofs to be done by the user. Besides, many problems do not have (provable) small coresets, limiting their applicability. To this end, we suggest an automatic practical framework for constructing coresets, which requires (only) the input data and the desired cost function from the user, without the need for any other task-related computation to be done by the user. To do so, we reduce the problem of approximating a loss function to an instance of vector summation approximation, where the vectors we aim to sum are loss vectors of a specific subset of the queries, such that we aim to approximate the image of the function on this subset. We show that while this set is limited, the coreset is quite general. An extensive experimental study on various machine learning applications is also conducted. Finally, we provide a ``plug and play" style implementation, proposing a user-friendly system that can be easily used to apply coresets for many problems. Full open source code can be found at https://github.com/alaamaalouf/AutoCoreset{https://github.com/alaamaalouf/AutoCoreset}. We believe that these contributions enable future research and easier use and applications of coresets.

  • 4 authors
·
May 19, 2023

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the existing highly efficient kernels over floating point matrices. We study the potential to compress Large Language Models (LLMs) for monolingual Code generation via Low Rank Decomposition (LoRD) and observe that ranks for the linear layers in these models can be reduced by upto 39.58% with less than 1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single A100. The compressed models speeds up inference by up to 22.35% with just a single line of change in code over huggingface's implementation with pytorch backend. Low Rank Decomposition (LoRD) models remain compatible with state of the art near-lossless quantization method such as SpQR, which allows leveraging further compression gains of quantization. Lastly, QLoRA over Low Rank Decomposition (LoRD) model further reduces memory requirements by as much as 21.2% over vanilla QLoRA while offering similar gains from parameter efficient fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new paradigm for LLM compression.

  • 3 authors
·
Sep 25, 2023

Shortcut Partitions in Minor-Free Graphs: Steiner Point Removal, Distance Oracles, Tree Covers, and More

The notion of shortcut partition, introduced recently by Chang, Conroy, Le, Milenkovi\'c, Solomon, and Than [CCLMST23], is a new type of graph partition into low-diameter clusters. Roughly speaking, the shortcut partition guarantees that for every two vertices u and v in the graph, there exists a path between u and v that intersects only a few clusters. They proved that any planar graph admits a shortcut partition and gave several applications, including a construction of tree cover for arbitrary planar graphs with stretch 1+varepsilon and O(1) many trees for any fixed varepsilon in (0,1). However, the construction heavily exploits planarity in multiple steps, and is thus inherently limited to planar graphs. In this work, we breach the "planarity barrier" to construct a shortcut partition for K_r-minor-free graphs for any r. To this end, we take a completely different approach -- our key contribution is a novel deterministic variant of the cop decomposition in minor-free graphs [And86, AGG14]. Our shortcut partition for K_r-minor-free graphs yields several direct applications. Most notably, we construct the first optimal distance oracle for K_r-minor-free graphs, with 1+varepsilon stretch, linear space, and constant query time for any fixed varepsilon in (0,1). The previous best distance oracle [AG06] uses O(nlog n) space and O(log n) query time, and its construction relies on Robertson-Seymour structural theorem and other sophisticated tools. We also obtain the first tree cover of O(1) size for minor-free graphs with stretch 1+varepsilon, while the previous best (1+varepsilon)-tree cover has size O(log^2 n) [BFN19].

  • 6 authors
·
Jul 31, 2023

From Cities to Series: Complex Networks and Deep Learning for Improved Spatial and Temporal Analytics*

Graphs have often been used to answer questions about the interaction between real-world entities by taking advantage of their capacity to represent complex topologies. Complex networks are known to be graphs that capture such non-trivial topologies; they are able to represent human phenomena such as epidemic processes, the dynamics of populations, and the urbanization of cities. The investigation of complex networks has been extrapolated to many fields of science, with particular emphasis on computing techniques, including artificial intelligence. In such a case, the analysis of the interaction between entities of interest is transposed to the internal learning of algorithms, a paradigm whose investigation is able to expand the state of the art in Computer Science. By exploring this paradigm, this thesis puts together complex networks and machine learning techniques to improve the understanding of the human phenomena observed in pandemics, pendular migration, and street networks. Accordingly, we contribute with: (i) a new neural network architecture capable of modeling dynamic processes observed in spatial and temporal data with applications in epidemics propagation, weather forecasting, and patient monitoring in intensive care units; (ii) a machine-learning methodology for analyzing and predicting links in the scope of human mobility between all the cities of Brazil; and, (iii) techniques for identifying inconsistencies in the urban planning of cities while tracking the most influential vertices, with applications over Brazilian and worldwide cities. We obtained results sustained by sound evidence of advances to the state of the art in artificial intelligence, rigorous formalisms, and ample experimentation. Our findings rely upon real-world applications in a range of domains, demonstrating the applicability of our methodologies.

  • 2 authors
·
Jun 1, 2022

Demystifying Local and Global Fairness Trade-offs in Federated Learning Using Partial Information Decomposition

This work presents an information-theoretic perspective to group fairness trade-offs in federated learning (FL) with respect to sensitive attributes, such as gender, race, etc. Existing works often focus on either global fairness (overall disparity of the model across all clients) or local fairness (disparity of the model at each client), without always considering their trade-offs. There is a lack of understanding regarding the interplay between global and local fairness in FL, particularly under data heterogeneity, and if and when one implies the other. To address this gap, we leverage a body of work in information theory called partial information decomposition (PID), which first identifies three sources of unfairness in FL, namely, Unique Disparity, Redundant Disparity, and Masked Disparity. We demonstrate how these three disparities contribute to global and local fairness using canonical examples. This decomposition helps us derive fundamental limits on the trade-off between global and local fairness, highlighting where they agree or disagree. We introduce the Accuracy and Global-Local Fairness Optimality Problem (AGLFOP), a convex optimization that defines the theoretical limits of accuracy and fairness trade-offs, identifying the best possible performance any FL strategy can attain given a dataset and client distribution. We also present experimental results on synthetic datasets and the ADULT dataset to support our theoretical findings.

  • 2 authors
·
Jul 20, 2023

Scaling Implicit Fields via Hypernetwork-Driven Multiscale Coordinate Transformations

Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, 3D shapes, signed distance fields, and radiance fields. While significant progress has been made in architecture design (e.g., SIREN, FFC, KAN-based INRs) and optimization strategies (meta-learning, amortization, distillation), existing approaches still suffer from two core limitations: (1) a representation bottleneck that forces a single MLP to uniformly model heterogeneous local structures, and (2) limited scalability due to the absence of a hierarchical mechanism that dynamically adapts to signal complexity. This work introduces Hyper-Coordinate Implicit Neural Representations (HC-INR), a new class of INRs that break the representational bottleneck by learning signal-adaptive coordinate transformations using a hypernetwork. HC-INR decomposes the representation task into two components: (i) a learned multiscale coordinate transformation module that warps the input domain into a disentangled latent space, and (ii) a compact implicit field network that models the transformed signal with significantly reduced complexity. The proposed model introduces a hierarchical hypernetwork architecture that conditions coordinate transformations on local signal features, enabling dynamic allocation of representation capacity. We theoretically show that HC-INR strictly increases the upper bound of representable frequency bands while maintaining Lipschitz stability. Extensive experiments across image fitting, shape reconstruction, and neural radiance field approximation demonstrate that HC-INR achieves up to 4 times higher reconstruction fidelity than strong INR baselines while using 30--60\% fewer parameters.

  • 1 authors
·
Nov 23, 2025

WADEPre: A Wavelet-based Decomposition Model for Extreme Precipitation Nowcasting with Multi-Scale Learning

The heavy-tailed nature of precipitation intensity impedes precise precipitation nowcasting. Standard models that optimize pixel-wise losses are prone to regression-to-the-mean bias, which blurs extreme values. Existing Fourier-based methods also lack the spatial localization needed to resolve transient convective cells. To overcome these intrinsic limitations, we propose WADEPre, a wavelet-based decomposition model for extreme precipitation that transitions the modeling into the wavelet domain. By leveraging the Discrete Wavelet Transform for explicit decomposition, WADEPre employs a dual-branch architecture: an Approximation Network to model stable, low-frequency advection, isolating deterministic trends from statistical bias, and a spatially localized Detail Network to capture high-frequency stochastic convection, resolving transient singularities and preserving sharp boundaries. A subsequent Refiner module then dynamically reconstructs these decoupled multi-scale components into the final high-fidelity forecast. To address optimization instability, we introduce a multi-scale curriculum learning strategy that progressively shifts supervision from coarse scales to fine-grained details. Extensive experiments on the SEVIR and Shanghai Radar datasets demonstrate that WADEPre achieves state-of-the-art performance, yielding significant improvements in capturing extreme thresholds and maintaining structural fidelity. Our code is available at https://github.com/sonderlau/WADEPre.

  • 7 authors
·
Feb 2

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): Fixing data, increasing the number of parameters merely scales the bulk of the spectrum; fixing the dimension and changing the data (for instance adding more clusters or making the data less separable) only affects the outliers. We believe that our observations have striking implications for non-convex optimization in high dimensions. First, the flatness of such landscapes (which can be measured by the singularity of the Hessian) implies that classical notions of basins of attraction may be quite misleading. And that the discussion of wide/narrow basins may be in need of a new perspective around over-parametrization and redundancy that are able to create large connected components at the bottom of the landscape. Second, the dependence of small number of large eigenvalues to the data distribution can be linked to the spectrum of the covariance matrix of gradients of model outputs. With this in mind, we may reevaluate the connections within the data-architecture-algorithm framework of a model, hoping that it would shed light into the geometry of high-dimensional and non-convex spaces in modern applications. In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.

  • 5 authors
·
Jun 14, 2017

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Time series forecasting is widely used in extensive applications, such as traffic planning and weather forecasting. However, real-world time series usually present intricate temporal variations, making forecasting extremely challenging. Going beyond the mainstream paradigms of plain decomposition and multiperiodicity analysis, we analyze temporal variations in a novel view of multiscale-mixing, which is based on an intuitive but important observation that time series present distinct patterns in different sampling scales. The microscopic and the macroscopic information are reflected in fine and coarse scales respectively, and thereby complex variations can be inherently disentangled. Based on this observation, we propose TimeMixer as a fully MLP-based architecture with Past-Decomposable-Mixing (PDM) and Future-Multipredictor-Mixing (FMM) blocks to take full advantage of disentangled multiscale series in both past extraction and future prediction phases. Concretely, PDM applies the decomposition to multiscale series and further mixes the decomposed seasonal and trend components in fine-to-coarse and coarse-to-fine directions separately, which successively aggregates the microscopic seasonal and macroscopic trend information. FMM further ensembles multiple predictors to utilize complementary forecasting capabilities in multiscale observations. Consequently, TimeMixer is able to achieve consistent state-of-the-art performances in both long-term and short-term forecasting tasks with favorable run-time efficiency.

  • 8 authors
·
May 23, 2024