Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribePulsar: Efficient Sphere-based Neural Rendering
We propose Pulsar, an efficient sphere-based differentiable renderer that is orders of magnitude faster than competing techniques, modular, and easy-to-use due to its tight integration with PyTorch. Differentiable rendering is the foundation for modern neural rendering approaches, since it enables end-to-end training of 3D scene representations from image observations. However, gradient-based optimization of neural mesh, voxel, or function representations suffers from multiple challenges, i.e., topological inconsistencies, high memory footprints, or slow rendering speeds. To alleviate these problems, Pulsar employs: 1) a sphere-based scene representation, 2) an efficient differentiable rendering engine, and 3) neural shading. Pulsar executes orders of magnitude faster than existing techniques and allows real-time rendering and optimization of representations with millions of spheres. Using spheres for the scene representation, unprecedented speed is obtained while avoiding topology problems. Pulsar is fully differentiable and thus enables a plethora of applications, ranging from 3D reconstruction to general neural rendering.
A Simple Introduction to the SiMPL Method for Density-Based Topology Optimization
We introduce a novel method for solving density-based topology optimization problems: Sigmoidal Mirror descent with a Projected Latent variable (SiMPL). The SiMPL method (pronounced as ``the simple method'') optimizes a design using only first-order derivative information of the objective function. The bound constraints on the density field are enforced with the help of the (negative) Fermi--Dirac entropy, which is also used to define a non-symmetric distance function called a Bregman divergence on the set of admissible designs. This Bregman divergence leads to a simple update rule that is further simplified with the help of a so-called latent variable. Because the SiMPL method involves discretizing the latent variable, it produces a sequence of pointwise-feasible iterates, even when high-order finite elements are used in the discretization. Numerical experiments demonstrate that the method outperforms other popular first-order optimization algorithms. To outline the general applicability of the technique, we include examples with (self-load) compliance minimization and compliant mechanism optimization problems.
Self-Directed Online Machine Learning for Topology Optimization
Topology optimization by optimally distributing materials in a given domain requires non-gradient optimizers to solve highly complicated problems. However, with hundreds of design variables or more involved, solving such problems would require millions of Finite Element Method (FEM) calculations whose computational cost is huge and impractical. Here we report Self-directed Online Learning Optimization (SOLO) which integrates Deep Neural Network (DNN) with FEM calculations. A DNN learns and substitutes the objective as a function of design variables. A small number of training data is generated dynamically based on the DNN's prediction of the optimum. The DNN adapts to the new training data and gives better prediction in the region of interest until convergence. The optimum predicted by the DNN is proved to converge to the true global optimum through iterations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 ~ 5 orders of magnitude compared with directly using heuristic methods, and outperformed all state-of-the-art algorithms tested in our experiments. This approach enables solving large multi-dimensional optimization problems.
TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts
Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as deepfake texts. There are currently over 11K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as Authorship Attribution (AA), in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose TopRoBERTa to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the RoBERTa model. We show the benefits of having a TDA layer when dealing with noisy, imbalanced, and heterogeneous datasets, by extracting TDA features from the reshaped pooled_output of RoBERTa as input. We use RoBERTa to capture contextual representations (i.e., semantic and syntactic linguistic features), while using TDA to capture the shape and structure of data (i.e., linguistic structures). Finally, TopRoBERTa, outperforms the vanilla RoBERTa in 2/3 datasets, achieving up to 7\% increase in Macro F1 score.
TOMATOES: Topology and Material Optimization for Latent Heat Thermal Energy Storage Devices
Latent heat thermal energy storage (LHTES) systems are compelling candidates for energy storage, primarily owing to their high storage density. Improving their performance is crucial for developing the next-generation efficient and cost effective devices. Topology optimization (TO) has emerged as a powerful computational tool to design LHTES systems by optimally distributing a high-conductivity material (HCM) and a phase change material (PCM). However, conventional TO typically limits to optimizing the geometry for a fixed, pre-selected materials. This approach does not leverage the large and expanding databases of novel materials. Consequently, the co-design of material and geometry for LHTES remains a challenge and unexplored. To address this limitation, we present an automated design framework for the concurrent optimization of material choice and topology. A key challenge is the discrete nature of material selection, which is incompatible with the gradient-based methods used for TO. We overcome this by using a data-driven variational autoencoder (VAE) to project discrete material databases for both the HCM and PCM onto continuous and differentiable latent spaces. These continuous material representations are integrated into an end-to-end differentiable, transient nonlinear finite-element solver that accounts for phase change. We demonstrate this framework on a problem aimed at maximizing the discharged energy within a specified time, subject to cost constraints. The effectiveness of the proposed method is validated through several illustrative examples.
Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling
While Large Language Models require more and more data to train and scale, rather than looking for any data to acquire, we should consider what types of tasks are more likely to benefit from data scaling. We should be intentional in our data acquisition. We argue that the topology of data itself informs which tasks to prioritize in data scaling, and shapes the development of the next generation of compute paradigms for tasks where data scaling is inefficient, or even insufficient.
Graph-based Topology Reasoning for Driving Scenes
Understanding the road genome is essential to realize autonomous driving. This highly intelligent problem contains two aspects - the connection relationship of lanes, and the assignment relationship between lanes and traffic elements, where a comprehensive topology reasoning method is vacant. On one hand, previous map learning techniques struggle in deriving lane connectivity with segmentation or laneline paradigms; or prior lane topology-oriented approaches focus on centerline detection and neglect the interaction modeling. On the other hand, the traffic element to lane assignment problem is limited in the image domain, leaving how to construct the correspondence from two views an unexplored challenge. To address these issues, we present TopoNet, the first end-to-end framework capable of abstracting traffic knowledge beyond conventional perception tasks. To capture the driving scene topology, we introduce three key designs: (1) an embedding module to incorporate semantic knowledge from 2D elements into a unified feature space; (2) a curated scene graph neural network to model relationships and enable feature interaction inside the network; (3) instead of transmitting messages arbitrarily, a scene knowledge graph is devised to differentiate prior knowledge from various types of the road genome. We evaluate TopoNet on the challenging scene understanding benchmark, OpenLane-V2, where our approach outperforms all previous works by a great margin on all perceptual and topological metrics. The code is released at https://github.com/OpenDriveLab/TopoNet
LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular Automata
The Game of Life (Life), a well known algorithm within the broader class of cellular automata (CA), exhibits complex emergent dynamics, with extreme sensitivity to initial conditions. Modeling and predicting such intricate behavior without explicit knowledge of the system's underlying topology presents a significant challenge, motivating the development of algorithms that can generalize across various grid configurations and boundary conditions. We develop a decoder-only generative pretrained transformer model to solve this problem, showing that our model can simulate Life on a toroidal grid with no prior knowledge on the size of the grid, or its periodic boundary conditions (LifeGPT). LifeGPT is topology-agnostic with respect to its training data and our results show that a GPT model is capable of capturing the deterministic rules of a Turing-complete system with near-perfect accuracy, given sufficiently diverse training data. We also introduce the idea of an `autoregressive autoregressor' to recursively implement Life using LifeGPT. Our results pave the path towards true universal computation within a large language model (LLM) framework, synthesizing of mathematical analysis with natural language processing, and probing AI systems for situational awareness about the evolution of such algorithms without ever having to compute them. Similar GPTs could potentially solve inverse problems in multicellular self-assembly by extracting CA-compatible rulesets from real-world biological systems to create new predictive models, which would have significant consequences for the fields of bioinspired materials, tissue engineering, and architected materials design.
TopoFR: A Closer Look at Topology Alignment on Face Recognition
The field of face recognition (FR) has undergone significant advancements with the rise of deep learning. Recently, the success of unsupervised learning and graph neural networks has demonstrated the effectiveness of data structure information. Considering that the FR task can leverage large-scale training data, which intrinsically contains significant structure information, we aim to investigate how to encode such critical structure information into the latent space. As revealed from our observations, directly aligning the structure information between the input and latent spaces inevitably suffers from an overfitting problem, leading to a structure collapse phenomenon in the latent space. To address this problem, we propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE. Concretely, PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model. To mitigate the impact of hard samples on the latent space structure, SDE accurately identifies hard samples by automatically computing structure damage score (SDS) for each sample, and directs the model to prioritize optimizing these samples. Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods. Code and models are available at: https://github.com/modelscope/facechain/tree/main/face_module/TopoFR.
Vanishing Variance Problem in Fully Decentralized Neural-Network Systems
Federated learning and gossip learning are emerging methodologies designed to mitigate data privacy concerns by retaining training data on client devices and exclusively sharing locally-trained machine learning (ML) models with others. The primary distinction between the two lies in their approach to model aggregation: federated learning employs a centralized parameter server, whereas gossip learning adopts a fully decentralized mechanism, enabling direct model exchanges among nodes. This decentralized nature often positions gossip learning as less efficient compared to federated learning. Both methodologies involve a critical step: computing a representation of received ML models and integrating this representation into the existing model. Conventionally, this representation is derived by averaging the received models, exemplified by the FedAVG algorithm. Our findings suggest that this averaging approach inherently introduces a potential delay in model convergence. We identify the underlying cause and refer to it as the "vanishing variance" problem, where averaging across uncorrelated ML models undermines the optimal variance established by the Xavier weight initialization. Unlike federated learning where the central server ensures model correlation, and unlike traditional gossip learning which circumvents this problem through model partitioning and sampling, our research introduces a variance-corrected model averaging algorithm. This novel algorithm preserves the optimal variance needed during model averaging, irrespective of network topology or non-IID data distributions. Our extensive simulation results demonstrate that our approach enables gossip learning to achieve convergence efficiency comparable to that of federated learning.
Manifold Learning by Mixture Models of VAEs for Inverse Problems
Representing a manifold of very high-dimensional data with generative models has been shown to be computationally efficient in practice. However, this requires that the data manifold admits a global parameterization. In order to represent manifolds of arbitrary topology, we propose to learn a mixture model of variational autoencoders. Here, every encoder-decoder pair represents one chart of a manifold. We propose a loss function for maximum likelihood estimation of the model weights and choose an architecture that provides us the analytical expression of the charts and of their inverses. Once the manifold is learned, we use it for solving inverse problems by minimizing a data fidelity term restricted to the learned manifold. To solve the arising minimization problem we propose a Riemannian gradient descent algorithm on the learned manifold. We demonstrate the performance of our method for low-dimensional toy examples as well as for deblurring and electrical impedance tomography on certain image manifolds.
A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem
Financial portfolio management is the process of constant redistribution of a fund into different financial products. This paper presents a financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem. The framework consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. This framework is realized in three instants in this work with a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). They are, along with a number of recently reviewed or published portfolio-selection strategies, examined in three back-test experiments with a trading period of 30 minutes in a cryptocurrency market. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. All three instances of the framework monopolize the top three positions in all experiments, outdistancing other compared trading algorithms. Although with a high commission rate of 0.25% in the backtests, the framework is able to achieve at least 4-fold returns in 50 days.
MMGP: a Mesh Morphing Gaussian Process-based machine learning method for regression of physical problems under non-parameterized geometrical variability
When learning simulations for modeling physical phenomena in industrial designs, geometrical variabilities are of prime interest. While classical regression techniques prove effective for parameterized geometries, practical scenarios often involve the absence of shape parametrization during the inference stage, leaving us with only mesh discretizations as available data. Learning simulations from such mesh-based representations poses significant challenges, with recent advances relying heavily on deep graph neural networks to overcome the limitations of conventional machine learning approaches. Despite their promising results, graph neural networks exhibit certain drawbacks, including their dependency on extensive datasets and limitations in providing built-in predictive uncertainties or handling large meshes. In this work, we propose a machine learning method that do not rely on graph neural networks. Complex geometrical shapes and variations with fixed topology are dealt with using well-known mesh morphing onto a common support, combined with classical dimensionality reduction techniques and Gaussian processes. The proposed methodology can easily deal with large meshes without the need for explicit shape parameterization and provides crucial predictive uncertainties, which are essential for informed decision-making. In the considered numerical experiments, the proposed method is competitive with respect to existing graph neural networks, regarding training efficiency and accuracy of the predictions.
AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System
Although large language models (LLMs) have revolutionized natural language processing capabilities, their practical implementation as autonomous multi-agent systems (MAS) for industrial problem-solving encounters persistent barriers. Conventional MAS architectures are fundamentally restricted by inflexible, hand-crafted graph topologies that lack contextual responsiveness, resulting in diminished efficacy across varied academic and commercial workloads. To surmount these constraints, we introduce AMAS, a paradigm-shifting framework that redefines LLM-based MAS through a novel dynamic graph designer. This component autonomously identifies task-specific optimal graph configurations via lightweight LLM adaptation, eliminating the reliance on monolithic, universally applied structural templates. Instead, AMAS exploits the intrinsic properties of individual inputs to intelligently direct query trajectories through task-optimized agent pathways. Rigorous validation across question answering, mathematical deduction, and code generation benchmarks confirms that AMAS systematically exceeds state-of-the-art single-agent and multi-agent approaches across diverse LLM architectures. Our investigation establishes that context-sensitive structural adaptability constitutes a foundational requirement for high-performance LLM MAS deployments.
pFedGame -- Decentralized Federated Learning using Game Theory in Dynamic Topology
Conventional federated learning frameworks suffer from several challenges including performance bottlenecks at the central aggregation server, data bias, poor model convergence, and exposure to model poisoning attacks, and limited trust in the centralized infrastructure. In the current paper, a novel game theory-based approach called pFedGame is proposed for decentralized federated learning, best suitable for temporally dynamic networks. The proposed algorithm works without any centralized server for aggregation and incorporates the problem of vanishing gradients and poor convergence over temporally dynamic topology among federated learning participants. The solution comprises two sequential steps in every federated learning round, for every participant. First, it selects suitable peers for collaboration in federated learning. Secondly, it executes a two-player constant sum cooperative game to reach convergence by applying an optimal federated learning aggregation strategy. Experiments performed to assess the performance of pFedGame in comparison to existing methods in decentralized federated learning have shown promising results with accuracy higher than 70% for heterogeneous data.
On Characterizing the Capacity of Neural Networks using Algebraic Topology
The learnability of different neural architectures can be characterized directly by computable measures of data complexity. In this paper, we reframe the problem of architecture selection as understanding how data determines the most expressive and generalizable architectures suited to that data, beyond inductive bias. After suggesting algebraic topology as a measure for data complexity, we show that the power of a network to express the topological complexity of a dataset in its decision region is a strictly limiting factor in its ability to generalize. We then provide the first empirical characterization of the topological capacity of neural networks. Our empirical analysis shows that at every level of dataset complexity, neural networks exhibit topological phase transitions. This observation allowed us to connect existing theory to empirically driven conjectures on the choice of architectures for fully-connected neural networks.
Reducing Training Time in Cross-Silo Federated Learning using Multigraph Topology
Federated learning is an active research topic since it enables several participants to jointly train a model without sharing local data. Currently, cross-silo federated learning is a popular training setting that utilizes a few hundred reliable data silos with high-speed access links to training a model. While this approach has been widely applied in real-world scenarios, designing a robust topology to reduce the training time remains an open problem. In this paper, we present a new multigraph topology for cross-silo federated learning. We first construct the multigraph using the overlay graph. We then parse this multigraph into different simple graphs with isolated nodes. The existence of isolated nodes allows us to perform model aggregation without waiting for other nodes, hence effectively reducing the training time. Intensive experiments on three public datasets show that our proposed method significantly reduces the training time compared with recent state-of-the-art topologies while maintaining the accuracy of the learned model. Our code can be found at https://github.com/aioz-ai/MultigraphFL
A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models
Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer. This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates. The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM alignment regimes. Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment (https://github.com/blacksnail789521/Time-Series-Reasoning-Survey). Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets. We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings. Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.
Decentralized and Self-adaptive Core Maintenance on Temporal Graphs
Key graph-based problems play a central role in understanding network topology and uncovering patterns of similarity in homogeneous and temporal data. Such patterns can be revealed by analyzing communities formed by nodes, which in turn can be effectively modeled through temporal k-cores. This paper introduces a novel decentralized and incremental algorithm for computing the core decomposition of temporal networks. Decentralized solutions leverage the ability of network nodes to communicate and coordinate locally, addressing complex problems in a scalable, adaptive, and timely manner. By leveraging previously computed coreness values, our approach significantly reduces the activation of nodes and the volume of message exchanges when the network changes over time. This enables scalability with only a minimal trade-off in precision. Experimental evaluations on large real-world networks under varying levels of dynamism demonstrate the efficiency of our solution compared to a state-of-the-art approach, particularly in terms of active nodes, communication overhead, and convergence speed.
A Deep Q-Network Based on Radial Basis Functions for Multi-Echelon Inventory Management
This paper addresses a multi-echelon inventory management problem with a complex network topology where deriving optimal ordering decisions is difficult. Deep reinforcement learning (DRL) has recently shown potential in solving such problems, while designing the neural networks in DRL remains a challenge. In order to address this, a DRL model is developed whose Q-network is based on radial basis functions. The approach can be more easily constructed compared to classic DRL models based on neural networks, thus alleviating the computational burden of hyperparameter tuning. Through a series of simulation experiments, the superior performance of this approach is demonstrated compared to the simple base-stock policy, producing a better policy in the multi-echelon system and competitive performance in the serial system where the base-stock policy is optimal. In addition, the approach outperforms current DRL approaches.
A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data
Symbolic regression (SR) is a powerful technique for discovering the underlying mathematical expressions from observed data. Inspired by the success of deep learning, recent efforts have focused on two categories for SR methods. One is using a neural network or genetic programming to search the expression tree directly. Although this has shown promising results, the large search space poses difficulties in learning constant factors and processing high-dimensional problems. Another approach is leveraging a transformer-based model training on synthetic data and offers advantages in inference speed. However, this method is limited to fixed small numbers of dimensions and may encounter inference problems when given data is out-of-distribution compared to the synthetic data. In this work, we propose DySymNet, a novel neural-guided Dynamic Symbolic Network for SR. Instead of searching for expressions within a large search space, we explore DySymNet with various structures and optimize them to identify expressions that better-fitting the data. With a topology structure like neural networks, DySymNet not only tackles the challenge of high-dimensional problems but also proves effective in optimizing constants. Based on extensive numerical experiments using low-dimensional public standard benchmarks and the well-known SRBench with more variables, our method achieves state-of-the-art performance in terms of fitting accuracy and robustness to noise.
Aligning Optimization Trajectories with Diffusion Models for Constrained Design Generation
Generative models have had a profound impact on vision and language, paving the way for a new era of multimodal generative applications. While these successes have inspired researchers to explore using generative models in science and engineering to accelerate the design process and reduce the reliance on iterative optimization, challenges remain. Specifically, engineering optimization methods based on physics still outperform generative models when dealing with constrained environments where data is scarce and precision is paramount. To address these challenges, we introduce Diffusion Optimization Models (DOM) and Trajectory Alignment (TA), a learning framework that demonstrates the efficacy of aligning the sampling trajectory of diffusion models with the optimization trajectory derived from traditional physics-based methods. This alignment ensures that the sampling process remains grounded in the underlying physical principles. Our method allows for generating feasible and high-performance designs in as few as two steps without the need for expensive preprocessing, external surrogate models, or additional labeled data. We apply our framework to structural topology optimization, a fundamental problem in mechanical design, evaluating its performance on in- and out-of-distribution configurations. Our results demonstrate that TA outperforms state-of-the-art deep generative models on in-distribution configurations and halves the inference computational cost. When coupled with a few steps of optimization, it also improves manufacturability for out-of-distribution conditions. By significantly improving performance and inference efficiency, DOM enables us to generate high-quality designs in just a few steps and guide them toward regions of high performance and manufacturability, paving the way for the widespread application of generative models in large-scale data-driven design.
Efficient MPC-Based Energy Management System for Secure and Cost-Effective Microgrid Operations
Model predictive control (MPC)-based energy management systems (EMS) are essential for ensuring optimal, secure, and stable operation in microgrids with high penetrations of distributed energy resources. However, due to the high computational cost for the decision-making, the conventional MPC-based EMS typically adopts a simplified integrated-bus power balance model. While this simplification is effective for small networks, large-scale systems require a more detailed branch flow model to account for the increased impact of grid power losses and security constraints. This work proposes an efficient and reliable MPC-based EMS that incorporates power-loss effects and grid-security constraints. %, while adaptively shaping the battery power profile in response to online renewable inputs, achieving reduced operational costs. It enhances system reliability, reduces operational costs, and shows strong potential for online implementation due to its reduced computational effort. Specifically, a second-order cone program (SOCP) branch flow relaxation is integrated into the constraint set, yielding a convex formulation that guarantees globally optimal solutions with high computational efficiency. Owing to the radial topology of the microgrid, this relaxation is practically tight, ensuring equivalence to the original problem. Building on this foundation, an online demand response (DR) module is designed to further reduce the operation cost through peak shaving. To the best of our knowledge, no prior MPC-EMS framework has simultaneously modeled losses and security constraints while coordinating flexible loads within a unified architecture. The developed framework enables secure operation with effective peak shaving and reduced total cost. The effectiveness of the proposed method is validated on 10-bus, 18-bus, and 33-bus systems.
EquiNO: A Physics-Informed Neural Operator for Multiscale Simulations
Multiscale problems are ubiquitous in physics. Numerical simulations of such problems by solving partial differential equations (PDEs) at high resolution are computationally too expensive for many-query scenarios, e.g., uncertainty quantification, remeshing applications, topology optimization, and so forth. This limitation has motivated the application of data-driven surrogate models, where the microscale computations are substituted with a surrogate, usually acting as a black-box mapping between macroscale quantities. These models offer significant speedups but struggle with incorporating microscale physical constraints, such as the balance of linear momentum and constitutive models. In this contribution, we propose Equilibrium Neural Operator (EquiNO) as a complementary physics-informed PDE surrogate for predicting microscale physics and compare it with variational physics-informed neural and operator networks. Our framework, applicable to the so-called multiscale FE^{,2}, computations, introduces the FE-OL approach by integrating the finite element (FE) method with operator learning (OL). We apply the proposed FE-OL approach to quasi-static problems of solid mechanics. The results demonstrate that FE-OL can yield accurate solutions even when confronted with a restricted dataset during model development. Our results show that EquiNO achieves speedup factors exceeding 8000-fold compared to traditional methods and offers an optimal balance between data-driven and physics-based strategies.
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video
We present V2M4, a novel 4D reconstruction method that directly generates a usable 4D mesh animation asset from a single monocular video. Unlike existing approaches that rely on priors from multi-view image and video generation models, our method is based on native 3D mesh generation models. Naively applying 3D mesh generation models to generate a mesh for each frame in a 4D task can lead to issues such as incorrect mesh poses, misalignment of mesh appearance, and inconsistencies in mesh geometry and texture maps. To address these problems, we propose a structured workflow that includes camera search and mesh reposing, condition embedding optimization for mesh appearance refinement, pairwise mesh registration for topology consistency, and global texture map optimization for texture consistency. Our method outputs high-quality 4D animated assets that are compatible with mainstream graphics and game software. Experimental results across a variety of animation types and motion amplitudes demonstrate the generalization and effectiveness of our method. Project page: https://windvchen.github.io/V2M4/.
Registering Explicit to Implicit: Towards High-Fidelity Garment mesh Reconstruction from Single Images
Fueled by the power of deep learning techniques and implicit shape learning, recent advances in single-image human digitalization have reached unprecedented accuracy and could recover fine-grained surface details such as garment wrinkles. However, a common problem for the implicit-based methods is that they cannot produce separated and topology-consistent mesh for each garment piece, which is crucial for the current 3D content creation pipeline. To address this issue, we proposed a novel geometry inference framework ReEF that reconstructs topology-consistent layered garment mesh by registering the explicit garment template to the whole-body implicit fields predicted from single images. Experiments demonstrate that our method notably outperforms its counterparts on single-image layered garment reconstruction and could bring high-quality digital assets for further content creation.
OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields
Reconstructing 3D shapes from planar cross-sections is a challenge inspired by downstream applications like medical imaging and geographic informatics. The input is an in/out indicator function fully defined on a sparse collection of planes in space, and the output is an interpolation of the indicator function to the entire volume. Previous works addressing this sparse and ill-posed problem either produce low quality results, or rely on additional priors such as target topology, appearance information, or input normal directions. In this paper, we present OReX, a method for 3D shape reconstruction from slices alone, featuring a Neural Field as the interpolation prior. A modest neural network is trained on the input planes to return an inside/outside estimate for a given 3D coordinate, yielding a powerful prior that induces smoothness and self-similarities. The main challenge for this approach is high-frequency details, as the neural prior is overly smoothing. To alleviate this, we offer an iterative estimation architecture and a hierarchical input sampling scheme that encourage coarse-to-fine training, allowing the training process to focus on high frequencies at later stages. In addition, we identify and analyze a ripple-like effect stemming from the mesh extraction step. We mitigate it by regularizing the spatial gradients of the indicator function around input in/out boundaries during network training, tackling the problem at the root. Through extensive qualitative and quantitative experimentation, we demonstrate our method is robust, accurate, and scales well with the size of the input. We report state-of-the-art results compared to previous approaches and recent potential solutions, and demonstrate the benefit of our individual contributions through analysis and ablation studies.
Spot the Difference: Detection of Topological Changes via Geometric Alignment
Geometric alignment appears in a variety of applications, ranging from domain adaptation, optimal transport, and normalizing flows in machine learning; optical flow and learned augmentation in computer vision and deformable registration within biomedical imaging. A recurring challenge is the alignment of domains whose topology is not the same; a problem that is routinely ignored, potentially introducing bias in downstream analysis. As a first step towards solving such alignment problems, we propose an unsupervised algorithm for the detection of changes in image topology. The model is based on a conditional variational auto-encoder and detects topological changes between two images during the registration step. We account for both topological changes in the image under spatial variation and unexpected transformations. Our approach is validated on two tasks and datasets: detection of topological changes in microscopy images of cells, and unsupervised anomaly detection brain imaging.
Mastering Spatial Graph Prediction of Road Networks
Accurately predicting road networks from satellite images requires a global understanding of the network topology. We propose to capture such high-level information by introducing a graph-based framework that simulates the addition of sequences of graph edges using a reinforcement learning (RL) approach. In particular, given a partially generated graph associated with a satellite image, an RL agent nominates modifications that maximize a cumulative reward. As opposed to standard supervised techniques that tend to be more restricted to commonly used surrogate losses, these rewards can be based on various complex, potentially non-continuous, metrics of interest. This yields more power and flexibility to encode problem-dependent knowledge. Empirical results on several benchmark datasets demonstrate enhanced performance and increased high-level reasoning about the graph topology when using a tree-based search. We further highlight the superiority of our approach under substantial occlusions by introducing a new synthetic benchmark dataset for this task.
Reward Generalization in RLHF: A Topological Perspective
Existing alignment methods share a common topology of information flow, where reward information is collected from humans, modeled with preference learning, and used to tune language models. However, this shared topology has not been systematically characterized, nor have its alternatives been thoroughly explored, leaving the problems of low data efficiency and unreliable generalization unaddressed. As a solution, we introduce a theoretical framework for investigating reward generalization in reinforcement learning from human feedback (RLHF), focusing on the topology of information flow at both macro and micro levels. At the macro level, we portray the RLHF information flow as an autoencoding process over behavior distributions, formalizing the RLHF objective of distributional consistency between human preference and model behavior. At the micro level, we present induced Bayesian networks as a theory of reward generalization in RLHF, introducing fine-grained dataset topologies into generalization bounds. Combining analysis on both levels, we propose reward modeling from tree-structured preference information. It is shown to reduce reward uncertainty by up to Theta(log n/loglog n) times compared to baselines, where n is the dataset size. Validation on three NLP tasks shows that our tree-based reward model achieves an average win rate of 65% against baseline methods, thus improving reward generalization for free via topology design.
Towards quantum-enabled cell-centric therapeutics
In recent years, there has been tremendous progress in the development of quantum computing hardware, algorithms and services leading to the expectation that in the near future quantum computers will be capable of performing simulations for natural science applications, operations research, and machine learning at scales mostly inaccessible to classical computers. Whereas the impact of quantum computing has already started to be recognized in fields such as cryptanalysis, natural science simulations, and optimization among others, very little is known about the full potential of quantum computing simulations and machine learning in the realm of healthcare and life science (HCLS). Herein, we discuss the transformational changes we expect from the use of quantum computation for HCLS research, more specifically in the field of cell-centric therapeutics. Moreover, we identify and elaborate open problems in cell engineering, tissue modeling, perturbation modeling, and bio-topology while discussing candidate quantum algorithms for research on these topics and their potential advantages over classical computational approaches.
Contrastive Deep Nonnegative Matrix Factorization for Community Detection
Recently, nonnegative matrix factorization (NMF) has been widely adopted for community detection, because of its better interpretability. However, the existing NMF-based methods have the following three problems: 1) they directly transform the original network into community membership space, so it is difficult for them to capture the hierarchical information; 2) they often only pay attention to the topology of the network and ignore its node attributes; 3) it is hard for them to learn the global structure information necessary for community detection. Therefore, we propose a new community detection algorithm, named Contrastive Deep Nonnegative Matrix Factorization (CDNMF). Firstly, we deepen NMF to strengthen its capacity for information extraction. Subsequently, inspired by contrastive learning, our algorithm creatively constructs network topology and node attributes as two contrasting views. Furthermore, we utilize a debiased negative sampling layer and learn node similarity at the community level, thereby enhancing the suitability of our model for community detection. We conduct experiments on three public real graph datasets and the proposed model has achieved better results than state-of-the-art methods. Code available at https://github.com/6lyc/CDNMF.git.
PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion
Detailed and photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, full-body reconstruction from a monocular RGB image remains challenging due to the ill-posed nature of the problem and sophisticated clothing topology with self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces. To address it, we propose a cross-scale diffusion that models the joint probability distribution of global full-body shape and local facial characteristics, enabling detailed and identity-preserved novel-view generation without any geometric distortion. Moreover, to enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X, which provide body priors and prevent unnatural views inconsistent with human anatomy. Leveraging the generated multi-view normal and color images, we present SMPLX-initialized explicit human carving to recover realistic textured human meshes efficiently. Extensive experimental results and quantitative evaluations on CAPE and THuman2.1 datasets demonstrate PSHumans superiority in geometry details, texture fidelity, and generalization capability.
IACT: A Self-Organizing Recursive Model for General AI Agents: A Technical White Paper on the Architecture Behind kragent.ai
This technical white paper introduces the Interactive Agents Call Tree (IACT), a computational model designed to address the limitations of static, hard-coded agent workflows. Unlike traditional systems that require pre-defined graphs or specialized programming, IACT operates as a general-purpose autonomous system driven purely by user dialogue. Given a high-level objective, the system autonomously grows a dynamic, recursive agent topology incrementally tailored to the problem's structure. This allows it to scale its organizational complexity to match open-ended tasks. To mitigate the error propagation inherent in unidirectional function calls, IACT introduces interactional redundancy by replacing rigid invocations with bidirectional, stateful dialogues. This mechanism enables runtime error correction and ambiguity resolution. We describe the architecture, design principles, and practical lessons behind the production deployment of this model in the kragent.ai system, presenting qualitative evidence from real-world workflows rather than exhaustive benchmark results.
From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design
Engineering Design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision language models, such as GPT-4V, enabling AI to impact many more types of tasks. In light of these advancements, this paper presents a comprehensive evaluation of GPT-4V, a vision language model, across a wide spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Our study assesses GPT-4V's capabilities in design tasks such as sketch similarity analysis, concept selection using Pugh Charts, material selection, engineering drawing analysis, CAD generation, topology optimization, design for additive and subtractive manufacturing, spatial reasoning challenges, and textbook problems. Through this structured evaluation, we not only explore GPT-4V's proficiency in handling complex design and manufacturing challenges but also identify its limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models, emphasizing their immense potential for innovating and enhancing the engineering design and manufacturing landscape. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.
