Recent Submissions

Preprint

From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

(arXiv, 2025-04-22) Zhuo, Le; Zhao, Liangbing; Paul, Sayak; Liao, Yue; Zhang, Renrui; Xin, Yi; Gao, Peng; Elhoseiny, Mohamed; Li, Hongsheng; Computer Science Program; Visual Computing Center (VCC); Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; CUHK MMLab; Shanghai AI Lab; Hugging Face

Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often struggle with complex scenes and fine-grained details. Inspired by the self-reflection capabilities emergent in large language models, we propose ReflectionFlow, an inference-time framework enabling diffusion models to iteratively reflect upon and refine their outputs. ReflectionFlow introduces three complementary inference-time scaling axes: (1) noise-level scaling to optimize latent initialization; (2) prompt-level scaling for precise semantic guidance; and most notably, (3) reflection-level scaling, which explicitly provides actionable reflections to iteratively assess and correct previous generations. To facilitate reflection-level scaling, we construct GenRef, a large-scale dataset comprising 1 million triplets, each containing a reflection, a flawed image, and an enhanced image. Leveraging this dataset, we efficiently perform reflection tuning on state-of-the-art diffusion transformer, FLUX.1-dev, by jointly modeling multimodal inputs within a unified framework. Experimental results show that ReflectionFlow significantly outperforms naive noise-level scaling methods, offering a scalable and compute-efficient solution toward higher-quality image synthesis on challenging tasks.

Conference Paper

Falling Walls, WWW, Modern AI, and the Future of the Universe

(ACM, 2025-04-22) Schmidhuber, Juergen; KAUST Center For Generative AI, Thowal, Makkah; Computer Science Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; United Arab Emirates; Swiss AI Lab, IDSIA, Lugano, Ticino, Switzerland

Around 1990, the Berlin Wall came down, the WWW was born at CERN, mobile phones became popular, self-driving cars appeared in traffic, and modern AI based on very deep artificial neural networks emerged, including the principles behind the G, P, and T in ChatGPT. I place these events in the history of the universe since the Big Bang, and discuss what’s next: not just AI behind the screen in the virtual world, but real AI for real robots in the real world, connected through a WWW of machines. Intelligent (but not necessarily super-intelligent) robots that can learn to operate the tools and machines operated by humans can also build (and repair when needed) more of their own kind. This will culminate in life-like, self-replicating and self-improving machine civilisations, which represent the ultimate form of upscaling, and will shape the long-term future of the entire cosmos. The wonderful short-term side effect is that our AI will continue to make people’s lives longer, healthier and easier

Conference Paper

LUSTER: Link Prediction Utilizing Shared-Latent Space Representation in Multi-Layer Networks

(ACM, 2025-04-22) Yang, Ruohan; Ali, Muhammad Asif; Wang, Huan; Chen, Junyang; Wang, Di; Computer Science Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Engineering Research Center of Intelligent Technology for Agriculture, Huazhong Agricultural University, Wuhan, Hubei, China; Huazhong Agricultural University, Wuhan, Hubei, China; Shenzhen University, Shenzhen, Guangdong, China

Link prediction in multi-layer networks is a longstanding issue that predicts missing links based on the observed structures across all layers. Existing link prediction methods in multi-layer network typically merge the multi-layer network into a single-layer network and/or perform explicit calculations using intra-layer and inter-layer similarity metrics. However, these approaches often overlook the role of coupling in multi-layer networks, specifically the shared information and latent relationships between layers, which in turn limits prediction performance. This calls the need for methods that can extract representations in a shared-latent space to enhance inter-layer information sharing and prediction performance. In this paper, we propose a novel end-to-end framework namely: Link prediction Utilizing Shared-laTent spacE Representation (LUSTER) in multi-layer networks. LUSTER consists of four key modules: the representation extractor, the latent space learner, the complementary enhancer, and the link predictor. The representation extractor focuses on learning the intra-layer representations of each layer, capturing the data characteristics within the layer. The latent space learner extracts representations from the shared-latent space across different network layers through adversarial training. The complementary enhancer combines the intra-layer representations and the shared-latent space representations through orthogonal fusion, providing comprehensive information. Finally, the link predictor uses the enhanced representations to predict missing links. Extensive experimental analyses demonstrate that LUSTER outperforms state-of-the-art methods for link prediction in multi-layer networks, improving the AUC metric by up to 15.87%.

Preprint

Chaos-based scalable optoelectronic physical unclonable functions with AI-driven dynamic authentication

(Springer Science and Business Media LLC, 2025-04-22) Zhou, Zhican; Lu, Hang; Nandhakumar, Nakul; Alkhazragi, Omar; Ou, Xiangpeng; Lin, Heming; Ng, Tien Khee; Ooi, Boon S.; Wan, Yating; Integrated Photonic Laboratory, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia; Electrical and Computer Engineering Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Photonics Laboratory

The growing reliance on cloud services and Internet of Thing (IoT) ecosystems demands scalable, real-time authentication for dynamic multi-user and multi-device environments. Conventional static key management systems, constrained by inflexibility and vulnerabilities, fail to meet these requirements. Herein, we present a chaos-based Physical unclonable function (PUF) security system using chaotic vertical-cavity surface-emitting lasers (VCSELs) as key generation sources, achieving ultrafast response rates (>300 Gbps) and exceptional energy consumption below 1 pJ/bit per emitter. To ensure authentication framework is compatible with this response rate, we developed a compact convolutional neural network (CNN) model that demonstrates dynamic key matching, achieving near-zero false positive rates. To further enhance system security, an adversarial generative framework is implemented to protect key transmission while enhancing the authentication model’s resilience against model inversion attacks. Furthermore, we propose a pluggable, 3D co-packaged PUF hardware design, offering a compact footprint for flexible deployment while maintaining an estimated energy consumption as low as 3.49 pJ/bit. Our solution establishes a robust foundation for scalable, low-latency key distribution and dynamic authentication, addressing the evolving security demands of modern user-intensive/device-dense digital service networks.

Conference Paper

Multiscale Reservoir Simulation Through Super-Resolution Techniques

(SPE, 2025-04-21) Li, Haotian; Aslaml, Billal; Yan, Bicheng; Energy Resources and Petroleum Engineering Program; Physical Science and Engineering (PSE) Division; Ali I. Al-Naimi Petroleum Engineering Research Center (ANPERC); Deep Geo-Energy & Engineering Modeling Lab (DGYM)

High-resolution reservoir simulation plays a vital role in understanding subsurface flow behavior and optimizing reservoir management strategies. However, the computational cost of fine-scale simulations poses a significant challenge, particularly for large-scale reservoirs and uncertainty quantification studies. Traditional upscaling methods often struggle to conserve critical fine-scale static and flow dynamic features, while recent super-resolution techniques in deep learning lack physical consistency in their predictions. To address these limitations, this paper presents a novel physics-constrained super-resolution framework that efficiently downscales coarse-scale simulation predictions while ensuring the physical consistency. Our framework employs a generative adversarial network (GAN) architecture specifically designed for reservoir simulation, featuring multi-input integration of dynamic flow information and static reservoir properties. The network is trained with multiple loss functions through a two-phase strategy, combining traditional reconstruction metrics with physical constraints. The framework processes pressure and saturation fields along with well information, achieving PSNR values of 35.52 dB and 31.63 dB for pressure and saturation fields respectively, with corresponding SSIM values of 0.9969 and 0.9797. Detailed analysis reveals the performance in well regions, with R2 values of 0.9163 for pressure and 0.9875 for water saturation predictions at well locations, though challenges remain in capturing extreme pressure values near wells and sharp saturation transitions at flood fronts. The framework's ability to maintain high accuracy while significantly reducing computational requirements makes it a promising tool for practical reservoir management applications, offering a balanced solution to the long-standing trade-off between computational efficiency and simulation accuracy.