For more information visit:

Recent Submissions

  • Adaptive Differentiable Grids for Cryo-Electron Tomography Reconstruction and Denoising

    Wang, Yuanhao; Idoughi, Ramzi; Rückert, Darius; Li, Rui; Heidrich, Wolfgang (Bioinformatics Advances, Oxford University Press (OUP), 2023-09-22) [Article]
    Motivation: Tilt-series cryo-Electron Tomography is a powerful tool widely used in structural biology to study three-dimensional structures of micro-organisms, macromolecular complexes, etc. Still the reconstruction process remains an arduous task due to several challenges: The missing-wedge acquisition, sample misalignment and motion, the need to process large data, and especially a low signal-to-noise ratio (SNR). Results: Inspired by the recently introduced neural representations, we propose an adaptive learned-based representation of the density field of the captured sample. This representation consists of an octree structure, where each node represents a 3D density grid optimized from the captured projections during the training process. This optimization is performed using a loss that combines a differentiable image formation model with different regularization terms: total variation, boundary consistency, and a cross-nodes non-local constraint. The final reconstruction is obtained by interpolating the learned density grid at the desired voxel positions. The evaluation of our approach using captured data of viruses and cells shows that our proposed representation is well-adapted to handle missing-wedges, and improves the SNR of the reconstructed tomogram. The reconstruction quality is highly improved in comparison to the state-of-the-art methods, while using the lowest computing time footprint.
  • The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

    Stanić, Aleksandar; Ashley, Dylan; Serikov, Oleg; Kirsch, Louis; Faccio, Francesco; Schmidhuber, Juergen; Hofmann, Thomas; Schlag, Imanol (arXiv, 2023-09-20) [Preprint]
    The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the model's throughput and the chosen compute class. Notably, this approach avoids constraints on critical hyperparameters which affect total parameters or floating-point operations. For evaluation, we pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length. On it, we compare methods based on their empirical scaling trends which are estimated through experiments at various levels of compute. This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput. While the GPT baseline achieves better perplexity throughout all our levels of compute, our LSTM baseline exhibits a predictable and more favourable scaling law. This is due to the improved throughput and the need for fewer training tokens to achieve the same decrease in test perplexity. Extrapolating the scaling laws leads of both models results in an intersection at roughly 50,000 accelerator hours. We hope this work can serve as the foundation for meaningful and reproducible language modelling research.
  • Repetitive DNA sequence detection and its role in the human genome

    Liao, Xingyu; Zhu, Wufei; Zhou, Juexiao; Li, Haoyang; Xu, Xiaopeng; Zhang, Bin; Gao, Xin (Communications Biology, Springer Science and Business Media LLC, 2023-09-19) [Article]
    Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
  • Opportunistic Mobile Networks Content Delivery for Important but Non-Urgent Traffic

    Lau, Chun Pong; Ma, Guoqing; Susanto, Hengky; Dang, Shuping; Ng, Kam Shing; Shihada, Basem (IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), 2023-09-18) [Article]
    As delay-tolerant and large-size content, for example, software updates, TV series, and virtual reality related content, become more prevalent in mobile networks, the need for efficient content delivery mechanisms becomes increasingly important,the traffic that carries these contents is not suitable to be evaluated using traditional network performance metrics, e.g., delay, throughput, and jitter. Based on this insight, we propose the solution of content dissemination from opportunistic mobile social communications (CODOMOC)which utilizes energy cost as an alternative performance metric and exploits daily human activity mobility pattern to determine how, when, and where the contents should be disseminated. Then, we introduce two options in CODOMOC to achieve different network operators’ objectives. The two options are the Only Dense (OD) option which aims at minimizing energy consumption for network operators and the Broadcast Efficiency (BE) option to further reduce the total carbon footprint of network operators. CODOMOC is evaluated by comparing with a mobility-based broadcast method. The results show that CODOMOC reduces the average energy consumption by 51% and 60% in the OD and BE options respectively. The proposed solution equipped with the two modes is expected to provide a higher degree of flexibility and reduce energy consumption for mobile networks, while, admittedly, the application scope of the solution and the associated methodologies proposed in this paper is restricted to important but non-urgent traffic delivery.
  • Comparative DFT Study of Small Anionic Silver and Copper Clusters: Evolution of Structure and Physicochemical Properties

    Matulis, Vitaly E.; Ivashkevich, Oleg A.; Lappo, Daniil D.; Lyakhov, Dmitry; Michels, Dominik L. (The Journal of Physical Chemistry C, American Chemical Society (ACS), 2023-09-18) [Article]
    Based on both total energy calculations and comparison of experimental and calculated characteristics of the photoelectron spectrum (PHES), the structural assignment of clusters Agn– (n = 13–16) and Cum– (m = 14–17) has been made using the density functional theory (DFT) model with our previously developed S2LYP functional. A comparative study of size dependence of geometry, electronic structure, and physicochemical properties has been carried out for a series of anionic silver and copper clusters containing up to 20 atoms. For the cases when two isomers contribute to the experimental PHES, the isomerization barriers and molar ratio of isomers were estimated. It has been shown that the geometry and the properties that are determined mainly by ns-derived electronic states are similar for copper and silver clusters. However, due to the larger contribution of (n–1)d-electrons to the chemical bond, the potential energy surface of copper clusters is less smooth, and these clusters are characterized by higher isomerization energies compared to silver clusters. The isomerization energies of clusters and the number of isomers with similar energies increase with enlarging cluster size. Thus, clusters containing less than 20 atoms easily overcome the barriers of intramolecular isomerization (i.e., behave like liquids). However, it is expected that cooled clusters containing several tens of atoms will have a rigid geometry due to high intramolecular isomerization energies.
  • Data Center-Enabled High Altitude Platforms: A Green Computing Alternative

    Abderrahim, Wiem; Amin, Osama; Shihada, Basem (Accepted by IEEE Transactions on Mobile Computing (TMC), 2023-09-17) [Article]
    Information technology organizations and companies are seeking greener alternatives to traditional terrestrial data centers to mitigate global warming and reduce carbon emissions. Currently, terrestrial data centers consume a significant amount of energy, estimated at about 1.5% of worldwide electricity use. Furthermore, the increasing demand for data-intensive applications is expected to raise energy consumption, making it crucial to consider sustainable computing paradigms. In this study, we propose a data center-enabled High Altitude Platform (HAP) system, where a flying data center supports the operation of terrestrial data centers. We conduct a detailed analytical study to assess the energy benefits and communication requirements of this approach. Our findings demonstrate that a data center-enabled HAP is more energy-efficient than a traditional terrestrial data center, owing to the naturally low temperature in the stratosphere and the ability to harvest solar energy. Adopting a data center-HAP can save up to 14% of energy requirements while overcoming the offloading outage problem and the associated delay resulting from server distribution. Our study highlights the potential of a data centerenabled HAP system as a sustainable computing solution to meet the growing energy demands and reduce carbon footprint.
  • CryoAlign: feature-based method for global and local 3D alignment of EM density maps

    He, Bintao; Zhang, Fa; Feng, Chenjie; Yang, Jianyi; Gao, Xin; Han, Renmin (arXiv, 2023-09-17) [Preprint]
    Advances on cryo-electron imaging technologies have led to a rapidly increasing number of density maps. Alignment and comparison of density maps play a crucial role in interpreting structural information, such as conformational heterogeneity analysis using global alignment and atomic model assembly through local alignment. Here, we propose a fast and accurate global and local cryo-electron microscopy density map alignment method CryoAlign, which leverages local density feature descriptors to capture spatial structure similarities. CryoAlign is the first feature-based EM map alignment tool, in which the employment of feature-based architecture enables the rapid establishment of point pair correspondences and robust estimation of alignment parameters. Extensive experimental evaluations demonstrate the superiority of CryoAlign over the existing methods in both alignment accuracy and speed.
  • Fake News Detectors are Biased against Texts Generated by Large Language Models

    Su, Jinyan; Zhuo, Terry Yue; Mansurov, Jonibek; Wang, Di; Nakov, Preslav (arXiv, 2023-09-15) [Preprint]
    The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly, our findings reveal a significant bias in many existing detectors: they are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine. This unexpected bias appears to arise from distinct linguistic patterns inherent to LLM outputs. To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news. To further catalyze research in this domain, we release two comprehensive datasets, GossipCop++ and PolitiFact++, thus amalgamating humanvalidated articles with LLM-generated fake and real news.
  • TinyML Models for a Low-cost Air Quality Monitoring Device

    Wardana, I Nyoman Kusuma; Fahmy, Suhaib A.; Gardner, Julian W. (IEEE Sensors Letters, Institute of Electrical and Electronics Engineers (IEEE), 2023-09-14) [Article]
    Low-cost air quality monitoring devices can provide high-density spatiotemporal pollution data, thus offering a better opportunity to apply machine learning. Low-cost sensor nodes usually utilize microcontrollers as the main processors, and tinyML brings machine learning (ML) models to these resource-constrained devices. In this letter, we reported the development of a low-cost air quality monitoring device with embedded tinyML models. We deployed two tinyML models on a single microcontroller and performed two tasks: predicting air quality and power parameters (using model predictor) and imputing missing features (using model imputer). The proposed model predictor can estimate parameters with a coefficient of determination above 0.70, and the model imputer effectively estimates the testing data when missing rates are below 80%. By performing the post-training quantization technique, we can further reduce the model size but slightly degrade the accuracies.
  • Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems

    Ltaief, Hatem; Hong, Yuxi; Wilson, Leighton; Jacquelin, Mathias; Ravasi, Matteo; Keyes, David E. (ACM/IEEE, 2023-09-11) [Conference Paper]
    We exploit the high memory bandwidth of AIcustomized Cerebras CS-2 systems for seismic processing. By leveraging low-rank matrix approximation, we fit memoryhungry seismic applications onto memory-austere SRAM waferscale hardware, thus addressing a challenge arising in many wave-equation-based algorithms that rely on Multi-Dimensional Convolution (MDC) operators. Exploiting sparsity inherent in seismic data in the frequency domain, we implement embarrassingly parallel tile low-rank matrix-vector multiplications (TLRMVM), which account for most of the elapsed time in MDC operations, to successfully solve the Multi-Dimensional Deconvolution (MDD) inverse problem. By reducing memory footprint along with arithmetic complexity, we fit a standard seismic benchmark dataset into the small local memories of Cerebras processing elements. Deploying TLR-MVM execution onto 48 CS-2 systems in support of MDD gives a sustained memory bandwidth of 92.58PB/s on 35, 784, 000 processing elements, a significant milestone that highlights the capabilities of AIcustomized architectures to enable a new generation of seismic algorithms that will empower multiple technologies of our lowcarbon future.
  • Residency Octree: A Hybrid Approach for Scalable Web-Based Multi-Volume Rendering

    Herzberger, Lukas; Hadwiger, Markus; Krüger, Robert; Sorger, Peter; Pfister, Hanspeter; Gröller, Eduard; Beyer, Johanna (arXiv, 2023-09-08) [Preprint]
    We present a hybrid multi-volume rendering approach based on a novel Residency Octree that combines the advantages of out-of-core volume rendering using page tables with those of standard octrees. Octree approaches work by performing hierarchical tree traversal. However, in octree volume rendering, tree traversal and the selection of data resolution are intrinsically coupled. This makes fine-grained empty-space skipping costly. Page tables, on the other hand, allow access to any cached brick from any resolution. However, they do not offer a clear and efficient strategy for substituting missing high-resolution data with lower-resolution data. We enable flexible mixed-resolution out-of-core multi-volume rendering by decoupling the cache residency of multi-resolution data from a resolution-independent spatial subdivision determined by the tree. Instead of one-to-one node-to-brick correspondences, each residency octree node is mapped to a set of bricks from different resolution levels. This makes it possible to efficiently and adaptively choose and mix resolutions, adapt sampling rates, and compensate for cache misses. At the same time, residency octrees support fine-grained empty-space skipping, independent of the data subdivision used for caching. Finally, to facilitate collaboration and outreach, and to eliminate local data storage, our implementation is a web-based, pure client-side renderer using WebGPU and WebAssembly. Our method is faster than prior approaches and efficient for many data channels with a flexible and adaptive choice of data resolution.
  • UBIC-A Blockchain-Less Cryptocurrency

    Caprolu, Maurantonio; Bentafat, Elmahdi; Bakiras, Spiridon; Di Pietro, Roberto (IEEE, 2023-09-06) [Conference Paper]
    In this paper we propose UBIC, a novel blockchain-less architecture that preserves the main advantages of classic cryptocurrencies while avoiding their pitfalls. The proposed construction is general-that is, UBIC can be adopted at par with any other cryptocurrency-though UBIC also satisfies the requirements to support state-sponsored financial services, like Universal Basic Income and Central Bank Digital Currency. Indeed, UBIC stands for Universal Basic Income Coin, to highlight one of its most straightforward use-cases. One of the key features of UBIC is that every user participating in the protocol gets fair and equal access to the rewards, regardless of the available resources, e.g., computational power or financial stake. Moreover, by leveraging standard cryptographic techniques, such as homomorphic encryption and verifiable random functions, UBIC ensures full user privacy and trust in the network, while enjoying a highly scalable architecture. Our experimental results confirm the feasibility of the proposed architecture and demonstrate that UBIC is very efficient in terms of transaction verification time. To the best of our knowledge, this is the first blockchain-less cryptocurrency proposal. Other than being interesting on its own, and being particularly fit to support UBI and Central Bank Digital Currency, the architectural solutions and the technical choices discussed in this contribution have the potential to generate high impact and further research in the field.
  • TENSOR: Lightweight BGP Non-Stop Routing

    Miao, Congcong; Xiao, Yunming; Canini, Marco; Dai, Ruiqiang; Zheng, Shengli; Wang, Jilong; Bu, Jiwu; Kuzmanovic, Aleksandar; Wang, Yachen (ACM, 2023-09) [Conference Paper]
    As the solitary inter-domain protocol, BGP plays an important role in today's Internet. Its failures threaten network stability and will usually result in large-scale packet losses. Thus, the non-stop routing (NSR) capability that protects inter-domain connectivity from being disrupted by various failures, is critical to any Autonomous System (AS) operator. Replicating the BGP and underlying TCP connection status is key to realizing NSR. But existing NSR solutions, which heavily rely on OS kernel modifications, have become impractical due to providers' adoption of virtualized network gateways for better scalability and manageability. In this paper, we tackle this problem by proposing TENSOR, which incorporates a novel kernel-modification-free replication design and lightweight architecture. More concretely, the kernel-modification-free replication design mitigates the reliance on OS kernel modification and hence allows the virtualization of the network gateway. Meanwhile, lightweight virtualization provides strong performance guarantees and improves system reliability. Moreover, TENSOR provides a solution to the split-brain problem that affects NSR solutions. Through extensive experiments, we show that TENSOR realizes NSR while bearing little overhead compared to open-source BGP implementations. Further, our two-year operational experience on a fleet of 400 servers controlling over 31,000 BGP peering connections demonstrates that TENSOR reduces the development, deployment, and maintenance costs significantly - at least by factors of 20, 5, and 10, respectively, while retaining the same SLA with the NSR-enabled routers.
  • Cross-parametric generative adversarial network-based magnetic resonance image feature synthesis for breast lesion classification

    Fan, Ming; Huang, Guangyao; Lou, Junhong; Gao, Xin; Zeng, Tieyong; Li, Lihua (IEEE Journal of Biomedical and Health Informatics, Institute of Electrical and Electronics Engineers (IEEE), 2023-09-01) [Article]
    Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) contains information on tumor morphology and physiology for breast cancer diagnosis and treatment. However, this technology requires contrast agent injection with more acquisition time than other parametric images, such as T2-weighted imaging (T2WI). Current image synthesis methods attempt to map the image data from one domain to another, whereas it is challenging or even infeasible to map the images with one sequence into images with multiple sequences. Here, we propose a new approach of cross-parametric generative adversarial network (GAN)-based feature synthesis (CPGANFS) to generate discriminative DCE-MRI features from T2WI with applications in breast cancer diagnosis. The proposed approach decodes the T2W images into latent cross-parameter features to reconstruct the DCE-MRI and T2WI features by balancing the information shared between the two. A Wasserstein GAN with a gradient penalty is employed to differentiate the T2WI-generated features from ground-truth features extracted from DCE-MRI. The synthesized DCE-MRI feature-based model achieved significantly (p = 0.036) higher prediction performance (AUC = 0.866) in breast cancer diagnosis than that based on T2WI (AUC = 0.815). Visualization of the model shows that our CPGANFS method enhances the predictive power by levitating attention to the lesion and the surrounding parenchyma areas, which is driven by the interparametric information learned from T2WI and DCE-MRI. Our proposed CPGANFS provides a framework for cross-parametric MR image feature generation from a single-sequence image guided by an information-rich, time-series image with kinetic information. Extensive experimental results demonstrate its effectiveness with high interpretability and improved performance in breast cancer diagnosis.
  • Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations

    Haydarov, Kilichbek; Shen, Xiaoqian; Madasu, Avinash; Salem, Mahmoud; Li, Jia; Elsayed, Gamaleldin; Elhoseiny, Mohamed (arXiv, 2023-08-30) [Preprint]
    We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations. The task involves three skills: (1) Dialog-based Question Answering (2) Dialog-based Emotion Prediction and (3) Affective emotion explanation generation based on the dialog. Our key contribution is the collection of a large-scale dataset, dubbed AffectVisDial, consisting of 50K 10-turn visually grounded dialogs as well as concluding emotion attributions and dialog-informed textual emotion explanations, resulting in a total of 27,180 working hours. We explain our design decisions in collecting the dataset and introduce the questioner and answerer tasks that are associated with the participants in the conversation. We train and demonstrate solid Affective Visual Dialog baselines adapted from state-of-the-art models. Remarkably, the responses generated by our models show promising emotional reasoning abilities in response to visually grounded conversations.
  • SAGDTI: self-attention and graph neural network with multiple information representations for the prediction of drug-target interactions

    Li, Xiaokun; Yang, Qiang; Luo, Gongning; Xu, Long; Dong, Weihe; Wang, Wei; Dong, Suyu; Wang, Kuanquan; Xuan, Ping; Gao, Xin (Bioinformatics Advances, Oxford University Press (OUP), 2023-08-26) [Article]
    Motivation: Accurate identification of target proteins that interact with drugs is a vital step in silico, which can significantly foster the development of drug repurposing and drug discovery. In recent years, numerous deep learning-based methods have been introduced to treat drug-target interaction (DTI) prediction as a classification task. The output of this task is binary identification suggesting the absence or presence of interactions. However, existing studies often (i) neglect the unique molecular attributes when embedding drugs and proteins, and (ii) determine the interaction of drug-target pairs without considering biological interaction information. Results: In this study, we propose an end-to-end attention-derived method based on the self-attention mechanism and graph neural network, termed SAGDTI. The aim of this method is to overcome the aforementioned drawbacks in the identification of DTI interaction. SAGDTI is the first method to sufficiently consider the unique molecular attribute representations for both drugs and targets in the input form of the SMILES sequences and three-dimensional structure graphs. In addition, our method aggregates the feature attributes of biological information between drugs and targets through multi-scale topologies and diverse connections. Experimental results illustrate that SAGDTI outperforms existing prediction models, which benefit from the unique molecular attributes embedded by atom-level attention and biological interaction information representation aggregated by node-level attention. Moreover, a case study on SARS-CoV-2 shows that our model is a powerful tool for identifying DTI interactions in real life.
  • Overcoming General Knowledge Loss with Selective Parameter Finetuning

    Zhang, Wenxuan; Janson, Paul; Aljundi, Rahaf; Elhoseiny, Mohamed (arXiv, 2023-08-23) [Preprint]
    Foundation models encompass an extensive knowledge base and offer remarkable transferability. However, this knowledge becomes outdated or insufficient over time. The challenge lies in updating foundation models to accommodate novel information while retaining their original ability. In this paper, we present a novel approach to achieving continual model updates by effecting localized modifications to a small subset of parameters. Guided by insights gleaned from prior analyses of foundational models, we first localize a specific layer for model refinement and then introduce an importance scoring mechanism designed to update only the most crucial weights. Our method is exhaustively evaluated on foundational vision-language models, measuring its efficacy in both learning new information and preserving pre-established knowledge across a diverse spectrum of continual learning tasks, including Aircraft, Birdsnap CIFAR-100, CUB, Cars, and GTSRB. The results show that our method improves the existing continual learning methods by 0.5\% - 10\% on average, and reduces the loss of pre-trained knowledge from around 5\% to 0.97\%. Comprehensive ablation studies substantiate our method design, shedding light on the contributions of each component to controllably learning new knowledge and mitigating the forgetting of pre-trained knowledge.
  • MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning

    Ma, Jiani; Li, Chen; Zhang, Yiwen; Wang, Zhikang; Li, Shanshan; Guo, Yuming; Zhang, Lin; Liu, Hui; Gao, Xin; Song, Jiangning (Bioinformatics, Oxford University Press (OUP), 2023-08-23) [Article]
    Motivation: Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization and reliable negative sample selection, remain to be addressed. Results: To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new “guilty-by-association”-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning.
  • Continual Zero-Shot Learning through Semantically Guided Generative Random Walks

    Zhang, Wenxuan; Janson, Paul; Yi, Kai; Skorokhodov, Ivan; Elhoseiny, Mohamed (arXiv, 2023-08-23) [Preprint]
    Learning novel concepts, remembering previous knowledge, and adapting it to future tasks occur simultaneously throughout a human's lifetime. To model such comprehensive abilities, continual zero-shot learning (CZSL) has recently been introduced. However, most existing methods overused unseen semantic information that may not be continually accessible in realistic settings. In this paper, we address the challenge of continual zero-shot learning where unseen information is not provided during training, by leveraging generative modeling. The heart of the generative-based methods is to learn quality representations from seen classes to improve the generative understanding of the unseen visual space. Motivated by this, we introduce generalization-bound tools and provide the first theoretical explanation for the benefits of generative modeling to CZSL tasks. Guided by the theoretical analysis, we then propose our learning algorithm that employs a novel semantically guided Generative Random Walk (GRW) loss. The GRW loss augments the training by continually encouraging the model to generate realistic and characterized samples to represent the unseen space. Our algorithm achieves state-of-the-art performance on AWA1, AWA2, CUB, and SUN datasets, surpassing existing CZSL methods by 3-7\%.
  • Unsupervised Volumetric Animation

    Siarohin, Aliaksandr; Menapace, Willi; Skorokhodov, Ivan; Olszewski, Kyle; Ren, Jian; Lee, Hsin-Ying; Chai, Menglei; Tulyakov, Sergey (IEEE, 2023-08-22) [Conference Paper]
    We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns the underlying object geometry and parts decomposition in an entirely unsupervised manner. This allows it to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. We primarily evaluate the framework on two video datasets: VoxCeleb 256 2 and TEDXPeople 256 2 . In addition, on the Cats 256 2 image dataset, we show it even learns compelling 3D geometry from still images. Finally, we show our model can obtain animatable 3D objects from a single or few images.

View more