Recent Submissions

  • RETINOBLASTOMA RELATED (RBR) interaction with key factors of the RNA-directed DNA methylation (RdDM) pathway

    Jesús, León-Ruiz; Annie, Espinal-Centeno; Blilou, Ikram; Ben, Scheres; Mario, Arteaga-Vázquez; Cruz-Ramirez, Luis Alfredo (Cold Spring Harbor Laboratory, 2022-01-07) [Preprint]
    SummaryTransposable elements and other repetitive elements are silenced by the RNA-directed DNA methylation pathway (RdDM). In RdDM, POLIV-derived transcripts are converted into double stranded RNA (dsRNA) by the activity of RDR2 and subsequently processed into 24 nucleotide short interfering RNAs (24 -nt siRNAs) by DCL3. 24-nt siRNAs are recruited by AGO4 and serve as guides to direct AGO4 - siRNA complexes to chromatin bound POLV-derived transcripts generated from the template/target DNA. The interaction between POLV, AGO4, DMS3, DRD1, RDM1 and DRM2 promotes DRM2-mediated $\textit{de novo}$ DNA methylation.The Arabidopsis Retinoblastoma protein homolog is a master regulator of cell cycle, stem cell maintenance and development. $\textit{In silico}$ exploration of RBR protein partners revealed that several members of the RdDM pathway contain a motif that confers high affinity binding to RBR, including the largest subunits of POLIV and POLV (NRPD1 and NRPE1), the shared second largest subunit of POLIV and POLV (NRPD/E2), RDR1, RDR2, DCL3, DRM2 and SUVR2. We demonstrate that RBR binds to DRM2, DRD1 and SUVR2. We also report that seedlings from loss -of-function mutants in RdDM and in $\textit{RBR}$ show similar phenotypes in the root apical meristem. Furthermore, we show that RdDM and SUVR2 targets are up-regulated in the $\textit{35S
  • Efficiently Disentangle Causal Representations

    Li, Yuanpeng; Hestness, Joel; Elhoseiny, Mohamed; Zhao, Liang; Church, Kenneth (arXiv, 2022-01-06) [Preprint]
    This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions. We approximate the difference with models' generalization abilities so that it fits in the standard machine learning framework and can be efficiently computed. In contrast to the state-of-the-art approach, which relies on the learner's adaptation speed to new distribution, the proposed approach only requires evaluating the model's generalization ability. We provide a theoretical explanation for the advantage of the proposed method, and our experiments show that the proposed technique is 1.9--11.0$\times$ more sample efficient and 9.4--32.4 times quicker than the previous method on various tasks. The source code is available at \url{https://github.com/yuanpeng16/EDCR}.
  • Applied phenomics and genomics for improving barley yellow dwarf resistance in winter wheat

    Silva, Paula; Evers, Byron; Kieffaber, Alexandria; Wang, Xu; Brown, Richard; Gao, Liangliang; Fritz, Allan K.; Crain, Jared; Poland, Jesse (Cold Spring Harbor Laboratory, 2022-01-06) [Preprint]
    Barley yellow dwarf (BYD) is one of the major viral diseases of cereals. Phenotyping BYD in wheat is extremely challenging due to similarities to other biotic and abiotic stresses. Breeding for resistance is additionally challenging as the wheat primary germplasm pool lacks genetic resistance, with most of the few resistance genes named to date originating from a wild relative species. The objectives of this study were to, i) evaluate the use of high-throughput phenotyping (HTP) from unmanned aerial systems to improve BYD assessment and selection, ii) identify genomic regions associated with BYD resistance, and iii) evaluate genomic prediction models ability to predict BYD resistance. Up to 107 wheat lines were phenotyped during each of five field seasons under both insecticide treated and untreated plots. Across all seasons, BYD severity was lower with the insecticide treatment and plant height (PTHTM) and grain yield (GY) showed increased values relative to untreated entries. Only 9.2% of the lines were positive for the presence of the translocated segment carrying resistance gene $\textit{Bdv2}$ on chromosome 7DL. Despite the low frequency, this region was identified through association mapping. Furthermore, we mapped a potentially novel genomic region for resistance on chromosome 5AS. Given the variable heritability of the trait (0.211 0.806), we obtained relatively good predictive ability for BYD severity ranging between 0.06 0.26. Including $\textit{Bdv2}$ on the predictive model had a large effect for predicting BYD but almost no effect for PTHTM and GY. This study was the first attempt to characterize BYD using field-HTP and apply GS to predict the disease severity. These methods have the potential to improve BYD characterization and identifying new sources of resistance will be crucial for delivering BYD resistant germplasm.
  • Scalable CMOS-BEOL compatible AlScN/2D Channel FE-FETs

    Kim, Kwan-Ho; Oh, Seyong; Fiagbenu, Merrilyn Mercy Adzo; Zheng, Jeffrey; Musavigharavi, Pariasadat; Kumar, Pawan; Trainor, Nicholas; Aljarb, Areej; Wan, Yi; Kim, Hyong Min; Katti, Keshava; Tang, Zichen; Tung, Vincent; Redwing, Joan; Stach, Eric A.; III, Roy H. Olsson; Jariwala, Deep (arXiv, 2022-01-06) [Preprint]
    Intimate integration of memory devices with logic transistors is a frontier challenge in computer hardware. This integration is essential for augmenting computational power concurrently with enhanced energy efficiency in big-data applications such as artificial intelligence. Despite decades of efforts, reliable, compact, energy efficient and scalable memory devices are elusive. Ferroelectric Field Effect Transistors (FE-FETs) are a promising candidate but their scalability and performance in a back-end-of-line (BEOL) process remain unattained. Here, we present scalable BEOL compatible FE-FETs using two-dimensional (2D) MoS2 channel and AlScN ferroelectric dielectric. We have fabricated a large array of FE-FETs with memory windows larger than 7.8 V, ON/OFF ratios of greater than 10^7, and ON current density greater than 250 uA/um, all at ~80 nm channel lengths. Our devices show stable retention up to 20000 secs and endurance up to 20000 cycles in addition to 4-bit pulse programmable memory features thereby opening a path towards scalable 3D hetero-integration of 2D semiconductor memory with Si CMOS logic.
  • Decision trees for regular factorial languages

    Moshkov, Mikhail (arXiv, 2022-01-06) [Preprint]
    In this paper, we study arbitrary regular factorial languages over a finite alphabet $\Sigma$. For the set of words $L(n)$ of the length $n$ belonging to a regular factorial language $L$, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word from $L(n)$, we should recognize it using queries each of which, for some $ i\in \{1,\ldots ,n\}$, returns the $i$th letter of the word. In the case of membership problem, for a given word over the alphabet $\Sigma$ of the length $n$, we should recognize if it belongs to the set $L(n)$ using the same queries. For a given problem and type of trees, instead of the minimum depth $h(n)$ of a decision tree of the considered type solving the problem for $L(n)$, we study the smoothed minimum depth $H(n)=\max\{h(m):m\le n\}$. With the growth of $n$, the smoothed minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a constant, or grows as a logarithm, or linearly. For other cases (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership problem deterministically and nondeterministically), with the growth of $n$, the smoothed minimum depth of decision trees is either bounded from above by a constant or grows linearly. As corollaries of the obtained results, we study joint behavior of smoothed minimum depths of decision trees for the considered four cases and describe five complexity classes of regular factorial languages. We also investigate the class of regular factorial languages over the alphabet $\{0,1\}$ each of which is given by one forbidden word.
  • Decision trees for binary subword-closed languages

    Moshkov, Mikhail (arXiv, 2022-01-05) [Preprint]
    In this paper, we study arbitrary subword-closed languages over the alphabet $\{0,1\}$ (binary subword-closed languages). For the set of words $L(n)$ of the length $n$ belonging to a binary subword-closed language $L$, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word from $L(n)$, we should recognize it using queries each of which, for some $i\in \{1,\ldots ,n\}$, returns the $i$th letter of the word. In the case of membership problem, for a given word over the alphabet $\{0,1\}$ of the length $n$, we should recognize if it belongs to the set $L(n)$ using the same queries. With the growth of $n$, the minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a constant, or grows as a logarithm, or linearly. For other types of trees and problems (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership problem deterministically and nondeterministically), with the growth of $n$, the minimum depth of decision trees is either bounded from above by a constant or grows linearly. We study joint behavior of minimum depths of the considered four types of decision trees and describe five complexity classes of binary subword-closed languages.
  • Efficient Importance Sampling Algorithm Applied to the Performance Analysis of Wireless Communication Systems Estimation

    Amar, Eya Ben; Rached, Nadhir Ben; Haji-Ali, Abdul-Lateef; Tempone, Raul (arXiv, 2022-01-04) [Preprint]
    When assessing the performance of wireless communication systems operating over fading channels, one often encounters the problem of computing expectations of some functional of sums of independent random variables (RVs). The outage probability (OP) at the output of Equal Gain Combining (EGC) and Maximum Ratio Combining (MRC) receivers is among the most important performance metrics that falls within this framework. In general, closed form expressions of expectations of functionals applied to sums of RVs are out of reach. A naive Monte Carlo (MC) simulation is of course an alternative approach. However, this method requires a large number of samples for rare event problems (small OP values for instance). Therefore, it is of paramount importance to use variance reduction techniques to develop fast and efficient estimation methods. In this work, we use importance sampling (IS), being known for its efficiency in requiring less computations for achieving the same accuracy requirement. In this line, we propose a state-dependent IS scheme based on a stochastic optimal control (SOC) formulation to calculate rare events quantities that could be written in a form of an expectation of some functional of sums of independent RVs. Our proposed algorithm is generic and can be applicable without any restriction on the univariate distributions of the different fading envelops/gains or on the functional that is applied to the sum. We apply our approach to the Log-Normal distribution to compute the OP at the output of diversity receivers with and without co-channel interference. For each case, we show numerically that the proposed state-dependent IS algorithm compares favorably to most of the well-known estimators dealing with similar problems.
  • Time and space complexity of deterministic and nondeterministic decision trees

    Moshkov, Mikhail (arXiv, 2022-01-04) [Preprint]
    In this paper, we study arbitrary infinite binary information systems each of which consists of an infinite set called universe and an infinite set of two-valued functions (attributes) defined on the universe. We consider the notion of a problem over information system which is described by a finite number of attributes and a mapping corresponding a decision to each tuple of attribute values. As algorithms for problem solving, we use deterministic and nondeterministic decision trees. As time and space complexity, we study the depth and the number of nodes in the decision trees. In the worst case, with the growth of the number of attributes in the problem description, (i) the minimum depth of deterministic decision trees grows either almost as logarithm or linearly, (ii) the minimum depth of nondeterministic decision trees either is bounded from above by a constant or grows linearly, (iii) the minimum number of nodes in deterministic decision trees has either polynomial or exponential growth, and (iv) the minimum number of nodes in nondeterministic decision trees has either polynomial or exponential growth. Based on these results, we divide the set of all infinite binary information systems into five complexity classes, and study for each class issues related to time-space trade-off for decision trees.
  • Rough analysis of computation trees

    Moshkov, Mikhail (arXiv, 2022-01-02) [Preprint]
    This paper deals with computation trees over an arbitrary structure consisting of a set along with collections of functions and predicates that are defined on it. It is devoted to the comparative analysis of three parameters of problems with $n$ input variables over this structure: the complexity of a problem description, the minimum complexity of a computation tree solving this problem deterministically, and the minimum complexity of a computation tree solving this problem nondeterministically. Rough classification of relationships among these parameters is considered and all possible seven types of these relations are enumerated. The changes of relation types with the growth of the number $n$ of input variables are studied.
  • Asymptotic Derivation of Multicomponent Compressible Flows with Heat Conduction and Mass Diffusion

    Georgiadis, Stefanos; Tzavaras, Athanasios (arXiv, 2021-12-27) [Preprint]
    A Type-I model of a multicomponent system of fluids with non-constant temperature is derived as the high-friction limit of a Type-II model via a Chapman-Enskog expansion. The asymptotic model is shown to fit into the general theory of hyperbolic-parabolic systems, by exploiting the entropy structure inherited through the asymptotic procedure. The exact computations are specified in the case of a two-component system. Finally, two convergence results for smooth solutions are presented, from the system with mass-diffusion and heat conduction to the corresponding system without mass-diffusion but including heat conduction and to its hyperbolic counterpart.
  • Unbiased Parameter Inference for a Class of Partially Observed Lévy-Process Models

    Ruzayqat, Hamza Mahmoud; Jasra, Ajay (arXiv, 2021-12-27) [Preprint]
    We consider the problem of static Bayesian inference for partially observed Lévy-process models. We develop a methodology which allows one to infer static parameters and some states of the process, without a bias from the time-discretization of the afore-mentioned Lévy process. The unbiased method is exceptionally amenable to parallel implementation and can be computationally efficient relative to competing approaches. We implement the method on S &P 500 log-return daily data and compare it to some Markov chain Monte Carlo (MCMC) algorithms.
  • Optimization and uncertainty quantification model for time-continuous geothermal energy extraction undergoing re-injection

    Hoteit, Hussein; He, Xupeng; Yan, Bicheng; Vahrenkamp, Volker (arXiv, 2021-12-10) [Preprint]
    Geothermal field modeling is often associated with uncertainties related to the subsurface static properties and the dynamics of fluid flow and heat transfer. Uncertainty quantification using simulations is a useful tool to design optimum field-development and to guide decision-making. The optimization process includes assessments of multiple time-dependent flow mechanisms, which are functions of operational parameters subject to subsurface uncertainties. This process requires careful determination of the parameter ranges, dependencies, and their probabilistic distribution functions. This study presents a new approach to assess time-dependent predictions of thermal recovery and produced-enthalpy rates, including uncertainty quantification and optimization. We use time-continuous and multi-objective uncertainty quantification for geothermal recovery, undergoing a water re-injection scheme. The ranges of operational and uncertainty parameters are determined from a collected database, including 135 geothermal fields worldwide. The uncertainty calculation is conducted non-intrusively, based on a workflow that couples low-fidelity models with Monte Carlo analysis. Full-physics reservoir simulations are used to construct and verify the low-fidelity models. The sampling process is performed with Design of Experiments, enhanced with space-filling, and combined with analysis of covariance to capture parameter dependencies. The predicted thermal recovery and produced-enthalpy rates are then evaluated as functions of the significant uncertainty parameters based on dimensionless groups. The workflow is applied for various geothermal fields to assess their optimum well-spacing in their well configuration. This approach offers an efficient and robust workflow for time-continuous uncertainty quantification and global sensitivity analysis applied for geothermal field modeling and optimization.
  • Local Mortality Impacts Due to Future Air Pollution Under Climate Change Scenarios

    Ingole, Vijendra; Dimitrova, Asya; Sampedro, Jon; Sacoor, Charfudin; Acacio, Sozinho; Juvekar, Sanjay; Roy, Sudipto; Moraga, Paula; Basagaña, Xavier; Ballester, Joan; Antó, Josep M.; Tonne, Cathryn (Submitted to Journal of Science of Total Environment, 2021-12-10) [Preprint]
    The health impacts of global climate change mitigation will affect local populations differently. We aimed to quantify the local health impacts due to fine particles (PM 2.5 ) under the governance arrangements embedded in the Shared Socioeconomic Pathways (SSPs1-5) under two greenhouse gas concentration scenarios (Representative Concentration Pathways (RCPs) 2.6 and 8.5) in local populations of Mozambique, India, and Spain.MethodsWe simulated the SSP-RCP scenarios using the Global Change Analysis Model, which was linked to the TM5-FASST model to estimate PM 2.5 levels. PM 2.5 levels were calibrated with local measurements. We used comparative risk assessment methods to estimate attributable premature deaths due to PM 2.5 linking local population and mortality data with PM 2.5 –mortality relationships from the literature. We incorporated population projections under the SSPs in sensitivity analysis.ResultsPM 2.5 attributable burdens in 2050 differed across SSP-RCP scenarios, and scenario-sensitivity varied across populations. Future attributable mortality burden of PM 2.5 was highly sensitive to assumptions about how populations will change according to SSP. SSPs reflecting high challenges for adaptation (SSPs 3 and 4) consistently resulted in the highest PM 2.5 attributable burdens mid-century.DiscussionOur analysis of local PM 2.5 attributable premature deaths under SSP-RCP scenarios in three local populations highlights the importance of both socioeconomic development and climate policy in reducing the health burden from air pollution. Sensitivity of future PM 2.5 mortality burden to SSPs was particularly evident in low- and midlle- income country settings due either to high air pollution levels or dynamic populations.
  • CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions

    Abdal, Rameen; Zhu, Peihao; Femiani, John; Mitra, Niloy J.; Wonka, Peter (arXiv, 2021-12-09) [Preprint]
    The success of StyleGAN has enabled unprecedented semantic editing capabilities, on both synthesized and real images. However, such editing operations are either trained with semantic supervision or described using human guidance. In another development, the CLIP architecture has been trained with internet-scale image and text pairings and has been shown to be useful in several zero-shot learning settings. In this work, we investigate how to effectively link the pretrained latent spaces of StyleGAN and CLIP, which in turn allows us to automatically extract semantically labeled edit directions from StyleGAN, finding and naming meaningful edit operations without any additional human guidance. Technically, we propose two novel building blocks; one for finding interesting CLIP directions and one for labeling arbitrary directions in CLIP latent space. The setup does not assume any pre-determined labels and hence we do not require any additional supervised text/attributes to build the editing framework. We evaluate the effectiveness of the proposed method and demonstrate that extraction of disentangled labeled StyleGAN edit directions is indeed possible, and reveals interesting and non-trivial edit directions.
  • On the Stability of Positive Semigroups

    Moral, Pierre; Horton, Emma; Jasra, Ajay (arXiv, 2021-12-07) [Preprint]
    The stability and contraction properties of positive integral semigroups on locally compact Polish spaces are investigated. We provide a novel analysis based on an extension of V-norm, Dobrushin-type, contraction techniques on functionally weighted Banach spaces for Markov operators. These are applied to a general class of positive and possibly time-inhomogeneous bounded integral semigroups and their normalised versions. Under mild regularity conditions, the Lipschitz-type contraction analysis presented in this article simplifies and extends several exponential estimates developed in the literature. The spectraltype theorems that we develop can also be seen as an extension of Perron-Frobenius and Krein-Rutman theorems for positive operators to time-varying positive semigroups. We review and illustrate in detail the impact of these results in the context of positive semigroups arising in transport theory, physics, mathematical biology and advanced signal processing.
  • Neural Networks for Infectious Diseases Detection: Prospects and Challenges

    Azeem, Muhammad; Javaid, Shumaila; Fahim, Hamza; Saeed, Nasir (arXiv, 2021-12-07) [Preprint]
    Artificial neural network (ANN) ability to learn, correct errors, and transform a large amount of raw data into useful medical decisions for treatment and care have increased its popularity for enhanced patient safety and quality of care. Therefore, this paper reviews the critical role of ANNs in providing valuable insights for patients' healthcare decisions and efficient disease diagnosis. We thoroughly review different types of ANNs presented in the existing literature that advanced ANNs adaptation for complex applications. Moreover, we also investigate ANN's advances for various disease diagnoses and treatments such as viral, skin, cancer, and COVID-19. Furthermore, we propose a novel deep Convolutional Neural Network (CNN) model called ConXNet for improving the detection accuracy of COVID-19 disease. ConXNet is trained and tested using different datasets, and it achieves more than 97% detection accuracy and precision, which is significantly better than existing models. Finally, we highlight future research directions and challenges such as complexity of the algorithms, insufficient available data, privacy and security, and integration of biosensing with ANNs. These research directions require considerable attention for improving the scope of ANNs for medical diagnostic and treatment applications.
  • Joint Posterior Inference for Latent Gaussian Models with R-INLA

    Chiuchiolo, Cristian; Niekerk, Janet van; Rue, Haavard (arXiv, 2021-12-06) [Preprint]
    Efficient Bayesian inference remains a computational challenge in hierarchical models. Simulation-based approaches such as Markov Chain Monte Carlo methods are still popular but have a large computational cost. When dealing with the large class of Latent Gaussian Models, the INLA methodology embedded in the R-INLA software provides accurate Bayesian inference by computing deterministic mixture representation to approximate the joint posterior, from which marginals are computed. The INLA approach has from the beginning been targeting to approximate univariate posteriors. In this paper we lay out the development foundation of the tools for also providing joint approximations for subsets of the latent field. These approximations inherit Gaussian copula structure and additionally provide corrections for skewness. The same idea is carried forward also to sampling from the mixture representation, which we now can adjust for skewness.
  • A review of diaphragmless shock tubes for interdisciplinary applications

    Janardhanraj, S.; Karthick, S. K.; Cohen, J.; Farooq, Aamir (arXiv, 2021-12-05) [Preprint]
    Shock tubes have emerged as an effective tool for applications in various fields of research and technology. The conventional mode of shock tube operation employs a frangible diaphragm to generate shockwaves. The last half-century has witnessed significant efforts to replace this diaphragm-bursting method with fast-acting valves. These diaphragmless methods have good repeatability, quick turnaround time between experiments, and produce a clean flow, free of diaphragm fragments in contrast to the conventional diaphragm-type operation. The constantly evolving valve designs are targeting shorter opening times for improved performance and efficiency. The present review is a compilation of the different diaphragmless shock tubes that have been conceptualized, developed, and implemented for various research endeavors. The discussions focus on essential factors, including the type of actuation mechanism, driver-driven configurations, valve opening time, shock formation distance, and operating pressure range, that ultimately influence the shockwave parameters obtained in the shock tube. A generalized mathematical model to study the behavior of these valves is developed. The advantages, limitations, and challenges in improving the performance of the valves are described. Finally, the present-day applications of diaphragmless shock tubes have been discussed, and their potential scope in expanding the frontiers of shockwave research and technology are presented.
  • Snapshot HDR Video Construction Using Coded Mask

    Alghamdi, Masheal; Fu, Qiang; Thabet, Ali Kassem; Heidrich, Wolfgang (arXiv, 2021-12-05) [Preprint]
    This paper study the reconstruction of High Dynamic Range (HDR) video from snapshot-coded LDR video. Constructing an HDR video requires restoring the HDR values for each frame and maintaining the consistency between successive frames. HDR image acquisition from single image capture, also known as snapshot HDR imaging, can be achieved in several ways. For example, the reconfigurable snapshot HDR camera is realized by introducing an optical element into the optical stack of the camera; by placing a coded mask at a small standoff distance in front of the sensor. High-quality HDR image can be recovered from the captured coded image using deep learning methods. This study utilizes 3D-CNNs to perform a joint demosaicking, denoising, and HDR video reconstruction from coded LDR video. We enforce more temporally consistent HDR video reconstruction by introducing a temporal loss function that considers the short-term and long-term consistency. The obtained results are promising and could lead to affordable HDR video capture using conventional cameras.
  • MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

    Soldan, Mattia; Pardo, Alejandro; Alcázar, Juan León; Heilbron, Fabian Caba; Zhao, Chen; Giancola, Silvio; Ghanem, Bernard (arXiv, 2021-12-01) [Preprint]
    The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the video-language grounding task. Recent works have begun to discover significant limitations in these datasets, suggesting that state-of-the-art techniques commonly overfit to hidden dataset biases. In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets. MAD's collection strategy enables a novel and more challenging version of video-language grounding, where short temporal moments (typically seconds long) must be accurately grounded in diverse long-form videos that can last up to three hours.

View more