Preprints
Recent Submissions

RETINOBLASTOMA RELATED (RBR) interaction with key factors of the RNAdirected DNA methylation (RdDM) pathway(Cold Spring Harbor Laboratory, 20220107) [Preprint]SummaryTransposable elements and other repetitive elements are silenced by the RNAdirected DNA methylation pathway (RdDM). In RdDM, POLIVderived transcripts are converted into double stranded RNA (dsRNA) by the activity of RDR2 and subsequently processed into 24 nucleotide short interfering RNAs (24 nt siRNAs) by DCL3. 24nt siRNAs are recruited by AGO4 and serve as guides to direct AGO4  siRNA complexes to chromatin bound POLVderived transcripts generated from the template/target DNA. The interaction between POLV, AGO4, DMS3, DRD1, RDM1 and DRM2 promotes DRM2mediated $\textit{de novo}$ DNA methylation.The Arabidopsis Retinoblastoma protein homolog is a master regulator of cell cycle, stem cell maintenance and development. $\textit{In silico}$ exploration of RBR protein partners revealed that several members of the RdDM pathway contain a motif that confers high affinity binding to RBR, including the largest subunits of POLIV and POLV (NRPD1 and NRPE1), the shared second largest subunit of POLIV and POLV (NRPD/E2), RDR1, RDR2, DCL3, DRM2 and SUVR2. We demonstrate that RBR binds to DRM2, DRD1 and SUVR2. We also report that seedlings from loss offunction mutants in RdDM and in $\textit{RBR}$ show similar phenotypes in the root apical meristem. Furthermore, we show that RdDM and SUVR2 targets are upregulated in the $\textit{35S

Efficiently Disentangle Causal Representations(arXiv, 20220106) [Preprint]This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions. We approximate the difference with models' generalization abilities so that it fits in the standard machine learning framework and can be efficiently computed. In contrast to the stateoftheart approach, which relies on the learner's adaptation speed to new distribution, the proposed approach only requires evaluating the model's generalization ability. We provide a theoretical explanation for the advantage of the proposed method, and our experiments show that the proposed technique is 1.911.0$\times$ more sample efficient and 9.432.4 times quicker than the previous method on various tasks. The source code is available at \url{https://github.com/yuanpeng16/EDCR}.

Applied phenomics and genomics for improving barley yellow dwarf resistance in winter wheat(Cold Spring Harbor Laboratory, 20220106) [Preprint]Barley yellow dwarf (BYD) is one of the major viral diseases of cereals. Phenotyping BYD in wheat is extremely challenging due to similarities to other biotic and abiotic stresses. Breeding for resistance is additionally challenging as the wheat primary germplasm pool lacks genetic resistance, with most of the few resistance genes named to date originating from a wild relative species. The objectives of this study were to, i) evaluate the use of highthroughput phenotyping (HTP) from unmanned aerial systems to improve BYD assessment and selection, ii) identify genomic regions associated with BYD resistance, and iii) evaluate genomic prediction models ability to predict BYD resistance. Up to 107 wheat lines were phenotyped during each of five field seasons under both insecticide treated and untreated plots. Across all seasons, BYD severity was lower with the insecticide treatment and plant height (PTHTM) and grain yield (GY) showed increased values relative to untreated entries. Only 9.2% of the lines were positive for the presence of the translocated segment carrying resistance gene $\textit{Bdv2}$ on chromosome 7DL. Despite the low frequency, this region was identified through association mapping. Furthermore, we mapped a potentially novel genomic region for resistance on chromosome 5AS. Given the variable heritability of the trait (0.211 0.806), we obtained relatively good predictive ability for BYD severity ranging between 0.06 0.26. Including $\textit{Bdv2}$ on the predictive model had a large effect for predicting BYD but almost no effect for PTHTM and GY. This study was the first attempt to characterize BYD using fieldHTP and apply GS to predict the disease severity. These methods have the potential to improve BYD characterization and identifying new sources of resistance will be crucial for delivering BYD resistant germplasm.

Scalable CMOSBEOL compatible AlScN/2D Channel FEFETs(arXiv, 20220106) [Preprint]Intimate integration of memory devices with logic transistors is a frontier challenge in computer hardware. This integration is essential for augmenting computational power concurrently with enhanced energy efficiency in bigdata applications such as artificial intelligence. Despite decades of efforts, reliable, compact, energy efficient and scalable memory devices are elusive. Ferroelectric Field Effect Transistors (FEFETs) are a promising candidate but their scalability and performance in a backendofline (BEOL) process remain unattained. Here, we present scalable BEOL compatible FEFETs using twodimensional (2D) MoS2 channel and AlScN ferroelectric dielectric. We have fabricated a large array of FEFETs with memory windows larger than 7.8 V, ON/OFF ratios of greater than 10^7, and ON current density greater than 250 uA/um, all at ~80 nm channel lengths. Our devices show stable retention up to 20000 secs and endurance up to 20000 cycles in addition to 4bit pulse programmable memory features thereby opening a path towards scalable 3D heterointegration of 2D semiconductor memory with Si CMOS logic.

Decision trees for regular factorial languages(arXiv, 20220106) [Preprint]In this paper, we study arbitrary regular factorial languages over a finite alphabet $\Sigma$. For the set of words $L(n)$ of the length $n$ belonging to a regular factorial language $L$, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word from $L(n)$, we should recognize it using queries each of which, for some $ i\in \{1,\ldots ,n\}$, returns the $i$th letter of the word. In the case of membership problem, for a given word over the alphabet $\Sigma$ of the length $n$, we should recognize if it belongs to the set $L(n)$ using the same queries. For a given problem and type of trees, instead of the minimum depth $h(n)$ of a decision tree of the considered type solving the problem for $L(n)$, we study the smoothed minimum depth $H(n)=\max\{h(m):m\le n\}$. With the growth of $n$, the smoothed minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a constant, or grows as a logarithm, or linearly. For other cases (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership problem deterministically and nondeterministically), with the growth of $n$, the smoothed minimum depth of decision trees is either bounded from above by a constant or grows linearly. As corollaries of the obtained results, we study joint behavior of smoothed minimum depths of decision trees for the considered four cases and describe five complexity classes of regular factorial languages. We also investigate the class of regular factorial languages over the alphabet $\{0,1\}$ each of which is given by one forbidden word.

Decision trees for binary subwordclosed languages(arXiv, 20220105) [Preprint]In this paper, we study arbitrary subwordclosed languages over the alphabet $\{0,1\}$ (binary subwordclosed languages). For the set of words $L(n)$ of the length $n$ belonging to a binary subwordclosed language $L$, we investigate the depth of decision trees solving the recognition and the membership problems deterministically and nondeterministically. In the case of recognition problem, for a given word from $L(n)$, we should recognize it using queries each of which, for some $i\in \{1,\ldots ,n\}$, returns the $i$th letter of the word. In the case of membership problem, for a given word over the alphabet $\{0,1\}$ of the length $n$, we should recognize if it belongs to the set $L(n)$ using the same queries. With the growth of $n$, the minimum depth of decision trees solving the problem of recognition deterministically is either bounded from above by a constant, or grows as a logarithm, or linearly. For other types of trees and problems (decision trees solving the problem of recognition nondeterministically, and decision trees solving the membership problem deterministically and nondeterministically), with the growth of $n$, the minimum depth of decision trees is either bounded from above by a constant or grows linearly. We study joint behavior of minimum depths of the considered four types of decision trees and describe five complexity classes of binary subwordclosed languages.

Efficient Importance Sampling Algorithm Applied to the Performance Analysis of Wireless Communication Systems Estimation(arXiv, 20220104) [Preprint]When assessing the performance of wireless communication systems operating over fading channels, one often encounters the problem of computing expectations of some functional of sums of independent random variables (RVs). The outage probability (OP) at the output of Equal Gain Combining (EGC) and Maximum Ratio Combining (MRC) receivers is among the most important performance metrics that falls within this framework. In general, closed form expressions of expectations of functionals applied to sums of RVs are out of reach. A naive Monte Carlo (MC) simulation is of course an alternative approach. However, this method requires a large number of samples for rare event problems (small OP values for instance). Therefore, it is of paramount importance to use variance reduction techniques to develop fast and efficient estimation methods. In this work, we use importance sampling (IS), being known for its efficiency in requiring less computations for achieving the same accuracy requirement. In this line, we propose a statedependent IS scheme based on a stochastic optimal control (SOC) formulation to calculate rare events quantities that could be written in a form of an expectation of some functional of sums of independent RVs. Our proposed algorithm is generic and can be applicable without any restriction on the univariate distributions of the different fading envelops/gains or on the functional that is applied to the sum. We apply our approach to the LogNormal distribution to compute the OP at the output of diversity receivers with and without cochannel interference. For each case, we show numerically that the proposed statedependent IS algorithm compares favorably to most of the wellknown estimators dealing with similar problems.

Time and space complexity of deterministic and nondeterministic decision trees(arXiv, 20220104) [Preprint]In this paper, we study arbitrary infinite binary information systems each of which consists of an infinite set called universe and an infinite set of twovalued functions (attributes) defined on the universe. We consider the notion of a problem over information system which is described by a finite number of attributes and a mapping corresponding a decision to each tuple of attribute values. As algorithms for problem solving, we use deterministic and nondeterministic decision trees. As time and space complexity, we study the depth and the number of nodes in the decision trees. In the worst case, with the growth of the number of attributes in the problem description, (i) the minimum depth of deterministic decision trees grows either almost as logarithm or linearly, (ii) the minimum depth of nondeterministic decision trees either is bounded from above by a constant or grows linearly, (iii) the minimum number of nodes in deterministic decision trees has either polynomial or exponential growth, and (iv) the minimum number of nodes in nondeterministic decision trees has either polynomial or exponential growth. Based on these results, we divide the set of all infinite binary information systems into five complexity classes, and study for each class issues related to timespace tradeoff for decision trees.

Rough analysis of computation trees(arXiv, 20220102) [Preprint]This paper deals with computation trees over an arbitrary structure consisting of a set along with collections of functions and predicates that are defined on it. It is devoted to the comparative analysis of three parameters of problems with $n$ input variables over this structure: the complexity of a problem description, the minimum complexity of a computation tree solving this problem deterministically, and the minimum complexity of a computation tree solving this problem nondeterministically. Rough classification of relationships among these parameters is considered and all possible seven types of these relations are enumerated. The changes of relation types with the growth of the number $n$ of input variables are studied.

Asymptotic Derivation of Multicomponent Compressible Flows with Heat Conduction and Mass Diffusion(arXiv, 20211227) [Preprint]A TypeI model of a multicomponent system of fluids with nonconstant temperature is derived as the highfriction limit of a TypeII model via a ChapmanEnskog expansion. The asymptotic model is shown to fit into the general theory of hyperbolicparabolic systems, by exploiting the entropy structure inherited through the asymptotic procedure. The exact computations are specified in the case of a twocomponent system. Finally, two convergence results for smooth solutions are presented, from the system with massdiffusion and heat conduction to the corresponding system without massdiffusion but including heat conduction and to its hyperbolic counterpart.

Unbiased Parameter Inference for a Class of Partially Observed LévyProcess Models(arXiv, 20211227) [Preprint]We consider the problem of static Bayesian inference for partially observed Lévyprocess models. We develop a methodology which allows one to infer static parameters and some states of the process, without a bias from the timediscretization of the aforementioned Lévy process. The unbiased method is exceptionally amenable to parallel implementation and can be computationally efficient relative to competing approaches. We implement the method on S &P 500 logreturn daily data and compare it to some Markov chain Monte Carlo (MCMC) algorithms.

Optimization and uncertainty quantification model for timecontinuous geothermal energy extraction undergoing reinjection(arXiv, 20211210) [Preprint]Geothermal field modeling is often associated with uncertainties related to the subsurface static properties and the dynamics of fluid flow and heat transfer. Uncertainty quantification using simulations is a useful tool to design optimum fielddevelopment and to guide decisionmaking. The optimization process includes assessments of multiple timedependent flow mechanisms, which are functions of operational parameters subject to subsurface uncertainties. This process requires careful determination of the parameter ranges, dependencies, and their probabilistic distribution functions. This study presents a new approach to assess timedependent predictions of thermal recovery and producedenthalpy rates, including uncertainty quantification and optimization. We use timecontinuous and multiobjective uncertainty quantification for geothermal recovery, undergoing a water reinjection scheme. The ranges of operational and uncertainty parameters are determined from a collected database, including 135 geothermal fields worldwide. The uncertainty calculation is conducted nonintrusively, based on a workflow that couples lowfidelity models with Monte Carlo analysis. Fullphysics reservoir simulations are used to construct and verify the lowfidelity models. The sampling process is performed with Design of Experiments, enhanced with spacefilling, and combined with analysis of covariance to capture parameter dependencies. The predicted thermal recovery and producedenthalpy rates are then evaluated as functions of the significant uncertainty parameters based on dimensionless groups. The workflow is applied for various geothermal fields to assess their optimum wellspacing in their well configuration. This approach offers an efficient and robust workflow for timecontinuous uncertainty quantification and global sensitivity analysis applied for geothermal field modeling and optimization.

Local Mortality Impacts Due to Future Air Pollution Under Climate Change Scenarios(Submitted to Journal of Science of Total Environment, 20211210) [Preprint]The health impacts of global climate change mitigation will affect local populations differently. We aimed to quantify the local health impacts due to fine particles (PM 2.5 ) under the governance arrangements embedded in the Shared Socioeconomic Pathways (SSPs15) under two greenhouse gas concentration scenarios (Representative Concentration Pathways (RCPs) 2.6 and 8.5) in local populations of Mozambique, India, and Spain.MethodsWe simulated the SSPRCP scenarios using the Global Change Analysis Model, which was linked to the TM5FASST model to estimate PM 2.5 levels. PM 2.5 levels were calibrated with local measurements. We used comparative risk assessment methods to estimate attributable premature deaths due to PM 2.5 linking local population and mortality data with PM 2.5 –mortality relationships from the literature. We incorporated population projections under the SSPs in sensitivity analysis.ResultsPM 2.5 attributable burdens in 2050 differed across SSPRCP scenarios, and scenariosensitivity varied across populations. Future attributable mortality burden of PM 2.5 was highly sensitive to assumptions about how populations will change according to SSP. SSPs reflecting high challenges for adaptation (SSPs 3 and 4) consistently resulted in the highest PM 2.5 attributable burdens midcentury.DiscussionOur analysis of local PM 2.5 attributable premature deaths under SSPRCP scenarios in three local populations highlights the importance of both socioeconomic development and climate policy in reducing the health burden from air pollution. Sensitivity of future PM 2.5 mortality burden to SSPs was particularly evident in low and midlle income country settings due either to high air pollution levels or dynamic populations.

CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions(arXiv, 20211209) [Preprint]The success of StyleGAN has enabled unprecedented semantic editing capabilities, on both synthesized and real images. However, such editing operations are either trained with semantic supervision or described using human guidance. In another development, the CLIP architecture has been trained with internetscale image and text pairings and has been shown to be useful in several zeroshot learning settings. In this work, we investigate how to effectively link the pretrained latent spaces of StyleGAN and CLIP, which in turn allows us to automatically extract semantically labeled edit directions from StyleGAN, finding and naming meaningful edit operations without any additional human guidance. Technically, we propose two novel building blocks; one for finding interesting CLIP directions and one for labeling arbitrary directions in CLIP latent space. The setup does not assume any predetermined labels and hence we do not require any additional supervised text/attributes to build the editing framework. We evaluate the effectiveness of the proposed method and demonstrate that extraction of disentangled labeled StyleGAN edit directions is indeed possible, and reveals interesting and nontrivial edit directions.

On the Stability of Positive Semigroups(arXiv, 20211207) [Preprint]The stability and contraction properties of positive integral semigroups on locally compact Polish spaces are investigated. We provide a novel analysis based on an extension of Vnorm, Dobrushintype, contraction techniques on functionally weighted Banach spaces for Markov operators. These are applied to a general class of positive and possibly timeinhomogeneous bounded integral semigroups and their normalised versions. Under mild regularity conditions, the Lipschitztype contraction analysis presented in this article simplifies and extends several exponential estimates developed in the literature. The spectraltype theorems that we develop can also be seen as an extension of PerronFrobenius and KreinRutman theorems for positive operators to timevarying positive semigroups. We review and illustrate in detail the impact of these results in the context of positive semigroups arising in transport theory, physics, mathematical biology and advanced signal processing.

Neural Networks for Infectious Diseases Detection: Prospects and Challenges(arXiv, 20211207) [Preprint]Artificial neural network (ANN) ability to learn, correct errors, and transform a large amount of raw data into useful medical decisions for treatment and care have increased its popularity for enhanced patient safety and quality of care. Therefore, this paper reviews the critical role of ANNs in providing valuable insights for patients' healthcare decisions and efficient disease diagnosis. We thoroughly review different types of ANNs presented in the existing literature that advanced ANNs adaptation for complex applications. Moreover, we also investigate ANN's advances for various disease diagnoses and treatments such as viral, skin, cancer, and COVID19. Furthermore, we propose a novel deep Convolutional Neural Network (CNN) model called ConXNet for improving the detection accuracy of COVID19 disease. ConXNet is trained and tested using different datasets, and it achieves more than 97% detection accuracy and precision, which is significantly better than existing models. Finally, we highlight future research directions and challenges such as complexity of the algorithms, insufficient available data, privacy and security, and integration of biosensing with ANNs. These research directions require considerable attention for improving the scope of ANNs for medical diagnostic and treatment applications.

Joint Posterior Inference for Latent Gaussian Models with RINLA(arXiv, 20211206) [Preprint]Efficient Bayesian inference remains a computational challenge in hierarchical models. Simulationbased approaches such as Markov Chain Monte Carlo methods are still popular but have a large computational cost. When dealing with the large class of Latent Gaussian Models, the INLA methodology embedded in the RINLA software provides accurate Bayesian inference by computing deterministic mixture representation to approximate the joint posterior, from which marginals are computed. The INLA approach has from the beginning been targeting to approximate univariate posteriors. In this paper we lay out the development foundation of the tools for also providing joint approximations for subsets of the latent field. These approximations inherit Gaussian copula structure and additionally provide corrections for skewness. The same idea is carried forward also to sampling from the mixture representation, which we now can adjust for skewness.

A review of diaphragmless shock tubes for interdisciplinary applications(arXiv, 20211205) [Preprint]Shock tubes have emerged as an effective tool for applications in various fields of research and technology. The conventional mode of shock tube operation employs a frangible diaphragm to generate shockwaves. The last halfcentury has witnessed significant efforts to replace this diaphragmbursting method with fastacting valves. These diaphragmless methods have good repeatability, quick turnaround time between experiments, and produce a clean flow, free of diaphragm fragments in contrast to the conventional diaphragmtype operation. The constantly evolving valve designs are targeting shorter opening times for improved performance and efficiency. The present review is a compilation of the different diaphragmless shock tubes that have been conceptualized, developed, and implemented for various research endeavors. The discussions focus on essential factors, including the type of actuation mechanism, driverdriven configurations, valve opening time, shock formation distance, and operating pressure range, that ultimately influence the shockwave parameters obtained in the shock tube. A generalized mathematical model to study the behavior of these valves is developed. The advantages, limitations, and challenges in improving the performance of the valves are described. Finally, the presentday applications of diaphragmless shock tubes have been discussed, and their potential scope in expanding the frontiers of shockwave research and technology are presented.

Snapshot HDR Video Construction Using Coded Mask(arXiv, 20211205) [Preprint]This paper study the reconstruction of High Dynamic Range (HDR) video from snapshotcoded LDR video. Constructing an HDR video requires restoring the HDR values for each frame and maintaining the consistency between successive frames. HDR image acquisition from single image capture, also known as snapshot HDR imaging, can be achieved in several ways. For example, the reconfigurable snapshot HDR camera is realized by introducing an optical element into the optical stack of the camera; by placing a coded mask at a small standoff distance in front of the sensor. Highquality HDR image can be recovered from the captured coded image using deep learning methods. This study utilizes 3DCNNs to perform a joint demosaicking, denoising, and HDR video reconstruction from coded LDR video. We enforce more temporally consistent HDR video reconstruction by introducing a temporal loss function that considers the shortterm and longterm consistency. The obtained results are promising and could lead to affordable HDR video capture using conventional cameras.

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions(arXiv, 20211201) [Preprint]The recent and increasing interest in videolanguage research has driven the development of largescale datasets that enable dataintensive machine learning techniques. In comparison, limited effort has been made at assessing the fitness of these datasets for the videolanguage grounding task. Recent works have begun to discover significant limitations in these datasets, suggesting that stateoftheart techniques commonly overfit to hidden dataset biases. In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video and exhibits a significant reduction in the currently diagnosed biases for videolanguage grounding datasets. MAD's collection strategy enables a novel and more challenging version of videolanguage grounding, where short temporal moments (typically seconds long) must be accurately grounded in diverse longform videos that can last up to three hours.