• Appendices for Modeling Inter-vehicle Communication in Multi-lane Highways: A Stochastic Geometry Approach

      Farooq, Muhammad Junaid; ElSawy, Hesham; Alouini, Mohamed-Slim (2015-03-16)
    • Appendices for: Improper Signaling in Two-Path Relay Channels

      Gaafar, Mohamed; Amin, Osama; Schaefer, Rafael F.; Alouini, Mohamed-Slim (2016-12-01)
      This document contains the appendices for the work in “Improper Signaling in Two-Path Relay Channels,” which is submitted to 2017 IEEE International Conference on Communications (ICC) Workshop on Full-Duplex Communications for Future Wireless Networks, Paris, France.
    • Application of Bayesian Networks for Estimation of Individual Psychological Characteristics

      Litvinenko, Alexander; Litvinenko, Natalya (2017-07-19)
      In this paper we apply Bayesian networks for developing more accurate final overall estimations of psychological characteristics of an individual, based on psychological test results. Psychological tests which identify how much an individual possesses a certain factor are very popular and quite common in the modern world. We call this value for a given factor -- the final overall estimation. Examples of factors could be stress resistance, the readiness to take a risk, the ability to concentrate on certain complicated work and many others. An accurate qualitative and comprehensive assessment of human potential is one of the most important challenges in any company or collective. The most common way of studying psychological characteristics of each single person is testing. Psychologists and sociologists are constantly working on improvement of the quality of their tests. Despite serious work, done by psychologists, the questions in tests often do not produce enough feedback due to the use of relatively poor estimation systems. The overall estimation is usually based on personal experiences and the subjective perception of a psychologist or a group of psychologists about the investigated psychological personality factors.
    • Asynchronous Task-Based Polar Decomposition on Manycore Architectures

      Sukkari, Dalal; Ltaief, Hatem; Faverge, Mathieu; Keyes, David E. (2016-10-25)
      This paper introduces the first asynchronous, task-based implementation of the polar decomposition on manycore architectures. Based on a new formulation of the iterative QR dynamically-weighted Halley algorithm (QDWH) for the calculation of the polar decomposition, the proposed implementation replaces the original and hostile LU factorization for the condition number estimator by the more adequate QR factorization to enable software portability across various architectures. Relying on fine-grained computations, the novel task-based implementation is also capable of taking advantage of the identity structure of the matrix involved during the QDWH iterations, which decreases the overall algorithmic complexity. Furthermore, the artifactual synchronization points have been severely weakened compared to previous implementations, unveiling look-ahead opportunities for better hardware occupancy. The overall QDWH-based polar decomposition can then be represented as a directed acyclic graph (DAG), where nodes represent computational tasks and edges define the inter-task data dependencies. The StarPU dynamic runtime system is employed to traverse the DAG, to track the various data dependencies and to asynchronously schedule the computational tasks on the underlying hardware resources, resulting in an out-of-order task scheduling. Benchmarking experiments show significant improvements against existing state-of-the-art high performance implementations (i.e., Intel MKL and Elemental) for the polar decomposition on latest shared-memory vendors' systems (i.e., Intel Haswell/Broadwell/Knights Landing, NVIDIA K80/P100 GPUs and IBM Power8), while maintaining high numerical accuracy.
    • Batched Tile Low-Rank GEMM on GPUs

      Charara, Ali; Keyes, David E.; Ltaief, Hatem (2018-02)
      Dense General Matrix-Matrix (GEMM) multiplication is a core operation of the Basic Linear Algebra Subroutines (BLAS) library, and therefore, often resides at the bottom of the traditional software stack for most of the scientific applications. In fact, chip manufacturers give a special attention to the GEMM kernel implementation since this is exactly where most of the high-performance software libraries extract the hardware performance. With the emergence of big data applications involving large data-sparse, hierarchically low-rank matrices, the off-diagonal tiles can be compressed to reduce the algorithmic complexity and the memory footprint. The resulting tile low-rank (TLR) data format is composed of small data structures, which retains the most significant information for each tile. However, to operate on low-rank tiles, a new GEMM operation and its corresponding API have to be designed on GPUs so that it can exploit the data sparsity structure of the matrix while leveraging the underlying TLR compression format. The main idea consists in aggregating all operations onto a single kernel launch to compensate for their low arithmetic intensities and to mitigate the data transfer overhead on GPUs. The new TLR GEMM kernel outperforms the cuBLAS dense batched GEMM by more than an order of magnitude and creates new opportunities for TLR advance algorithms.
    • Borehole Tool for the Comprehensive Characterization of Hydrate-bearing Sediments

      Dai, Sheng; Santamarina, Carlos (Office of Scientific and Technical Information (OSTI), 2018-02-01)
      Reservoir characterization and simulation require reliable parameters to anticipate hydrate deposits responses and production rates. The acquisition of the required fundamental properties currently relies on wireline logging, pressure core testing, and/or laboratory ob-servations of synthesized specimens, which are challenged by testing capabilities and in-nate sampling disturbances. The project reviews hydrate-bearing sediments, properties, and inherent sampling effects, albeit lessen with the developments in pressure core technology, in order to develop robust correlations with index parameters. The resulting information is incorporated into a tool for optimal field characterization and parameter selection with un-certainty analyses. Ultimately, the project develops a borehole tool for the comprehensive characterization of hydrate-bearing sediments at in situ, with the design recognizing past developments and characterization experience and benefited from the inspiration of nature and sensor miniaturization.
    • Capacity Bounds for Parallel Optical Wireless Channels

      Chaaban, Anas; Rezki, Zouheir; Alouini, Mohamed-Slim (2016-01)
      A system consisting of parallel optical wireless channels with a total average intensity constraint is studied. Capacity upper and lower bounds for this system are derived. Under perfect channel-state information at the transmitter (CSIT), the bounds have to be optimized with respect to the power allocation over the parallel channels. The optimization of the lower bound is non-convex, however, the KKT conditions can be used to find a list of possible solutions one of which is optimal. The optimal solution can then be found by an exhaustive search algorithm, which is computationally expensive. To overcome this, we propose low-complexity power allocation algorithms which are nearly optimal. The optimized capacity lower bound nearly coincides with the capacity at high SNR. Without CSIT, our capacity bounds lead to upper and lower bounds on the outage probability. The outage probability bounds meet at high SNR. The system with average and peak intensity constraints is also discussed.
    • Comparison of Low-Complexity Diversity Schemes for Dual-Hop AF Relaying Systems

      Gaaloul, Fakhreddine; Alouini, Mohamed-Slim; Radaydeh, Redha M. (Institute of Electrical and Electronics Engineers (IEEE), 2012-02-13)
      This paper investigates the performance of two low-complexity combining schemes, which are based on one- or two-phase observation, to mitigate multipath fading in dual-hop amplify-and-forward relaying systems. For the one-phase-based combining, a single-antenna station is assumed to relay information from a multiple-antenna transmitter to a multiple-antenna receiver, and the activation of the receive antennas is adaptively performed based on the second-hop statistics, regardless of the first-hop conditions. On the other hand, the two-phase-based combining suggests using multiple single-antenna stations between the multiple-antenna transmitter and the single-antenna receiver, where the suitable set of active relays is identified according to the precombining end-to-end fading conditions. To facilitate comparisons between the two schemes, formulations for the statistics of the combined signal-to-noise ratio and some performance measures are presented. Numerical and simulation results are shown to clarify the tradeoff between the achieved diversity-array gain, the processing complexity, and the power consumption.
    • Computation of the Response Surface in the Tensor Train data format

      Dolgov, Sergey; Khoromskij, Boris N.; Litvinenko, Alexander; Matthies, Hermann G. (2014-06-11)
      We apply the Tensor Train (TT) approximation to construct the Polynomial Chaos Expansion (PCE) of a random field, and solve the stochastic elliptic diffusion PDE with the stochastic Galerkin discretization. We compare two strategies of the polynomial chaos expansion: sparse and full polynomial (multi-index) sets. In the full set, the polynomial orders are chosen independently in each variable, which provides higher flexibility and accuracy. However, the total amount of degrees of freedom grows exponentially with the number of stochastic coordinates. To cope with this curse of dimensionality, the data is kept compressed in the TT decomposition, a recurrent low-rank factorization. PCE computations on sparse grids sets are extensively studied, but the TT representation for PCE is a novel approach that is investigated in this paper. We outline how to deduce the PCE from the covariance matrix, assemble the Galerkin operator, and evaluate some post-processing (mean, variance, Sobol indices), staying within the low-rank framework. The most demanding are two stages. First, we interpolate PCE coefficients in the TT format using a few number of samples, which is performed via the block cross approximation method. Second, we solve the discretized equation (large linear system) via the alternating minimal energy algorithm. In the numerical experiments we demonstrate that the full expansion set encapsulated in the TT format is indeed preferable in cases when high accuracy and high polynomial orders are required.
    • Design and Analysis of Delayed Chip Slope Modulation in Optical Wireless Communication

      Park, Kihong; Alouini, Mohamed-Slim (2015-08-23)
      In this letter, we propose a novel slope-based binary modulation called delayed chip slope modulation (DCSM) and develop a chip-based hard-decision receiver to demodulate the resulting signal, detect the chip sequence, and decode the input bit sequence. Shorter duration of chips than bit duration are used to represent the change of state in an amplitude level according to consecutive bit information and to exploit the trade-off between bandwidth and power efficiency. We analyze the power spectral density and error rate performance of the proposed DCSM. We show from numerical results that the DCSM scheme can exploit spectrum density more efficiently than the reference schemes while providing an error rate performance comparable to conventional modulation schemes.
    • A Direct Radiative Transfer Equation Solver for Path Loss Calculation of Underwater Optical Wireless Channels

      Li, Changping; Park, Ki-Hong; Alouini, Mohamed-Slim (2014-11-10)
      In this report, we propose a fast numerical solution for the steady state radiative transfer equation in order to calculate the path loss due to light absorption and scattering in various type of underwater channels. In the proposed scheme, we apply a direct non-uniform method to discretize the angular space and an upwind type finite difference method to discretize the spatial space. A Gauss-Seidel iterative method is then applied to solve the fully discretized system of linear equations. The accuracy and efficiency of the proposed scheme is validated by Monte Carlo simulations.
    • Efficient Outage Probability Evaluation of Diversity Receivers Over Generalized Gamma Channels

      Ben Issaid, Chaouki; Alouini, Mohamed-Slim; Tempone, Raul (2016-10)
      In this paper, we are interested in determining the cumulative distribution function of the sum of generalized Gamma in the setting of rare event simulations. To this end, we present an efficient importance sampling estimator. The main result of this work is the bounded relative property of the proposed estimator. This result is used to accurately estimate the outage probability of multibranch maximum ratio combining and equal gain combining diversity receivers over generalized Gamma fading channels. Selected numerical simulations are discussed to show the robustness of our estimator compared to naive Monte Carlo.
    • Energy-Efficient Power Allocation for Fixed-Gain Amplify-and-Forward Relay Networks with Partial Channel State Information

      Zafar, Ammar; Alouini, Mohamed-Slim; Chen, Yunfei; Radaydeh, Redha M. (King Abdullah University of Science and Technology, 2012-06)
      In this report, energy-efficient transmission and power allocation for fixed-gain amplify-and-forward relay networks with partial channel state information (CSI) are studied. In the energy-efficiency problem, the total power consumed is minimized while keeping the signal-to-noise-ratio (SNR) above a certain threshold. In the dual problem of power allocation, the end-to-end SNR is maximized under individual and global power constraints. Closed-form expressions for the optimal source and relay powers and the Lagrangian multiplier are obtained. Numerical results show that the optimal power allocation with partial CSI provides comparable performance as optimal power allocation with full CSI at low SNR.
    • Error Rates of M-PAM and M-QAM in Generalized Fading and Generalized Gaussian Noise Environments

      Soury, Hamza; Alouini, Mohamed-Slim; Yilmaz, Ferkan (IEEE International Symposium on Information Theory - July, 2013 Istanbul, Turkey, 2013-07)
      This letter investigates the average symbol error probability (ASEP) of pulse amplitude modulation and quadrature amplitude modulation coherent signaling over flat fading channels subject to additive white generalized Gaussian noise. The new ASEP results are derived in a generic closed-form in terms of the Fox H function and the bivariate Fox H function for the extended generalized-K fading case. The utility of this new general closed-form is that it includes some special fading distributions, like the Generalized-K, Nakagami-m, and Rayleigh fading and special noise distributions such as Gaussian and Laplacian. Some of these special cases are also treated and are shown to yield simplified results.
    • Exploiting Data Sparsity for Large-Scale Matrix Computations

      Akbudak, Kadir; Ltaief, Hatem; Mikhalev, Aleksandr; Charara, Ali; Keyes, David E. (2018-02-24)
      Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. The Hierarchical matrix Computations on Manycore Architectures (HiCMA) library tackles this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. HiCMA provides a high-performance implementation on distributed-memory systems of one of the most widely used matrix factorization in large-scale scientific applications, i.e., the Cholesky factorization. It employs the tile low-rank data format to compress the dense data-sparse off-diagonal tiles of the matrix. It then decomposes the matrix computations into interdependent tasks and relies on the dynamic runtime system StarPU for asynchronous out-of-order scheduling, while allowing high user-productivity. Performance comparisons and memory footprint on matrix dimensions up to eleven million show a performance gain and memory saving of more than an order of magnitude for both metrics on thousands of cores, against state-of-the-art open-source and vendor optimized numerical libraries. This represents an important milestone in enabling large-scale matrix computations toward solving big data problems in geospatial statistics for climate/weather forecasting applications.
    • Extreme Computing for Extreme Adaptive Optics: the Key to Finding Life Outside our Solar System

      Ltaief, Hatem; Sukkari, Dalal; Guyon, Olivier; Keyes, David E. (2018)
      The real-time correction of telescopic images in the search for exoplanets is highly sensitive to atmospheric aberrations. The pseudo- inverse algorithm is an efficient mathematical method to filter out these turbulences. We introduce a new partial singular value decomposition (SVD) algorithm based on QR-based Diagonally Weighted Halley (QDWH) iteration for the pseudo-inverse method of adaptive optics. The QDWH partial SVD algorithm selectively calculates the most significant singular values and their corresponding singular vectors. We develop a high performance implementation and demonstrate the numerical robustness of the QDWH-based partial SVD method. We also perform a benchmarking campaign on various generations of GPU hardware accelerators and compare against the state-of-the-art SVD implementation SGESDD from the MAGMA library. Numerical accuracy and performance results are reported using synthetic and real observational datasets from the Subaru telescope. Our implementation outperforms SGESDD by up to fivefold and fourfold performance speedups on ill-conditioned synthetic matrices and real observational datasets, respectively. The pseudo-inverse simulation code will be deployed on-sky for the Subaru telescope during observation nights scheduled early 2018.
    • Free-Space Optical Communications: Capacity Bounds, Approximations, and a New Sphere-Packing Perspective

      Chaaban, Anas; Morvan, Jean-Marie; Alouini, Mohamed-Slim (2015-04)
      The capacity of the intensity-modulation direct-detection (IM-DD) free-space optical channel is studied. It is shown that for an IM-DD channel with generally input-dependent noise, the worst noise at high SNR is input-independent Gaussian with variance dependent on the input cost. Based on this result, a Gaussian IM-DD channel model is proposed where the noise variance depends on the optical intensity constraints only. A new recursive approach for bounding the capacity of the channel based on sphere-packing is proposed, which leads to a tighter bound than an existing sphere-packing bound for the channel with only an average intensity constraint. Under both average and peak constraints, it yields bounds that characterize the high SNR capacity within a negligible gap, where the achievability is proved by using a truncated Gaussian input distribution. This completes the high SNR capacity characterization of the channel, by closing the gap in the existing characterization for a small average-to-peak ratio. Simple fitting functions that capture the best known achievable rate for the channel are provided. These functions can be of significant practical importance especially for the study of systems operating under atmospheric turbulence and misalignment conditions. Finally, the capacity/SNR loss between heterodyne detection (HD) systems and IM-DD systems is bounded at high SNR, where it is shown that the loss grows as SNR increases for a complex-valued HD system, while it is bounded by 1.245 bits or 3.76 dB at most for a real-valued one.
    • GraMi: Generalized Frequent Pattern Mining in a Single Large Graph

      Saeedy, Mohammed El; Kalnis, Panos (2011-11)
      Mining frequent subgraphs is an important operation on graphs. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs or protein-protein interaction in bioinformatics, are modeled as a single large graph. Interesting interactions in such applications may be transitive (e.g., friend of a friend). Existing methods, however, search for frequent isomorphic (i.e., exact match) subgraphs and cannot discover many useful patterns. In this paper the authors propose GRAMI, a framework that generalizes frequent subgraph mining in a large single graph. GRAMI discovers frequent patterns. A pattern is a graph where edges are generalized to distance-constrained paths. Depending on the definition of the distance function, many instantiations of the framework are possible. Both directed and undirected graphs, as well as multiple labels per vertex, are supported. The authors developed an efficient implementation of the framework that models the frequency resolution phase as a constraint satisfaction problem, in order to avoid the costly enumeration of all instances of each pattern in the graph. The authors also implemented CGRAMI, a version that supports structural and semantic constraints; and AGRAMI, an approximate version that supports very large graphs. The experiments on real data demonstrate that the authors framework is up to 3 orders of magnitude faster and discovers more interesting patterns than existing approaches.
    • A High Performance QDWH-SVD Solver using Hardware Accelerators

      Sukkari, Dalal E.; Ltaief, Hatem; Keyes, David E. (2015-04-08)
      This paper describes a new high performance implementation of the QR-based Dynamically Weighted Halley Singular Value Decomposition (QDWH-SVD) solver on multicore architecture enhanced with multiple GPUs. The standard QDWH-SVD algorithm was introduced by Nakatsukasa and Higham (SIAM SISC, 2013) and combines three successive computational stages: (1) the polar decomposition calculation of the original matrix using the QDWH algorithm, (2) the symmetric eigendecomposition of the resulting polar factor to obtain the singular values and the right singular vectors and (3) the matrix-matrix multiplication to get the associated left singular vectors. A comprehensive test suite highlights the numerical robustness of the QDWH-SVD solver. Although it performs up to two times more flops when computing all singular vectors compared to the standard SVD solver algorithm, our new high performance implementation on single GPU results in up to 3.8x improvements for asymptotic matrix sizes, compared to the equivalent routines from existing state-of-the-art open-source and commercial libraries. However, when only singular values are needed, QDWH-SVD is penalized by performing up to 14 times more flops. The singular value only implementation of QDWH-SVD on single GPU can still run up to 18% faster than the best existing equivalent routines. Integrating mixed precision techniques in the solver can additionally provide up to 40% improvement at the price of losing few digits of accuracy, compared to the full double precision floating point arithmetic. We further leverage the single GPU QDWH-SVD implementation by introducing the first multi-GPU SVD solver to study the scalability of the QDWH-SVD framework.