Al Farhan, Mohammed A.; Kaushik, Dinesh K.; Keyes, David E.(2017-03-13)[Poster]
Shared memory parallelization of PETSc-FUN3D, an unstructured tetrahedral mesh Euler code previously characterized for distributed memory Single Program, Multiple Data (SPMD) for thousands of nodes, is hybridized with shared memory Single Instruction, Multiple Data (SIMD) for hundreds of threads per node. We explore thread-level performance optimizations on state-of-the-art multi- and many-core Intel processors, including the second generation of Xeon Phi, Knights Landing (KNL). We study the performance on the KNL with different configurations of memory and cluster modes, with code optimizations to minimize indirect addressing and enhance the cache locality. The optimizations employed are expected to be of value other unstructured applications as many-core architecture evolves.
Hamam, Alwaleed A.; Khan, Ayaz H.(2017-03-13)[Poster]
Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.
We investigate the problem of secure broadcasting over fast fading channels with imperfect main channel state information (CSI) at the transmitter. In particular, we analyze the effect of the noisy estimation of the main CSI on the throughput of a broadcast channel where the transmission is intended for multiple legitimate receivers in the presence of an eavesdropper. Besides, we consider the realistic case where the transmitter is only aware of the statistics of the eavesdropper's CSI and not of its channel's realizations. First, we discuss the common message transmission case where the source broadcasts the same information to all the receivers, and we provide an upper and a lower bounds on the ergodic secrecy capacity. For this case, we show that the secrecy rate is limited by the legitimate receiver having, on average, the worst main channel link and we prove that a non-zero secrecy rate can still be achieved even when the CSI at the transmitter is noisy. Then, we look at the independent messages case where the transmitter broadcasts multiple messages to the receivers, and each intended user is interested in an independent message. For this case, we present an expression for the achievable secrecy sum-rate and an upper bound on the secrecy sum-capacity and we show that, in the limit of large number of legitimate receivers K, our achievable secrecy sum-rate follows the scaling law log((1-a ) log(K)), where is the estimation error variance of the main CSI. The special cases of high SNR, perfect and no-main CSI are also analyzed. Analytical derivations and numerical results are presented to illustrate the obtained expressions for the case of independent and identically distributed Rayleigh fading channels.
Peter, Daniel; Rietmann, Max; Galvez, Percy; Ampuero, Jean Paul(2017-03-13)[Poster]
High-resolution seismic wave simulations often require local refinements in numerical meshes to accurately capture e.g. steep topography or complex fault geometry. Together with explicit time schemes, this dramatically reduces the global time step size for ground-motion simulations due to numerical stability conditions. To alleviate this problem, local time stepping (LTS) algorithms allow an explicit time stepping scheme to adapt the time step to the element size, allowing nearoptimal time steps everywhere in the mesh. This can potentially lead to significantly faster simulation runtimes.
The export option will allow you to export the current search results of the entered query to a file. Different
formats are available for download. To export the items, click on the button corresponding with the preferred download format.
By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.
For anonymous users the allowed maximum amount is 50 search results.
To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export.
The amount of items that can be exported at once is similarly restricted as the full export.
After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.