Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/
Recent Submissions

Uniting Heterogeneity, Inductiveness, and Efficiency for Graph Representation Learning(IEEE Transactions on Knowledge and Data Engineering, IEEE, 20210727) [Article]Recently, bearing the message passing paradigm, graph neural networks(GNNs) have greatly advanced the performance of node representation learning on graphs. However, a majority class of GNNs are only designed for homogeneous graphs, leading to inferior adaptivity to the more informative heterogeneous graphs with various types of nodes and edges. Also, despite the necessity of inductively producing representations for completely new nodes (e.g., in streaming scenarios), few heterogeneous GNNs can bypass the transductive learning scheme where all nodes must be known during training. Furthermore, the training efficiency of most heterogeneous GNNs has been hindered by their sophisticated designs for extracting the semantics associated with each meta path or relation. In this paper, we propose a wide and deep message passing network (WIDEN) to cope with the aforementioned problems about heterogeneity, inductiveness, and efficiency that are rarely investigated together in graph representation learning. In WIDEN, we propose a novel inductive, meta pathfree message passing scheme that packs up heterogeneous node features with their associated edges from both low and highorder neighbor nodes. To further improve the training efficiency, we innovatively present an active downsampling strategy that drops unimportant neighbor nodes to facilitate faster information propagation.

All ScreenPrinted, PolymerNanowire Based Foldable Electronics for mmWave Applications(Advanced Materials Technologies, Wiley, 20210726) [Article]With the surge in devices for Internet of Things (IoT) applications, there is great interest in flexible electronics to be mass manufactured at lower costs. Screenprinting is wellknown for mass manufacturing, however, this method has mostly focused on printing metallic patterns. Rare efforts have been devoted to print substrates for high frequency (mmwave) electronics, which requires low dielectric loss to ensure a decent system efficiency. This paper presents a novel screenprintable composite ink comprising of acrylonitrilebutadienestyrene and ceramic particles, through which, dielectric substrates with various thicknesses (down to few microns), lateral dimensions, and relative permittivities can be printed. A low dielectric loss of 0.0063 at 28 GHz (fifth generation (5G) communication band) makes the substrates suitable for mmwave electronics. A custom silver nanowires based screenprintable ink is utilized for metallic printing to provide high conductivity (3.4 × 106 S m1) and stable electrical response under bent or folded conditions. As a proof of concept for fully printed mmwave electronics, a flexible quasiYagi antenna operating at 5G band (26.5–29.5 GHz) is demonstrated that exhibits decent performance in flat as well as bent conditions, confirming the suitability of the material system and printing processes for mass production of IoT and wearable electronics.

Landslide size matters: A new datadriven, spatial prototype(Engineering Geology, Elsevier BV, 20210724) [Article]The standard definition of landslide hazard requires the estimation of where, when (or how frequently) and how large a given landslide event may be. The geoscientific community involved in statistical models has addressed the component pertaining to how large a landslide event may be by introducing the concept of landslideevent magnitude scale. This scale, which depends on the planimetric area of the given population of landslides, in analogy to the earthquake magnitude, has been expressed with a single value per landslide event. As a result, the geographic or spatiallydistributed estimation of how large a population of landslide may be when considered at the slope scale, has been disregarded in statisticallybased landslide hazard studies. Conversely, the estimation of the landslide extent has been commonly part of physicallybased applications, though their implementation is often limited to very small regions. In this work, we initially present a review of methods developed for landslide hazard assessment since its first conception decades ago. Subsequently, we introduce for the first time a statisticallybased model able to estimate the planimetric area of landslides aggregated per slope units. More specifically, we implemented a Bayesian version of a Generalized Additive Model where the maximum landslide size per slope unit and the sum of all landslide sizes per slope unit are predicted via a LogGaussian model. These “max” and “sum” models capture the spatial distribution of (aggregated) landslide sizes. We tested these models on a global dataset expressing the distribution of coseismic landslides due to 24 earthquakes across the globe. The two models we present are both evaluated on a suite of performance diagnostics that suggest our models suitably predict the aggregated landslide extent per slope unit. In addition to a complex procedure involving variable selection and a spatial uncertainty estimation, we built our model over slopes where landslides triggered in response to seismic shaking, and simulated the expected failing surface over slopes where the landslides did not occur in the past. What we achieved is the first statisticallybased model in the literature able to provide information about the extent of the failed surface across a given landscape. This information is vital in landslide hazard studies and should be combined with the estimation of landslide occurrence locations. This could ensure that governmental and territorial agencies have a complete probabilistic overview of how a population of landslides could behave in response to a specific trigger. The predictive models we present are currently valid only for the 25 cases we tested. Statistically estimating landslide extents is still at its infancy stage. Many more applications should be successfully validated before considering such models in an operational way. For instance, the validity of our models should still be verified at the regional or catchment scale, as much as it needs to be tested for different landslide types and triggers. However, we envision that this new spatial predictive paradigm could be a breakthrough in the literature and, in time, could even become part of official landslide risk assessment protocols.

Aerial Swarms: Recent Applications and Challenges(Current Robotics Reports, 20210723) [Article]Purpose of Review Currently, there is a large body of research on multiagent systems addressing their different system theoretic aspects. Aerial swarms as one type of multiagent robotic systems have recently gained huge interest due to their potential applications. However, aerial robot groups are complex multidisciplinary systems and usually research works focus on specific system aspects for particular applications. The purpose of this review is to provide an overview of the main motivating applications that drive the majority of research works in this field, and summarize fundamental and common algorithmic components required for their development. Recent Findings Most system demonstrations of current aerial swarms are based on simulations, some have shown experiments using few 10 s of robots in controlled indoor environment, and limited number of works have reported outdoor experiments with small number of autonomous aerial vehicles. This indicates scalability issues of current swarm systems in real world environments. This is mainly due to the limited confidence on the individual robot’s localization, swarmlevel relative localization, and the rate of exchanged information between the robots that is required for planning safe coordinated motions. Summary This paper summarizes the main motivating aerial swarm applications and the associated research works. In addition, the main research findings of the core elements of any aerial swarm system, state estimation and mission planning, are also presented. Finally, this paper presents a proposed abstraction of an aerial swarm system architecture that can help developers understand the main required modules of such systems.

Atomistic origin of compositional pulling effect in wurtzite (B, Al, In)xGa1−xN: A firstprinciples study(Journal of Applied Physics, AIP Publishing, 20210721) [Article]Some fluctuations in composition are commonly observed in epitaxialgrown IIIV multinary alloys. These fluctuations are attributed to compositional pulling effects, and an insight into their atomistic origin is necessary to improve current epitaxial growth techniques. In addition, the crystallinity of IIIV multinary alloys varies widely depending on the constituent atoms. Using firstprinciples calculations, we then investigated different geometric configurations of gallium nitride (GaN)based ternary alloy, X0.125Ga0.875N where X is the minority atom which is boron (B), aluminum (Al), or indium (In). The minority atoms are presented as two atoms in the simulation cell, and the energetics of five geometric configurations are analyzed to estimate the most stable configuration. For the B0.125Ga0.875N alloy, the most stable configuration is the one where the minority atoms occupy gallium (Ga) sites in a collinear orientation along the caxis. On the contrary, the configurations along the inplane direction result in a higher energy state. In0.125Ga0.875N and Al0.125Ga0.875N also show the same trend with a small relative energy difference. These preferential sites of minority atoms are consistent with composition pulling effects in wurtzite nitride phases. Moreover, the degree of crystallinity for wurtzite nitride alloys can be well described by the order of calculated relative energy.

Reinfection with different SARSCoV2 clade and prolonged viral shedding in a patient with hematopoietic stem cell transplantation: SARSCoV2 Reinfection with different clade.(International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases, Elsevier BV, 20210721) [Article]Immunocompromised patients who have SARSCoV2 infection pose many clinical and public health challenges. We describe a patient with hematopoietic stem cell transplantation and lymphoma with protracted illness requiring 3 consecutive hospital admissions. Whole genome sequencing confirmed two different SARSCoV2 clades. Clinical management issues, and unanswered questions arising are discussed.

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression(arXiv, 20210720) [Preprint]Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradienttype methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent (Nesterov, 2004) and Adam (Kingma and Ba, 2014). In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} $O\bigg(\sqrt{\Big(1+\sqrt{\frac{\omega^3}{n}}\Big)\frac{L}{\epsilon}} + \omega\big(\frac{1}{\epsilon}\big)^{\frac{1}{3}}\bigg)$, which improves upon the stateoftheart nonaccelerated rate $O\left((1+\frac{\omega}{n})\frac{L}{\epsilon} + \frac{\omega^2+n}{\omega+n}\frac{1}{\epsilon}\right)$ of DIANA (Khaled et al., 2020b) for distributed general convex problems, where $\epsilon$ is the target error, $L$ is the smooth parameter of the objective, $n$ is the number of machines/devices, and $\omega$ is the compression parameter (larger $\omega$ means more compression can be applied, and no compression implies $\omega=0$). Our results show that as long as the number of devices $n$ is large (often true in distributed/federated learning), or the compression $\omega$ is not very high, CANITA achieves the faster convergence rate $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$, i.e., the number of communication rounds is $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$ (vs. $O\big(\frac{L}{\epsilon}\big)$ achieved by previous works). As a result, CANITA enjoys the advantages of both compression (compressed communication in each round) and acceleration (much fewer communication rounds).

Decision trees based on 1consequences(Discrete Applied Mathematics, Elsevier BV, 20210720) [Article]In this paper, we study arbitrary infinite binary information systems each of which consists of an infinite set of elements and an infinite set of twovalued nonconstant functions (attributes) defined on the set of elements. We consider the notion of a problem over information system, which is described by a finite number of attributes: for a given element, we should determine values of these attributes. As algorithms for problem solving, we study decision trees that use arbitrary attributes from the considered infinite set of attributes and solve the problem based on 1consequences. In such a tree, we take into account consequences each of which follows from one equation of the kind “attribute value” obtained during the decision tree work and ignore consequences that can be derived only from at least two equations. As time complexity, we study the depth of decision trees. We prove that in the worst case, with the growth of the number of attributes in the problem description, the minimum depth of decision trees based on 1consequences grows either as a logarithm or linearly.

DMILIsoFun: predicting isoform function using deep multiinstance learning.(Bioinformatics (Oxford, England), Oxford University Press (OUP), 20210720) [Article]MotivationAlternative splicing creates the considerable proteomic diversity and complexity on relatively limited genome. Proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions of this gene, which reflect the functional knowledge of genes at a finer granular level. Recently, some computational approaches have been proposed to differentiate isoform functions using sequence and expression data. However, their performance is far from being desirable, mainly due to the imbalance and lack of annotations at isoformlevel, and the difficulty of modeling geneisoform relations.ResultWe propose a deep multiinstance learning based framework (DMILIsoFun) to differentiate the functions of isoforms. DMILIsoFun firstly introduces a multiinstance learning convolution neural network trained with isoform sequences and genelevel annotations to extract the feature vectors and initialize the annotations of isoforms, and then uses a classimbalance Graph Convolution Network to refine the annotations of individual isoforms based on the isoform coexpression network and extracted features. Extensive experimental results show that DMILIsoFun improves the Smin and Fmax of stateoftheart solutions by at least 29.6% and 40.8%. The effectiveness of DMILIsoFun is further confirmed on a testbed of human multipleisoform genes, and Maize isoforms related with photosynthesis.AvailabilityThe code and data are available at http://www.sduidea.cn/codes.php?name=DMILIsofun.Supplementary informationSupplementary data are available at Bioinformatics online.

BICNet: A Bayesian Approach for Estimating Task Effects on Intrinsic Connectivity Networks in fMRI Data(arXiv, 20210719) [Preprint]Intrinsic connectivity networks (ICNs) are specific dynamic functional brain networks that are consistently found under various conditions including rest and task. Studies have shown that some stimuli actually activate intrinsic connectivity through either suppression, excitation, moderation or modification. Nevertheless, the structure of ICNs and taskrelated effects on ICNs are not yet fully understood. In this paper, we propose a Bayesian Intrinsic Connectivity Network (BICNet) model to identify the ICNs and quantify the taskrelated effects on the ICN dynamics. Using an extended Bayesian dynamic sparse latent factor model, the proposed BICNet has the following advantages: (1) it simultaneously identifies the individual ICNs and grouplevel ICN spatial maps; (2) it robustly identifies ICNs by jointly modeling restingstate functional magnetic resonance imaging (rfMRI) and taskrelated functional magnetic resonance imaging (tfMRI); (3) compared to independent component analysis (ICA)based methods, it can quantify the difference of ICNs amplitudes across different states; (4) it automatically performs feature selection through the sparsity of the ICNs rather than adhoc thresholding. The proposed BICNet was applied to the rfMRI and language tfMRI data from the Human Connectome Project (HCP) and the analysis identified several ICNs related to distinct language processing functions.

Fire in paradise: mesoscale simulation of wildfires(ACM Transactions on Graphics, Association for Computing Machinery (ACM), 20210719) [Article]Resulting from changing climatic conditions, wildfires have become an existential threat across various countries around the world. The complex dynamics paired with their often rapid progression renders wildfires an often disastrous natural phenomenon that is difficult to predict and to counteract. In this paper we present a novel method for simulating wildfires with the goal to realistically capture the combustion process of individual trees and the resulting propagation of fires at the scale of forests. We rely on a stateoftheart modeling approach for largescale ecosystems that enables us to represent each plant as a detailed 3D geometric model. We introduce a novel mathematical formulation for the combustion process of plants  also considering effects such as heat transfer, char insulation, and mass loss  as well as for the propagation of fire through the entire ecosystem. Compared to other wildfire simulations which employ geometric representations of plants such as cones or cylinders, our detailed 3D tree models enable us to simulate the interplay of geometric variations of branching structures and the dynamics of fire and wood combustion. Our simulation runs at interactive rates and thereby provides a convenient way to explore different conditions that affect wildfires, ranging from terrain elevation profiles and ecosystem compositions to various measures against wildfires, such as cutting down trees as firebreaks, the application of fire retardant, or the simulation of rain.

Multivariate ConwayMaxwellPoisson Distribution: Sarmanov Method and DoublyIntractable Bayesian Inference(arXiv, 20210715) [Preprint]In this paper, a multivariate count distribution with ConwayMaxwell (COM)Poisson marginals is proposed. To do this, we develop a modification of the Sarmanov method for constructing multivariate distributions. Our multivariate COMPoisson (MultCOMP) model has desirable features such as (i) it admits a flexible covariance matrix allowing for both negative and positive nondiagonal entries; (ii) it overcomes the limitation of the existing bivariate COMPoisson distributions in the literature that do not have COMPoisson marginals; (iii) it allows for the analysis of multivariate counts and is not just limited to bivariate counts. Inferential challenges are presented by the likelihood specification as it depends on a number of intractable normalizing constants involving the model parameters. These obstacles motivate us to propose a Bayesian inferential approach where the resulting doublyintractable posterior is dealt with via the exchange algorithm and the Grouped Independence MetropolisHastings algorithm. Numerical experiments based on simulations are presented to illustrate the proposed Bayesian approach. We analyze the potential of the MultCOMP model through a real data application on the numbers of goals scored by the home and away teams in the Premier League from 2018 to 2021. Here, our interest is to assess the effect of a lack of crowds during the COVID19 pandemic on the wellknown home team advantage. A MultCOMP model fit shows that there is evidence of a decreased number of goals scored by the home team, not accompanied by a reduced score from the opponent. Hence, our analysis suggests a smaller home team advantage in the absence of crowds, which agrees with the opinion of several football experts.

Voltage Controlled Domain Wall Motion based Neuron and Stochastic Magnetic Tunnel Junction Synapse for Neuromorphic Computing Applications(Institute of Electrical and Electronics Engineers (IEEE), 20210715) [Preprint]The present work discusses the proposal of a spintronic neuromorphic system with spin orbit torque driven domain wall motionbased neuron and synapse. We propose a voltagecontrolled magnetic anisotropy domain wall motion based magnetic tunnel junction neuron. We investigate how the electric field at the gate (pinning site), generated by the voltage signals from preneurons, modulates the domain wall motion, which reflects in the nonlinear switching behaviour of neuron magnetization. For the implementation of synaptic weights, we propose 3terminal MTJ with stochastic domain wall motion in the free layer. We incorporate intrinsic pinning effects by creating triangular notches on the sides of the free layer. The pinning of domain wall and intrinsic thermal noise of device lead to the stochastic behaviour of domain wall motion. The control of this stochasticity by the spin orbit torque is shown to realize the potentiation and depression of the synaptic weight. The micromagnetics and spin transport studies in synapse and neuron are carried out by developing a coupled micromagnetic NonEquilibrium Green’s Function (MuMagNEGF) model. The minimization of the writing current pulse width by leveraging the thermal noise and demagnetization energy is also presented. Finally, we discuss the implementation of digit recognition by the proposed system using a spike time dependent algorithm.

Bayesian calibration of order and diffusivity parameters in a fractional diffusion equation(Journal of Physics Communications, IOP Publishing, 20210715) [Article]This work focuses on parameter calibration of a variablediffusivity fractional diffusion model. A random, spatiallyvarying diffusivity field with lognormal distribution is considered. The variance and correlation length of the diffusivity field are considered uncertain parameters, and the order of the fractional subdiffusion operator is also taken uncertain and uniformly distributed in the range (0,1). A KarhunenLo`eve (KL) decomposition of the random diffusivity field is used, leading to a stochastic problem defined in terms of a finite number of canonical random variables. Polynomial chaos (PC) techniques are used to express the dependence of the stochastic solution on these random variables. A nonintrusive methodology is used, and a deterministic finitedifference solver of the fractional diffusion model is utilized for this purpose. The PC surrogates are first used to assess the sensitivity of quantities of interest (QoIs) to uncertain inputs and to examine their statistics. In particular, the analysis indicates that the fractional order has a dominant effect on the variance of the QoIs considered, followed by the leading KL modes. The PC surrogates are further exploited to calibrate the uncertain parameters using a Bayesian methodology. Different setups are considered, including distributed and localized forcing functions and data consisting of either noisy observations of the solution or its first moments. In the broad range of parameters addressed, the analysis shows that the uncertain parameters having a significant impact on the variance of the solution can be reliably inferred, even from limited observations.

Sex ratio at birth in Vietnam among six subnational regions during 1980–2050, estimation and probabilistic projection using a Bayesian hierarchical time series model with 2.9 million birth records(PLOS ONE, Public Library of Science (PLoS), 20210714) [Article]The sex ratio at birth (SRB, i.e., the ratio of male to female births) in Vietnam has been imbalanced since the 2000s. Previous studies have revealed a rapid increase in the SRB over the past 15 years and the presence of important variations across regions. More recent studies suggested that the nation’s SRB may have plateaued during the 2010s. Given the lack of exhaustive birth registration data in Vietnam, it is necessary to estimate and project levels and trends in the regional SRBs in Vietnam based on a reproducible statistical approach. We compiled an extensive database on regional Vietnam SRBs based on all publicly available surveys and censuses and used a Bayesian hierarchical time series mixture model to estimate and project SRB in Vietnam by region from 1980 to 2050. The Bayesian model incorporates the uncertainties from the observations and yearbyyear natural fluctuation. It includes a binary parameter to detect the existence of sex ratio transitions among Vietnamese regions. Furthermore, we model the SRB imbalance using a trapezoid function to capture the increase, stagnation, and decrease of the sex ratio transition by Vietnamese regions. The model results show that four out of six Vietnamese regions, namely, Northern Midlands and Mountain Areas, Northern Central and Central Coastal Areas, Red River Delta, and South East, have existing sex imbalances at birth. The rise in SRB in the Red River Delta was the fastest, as it took only 12 years and was more pronounced, with the SRB reaching the local maximum of 1.146 with a 95% credible interval (1.129, 1.163) in 2013. The model projections suggest that the current decade will record a sustained decline in sex imbalances at birth, and the SRB should be back to the national SRB baseline level of 1.06 in all regions by the mid2030s.

A Field Guide to Federated Optimization(arXiv, 20210714) [Preprint]Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer realworld performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.

Spatial cluster detection with threshold quantile regression(Environmetrics, Wiley, 20210713) [Article]Spatial cluster detection, which is the identification of spatial units adjacent in space associated with distinctive patterns of data of interest relative to background variation, is useful for discerning spatial heterogeneity in regression coefficients. Some real studies with regressionbased models on air quality data show that there exists not only spatial heterogeneity but also heteroscedasticity between air pollution and its predictors. Since the low air quality is a wellknown risk factor for mortality, various cardiopulmonary diseases, and preterm birth, the analysis at the tail would be of more interest than the center of air pollution distribution. In this article, we develop a spatial cluster detection approach using a threshold quantile regression model to capture the spatial heterogeneity and heteroscedasticity. We introduce two threshold variables in the quantile regression model to define a spatial cluster. The proposed test statistic for identifying the spatial cluster is the supremum of the Wald process over the space of threshold parameters. We establish the limiting distribution of the test statistic under the null hypothesis that the quantile regression coefficient is the same over the entire spatial domain at the given quantile level. The performance of our proposed method is assessed by simulation studies. The proposed method is also applied to analyze the particulate matter (PM 2.5 ) concentration and aerosol optical depth (AOD) data in the Northeastern United States in order to study geographical heterogeneity in the association between AOD and PM 2.5 at different quantile levels.

Optimal Decentralized Algorithms for Saddle Point Problems over TimeVarying Networks(arXiv, 20210713) [Preprint]Decentralized optimization methods have been in the focus of optimization community due to their scalability, increasing popularity of parallel algorithms and many applications. In this work, we study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network. The network topology may change from time to time, which models realworld network malfunctions. We obtain lower complexity bounds for algorithms in this setup and develop optimal methods which meet the lower bounds.

Customized Summarizations of Visual Data Collections(Computer Graphics Forum, Wiley, 20210712) [Article]We propose a framework to generate customized summarizations of visual data collections, such as collections of images, materials, 3D shapes, and 3D scenes. We assume that the elements in the visual data collections can be mapped to a set of vectors in a feature space, in which a fitness score for each element can be defined, and we pose the problem of customized summarizations as selecting a subset of these elements. We first describe the design choices a user should be able to specify for modeling customized summarizations and propose a corresponding user interface. We then formulate the problem as a constrained optimization problem with binary variables and propose a practical and fast algorithm based on the alternating direction method of multipliers (ADMM). Our results show that our problem formulation enables a wide variety of customized summarizations, and that our solver is both significantly faster than stateoftheart commercial integer programming solvers and produces better solutions than fast relaxationbased solvers.

DESTcell is a knowledgebase for exploring immunologyrelated literature(Scientific Reports, Springer Science and Business Media LLC, 20210712) [Article]AbstractTcells are a subtype of white blood cells circulating throughout the body, searching for infected and abnormal cells. They have multifaceted functions that include scanning for and directly killing cells infected with intracellular pathogens, eradicating abnormal cells, orchestrating immune response by activating and helping other immune cells, memorizing encountered pathogens, and providing longlasting protection upon recurrent infections. However, Tcells are also involved in immune responses that result in organ transplant rejection, autoimmune diseases, and some allergic diseases. To support Tcell research, we developed the DESTcell knowledgebase (KB). This KB incorporates text and datamined information that can expedite retrieval and exploration of Tcell relevant information from the large volume of published Tcellrelated research. This KB enables exploration of data through concepts from 15 topicspecific dictionaries, including immunologyrelated genes, mutations, pathogens, and pathways. We developed three case studies using DESTcell, one of which validates effective retrieval of known associations by DESTcell. The second and third case studies focuses on concepts that are common to Grave’s disease (GD) and Hashimoto’s thyroiditis (HT). Several reports have shown that up to 20% of GD patients treated with antithyroid medication develop HT, thus suggesting a possible conversion or shift from GD to HT disease. DESTcell found miR4442 links to both GD and HT, and that miR4442 possibly targets the autoimmune disease risk factor CD6, which provides potential new knowledge derived through the use of DESTcell. According to our understanding, DESTcell is the first KB dedicated to exploring Tcellrelevant information via literaturemining, datamining, and topicspecific dictionaries.