Computer Science Program

Permanent URI for this collection

For more information visit: https://cemse.kaust.edu.sa/cs

Browse

Recent Submissions

Now showing 1 - 5 of 2829
  • Article

    Editorial: Clinical risk assessment and intervention of gastrointestinal tumors driven by big-data

    (Frontiers Media SA, 2024-02-27) Zhang, Nan; Wang, Wei; Gao, Xin; Gao, Feng; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Department of Gastroenterology, The First Hospital of Jilin University, Changchun, China; Department of Pathology, First Affiliated Hospital of Anhui Medical University, Hefei, China; Department of General Surgery, Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China; Shanghai Artificial Intelligence Laboratory, Shanghai, China

    The molecular mechanisms underlying tumor development are highly intricate. For gastrointestinal tumors, comprehending and controlling the heterogeneous molecular mechanisms may unveil novel therapeutic targets (1, 2). In recent years, the cancer field, including gastrointestinal tumors, has generated an abundance of molecular and phenotypic data. Facilitated by high-throughput technologies, a vast amount of genomic data has rapidly accumulated, giving rise to the era of cancer “big data.” These cancer datasets continue to expand globally, fueled by investments in the research community (3). The utilization and development of extensive databases play a pivotal role in establishing a sturdy framework for deciphering the molecular mechanisms involved in tumor initiation and progression, while concurrently exploring innovative approaches to diagnosis and treatment. Within this context, it is possible to achieve precise assessment of clinical risks using bioinformatics and identify suitable biomarkers through multi-omics data. This enables personalized and precise diagnosis and treatment for patients with gastrointestinal tumors. The present study compiles some of the outcomes in these areas, offering new insights and perspectives for the diagnosis, prognosis, and individualized treatment of gastrointestinal tumors.

  • Preprint

    Streamlining in the Riemannian Realm: Efficient Riemannian Optimization with Loopless Variance Reduction

    (arXiv, 2024-03-11) Demidovich, Yury; Malinovsky, Grigory; Richtarik, Peter; King Abdullah University of Science and Technology (KAUST) Thuwal, Saudi Arabia; Computer Science Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Visual Computing Center (VCC)

    In this study, we investigate stochastic optimization on Riemannian manifolds, focusing on the crucial variance reduction mechanism used in both Euclidean and Riemannian settings. Riemannian variance-reduced methods usually involve a double-loop structure, computing a full gradient at the start of each loop. Determining the optimal inner loop length is challenging in practice, as it depends on strong convexity or smoothness constants, which are often unknown or hard to estimate. Motivated by Euclidean methods, we introduce the Riemannian Loopless SVRG (R-LSVRG) and PAGE (R-PAGE) methods. These methods replace the outer loop with probabilistic gradient computation triggered by a coin flip in each iteration, ensuring simpler proofs, efficient hyperparameter selection, and sharp convergence guarantees. Using R-PAGE as a framework for non-convex Riemannian optimization, we demonstrate its applicability to various important settings. For example, we derive Riemannian MARINA (R-MARINA) for distributed settings with communication compression, providing the best theoretical communication complexity guarantees for non-convex distributed optimization over Riemannian manifolds. Experimental results support our theoretical findings.

  • Preprint

    Task-Oriented GNNs Training on Large Knowledge Graphs for Accurate and Efficient Modeling

    (arXiv, 2024-03-09) Abdallah, Hussein; Afandi, Waleed; Kalnis, Panos; Mansour, Essam; Computer Science Program; Extreme Computing Research Center; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Concordia University

    A Knowledge Graph (KG) is a heterogeneous graph encompassing a diverse range of node and edge types. Heterogeneous Graph Neural Networks (HGNNs) are popular for training machine learning tasks like node classification and link prediction on KGs. However, HGNN methods exhibit excessive complexity influenced by the KG's size, density, and the number of node and edge types. AI practitioners handcraft a subgraph of a KG G relevant to a specific task. We refer to this subgraph as a task-oriented subgraph (TOSG), which contains a subset of task-related node and edge types in G. Training the task using TOSG instead of G alleviates the excessive computation required for a large KG. Crafting the TOSG demands a deep understanding of the KG's structure and the task's objectives. Hence, it is challenging and time-consuming. This paper proposes KG-TOSA, an approach to automate the TOSG extraction for task-oriented HGNN training on a large KG. In KG-TOSA, we define a generic graph pattern that captures the KG's local and global structure relevant to a specific task. We explore different techniques to extract subgraphs matching our graph pattern: namely (i) two techniques sampling around targeted nodes using biased random walk or influence scores, and (ii) a SPARQL-based extraction method leveraging RDF engines' built-in indices. Hence, it achieves negligible preprocessing overhead compared to the sampling techniques. We develop a benchmark of real KGs of large sizes and various tasks for node classification and link prediction. Our experiments show that KG-TOSA helps state-of-the-art HGNN methods reduce training time and memory usage by up to 70% while improving the model performance, e.g., accuracy and inference time.

  • Preprint

    LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression

    (arXiv, 2024-03-07) Condat, Laurent Pierre; Maranjyan, Artavazd; Richtarik, Peter; SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI); AI Initiatives; Computer Science Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Visual Computing Center (VCC)

    In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical. We introduce LoCoDL, a communication-efficient algorithm that leverages the two popular and effective techniques of Local training, which reduces the communication frequency, and Compression, in which short bitstreams are sent instead of full-dimensional vectors of floats. LoCoDL works with a large class of unbiased compressors that includes widely-used sparsification and quantization methods. LoCoDL provably benefits from local training and compression and enjoys a doubly-accelerated communication complexity, with respect to the condition number of the functions and the model dimension, in the general heterogenous regime with strongly convex functions. This is confirmed in practice, with LoCoDL outperforming existing algorithms.

  • Preprint

    Complexity of Deterministic and Strongly Nondeterministic Decision Trees for Decision Tables from Closed Classes

    (Elsevier BV, 2023-10-17) Ostonov, Azimkhon; Moshkov, Mikhail; King Abdullah University of Science and Technology; Computer Science Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division; Applied Mathematics and Computational Science Program; Computational Bioscience Research Center (CBRC)

    In this paper, we consider classes of decision tables with 0-1-decisions closed relative to removal of attributes (columns) and changing decisions assigned to rows. For tables from an arbitrary closed class, we study the dependence of the minimum complexity of deterministic decision trees on various parameters of the tables: the minimum complexity of a test, the complexity of the set of attributes attached to columns, and the minimum complexity of a strongly nondeterministic decision tree. We also study the dependence of the minimum complexity of strongly nondeterministic decision trees on the complexity of the set of attributes attached to columns. Note that a strongly nondeterministic decision tree can be interpreted as a set of true decision rules that cover all rows labeled with the decision 1.