The mission of the University of Kentucky (UK) Center for Computational Sciences (CCS) is to enable and enhance the success of our faculty and staff in their pursuit of computational research and education by providing access to leading computational resources and the necessary support services to utilize them effectively.  Research Computing and Data (RCD) technologies encompassing high-performance computing, high-throughput computing, computational workflows, AI computing, and Big Data collection, storage, and processing are increasingly important as a critical means to expand our knowledge about the world in every area of human endeavor.

To bring the CCS research computing ecosystem closer to the research community, to educate researchers about its capabilities and its ever-expanding range of applications, and to promote and nurture its role in advancing arts, humanities, science and engineering at the University of Kentucky and beyond, CCS will offer a new Research Computing and Data Seminar Series.

The CCS RCD series will begin in the Spring 2024 semester and consist of eight meetings spaced throughout the semester. The series will include presentations providing an overview of the CCS and its role and resources, practicums and tutorials to help computational researchers become effective users of recently deployed computing and data resources, as well as talks by researchers from a variety of disciplines discussing their computational research, with a particular emphasis on RCD “know-how” they find critically important in their work. We encourage everyone interested in using RCD  to support their research or educational work to attend.


The meetings will be held in the theater located in the Davis Marksbury Building, home to the Computer Science Department. They will start at 3:00 pm with refreshments; the presentations will follow, starting at 3:30 pm.

All presentations to be given by non-CCS staff will be scheduled for 25 minutes, with about 10 minutes for discussion. The presentations are expected to be given in person.


Zoom access information:

https://uky.zoom.us/s/84474671604

The next seminar:

April 16, 2024
3:00 pm – Refreshments
3:30 pm – Presentation

Speakers:
(1) Chad Risko, Department of Chemistry, University of Kentucky

(2) Hunter Moseley, Department of Molecular and Cellular Biochemistry, University of Kentucky

Title:
(1) Towards Machine-driven Discovery of Organic Materials

(2) A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement

Abstract:
(1) There is significant interest in the development of organic materials for applications that span new generations of electronic, optical, and energy generation and storage technologies. The chemical space to be explored for these materials, however, is tremendously large, and at the same time it can often be difficult to derive clear chemical building block-to-material structure–property relationships. As these hurdles have served as significant impediments to the commercial adoption of organic materials in these areas, there is growing interest in using computers and automation to aid in materials design and discovery. Here we will discuss recent advances in the development and use of high-throughput computational protocols, data infrastructures, and machine learning (ML) approaches that offer the potential to explore the wide and varied chemistries of organic materials.

(2) The mapping of metabolite-specific data to pathways within cellular metabolism is a major data analysis step needed for biochemical interpretation. A variety of machine learning approaches, particularly deep learning approaches, have been used to predict these metabolite-to-pathway mappings, utilizing a training dataset of known metabolite-to-pathway mappings. A few such training datasets have been derived from the Kyoto Encyclopedia of Gene and Genomes (KEGG). However, several prior published machine learning approaches utilized an erroneous KEGG-derived training dataset that used SMILES molecular representations strings (KEGG-SMILES dataset) and contained a sizable proportion (∼26%) duplicate entries. The presence of so many duplicates taint the training and testing sets generated from k-fold cross-validation of the KEGG-SMILES dataset. Therefore, the k-fold cross-validation performance of the resulting machine learning models was grossly inflated by the erroneous presence of these duplicate entries. Here we describe and evaluate the KEGG-SMILES dataset so that others may avoid using it. We also identify the prior publications that utilized this erroneous KEGG-SMILES dataset so their machine learning results can be properly and critically evaluated. In addition, we demonstrate the reduction of model k-fold cross-validation performance after de-duplicating the KEGG-SMILES dataset. This is a cautionary tale about properly vetting prior published benchmark datasets before using them in machine learning approaches. We also present a new benchmark dataset for training and predicting metabolic pathway involvement as well as a new dataset and model design with superior performance. In addition, we present a new tool for troubleshooting and optimizing GPU-utilizing methods within a high performance computing environment.

The CCS RCD Seminar Series schedule for Spring 2024:

January 30, 2024
Jim Griffioen, CCS and Computer Science Department, University of Kentucky
Slides

February 13, 2024
(1) Ted Kalbfleisch, Department of Veterinary Sciences, University of Kentucky
Slides

(2) Christopher Crawford, Department of Physics, University of Kentucky
Slides

February 27, 2024
(1) Tyler Burkett, UK ITSRCD, University of Kentucky
Slides

(2) Helene Gold & Isaac Wink, UK Libraries, University of Kentucky
Slides

March 19, 2024
(1) Chang-Guo Zhan, Department of Pharmaceutical Sciences, University of Kentucky
Slides

(2) Nicolas Teets, Department of Entomology, University of Kentucky
Slides

April 02, 2024 *** POSTPONED to April 30, 2024 ***
(1) Barry Farmer, CCS, University of Kentucky
(2) Satrio Husodo, ITS RCI, University of Kentucky
(3) Vikram Gazula, CCS, University of Kentucky

April 16, 2024
(1) Chad Risko, Department of Chemistry, University of Kentucky
(2) Hunter Moseley, Department of Molecular and Cellular Biochemistry, University of Kentucky

April 30, 2024
(1) Barry Farmer, CCS, University of Kentucky
(2) Satrio Husodo, ITS RCI, University of Kentucky
(3) Vikram Gazula, CCS, University of Kentucky          

May  07, 2024
(1) Isaac Shlosman, Department of Physics, University of Kentucky