Data & Storage – Center for Computational Sciences

Research at the University of Kentucky often involves large datasets that must be stored, processed, and shared across computing systems. CCS provides several storage platforms designed to support computational workflows, collaborative research, and long-term data retention.

These systems include high-performance parallel file systems for active computation, scalable object storage for large datasets, and network-attached storage for persistent research data. This page provides an overview of these storage models and how they are used within CCS environments.

What is Research Storage?

Research storage systems provide the infrastructure needed to manage large datasets generated by modern research workflows.

These systems support activities such as:

storing simulation and analysis outputs
managing large datasets used in computational workflows
sharing research data within research groups or collaborations
retaining datasets beyond active computation

Different storage platforms are designed to support different access patterns, performance needs, and data management goals.

When should I use Research Storage?

Research storage services may be appropriate if:

your research generates large datasets that exceed local storage
you need shared storage accessible across compute systems
your workflows require high-throughput access during computation
you need a scalable system for storing or sharing large datasets
you need persistent storage for research data that must remain accessible over time

If you are unsure which storage platform is appropriate for your workflow, CCS can help evaluate your data management needs.

Storage Systems

UK’s research storage infrastructure includes multiple storage platforms designed to support different classes of data workflows. These systems support high-performance computation, collaborative research environments, and long-term dataset retention.

Parallel File Systems (GPFS)

GPFS is optimized for active computational workloads rather than long-term data retention.

Parallel file systems provide high-performance storage designed for active computation on HPC systems. CCS operates GPFS-based storage environments connected to the University’s research computing clusters, enabling fast access to data during simulations, analysis workflows, and other compute-intensive tasks. Each cluster maintains its own GPFS environment, so storage performance can be optimized for the workloads running on that system.

Typical uses include:

active research data used during computation
temporary workspace for simulations and analysis pipelines
shared project directories used by research groups

Storage Quotas/Limits: LCC | MCC | ECC
How to check disk usage
Filesystem Basics

Object Storage (Ceph)

Object storage is not intended to replace high-performance filesystems used by compute workloads.

Object storage provides scalable storage designed for large datasets and applications that benefit from programmatic or service-based data access. CCS operates object storage infrastructure based on Ceph, providing S3-compatible storage services for research workflows that benefit from scalable data storage or object-based interfaces.

Typical uses include:

storing large research datasets
enabling data services or programmatic access to research data
supporting applications designed to interact with object storage systems

Network-Attached Storage (NAS)

NAS systems prioritize accessibility and reliability rather than high-performance parallel I/O.

Network-attached storage (NAS) provides persistent storage for research datasets that are not actively being processed by HPC systems. NAS environments are deployed through condo storage purchases, allowing research groups to obtain dedicated storage capacity for long-term project data.

CCS periodically coordinates campus storage procurements to allow research groups to participate in shared purchases.

Typical uses include:

retaining datasets after analysis workflows complete
maintaining persistent project data repositories
staging data before or after computational workflows

NAS Storage Documentation

Research computing workflows often require moving large datasets between systems, institutions, and collaborators.

CCS provides Globus endpoints on the data transfer nodes (DTNs) associated with each cluster. These endpoints allow researchers to efficiently transfer data between campus systems and external research infrastructure.

Through the institutional Globus service, CCS can also create Globus Guest Collections, allowing researchers to securely share datasets with collaborators without requiring direct system access.

In addition to campus storage infrastructure, CCS works with national research data services such as OURRstore to support long-term archival storage of large research datasets. These services provide durable tape-based storage designed for retaining data beyond the active phases of computational research.

Data Transfer Node Documentation
Globus User Documentation
OURRstore User Documentation

Data Protection Notice

CCS storage platforms are designed to support research workflows but are not intended to serve as the sole copy of important research data.

Researchers are responsible for maintaining appropriate redundant or off-site copies of critical datasets.

On this page

What is Research Storage?

When should I use Research Storage?

Storage Systems

Parallel File Systems (GPFS)

Object Storage (Ceph)

Network-Attached Storage (NAS)

Moving and Sharing Data

Data Protection Notice