Cluster Data Movement and Management

The SPDS group is working on several mechanisms to move data between clusters and within clusters. This includes parallel filesystems and adaptive caching, parallel communications, and scalable file transfer.

Cluster-to-Cluster (C2C) Communications

This project aimed at developing an approach to allow parallel programs to communicate directly with other parallel programs without requiring intervening steps, such as serializing data or writing then reading a file. Our initial prototype performed very well and allowed the nodes a parallel program to sent directly to the nodes of another parallel program. The mapping problem, which determined what data needed to be sent to/from which node, is a very complex issue, so the prototype solved it for block mappings only, but did handle ghost rows/planes nicely. Please see the following paper for more details.

S. Fide and S. Jenks, “Cluster-to-Cluster Communications: A Consistent Communication Paradigm for Distributed Clusters,” in Proceedings of Parallel and Distributed Computing and Systems Conference, Cambridge, MA, 2004.

S. Fide and S. F. Jenks, “A middleware approach for pipelining communications in clusters,” Cluster Computing, to appear.

Parallel File Systems

Parallel file systems store data on multiple computers and can respond to many simultaneous requests with high aggregate throughput. Therefore, they are ideal for clusters with disks in the nodes and a high-performance network connecting the nodes. Several popular and freely-available parallel file systems exist, but these tend to be fairly intolerant of faults and configuration changes. The goal of our project is to develop an adaptive parallel file system that can recover from disk and node failures, can transparently add nodes, and will exploit and adapt to locality of reference. As this work is still in a very early stage, more details will have to wait.

Scalable File Transfer

The way data is transfered into clusters and within the nodes of a cluster relies on either very old transfer technology (FTP and variants), distributed file systems (NFS), or parallel file systems. None of these approaches provides the aggregate end-to-end data transfer performance supported by the underlying parallel hardware and networks. This research will develop tools that will allow data transfer performance to scale with the performance of the destination cluster. This work is still early, but look for preliminary results soon!