Lustre
From Yale HPC Wiki
Contents |
Lustre is a high performance clustered file system currently offered on our larger clusters such as bulldogj and bulldogi. Lustre has three main advantages:
- Aggregates petabytes into a single namespace.
- It's extremely scalable and provides speeds many times that of traditional NFS filesystems.
- Currently there are no quotas on lustre filesystems.
Luster achieves these advantages by making some compromises. For example, lustre:
- Is not extremely redundant, although it has some redundancy.
- Only supports local POSIX file locks.
- Needs to be purged of data occasionally once the file system reaches capacity. Users will be notified in advance.
Users of the lustre enabled clusters have two links in their home directory called "scratch" and "stripe_scratch".
How Lustre Works
The Lustre system allows multiple storage servers to cooperate together to present a single file system. When a client (a compute node for example) uses the lustre file system it can request data from any one of the servers or all of the servers at the same time. For example, bulldogi has ten lustre servers with 2TB of disk space each. The lustre file system is then 10 X 2TB = 20TB total size.
The Standard Parallel File System Model ("scratch" directories)
The standard parallel file system stores each file on a single server. Say you have a directory of 100 1MB files and each of these files need to be read by a single process at the start of your job. On a traditional NFS file system, a single server would handle all 100 requests for the 100 1MB files. On bulldogi's lustre (ten servers, see above), each of those 100 files are spread out over the 10 servers, so each server only handles 10 requests for a 1MB file.
The Striped Parallel File System Model ("stripe_scratch" directories)
Lustre can also stripe a single file over multiple servers. For example, say instead of the 100 1MB files in the example above you had one file of 100MB. Using the standard storage model above, that single 100MB file would be served by only one server. If you were to run a job of 100 processes reading 1MB of that 100MB file, all 100 requests would go to that single server.
To resolve one server from serving up all of the data you can put the file in the "stripe_scratch" directory. In the stripe_scratch directory a single file is broken up into stripes and spread around to each of the servers. These stripes are generally around 1 - 4MB (whatever is the optimum value for the system).
Using the example above, the stripe_scratch would break that 100MB file into 100 1MB chunks which would then be served up by each of the ten servers simultaneously with little overhead. There is, however, some overhead, so only use stripe_scratch directories if you have large files being read by more than one client host. Generally speaking, you will see performance improvements on file sizes larger than about 150MB.
More Information
For more information see the official detailed documentation.

