[Bioclusters] Large numbers of files (again...?)

Dan Bolser bioclusters@bioinformatics.org
Wed, 28 Jan 2004 12:25:02 -0000 (GMT)


Hello,

Sorry if this is a repost, I am not sure how your moderation works, but now I am a
member of the list, I am sending this mail again...

-----

I am looking for information regarding an old problem.

Does anyone have experience dealing with directories with 'large' numbers of files
i.e. around 10,000.

Although I know there are plenty of tools to get around the 'argument list too long'
problem in bash, more generally these directories are sluggish to handle. This is
because the FS uses a linear (un indexed) search of directory listings to find
files.

I accidentally created a directory with 300,000 files, and it was practically a
death trap for the system.

Does anyone have any suggestions about how to handle this kind of situation? For
example I was looking into hashing functions using directories as nodes in the hash
tree. By automatically following the right set of directories you would find your
file, but this underlying behavior could be hidden from the user by using special
tools in a special 'big files' directory.

i.e. $> bigLs bigDir

Any FS implemented in this way? It is frustrating when mysql can easily handle
millions of records, but my file system starts to complain with about 5000 files in
one directory.

Thanks for any help you can give, even if it is just "Don't put 300,000 files in one
directory!".

Cheers,
Dan.