[Bioclusters] Large numbers of files (again...?)

Joe Landman bioclusters@bioinformatics.org
Wed, 28 Jan 2004 08:12:40 -0500

Hi Dan:

   XFS uses B*-trees everywhere, and can in theory handle obscene 
numbers of files per directory (obscene >> 10**7).  It is very fast with 
several thousand or tens of thousands of  entries under Linux.  JFS may 
be similar in terms of ability to handle 10**4 files/directory, though 
it uses a different technology.  None of the others do this.


Dan Bolser wrote:

>Sorry if this is a repost, I am not sure how your moderation works, but now I am a
>member of the list, I am sending this mail again...
>I am looking for information regarding an old problem.
>Does anyone have experience dealing with directories with 'large' numbers of files
>i.e. around 10,000.
>Although I know there are plenty of tools to get around the 'argument list too long'
>problem in bash, more generally these directories are sluggish to handle. This is
>because the FS uses a linear (un indexed) search of directory listings to find
>I accidentally created a directory with 300,000 files, and it was practically a
>death trap for the system.
>Does anyone have any suggestions about how to handle this kind of situation? For
>example I was looking into hashing functions using directories as nodes in the hash
>tree. By automatically following the right set of directories you would find your
>file, but this underlying behavior could be hidden from the user by using special
>tools in a special 'big files' directory.
>i.e. $> bigLs bigDir
>Any FS implemented in this way? It is frustrating when mysql can easily handle
>millions of records, but my file system starts to complain with about 5000 files in
>one directory.
>Thanks for any help you can give, even if it is just "Don't put 300,000 files in one
>Bioclusters maillist  -  Bioclusters@bioinformatics.org

Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615