[Bioclusters] Large numbers of files (again...?)

Wed, 28 Jan 2004 15:23:46 +0200

Dan Bolser wrote:

>Hello,
>
>Sorry if this is a repost, I am not sure how your moderation works, but now I am a
>member of the list, I am sending this mail again...
>
>-----
>
>I am looking for information regarding an old problem.
>
>Does anyone have experience dealing with directories with 'large' numbers of files
>i.e. around 10,000.
>
>Although I know there are plenty of tools to get around the 'argument list too long'
>problem in bash, more generally these directories are sluggish to handle. This is
>because the FS uses a linear (un indexed) search of directory listings to find
>files.
>
>I accidentally created a directory with 300,000 files, and it was practically a
>death trap for the system.
>
>Does anyone have any suggestions about how to handle this kind of situation? For
>example I was looking into hashing functions using directories as nodes in the hash
>tree. By automatically following the right set of directories you would find your
>file, but this underlying behavior could be hidden from the user by using special
>tools in a special 'big files' directory.
>
>i.e. $> bigLs bigDir
>
>Any FS implemented in this way? It is frustrating when mysql can easily handle
>millions of records, but my file system starts to complain with about 5000 files in
>one directory.
>
>Thanks for any help you can give, even if it is just "Don't put 300,000 files in one
>directory!".
>
>Cheers,
>Dan.
>
>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>
>
>  
>
Hi ,

Some of my data is stored as a bunch of files, divided between about 70 
directories, each directory containing ~14,000 files.
It's working ok. The FS is ext3 (on linux kernel 2.4.22). I'm using bash 
for scripting, and it manages the ~14000 on the command line.

However, if you want to migrate to MySQL - check out this FS-lookalike 
that lets you access MySQL data through a FS interface - 
http://no.spam.ee/~tonu/modules.php?name=News&new_topic=2 . I'm not sure 
it's ready for prime-time, though...

Arnon