[BiO BB] CD-HIT Support Request

Fri Mar 12 02:10:57 EST 2010

Hi Everyone,

I am writing a clustering program in java that calls cd-hit for New, 
Incremental, and Hierarchical clustering. The program works fine for New 
clustering, however when I attempt to call cd-hit from within the java 
code for Incremental clustering, I get errors. The error logs are 
attached below. 3 different errors occured that I don't understand and 
are the reason for which I am seeking your assistance.

This error occurred when trying to execute cd-hit for hierarchical 
clustering from the linux command prompt.
****************************************************************************************
[root at ip-10-194-215-223 cd-hit]# ./psi-cd-hit-local.pl -i hierarchical93 
-o hierarchical90 -c 0.90
Name "main::formatdb_no" used only once: possible typo at 
./psi-cd-hit-local.pl line 1712.
Name "main::known_singles" used only once: possible typo at 
./psi-cd-hit-local.pl line 1873.
Name "main::longest_ide" used only once: possible typo at 
./psi-cd-hit-local.pl line 966.
Name "main::known_single" used only once: possible typo at 
./psi-cd-hit-local.pl line 1873.
[root at ip-10-194-215-223 cd-hit]#

[root at ip-10-194-215-223 cd-hit]# perl psi-cd-hit.pl -i hierarchical93 -o 
hierarchical90 -c 0.3
Name "main::reformat_seg" used only once: possible typo at psi-cd-hit.pl 
line 65.
Name "main::restart_seg" used only once: possible typo at psi-cd-hit.pl 
line 62.
Can't exec "formatdb": No such file or directory at 
.//psi-cd-hit-local.pl line 1723.
Can not formatdb at .//psi-cd-hit-local.pl line 1724.
[root at ip-10-194-215-223 cd-hit]# vi hierarchical90.log
[root at ip-10-194-215-223 cd-hit]# vi hierarchical90.out
[root at ip-10-194-215-223 cd-hit]#

The following outputs and subsequent errors occurred during incremental 
clustering executed from within a java code.
****************************************************************************************
Cluster CMD:         C:\cd-hit-windows\cd_hit_2d.exe -i "C:/cluster 
files/unipath_2010-3-9.clstr" -i2 "c:/cluster 
files/unipath_2010-3-9.fasta" -o "C:/cluster files/unipath_2010-3-9" -c 
0.9 -n 5 -d 50

Mar 9, 2010 6:55:31 AM Here is the standard output of the command:

total seq in db1: 0
total seq in db2: 15732
longest and shortest : 0 and 99999999
Total letters: 0
Mar 9, 2010 6:55:57 AM Process Exit Value : 1
Mar 9, 2010 6:55:57 AM Here is the standard error of the command (if any):

Fatal Error
Memory

Program halted !!

**************************************************************************************
Cluster CMD:         C:\cd-hit-windows\cd_hit_2d.exe -i "C:/cluster 
files/clusteroutput.clstr" -i2 "c:/cluster files/aric_2010-3-10.fasta" 
-o "C:/cluster files/aric_2010-3-10" -c 0.9 -n 5 -d 50

Mar 10, 2010 6:51:01 AM Here is the standard output of the command:

total seq in db1: 79
total seq in db2: 4776
longest and shortest : 48 and 11
Total letters: 1061
Sequences have been sorted
longest and shortest : 34350 and 11
Total letters: 3167574
compute index table for first database
Reading swap
Comparing with SEG 0
..........1000 compared        0 clustered
..........2000 compared        0 clustered
..........3000 compared        0 clustered
..........4000 compared        0 clustered
.......
4776 compared        0 clustered
writing non-redundant sequences from db2
writing clustering information
program completed !

Total CPU time 103
Mar 10, 2010 6:51:23 AM Process Exit Value : 0
Mar 10, 2010 6:51:23 AM Here is the standard error of the command (if any):

The attached fasta file was generated along with the last part of the 
output/error log. The contents of the fasta file are very un-familiar 
and I would be very grateful if someone can help me understand it.
Thanks you all.
Regards.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: aric_2010-3-10.clstr
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20100312/b95f6113/attachment.ksh>