Detecting Frame Shifts by amino Acid Sequence Comparison Jean-Michel Claverie This FTP directory contains the various files necessary to reproduce (and expand to other problems) the work described in J. Molecular Biology (1993) 234: 1140-1157. DATA FILES: Std_Genetic_Code.no.bias -> standard genetic code Coli.code.bias -> standard genetic code and E. coli codon bias Human.code.bias -> standard genetic code and human codon bias those files can serve as input for the framscore program MATRIX FILES: TABLE2 , TABLE3 and TABLE4 contain the information presented in the corresponding tables in the article in reference. REV3.bias.4 is a matrix incorporating the human codon information, cited (Table 5) but not presented in the article. SOURCE CODE FILES: framscore.c -> to generate a transition matrix from a given frame shift type, genetic code and codon bias Stand alone program, compile with cc framscore.c pamxt.c -> to generate scoring matrices from a transition matrix and given pam distance, using a symetrical or and asymetrical model. Stand alone program, compile with cc pamxt.c -lm USING THE PROGRAMS: framscore :[-3,-2,-1,1,2,3]> ex: framscore Human.code.bias -3 > rev3 generates the transition matrix: A R N D C ....... 0.033 0.217 0.000 0.000 0.000 ....... 0.095 0.047 0.023 0.023 0.047 ....... 0.000 0.165 0.085 0.085 0.165 ....... 0.000 0.155 0.095 0.095 0.155 ....... 0.000 0.175 0.075 0.075 0.175 ....... ...................................... as the the last part of the output (to stdout) pamxt number pamxt number Transition_Matrix [-a] ex: pamxt 4 rev3 -a generates the PAMxT.out file with the REV3.bias.4 scoring matrix RUNNING THE SEARCH: The Scoring matrices are produced in a format compatible with BLAST. Source Code and information about the BLAST suite can be found in the /pub/blast directory of this FTP site. Using the scoring matrices with another similarity search program such as FASTA, might requires a change in their format. FOR BETTER RESULTS: Frame shift similarity searches are subject to the same artefact than regular similarity searches, namely false positive induced by low complexity regions in proteins. Please look into the /pub/jmc/xnu directory for information on this problem and how to alleviate it with the XNU program. ============================================================================== | Jean-Michel CLAVERIE | | National Center for Biotechnology Information, | | National Library of Medicine, National Institutes of Health, | | Bldg 38 A, 8600 Rockville Pike, Bethesda, MD 20894, USA. | | Phone: (1) 301 496 24 75 | | fax: (1) 301 480 92 41 | | E-mail: jmc@ncbi.nlm.nih.gov | ============================================================================== January 1994