Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12.
Article Details
- CitationCopy to clipboard
Link AJ, Robison K, Church GM
Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12.
Electrophoresis. 1997 Aug;18(8):1259-313.
- PubMed ID
- 9298646 [ View in PubMed]
- Abstract
Mining the emerging abundance of microbial genome sequences for hypotheses is an exciting prospect of "functional genomics". At the forefront of this effort, we compared the predictions of the complete Escherichia coli genomic sequence with the observed gene products by assessing 381 proteins for their mature N-termini, in vivo abundances, isoelectric points, molecular masses, and cellular locations. Two-dimensional gel electrophoresis (2-DE) and Edman sequencing were combined to sequence Coomassie-stained 2-DE spots representing the abundant proteins of wild-type E. coli K-12 strains. Greater than 90% of the abundant proteins in the E. coli proteome lie in a small isoelectric point and molecular mass window of 4-7 and 10-100 kDa, respectively. We identified several highly abundant proteins, YjbJ, YjbP, YggX, HdeA, and AhpC, which would not have been predicted from the genomic sequence alone. Of the 223 uniquely identified loci, 60% of the encoded proteins are proteolytically processed. As previously reported, the initiator methionine was efficiently cleaved when the penultimate amino acid was serine or alanine. In contrast, when the penultimate amino acid was threonine, glycine, or proline, cleavage was variable, and valine did not signal cleavage. Although signal peptide cleavage sites tended to follow predicted rules, the length of the putative signal sequence was occassionally greater than the consensus. For proteins predicted to be in the cytoplasm or inner membrane, the N-terminal amino acids were highly constrained compared to proteins localized to the periplasm or outer membrane. Although cytoplasmic proteins follow the N-end rule for protein stability, proteins in the periplasm or outer membrane do not follow this rule; several have N-terminal amino acids predicted to destabilize the proteins. Surprisingly, 18% of the identified 2-DE spots represent isoforms in which protein products of the same gene have different observed pI and M(r), suggesting they are post-translationally processed. Although most of the predicted and observed values for isoelectric point and molecular mass show reasonable concordance, for several proteins the observed values significantly deviate from the expected values. Such discrepancies may represent either highly processed proteins or misinterpretations of the genomic sequence. Our data suggest that AhpC, CspC, and HdeA exist as covalent homomultimers, and that IcdA exists as at least three isoforms even under conditions in which covalent modification is not predicted. We enriched for proteins based on subcellular location and found several proteins in unexpected subcellular locations.
DrugBank Data that Cites this Article
- Polypeptides
Name UniProt ID Malate dehydrogenase P61889 Details 30S ribosomal protein S10 P0A7R5 Details DNA-directed RNA polymerase subunit alpha P0A7Z4 Details Aspartate aminotransferase P00509 Details Adenylate kinase P69441 Details Isocitrate dehydrogenase [NADP] P08200 Details ADP-L-glycero-D-manno-heptose-6-epimerase P67910 Details Putrescine-binding periplasmic protein P31133 Details NH(3)-dependent NAD(+) synthetase P18843 Details Adenylosuccinate synthetase P0A7D4 Details Dihydroorotase P05020 Details Vitamin B12 transporter BtuB P06129 Details Outer membrane protein F P02931 Details Protein YceI P0A8X2 Details Oxygen-insensitive NAD(P)H nitroreductase P38489 Details L-arabinose-binding periplasmic protein P02924 Details Glucose-1-phosphatase P19926 Details 3-methyl-2-oxobutanoate hydroxymethyltransferase P31057 Details Histidinol dehydrogenase P06988 Details Argininosuccinate synthase P0A6E4 Details Ornithine carbamoyltransferase chain I P04391 Details Sulfite reductase [NADPH] hemoprotein beta-component P17846 Details Outer membrane protein TolC P02930 Details Ribose import binding protein RbsB P02925 Details Aconitate hydratase B P36683 Details Maltose-binding periplasmic protein P0AEX9 Details Pyruvate dehydrogenase E1 component P0AFG8 Details Phospho-2-dehydro-3-deoxyheptonate aldolase, Phe-sensitive P0AB91 Details Enoyl-[acyl-carrier-protein] reductase [NADH] FabI P0AEK4 Details Class B acid phosphatase P0AE22 Details Fructose-bisphosphate aldolase class 2 P0AB71 Details Phosphate-binding protein PstS P0AG82 Details Spermidine/putrescine-binding periplasmic protein P0AFK9 Details Outer membrane protein A P0A910 Details Glyceraldehyde-3-phosphate dehydrogenase A P0A9B2 Details D-galactose-binding periplasmic protein P0AEE5 Details Thiol:disulfide interchange protein DsbC P0AEG6 Details 2-iminobutanoate/2-iminopropanoate deaminase P0AF93 Details 3-mercaptopyruvate sulfurtransferase P31142 Details Flavohemoprotein P24232 Details Ecotin P23827 Details Protein UshA P07024 Details Glutamate decarboxylase alpha P69908 Details Aspartate carbamoyltransferase catalytic subunit P0A786 Details Succinate dehydrogenase flavoprotein subunit P0AC41 Details 50S ribosomal protein L4 P60723 Details Succinate dehydrogenase iron-sulfur subunit P07014 Details Ribosome-recycling factor P0A805 Details Lactaldehyde dehydrogenase P25553 Details Biotin carboxylase P24182 Details Thiol:disulfide interchange protein DsbA P0AEG4 Details Outer membrane porin C P06996 Details