Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12.

Article Details

Citation

Link AJ, Robison K, Church GM

Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12.

Electrophoresis. 1997 Aug;18(8):1259-313.

PubMed ID
9298646 [ View in PubMed
]
Abstract

Mining the emerging abundance of microbial genome sequences for hypotheses is an exciting prospect of "functional genomics". At the forefront of this effort, we compared the predictions of the complete Escherichia coli genomic sequence with the observed gene products by assessing 381 proteins for their mature N-termini, in vivo abundances, isoelectric points, molecular masses, and cellular locations. Two-dimensional gel electrophoresis (2-DE) and Edman sequencing were combined to sequence Coomassie-stained 2-DE spots representing the abundant proteins of wild-type E. coli K-12 strains. Greater than 90% of the abundant proteins in the E. coli proteome lie in a small isoelectric point and molecular mass window of 4-7 and 10-100 kDa, respectively. We identified several highly abundant proteins, YjbJ, YjbP, YggX, HdeA, and AhpC, which would not have been predicted from the genomic sequence alone. Of the 223 uniquely identified loci, 60% of the encoded proteins are proteolytically processed. As previously reported, the initiator methionine was efficiently cleaved when the penultimate amino acid was serine or alanine. In contrast, when the penultimate amino acid was threonine, glycine, or proline, cleavage was variable, and valine did not signal cleavage. Although signal peptide cleavage sites tended to follow predicted rules, the length of the putative signal sequence was occassionally greater than the consensus. For proteins predicted to be in the cytoplasm or inner membrane, the N-terminal amino acids were highly constrained compared to proteins localized to the periplasm or outer membrane. Although cytoplasmic proteins follow the N-end rule for protein stability, proteins in the periplasm or outer membrane do not follow this rule; several have N-terminal amino acids predicted to destabilize the proteins. Surprisingly, 18% of the identified 2-DE spots represent isoforms in which protein products of the same gene have different observed pI and M(r), suggesting they are post-translationally processed. Although most of the predicted and observed values for isoelectric point and molecular mass show reasonable concordance, for several proteins the observed values significantly deviate from the expected values. Such discrepancies may represent either highly processed proteins or misinterpretations of the genomic sequence. Our data suggest that AhpC, CspC, and HdeA exist as covalent homomultimers, and that IcdA exists as at least three isoforms even under conditions in which covalent modification is not predicted. We enriched for proteins based on subcellular location and found several proteins in unexpected subcellular locations.

DrugBank Data that Cites this Article

Polypeptides
NameUniProt ID
Malate dehydrogenaseP61889Details
30S ribosomal protein S10P0A7R5Details
DNA-directed RNA polymerase subunit alphaP0A7Z4Details
Aspartate aminotransferaseP00509Details
Adenylate kinaseP69441Details
Isocitrate dehydrogenase [NADP]P08200Details
ADP-L-glycero-D-manno-heptose-6-epimeraseP67910Details
Putrescine-binding periplasmic proteinP31133Details
NH(3)-dependent NAD(+) synthetaseP18843Details
Adenylosuccinate synthetaseP0A7D4Details
DihydroorotaseP05020Details
Vitamin B12 transporter BtuBP06129Details
Outer membrane protein FP02931Details
Protein YceIP0A8X2Details
Oxygen-insensitive NAD(P)H nitroreductaseP38489Details
L-arabinose-binding periplasmic proteinP02924Details
Glucose-1-phosphataseP19926Details
3-methyl-2-oxobutanoate hydroxymethyltransferaseP31057Details
Histidinol dehydrogenaseP06988Details
Argininosuccinate synthaseP0A6E4Details
Ornithine carbamoyltransferase chain IP04391Details
Sulfite reductase [NADPH] hemoprotein beta-componentP17846Details
Outer membrane protein TolCP02930Details
Ribose import binding protein RbsBP02925Details
Aconitate hydratase BP36683Details
Maltose-binding periplasmic proteinP0AEX9Details
Pyruvate dehydrogenase E1 componentP0AFG8Details
Phospho-2-dehydro-3-deoxyheptonate aldolase, Phe-sensitiveP0AB91Details
Enoyl-[acyl-carrier-protein] reductase [NADH] FabIP0AEK4Details
Class B acid phosphataseP0AE22Details
Fructose-bisphosphate aldolase class 2P0AB71Details
Phosphate-binding protein PstSP0AG82Details
Spermidine/putrescine-binding periplasmic proteinP0AFK9Details
Outer membrane protein AP0A910Details
Glyceraldehyde-3-phosphate dehydrogenase AP0A9B2Details
D-galactose-binding periplasmic proteinP0AEE5Details
Thiol:disulfide interchange protein DsbCP0AEG6Details
2-iminobutanoate/2-iminopropanoate deaminaseP0AF93Details
3-mercaptopyruvate sulfurtransferaseP31142Details
FlavohemoproteinP24232Details
EcotinP23827Details
Protein UshAP07024Details
Glutamate decarboxylase alphaP69908Details
Aspartate carbamoyltransferase catalytic subunitP0A786Details
Succinate dehydrogenase flavoprotein subunitP0AC41Details
50S ribosomal protein L4P60723Details
Succinate dehydrogenase iron-sulfur subunitP07014Details
Ribosome-recycling factorP0A805Details
Lactaldehyde dehydrogenaseP25553Details
Biotin carboxylaseP24182Details
Thiol:disulfide interchange protein DsbAP0AEG4Details
Outer membrane porin CP06996Details