Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs.

Article Details


Wiemann S, Weil B, Wellenreuther R, Gassenhuber J, Glassl S, Ansorge W, Bocher M, Blocker H, Bauersachs S, Blum H, Lauber J, Dusterhoft A, Beyer A, Kohrer K, Strack N, Mewes HW, Ottenwalder B, Obermaier B, Tampe J, Heubner D, Wambutt R, Korn B, Klein M, Poustka A

Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs.

Genome Res. 2001 Mar;11(3):422-35.

PubMed ID
11230166 [ View in PubMed

With the complete human genomic sequence being unraveled, the focus will shift to gene identification and to the functional analysis of gene products. The generation of a set of cDNAs, both sequences and physical clones, which contains the complete and noninterrupted protein coding regions of all human genes will provide the indispensable tools for the systematic and comprehensive analysis of protein function to eventually understand the molecular basis of man. Here we report the sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame. Assignment to functional categories was possible for 52% (259) of the encoded proteins, the remaining fraction having no similarities with known proteins. By aligning the cDNA sequences with the sequences of the finished chromosomes 21 and 22 we identified a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted. Three of these genes appear to be present in several copies. We conclude that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes. The set of 500 novel cDNAs, and another 1000 full-coding cDNAs of known transcripts we have identified, adds up to cDNA representations covering 2%--5 % of all human genes. We thus substantially contribute to the generation of a gene catalog, consisting of both full-coding cDNA sequences and clones, which should be made freely available and will become an invaluable tool for detailed functional studies.

DrugBank Data that Cites this Article

NameUniProt ID
Pyruvate dehydrogenase E1 component subunit beta, mitochondrialP11177Details
NADH dehydrogenase [ubiquinone] 1 subunit C2O95298Details
NEDD8-activating enzyme E1 regulatory subunitQ13564Details
NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 8, mitochondrialO95169Details
Glutathione S-transferase kappa 1Q9Y2Q3Details
Transient receptor potential cation channel subfamily V member 1Q8NER1Details
Isocitrate dehydrogenase [NADP] cytoplasmicO75874Details
Heat shock protein HSP 90-betaP08238Details
Proline synthase co-transcribed bacterial homolog proteinO94903Details
Methionine adenosyltransferase 2 subunit betaQ9NZL9Details
Sphingomyelin phosphodiesterase 4Q9NXE4Details
DNA damage-inducible transcript 4 proteinQ9NX09Details
Carbonic anhydrase-related protein 10Q9NS85Details
Sodium channel subunit beta-3Q9NY72Details
Enoyl-CoA delta isomerase 2, mitochondrialO75521Details
Guanine nucleotide-binding protein G(i) subunit alpha-1P63096Details
RuvB-like 2Q9Y230Details
ATP synthase subunit g, mitochondrialO75964Details
Nuclear migration protein nudCQ9Y266Details
KAT8 regulatory NSL complex subunit 3Q9P2N6Details
A disintegrin and metalloproteinase with thrombospondin motifs 13Q76LX8Details
Elongation of very long chain fatty acids protein 5Q9NYP7Details
Calcium/calmodulin-dependent protein kinase kinase 1Q8N5S9Details
Eukaryotic translation initiation factor 2-alpha kinase 1Q9BQI3Details
LIM domain kinase 2P53671Details
NUAK family SNF1-like kinase 2Q9H093Details
Serine/threonine-protein kinase SIK2Q9H0K1Details
Transient receptor potential cation channel subfamily M member 3Q9HCF6Details
Solute carrier family 40 member 1Q9NP59Details
Cytochrome b reductase 1Q53TN4Details
V-type proton ATPase subunit S1Q15904Details
Sphingosine kinase 2Q9NRA0Details
Microphthalmia-associated transcription factorO75030Details