Molecular cloning, gene organization and expression of the human UDP-GalNAc:Neu5Acalpha2-3Galbeta-R beta1,4-N-acetylgalactosaminyltransferase responsible for the biosynthesis of the blood group Sda/Cad antigen: evidence for an unusual extended cytoplasmic domain. The nucleotide sequence of the short and long transcripts of beta1,4- N -acetylgalactosaminyltransferase have been submitted to the DDBJ, EMBL, GenBank(R) and GSDB Nucleotide Sequence Databases under accession nos AJ517770 and AJ517771 respectively. The human Sd(a) antigen is formed through the addition of an N -acetylgalactosamine residue via a beta1,4-linkage to a sub-terminal galactose residue substituted with an alpha2,3-linked sialic acid residue. We have taken advantage of the previously cloned mouse cDNA sequence of the UDP-GalNAc:Neu5Acalpha2-3Galbeta-R beta1,4- N -acetylgalactosaminyltransferase (Sd(a) beta1,4GalNAc transferase) to screen the human EST and genomic databases and to identify the corresponding human gene. The sequence spans over 35 kb of genomic DNA on chromosome 17 and comprises at least 12 exons. As judged by reverse transcription PCR, the human gene is expressed widely since it is detected in various amounts in almost all cell types studied. Northern blot analysis indicated that five Sd(a) beta1,4GalNAc transferase transcripts of 8.8, 6.1, 4.7, 3.8 and 1.65 kb were highly expressed in colon and to a lesser extent in kidney, stomach, ileum and rectum. The complete coding nucleotide sequence was amplified from Caco-2 cells. Interestingly, the alternative use of two first exons, named E1(S) and E1(L), leads to the production of two transcripts. These nucleotide sequences give rise potentially to two proteins of 506 and 566 amino acid residues, identical in their sequence with the exception of their cytoplasmic tail. The short form is highly similar (74% identity) to the mouse enzyme whereas the long form shows an unusual long cytoplasmic tail of 66 amino acid residues that is as yet not described for any other mammalian glycosyltransferase. Upon transient transfection in Cos-7 cells of the common catalytic domain, a soluble form of the protein was obtained, which catalysed the transfer of GalNAc residues to alpha2,3-sialylated acceptor substrates, to form the GalNAcbeta1-4[Neu5Acalpha2-3]Galbeta1-R trisaccharide common to both Sd(a) and Cad antigens.