Genetic information in dna is encoded in sequences of how many different molecular letters.

It's not a mistake when we say that ATG is a start codon. Scientists generally consider AUG to be a start codon in mRNA sequence and ATG to be a start codon in a DNA sequence.

But...
If AUG on an mRNA molecule means "start,"
and mRNA is copied from a DNA template,
and the DNA template is complementary to the mRNA copy,
then why isn't a DNA start codon TAC?

The key thing to remember is that DNA is double stranded.

Here's a DNA sequence, with the start codon in red:

GC ATG CTG CGA AAC TTT GGC TGA

We've shown the sequence of just one of the DNA strands. It's a shortcut, and it's tidier to look at, and it's how DNA sequences are typically written. If we wanted to, we could include the sequences of both strands:

GC ATG CTG CGA AAC TTT GGC TGA
CG TAC GAC GCT TTG AAA CCG ACT

While our shorthand version shows just the top strand, it's actually the bottom strand that RNA polymerase reads to build an mRNA molecule. And if we're being literal about the actual nucleotides in the DNA strand that are read to build the mRNA's AUG start codon, we might consider the start codon on a DNA molecule to be TAC.

But that's not quite right. The chemical structure of DNA gives it a polarity, and the two complementary DNA strands are anti-parallel. That is, the 5' (5-prime) and 3' (3-prime) ends of the two DNA strands face in opposite directions:

5' GC ATG CTG CGA AAC TTT GGC TGA 3'
3' CG TAC GAC GCT TTG AAA CCG ACT 5'

The scientific standard is to write a nucleotide sequence from 5' to 3'. That means we'd have to write the sequence of the bottom strand like this:

5' TCA GCC AAA GTT TCG CAG CAT GC 3'

It would be more accurate to say that the DNA sequence of the "start codon" on the bottom strand is CAT. But that's an inconvenient way to talk about a protein-coding DNA sequence: everything's not only complementary but also backwards.

For the sake of ease and clarity, scientists tend to ignore the bottom strand (they call it the "non-coding" or "antisense" strand). Instead, they refer to the sequence of the "coding" or "sense" strand: the one that's almost identical to mRNA—the difference of course being that every T in DNA is replaced by a U in RNA. They know there's another strand, and they know how to figure out what its sequence is if they need to.

An open reading frame, as related to genomics, is a portion of a DNA sequence that does not include a stop codon (which functions as a stop signal). A codon is a DNA or RNA sequence of three nucleotides (a trinucleotide) that forms a unit of genomic information encoding a particular amino acid or signaling the termination of protein synthesis (stop codon). There are 64 different codons: 61 specify amino acids and 3 are used as stop codons. A long open reading frame is often part of a gene (that is, a sequence directly coding for a protein).

Genetic information in dna is encoded in sequences of how many different molecular letters.



Narration

"Open reading frame" is a terrible term that we're stuck with. What it refers to is a frame of reference, and what is being read, "reading", is the RNA code, and it is being read by the ribosomes in order to make a protein. And "open" means that the road is open to keep reading, and the ribosome will be able to keep reading the RNA code and add another amino acid one after another. Now, DNA, though it is a monotonous repetition of As, Cs, Ts, and Gs, has a language, which is transcribed, of course, into RNA and then translated into a protein. And when it's translated into a protein, the mRNA is not read one letter at a time, but it's read three letters at a time. And those three letters are called a codon, and each of those codons, whether it's an AAA or UUU or an AUG, each of those codons is interpreted by the ribosome, the molecular machine, that's going to make the protein as a certain amino acid. So AUG codes for one amino acid, and UUU codes for another, and etc. So an open reading frame is the length of DNA, or RNA, which is transcribed into RNA, through which the ribosome can travel, adding one amino acid after another before it runs into a codon that doesn't code for any amino acid. And when that happens, it confuses the ribosome, and the ribosome stops. So you'll be pleased to hear that codons, which make that happen are called stop codons, and a stop codon ends an open reading frame. So an open reading frame is sometimes 300 amino acids long, and sometimes maybe it's 600, and sometimes it's longer. The longer an open reading frame is, the longer you get before you get to a stop codon, the more likely it is to be part of a gene which is coding for a protein. Now the finally confusing thing about an open reading frame is that because the codons are three nucleic acids long and DNA has two strands, the ribosome can read an RNA derived from one strand or another, and it can read it in 1-2-3s that are separated one from another so you can actually get three reading frames reading in one direction, three reading frames going in the other direction. So it's actually six different reading frames for every piece of DNA, which might give you an open reading frame.

How many letters are there in DNA?

One of the first things you learn in Biology 101 is that the genetic code consists of four letters: A, T, C, and G. Each represents a chemical building block of DNA, the molecule that encodes the information necessary to build life as we know it.

How is genetic information encoded in a DNA molecule?

DNA encodes information through the order, or sequence, of the nucleotides along each strand. Each base—A, C, T, or G—can be considered as a letter in a four-letter alphabet that spells out biological messages in the chemical structure of the DNA.

What are the 4 letters of the genome?

ACGT is an acronym for the four types of bases found in a DNA molecule: adenine (A), cytosine (C), guanine (G), and thymine (T). A DNA molecule consists of two strands wound around each other, with each strand held together by bonds between the bases.

How many letters are in each word of the genetic code?

The genetic code had to be a "language" — using the DNA alphabet of A, T, C, and G — that produced enough DNA "words" to specify each of the 20 known amino acids.