Hi everybody! Welcome back to Synthetic Biology One. Today’s lesson is extremely important because we are going over a critical basic unit of biological function. We are aiming right for the center of the central dogma. We are paying a visit to the queen of the gene, move over mitochondria because there’s a new powerhouse in the cell: I’m talking about a coding sequence.
A coding sequence is a piece of DNA that codes for a protein. Proteins are the basic functional unit in biology and responsible for almost everything that happens in a living cell. So if we can start to recognize coding sequences and get familiar, we’ll unlock the most important tool in the synthetic biology toolbox. A coding sequence has 3 key functional features that will always give it away.
First, it starts with a start codon: ATG. These three nucleotides tell the ribosome where to start making protein. In rare cases, proteins can start with other codons. But these cases make our life more difficult so we can ignore them. We’re allowed to do that because we are engineers, not scientists, and we care about results! Seriously though, for the purposes of this course, we will assume coding sequences always begin ATG.
Second, it ends with a stop codon. TAA, TGA, or TAG. This is the code that tells the ribosome to stop making a protein and fall off. My favorite stop codon is TAA and I use it in all my designs. But the other two are quite common as well.
Third, a real coding sequence has to have some meat inside it. That means that between the start codon and the stop codon we expect to see a good chunk of base pairs, a few hundred to a few thousand, that don’t include other stop codons to interrupt the gene. This rule helps us to distinguish real coding sequence from random DNA that just happens include start and stop codons. If a coding sequence is less than a few hundred base pairs, it might be real, but don’t trust it!
And that’s it. Three simple rules for finding a coding sequence. In fact they are so simple, that they are easily done by a computer. Any common DNA sequence editor software can apply these rules to find things that look like DNA sequences automatically so you don’t have to read through each base pair and mark all the ATGs.
A sequence of DNA that looks like a coding sequence because it has a start codon, stop codon, etc., is sometimes called an Open Reading Frame, or ORF. Technically, the term coding sequence, or CDS, only applies when we know from experimental data that it actually codes for a protein. In practice, ORF and CDS are often used interchangeably and you should be familiar with both.
Coding sequences are very useful when you are trying to understand a new piece of DNA. The coding sequences are easy to find with the help of software, and they are usually have a key role in the overall functional of the DNA. In the next lesson, we’ll introduce some simple strategies for understanding coding sequences when you do find them.
Until then – happy decoding!