Thursday, April 21, 2005

Bead Based Resequencing.

I had a busy day attending seminars yesterday. I already wrote about Phil Skell at our evolutionary genetics seminar. Later in the day, I went to a genomics/bioinformatics seminar on a new sequencing technique. The company (454) has developed a new method that they hope will complement the shotgun sequencing approach. I describe the method below (along with some background on shotgun sequencing) and its potential applications for polymorphism studies.

Shotgun sequencing is the dominant method for genome sequencing in the post-genomic world. The approach, in a nutshell, involves sequencing random fragments of the genome, then assembling those fragments based on sequence overlaps and mate-pair information. You can learn much more about the method here. The shotgun method has benefits in that you do not need to know much about the genetics of the organism whose genome you plan to sequence (of course, the more you know, the easier the process). The main drawback of the method, however, is that it is extremely labor intensive (involving many intermediate steps) and costs an arm and a leg to sequence a eukaryotic genome with good coverage. The quality of a genome sequencing project is defined by how much "coverage" we have of the entire genome. The modern standard is 6-10x coverage. This means that, on average, each nucleotide is sequenced 6-10 times. The higher the coverage, the more confident we are in our nucleotide calls and the higher the probability we have sequenced a majority of the genome. There are diminishing returns to increasing coverage because you end up sequencing the same nucleotides without getting at those hard to sequence regions. This is why 6-10x coverage is considered the ideal (a balance between high coverage without diminishing returns).

Resequencing projects aim to sequence a portion of the genome or a bunch of loci arrayed throughout the genome in multiple individuals of an organism with a completely sequenced genome. The sequences are then assembled onto the backbone of the completely sequenced reference genome. The common method is PCR based and similar in technique to shotgun method, but different in that it is a targeted process. These studies can give important insights into the function of genes and the evolution of genes and populations.

The new method that I heard about yesterday will allow for whole genome resequencing (as well as targeted resequencing) with much more coverage and for a lower cost. For the cost of resequencing in the classical method with 0.5x coverage, this method can resequence at ~30x in much less time. Yes, I know I sound like an infomercial for 454 (the company behind this technique), but I was pretty impressed and my statements are not influenced by gifts from the company.

The method works by affixing ~1kb ssDNA to tiny beads. The DNA is then replicated, and the replicated strands are also affixed to the beads. They then replicate the DNA using nucleotides that emit a pyrophosphate when added to the growing DNA strand. These pyrophosphates can be detected using the method described here. Basically, they can "watch" the DNA strand replicate in real time and record which nucleotide is being added. They then record the order in which nucleotides are added to the chain, and that's the sequence of that fragment.

Presently, the method is good for sequencing 100-200 base pairs (PCR based approaches sequence 700-800bp). Even though they are sequencing smaller fragments, the method is capable of generating more coverage because they sequence many more of those fragments than under the traditional shotgun approach. One major limitation of the method, though, is that it is unable to sequence mononucleotide repeats (e.g., AAAA or GGGG) because all of the nucleotides get added at the same time in the sequencing reaction. Also, because the sequenced fragments are small and there is no mate pair information, the sequences assemble into smaller contigs/scaffolds than under the traditional shotgun approach. This is not a major issue for resequencing projects because the sequences can be assembled on top of the whole genome sequence. If they want their bead based method to be used for de novo sequencing, they must improve sequencing over mononucleotide repeats and increase the size of the sequence reads.

Despite its limitations, this method could be a welcome advance for researchers interested in resequencing. For instance, if I want to describe patterns of nucleotide variation in my species of interest with a sequenced genome, I could chose a bunch of loci and perform targeted sequencing at about 2x coverage in a bunch of individuals. Using this new method, I could get more coverage at my loci of interest in less time, or perform a whole genome resequencing project in multiple individuals with impressive coverage and not have to worry about choosing loci because I would be sequencing the entire genome. At this point in time, the method has been tested on bacterial genomes (~10 megabases in length), but they plan to extend it to eukaryotic genomes (about 1000 times larger). If it's cheaper, faster, and better (more coverage) this may be the wave of the future in DNA sequencing.

2 Comments:

At 11:50 AM, Blogger gaw3 said...

Very, very interesting. How much input DNA do you need? There are these big human genealogy projects, but all they collect is a cheek swab. Right now a lot of those population studies are done on just a few loci, so this approach would be ideal.

 
At 1:04 PM, Blogger RPM said...

They claim that you only need a minimal sample. If you want to resequence an entire genome (i.e., not targetted) you probably need more. For a targetted approach (like a human SNP project), you could probably start with a cheek swap, then using specific primers to amplify up the genomic regions in which you have SNP makers. This would be identical to what (I believe) SNP projects do. Then, instead of sequencing the PCR products using the Sanger method, running them on a dHPLC, or any other traditional method, you would simply begin the bead based protocol with your genomic regions of interest.

 

Post a Comment

<< Home