The whole gene synthesis technology has been very mature. The general practice is to design and synthesize overlapping single-stranded oligonucleotides, and splice the full length by overlapping extension PCR. There is a lot of information about the method of whole gene synthesis on the Internet, and there are hundreds of patents related to single whole gene synthesis. Common ones are overlapping extension PCR(OE-PCR), double asymmetric PCR(DA-PCR) [2], polymerase chain reaction (PCR) [3], ligase chain reaction (LCR) [4], and thermodynamic equilibrium from inside to outside (TBIO). To be honest, I haven't studied these methods carefully. It doesn't matter what their names are, they will always be the same: PCR, based on some overlapping short primers, gradually extends the growth fragments through polymerase.
What is the easiest way to synthesize the whole gene?
Of course, let DNA synthesis company synthesize it. We just need to provide DNA sequence information. They will synthesize dsDNA and clone it into a general vector, and generally provide sequencing information to ensure the correctness of the synthesis. This is undoubtedly the simplest and most convenient method. And now the whole gene synthesis is very cheap, and 1bp is sequenced for less than one dollar.
Since DNA synthesis companies are so convenient, why should they synthesize them themselves?
① The company's synthesis is slow, which generally takes 1-2 weeks. If you encounter a special sequence, such as a coding sequence that is extremely toxic to Escherichia coli, the cycle is hard to say (I made a nuclease, but the synthesis company didn't make it for a month, and I got it myself in a week).
② Not free. Usually, the recombinant vector carrying the target gene is provided by a synthetic company. After it is obtained, it needs to be cut by enzyme. If the gene contains restriction sites, it needs to be avoided. Of course, these are generally not big problems, but you really can't help it.
(3) As mentioned above, total gene synthesis is generally used for heterologous gene expression, and most of the targets of heterologous expression are enzymes, so it may be necessary to construct a large number of mutants to study the properties of enzymes. The synthetic company only provided one sequence, and the mutant had to be reconstructed by designing primers. If the whole gene is synthesized by itself, various mutants can be obtained at the same time by replacing primers containing mutations, which is more advantageous in constructing mutants containing a large number of mutations.
The sequence needs to be kept secret, after all, you are the most reliable.
There are always people who like to have plenty of food and clothing. In this article, I will introduce the method of self-synthesis and introduce two methods:
One-time splicing method of 1 based on "bridge" PCR
This method relies on mutual annealing and mutual extension of primers as templates, so the required primers are always one positive and one negative. Firstly, the whole gene sequence is broken into short oligonucleotides, which generally do not exceed 59bp, because the general primer synthesis takes 59bp as the watershed, and the price and time cost will be much higher if it exceeds 59bp. The oligonucleotide anneals to the complementary sequence at the 3' end to form a notched double-stranded product, and then the notch is supplemented by DNA polymerase to form a notched DNA double-stranded product. The product is linked by Taq DNA ligase to form a complete double-stranded product, which can be used as a template for PCR amplification to obtain the target gene, or can be directly used as a template for PCR amplification.
Step-by-step method based on step-by-step expansion
In this method, only the last primer is reverse, the others are forward, and there are overlapping sequences between the forward primers. The penultimate oligonucleotide and penultimate oligonucleotide anneal to each other through terminal complementary sequences. After the first PCR cycle, the double strand is extended, and the extended double strand and the penultimate oligo continue to anneal and extend, and so on until the full-length sequence is synthesized. Theoretically, one PCR cycle can only extend one primer, and n oliogs need to go through at least n PCR cycles. Because there is only one extension end, the primer design is simpler than the method 1, and the number of primers does not need to be even.
1. Design PCR primers.
You can use automatic design tools or design manually, which will be introduced in detail later. If it is designed manually, it is recommended to use SnapGene (not to mention the power of this software, you should know that there are many cracked versions on the Internet, so you don't install yourself to Baidu). After copying the complete gene sequence, first call up the "Preferences" panel, find the "Primer" option, and set the shortest matching length and the lowest Tm at the 3' end to 10bp and 40 respectively.
Because the 3'- terminal mismatch is very unfavorable to the whole gene synthesis method in this paper, if these two settings are too low, it is difficult to design primers, and it can be appropriately increased to 12bp, 45℃. Otherwise, we can only optimize the codon and redesign it.
The length of primer is preset to be 59bp, and the emphasis is on the design of overlapping region, which is generally determined according to the Tm of overlapping region. Tm balance between different overlapping areas is very important. Tm should not exceed 3℃, and it is recommended to set Tm between 55-58℃. It should be noted that for the first method in this paper, the number of primers must be even, one by one from the beginning of the sequence, and the starting point of the next oligonucleotide is the end point of the previous overlapping region, with the top straight and the bottom reverse. If the last number is odd, the length of the last few primers should be adjusted to form a reverse primer.
For the second method, the forward direction is pulled from the beginning of the sequence, and all the rest are reversed until the end, or the reverse direction can be pulled from the end, and the rest are forward until the beginning. This method does not care about the number of primers.
The full-length product was obtained by PCR.
Generally, two rounds of PCR are needed. In the first round, a small amount of primers were added, and 0.5- 1pmol/ oligo and 10- 15 cycles in 50uL system were recommended to obtain full-length templates. The DNA polymerase recommended by this round of PCR will lose the exonuclease activity of 3'-5' and 5'-3' nucleic acids at the same time, which will destroy the Tm balance of overlapping regions. In the second round, using 1uLPCR product as template, 20pmol full-length upstream and downstream primers were added, and the target gene was obtained in 20-25 cycles. This round of PCR needs high fidelity DNA polymerase.
3. Cloning into expression vector.
By homologous recombination, restriction endonuclease ligation and other methods, the target gene was inserted into the vector and transformed into Escherichia coli or other host competent cells to obtain monoclonal antibodies. It is suggested that homologous arms or restriction sites must be designed for cloning/expression vectors and directly added to the whole gene sequence before the whole gene synthesis, otherwise primers should be designed separately to add homologous arms or restriction sites.
4. Sequencing and identification
The positive clones were identified by PCR and sequenced. Correct cloning can be used for downstream expression and purification.
Optimizer: /gems, can't open it.
GeneDesign [12]: from genome research. The link in the paper is: /tools/ 104.html, which will definitely open. This is a tool written by myself, including codon analysis and optimization, automatic transformation of genes into oligonucleotides, and generation of reference experimental schemes. The oligomer produced has a uniform length, and the overlapping regions have the same Tm value. The formula of Tm is: Tm = 64+0.4 1×GC-528/n, please refer to my PCR primer design method for details. The generated oligo can be copied with one click, and the format is oligo+ serial number+space +Tab+ sequence (5'-3'), which can be directly imported into SnapGene.
The primers output by my program can be recognized by SnapGene. Click the Primers tool in the menu bar, click the Import Primers from List option, and select Import Sequences from Clipboard.
You can check whether each primer completely covers the target sequence, whether the "head and tail" of the primers conflict, and so on.
Test with GFP as target sequence:
① Find the sequence of NCBI green fluorescent protein and paste it into the text box. First, analyze codon preference;
The rare codons of Escherichia coli are marked in red, indicating that there are many rare codons in GFP natural gene, so codon optimization can be carried out first.
② Oligomers can be produced in two ways. The default is method 1. If the primers have high similarity, you will be prompted to use Method 2.
Click the Generate Oligonucleotide button, the statistical information of the total number of bases will pop up, then the serial number and sequence of oligonucleotides will be output, and the experimental scheme button and copy button will be generated at the same time. One-click copy can be imported into SanpGene analysis.
③ Gene analysis
The GFP gene is completely covered, even with primers, and the Tm between the primers is basically the same (because the program is inconsistent with the Tm algorithm of SnapGene, only the basic consistency can be seen on SnapGene.
(4) generating an experimental scheme
Generally, the total gene synthesis in vitro does not exceed 1000bp at a time. If the one-time synthesis is too long, it will increase the probability of error. I recommend 800bp segmentation, and the program will recommend the number of segments according to the sequence length you enter.
I hope this tool can help you complete the whole gene synthesis. I also wrote a tool to assist vector construction, which will be introduced later.
refer to
[1] Prodromou, C. and Pearl, L. (1992) Recursive PCR: A New Technology of Whole Gene Synthesis. Protein project. , 5, 827–829.
[2] Sandhu, G.S., Alef, R.A. and Klein, B.C. (1992) Double asymmetric PCR: one-step construction of synthetic genes. Biotechnology, 12,14–16.
[3] Stemmer, W.P., Clemri, A., Ha, K.D., Brennan, T.M. and hynek, H.L. (1995) One-step assembly of genes and complete plasmids from a large number of oligodeoxynucleotide. Gene, 164, 49–53.
[4] Au, L.C., Yang, F.Y., Yang, W.J., Lo, S.H. and Kao, C.F. (1998) synthesized the gene by using the synthetic gene to produce leptin -L54 at a high level in Escherichia coli. Biochemistry. Biophysics. Joint resolution, 248, 200–203.
Gao, X., Yo, P., Keith, A., T. J. and Harris, T.K. (2003) Gene synthesis based on thermodynamic equilibrium inside-outside (TBIO) PCR: a new method of primer design for high-fidelity assembly of long gene sequences. Nucleic acid research, 3 1, e 143.
Xiong Yasheng, Yao Qinghui, Peng, Li Ruihui, Li, Chen, Fan Haihui, Cheng, Zheng Zhiming, Li Yu (2004) A simple, fast, high-fidelity and low-cost two-step DNA synthesis method based on PCR. Nucleic acid research, 32, e98.
[7] Young, L. and Dong, Q. (2004) Two-step whole gene synthesis method. Nucleic acid research, 32, e59.
Hoover, D.M. and Lubkowski, J. (2002) DNAWorks: An automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic acid research, 30, e43.
Rouillard, J.-M., Lee, W., Truan, G., Gao, X., Zhou, X. and Gulari, E.(2004) Gene2oligo: oligonucleotide design for gene synthesis in vitro. Nucleic acid research, 32, w176–w180.
[10] Rydzanicz, R., Zhao, X.S. and Johnson, P.E. (2005) Assembly PCR oligo maker: a tool for designing oligodeoxynucleotides, which is used to construct long DNA molecules for RNA production. Nucleic acid research, 33, W521–W525.
[1 1] Jayaraj, S., Reid, R. and Santi, D.V. (2005) GeMS: an advanced software package for designing synthetic genes. Nucleic acid research, 33,3011–3016.
[12] Richardson, S.M., Wheelan, S.J., Yarrington, R.M. and Boeke, J.D. (2006) Gene Design: Rapid and Automated Design of Genes Synthesized by Thousands of Bases. Genome research,16,550–556.