Current location - Education and Training Encyclopedia - Graduation thesis - Popular papers
Popular papers
When Professor Li Baojian talked about the importance of genome research plan in the article "Life science in Prospect 2 1 Century", he quoted a sentence from Science's "The Third Technological Revolution": "The next era of great communication will be the era of genome revolution, which is in its initial stage." At the current research level, as long as the topics related to important phenomena of life are almost inseparable from the analysis of genes and their functions. On June 26, 2000, the heads of Britain and the United States, together with the public and private human genome sequencing teams, officially announced to the world that the working draft of the human genome had been drawn. Scientists regard this as a sign that life science has entered a new era, that is, the post-genome era. Therefore, it is necessary to understand the genome and its research content and progress.

1 genomics and its research contents?

The word GENOME was put forward by Winkles in 1920, which is composed of genes and chromosomes. It is used to describe the concept of all genes and chromosomes in organisms. Watson and Crick discovered the double helix structure of DNA in 1953, which marked the birth of molecular biology. With the development of various disciplines, the current biological research has entered a new generation, and different research techniques and means are organically combined at the biomacromolecule level to overcome biological problems. ?

Genome research can be understood as: (1) gene expression profile research, that is, comparing the differences of gene expression patterns in different tissues and different development stages, normal state and disease state, and cells cultured in vitro. These techniques include traditional RTPCR, RNase protection test and northern blot hybridization, but the disadvantage is that only one can be done at a time. New Qualcomm expression analysis methods include microarray, serial analysis of SAGE, DNA chip, etc. (2) Functional study of gene product-protein, including protein expression method of single gene in vitro and protein group study; (3) The interaction between protein and protein was studied by using yeast two-hybrid system, one-hybrid system, three-hybrid system (three-hybrid system) and reverse hybrid system.

From 65438 to 0986, American scientist Thomas Roderick proposed genomics, which refers to the science of genome mapping (including genetic map, physical map and transcription map), nucleotide sequence analysis, gene location and gene function analysis of all genes. Therefore, genome research should include two aspects: structural genomics for the purpose of whole genome sequencing and functional genomics for the purpose of gene function identification. Structural genomics represents the early stage of genome analysis, focusing on the establishment of high-resolution genetic, physical and transcription maps of organisms. Functional genomics represents a new stage of gene analysis, that is, using the information provided by structural genomics to systematically study gene functions. It is characterized by Qualcomm quantity, large-scale experimental methods, statistics and computer analysis. With the implementation of Human Genome Project (HGP) 1990, great achievements have been made. At the same time, the genome project of model organisms is also under way, and the sequence analysis of several species has been completed successively. The research focus has shifted from revealing all the genetic information of life to studying the function at the molecular level as a whole. The first sign is the emergence of functional genomics, and the second sign is the rise of protein Group.

2 What is the research content of structural genomics?

Structural genomics is an important part and research field of genomics, and it is a science to determine gene composition and gene location through gene mapping and nucleotide sequence analysis. Genetic information is on chromosomes, but chromosomes cannot be directly used for sequencing. Genome, a huge research object, must be decomposed into small structural areas that are easy to operate. This process is the gene map. According to the different signs and means used, mapping can be divided into three types, namely, constructing high-resolution genetic map of biological genome, physical map and transcription map.

2. 1 genetic map

The linear permutation map of genes on specific chromosomes obtained by gene recombination is called genetic linkage map. It is to calculate the recombination frequency between linked genetic markers and determine their relative distance, which is generally expressed in centimeters (cM, that is, the recombination frequency of each meiosis is 1%). There are many methods to draw genetic linkage maps, but few linkage maps can be identified when DNA polymorphism technology is not developed. With the development of DNA polymorphism, the number of available genetic markers has expanded rapidly. Early polymorphic markers included RFLP (Restriction Fragment Length Polymorphism), RAPD (Random Primer Amplified Polymorphic DNA) and AFLP (Amplified Fragment Length Polymorphism). After 1980s, STR (short tandem repeats, also known as microsatellite) DNA genetic polymorphism analysis and SNP (single nucleotide polymorphism) analysis developed in 1990s appeared.

2.2 Physical map

Physical map is a map in which chromosomes are cut into fragments by restriction endonucleases, and then the connection order between fragments and the physical distance between genetic markers [base pairs (bp) or kilobases (kb) or megabytes (Mb)] are determined according to overlapping sequences. Taking the physical map of human genome as an example, it includes two meanings. One is to obtain 30,000 sequence marker sites (STS, defined as a single copy sequence with clear chromosome position that can be amplified by PCR) distributed throughout the genome. Cloning the cDNA of the target gene, sequencing, determining the cDNA sequence at both ends, about 200bp, designing and synthesizing primers, and amplifying with cDNA and genomic DNA as templates respectively; Comparing and purifying specific bands; The radioactive probe prepared by STS hybridizes with the genome in situ, so there is a marker every 100kb. The second is to construct large fragments covering each chromosome. Firstly, YAC (yeast artificial chromosome) with hundreds of kb is constructed, and YAC is mapped to obtain overlapping YAC continuous clones, which is called low-precision physical mapping, and then it is carried out at the level of tens of kb DNA fragments, and YAC mapping loaded into clay by random shearing is called high-precision physical mapping.

2.3 Transcript?

The molecular genetic map constructed with EST as marker is called transcription map. The 5' or 3' terminal sequences of partial cDNA cloned from random bands in cDNA library are called expression sequence tags (ESTs), which are generally about 300 ~ 500 bp long. Generally speaking, the 3' untranslated region (3'-UTR) of mRNA is a relatively specific sequence representing each gene. By locating the EST sequence corresponding to 3'-UTR with RH, an STS map composed of genes can be formed. By the end of February, the total number of1998,65438+plants distributed in the database of the National Center for Biotechnology Information (NCBI) in the United States has reached tens of thousands, and the number of ESTs in the human genome has reached more than 1.8 million. These ESTs not only provide a large number of molecular markers for the construction of genome genetic map, but also provide valuable information for the functional study of genes in different tissues and organs. In addition, EST program also provides candidate genes for gene identification. Its disadvantage is that it is sometimes difficult to obtain those genes with low abundance expression and those genes with induced expression under special environmental conditions (such as biological stress and abiotic stress) by random sequencing. Therefore, in order to make up for the deficiency of EST project, genome sequencing is necessary. By analyzing the genome sequence, we can get the complete information of genome structure, such as the arrangement order of genes on chromosomes, the structure of intergenic spacers, the structure of promoters, the distribution of introns and so on.

3 functional genomics research?

Functional genomics, also known as post-genomics, uses the information and products provided by structural genomics to develop and apply new experimental methods to comprehensively analyze the functions of genes at the genome or system level, so that biological research shifts from the study of a single gene or protein to the systematic study of multiple genes or protein at the same time. This is a biological function study of genome dynamics after the static base sequence of genome has been clarified. The research contents include gene function discovery, gene expression analysis and mutation detection. The functions of the gene include: biological functions, such as phosphorylation of specific proteins as protein kinases; Cytological functions, such as participating in intercellular and intracellular signal transmission pathways; Developmental functions, such as participating in morphogenesis, include classical subtractive hybridization, differential screening, differential analysis of cDNA expression and differential display of mRNA, but these techniques can not comprehensively and systematically analyze genes. New technologies came into being, including systematic analysis of gene expression, cDNA microarray, DNA chip and so on. The most effective way to identify gene function is to observe the phenotypic variation at cell and whole level after gene expression is blocked or increased, so it is necessary to establish a model organism.

Comparative genomics is a discipline based on genome mapping and sequencing. By comparing known genes and genome structures, we can understand the function, expression mechanism and species evolution of genes. Using the homology of coding sequence and structure between model organism genome and human genome, we can clone human disease genes, reveal the molecular mechanism of gene function and disease, and clarify the evolutionary relationship of species and the internal structure of genome. At present, some laws have been drawn from the research on the genome of model organisms: the genome of model organisms is generally small, but the proportion of coding genes is high, and there are few repetitive sequences and non-coding sequences; Its G+C% is relatively high; The structure and organization of introns and exons are conservative, and the cutting sites are consistent in many organisms. DNA redundancy, that is, duplication; Most core biological functions are undertaken by a considerable number of orthohomologous proteins; Collinear linked homologous genes have the same linkage in different genomes. The study of model organism genome reveals the function of human disease genes, clones human disease genes by using the homology of gene sequences, and analyzes complex characters by comparative mapping in human genome research by taking advantage of the experimental system of model organisms, thus deepening the understanding of genome structure. In addition, mutation technology can be used to identify unknown genes, genomic diversity and bioinformatics applications.

4 protein omics research

Genes are carriers of genetic information, but protein is the executor of all biological functions. It has its own law of activity, and it is not enough to study it only from the perspective of genes. Only by studying the process of gene transcription and protein translation can we truly reveal the law of life activities, and thus a new discipline-protein omics has emerged to study the composition and activity law of intracellular protein. Protein group was first proposed by Wilkins and Williams of Macquarie University in 1994, and appeared in Electrophoresis in 1995. It refers to all protein expressed by all genes and their modes of existence, and all protein components expressed by a gene, a cell or an organization. Protein Group is a specific protein Group that studies its functions in different time and space. It explores protein's mode of action, mechanism of action, regulation and intra-group interaction of protein from the level of protein, providing theoretical basis and foundation for clinical diagnosis, pathological study, drug screening, drug development and metabolic pathway. ? Protein omics aims at elucidating the expression patterns and functional patterns of all protein in organisms, including identifying the expression, existing mode (modified form), structure, function and interaction mode of protein. It is different from the traditional protein discipline, and it is conducted at the overall protein level of organisms or their cells, revealing the law of life from the overall protein activity of organisms or cells. However, due to the diversity, variability and complexity of protein, it is very difficult to detect low-expression protein, so the difficulty of its research should be made clear. Generally speaking, the research can be divided into two aspects: protein's expression model (or protein's composition) and protein's functional model (currently focusing on protein's interaction network). Protein's research can provide the following information: whether and when the gene product predicted by the gene sequence is translated; The relative concentration of gene products; The degree of revision after translation, etc. Because the number of protein is less than the number of open reading frames in the genome, functional protein omics was proposed. Functional protein refers to protein whose genome is actively expressed at a specific time, environment and experimental conditions, and it is only a part of the total protein group. Functional protein omics research is between the traditional protein research on a single protein and the protein research on all protein, and it is a group of protein related to a certain function or under certain conditions. ?

The composition analysis and identification of protein need to characterize protein, that is, the separation and identification map, including two steps of protein separation and identification. Two-dimensional gel electrophoresis (2-DGE) and mass spectrometry (MS) are the main techniques. In recent years, related technologies and bioinformatics have been developing and progressing rapidly. The technical system studied by protein Group includes: sample preparation; Two-dimensional polyacrylamide gel electrophoresis; Protein's dyeing; Gel image analysis; Protein analysis; Protein group database. Three of them are: two-dimensional gel electrophoresis, mass spectrometry, computer image data processing and protein database.

5. The birth of genomics related disciplines?

With the deepening of genomics research, human beings are expected to reveal various previously unknown laws of the material world of life, thoroughly uncover the mystery of life, and then drive life to serve human society and economy. The intersection of genome research and other disciplines has promoted the birth of some disciplines, such as nutritional genomics, environmental genomics, genomics, pathogenic genomics, reproductive genomics, population genomics and so on. Among them, bioinformatics is becoming the support point of new industries that have attracted much attention. ?

Bioinformatics is a science that studies biological macromolecules, uses computers as tools, uses the viewpoints, theories and methods of mathematics and information science, studies life phenomena, and organizes and analyzes biological information data with exponential growth. This research focuses on genomics and protein. The first is to study the carrier DNA of genetic material and its encoded high molecular weight substances. With the computer as a tool, we study bioinformatics, an interdisciplinary method, to find out its regularity, and then develop all kinds of software suitable for it to collect, sort out, publish, extract, process, analyze and discover the sequences and structures of DNA and protein. It consists of three parts: database, computer network and application software. Its research focuses include: sequence comparison, gene identification and DNA sequence analysis, protein structure prediction, molecular evolution and knowledge discovery (KDD) in database. The main scientific problems in this field are: continue to establish and optimize the database; Learning new theories, technologies and software of database; Some important algorithms are compared and analyzed. Analyzing the information structure of human genome; Research on the origin and biological evolution of genetic code based on bioinformatics data: training bioinformatics professionals and establishing national biomedical database and service system [5]. The accumulation of biological data at the end of the 20th century will lead to new theoretical discoveries or major scientific discoveries. Bioinformatics is a research based on database and knowledge discovery, which brings revolutionary changes to life science and has a great impact on medicine, health, food, agriculture and other industries.

When talking about the life science in 2 1 century, Professor Zou Chenglu said that biology has made great progress in the 20th century, and the extensive and profound infiltration of mathematical science into biology has revealed the mystery of life at a new height and completely changed the face of biology. Biology is not only a hot spot in the development of natural science at present, but will remain so after entering 2 1 century. Scientists call 2 1 century the information age. The combination of biological science and information science is undoubtedly the inevitable result of multidisciplinary development.