① Import Seurat objects after data cleaning, standardization and clustering.
② Import Seurat object without any processing.
Let's briefly introduce the Monocle package, and then try it in these two cases.
Why try these two situations?
http://cole-trapnell-lab.github.io/monocle-release/
introduce
Monocle introduces the strategy of analyzing single cell trajectory by RNA-Seq, which can arrange cells in simulated time sequence and display their development trajectories such as biological processes such as cell differentiation. Monocle obtains this trajectory from the data through unsupervised or semi-supervised learning.
Unsupervised: Use Monocle's own set of tools or Seurat to generate the gene list.
Semi-supervision: through their own knowledge accumulation, artificially input some genes that are considered important.
Monocle does not purify cells into discrete states through experiments, but uses algorithms to learn the sequence of gene expression changes that each cell must undergo as part of a dynamic biological process. Once it knows the overall "trajectory" of gene expression changes, Monocle can place each cell in an appropriate position in the trajectory. Then, we can use Monocle's differential analysis toolkit to find the genes that are regulated in the trajectory process. If the process has multiple results, Monocle will reconstruct the "branch" trajectory. These branches correspond to cell "decision-making", and Monocle provides a powerful tool to identify the genes affected by them and participate in decision-making. The website also provides a method to analyze branches. Monocle relies on reverse graph embedding machine learning technology to construct single cell trajectory.
In addition to constructing single cell trajectories, it can also conduct differential expression analysis and clustering to reveal important genes and cells. This is similar to Seurat's function.
Workflow and its similarities and differences with Seurat
Res.0.6 is the number of clusters. Change the column name to "cluster" for future use.
Make sure that the range of this column is 0 to 8, that is, ***9 clusters.
The calculation is for the convenience of later analysis.
Now that these data have been cleaned up, they don't need to be processed again in Monocle.
According to the author's suggestion, even the data that has been standardized in the Seurat package still needs to be standardized again when it is converted into a monocle.
First of all, because a cell can be subdivided into smaller categories, we should consider their correspondence when giving cell categories with marker. For example, the cells corresponding to CD4 gene are CD4+ T cells, and CD4+T cells belong to a kind of T cells, so we should tell Monocle that CD4+T CD4+T cells belong to a subset of T cells, and let it not divide them into two categories in the process of classification.
Monocle provides a function newCellTypeHierarchy to rank cells.
Match tags with cells and arrange their subordination.
The next step is to classify the cells.
Check the cell classification.
View the variables that can be used for color discrimination:
In fact, under normal circumstances, this step should have a time variable (such as hours or time) to distinguish the data produced by batch processing at different times, so that the bright data can be colored according to the different parasitic time, so as to observe the changes of cell state (development/differentiation) with parasitic time. Although the data of spleen are processed according to four time points, they are not distinguished according to different time points, so we can only determine which is the original state according to the process of cell differentiation.
This is a tree diagram with three cell trajectories, indicating that the cell state is mainly divided into three stages, and the middle number 1 indicates a bifurcation.
The cells in the picture above are colored according to different clusters. According to the previous Seurat cluster analysis, Cluster5 (light blue) corresponds to neutrophils, and this figure is located at the top of the above branches; Cluster0 (red) corresponds to B cells, mainly located at the top of the right branch; The blue color in the upper left corner may be NK cells, but it is not certain. The one on the right seems more suitable for the initial state. Compared with the picture below, the result is similar.
The above figure shows the distribution trajectory of each cell. It is obvious that B cells are concentrated at the top of the right branch, and then concentrated as T cells, with some neutrophils mixed in the middle (or it may be unclear). However, most of the cells have not been isolated, and this result needs to be reprocessed.
Because Monocle can't tell which trajectory is the "root", that is, it doesn't know which cell state is more initial, so we can set the root_state parameter to set which trajectory is the initial state. Then give each cell a pseudo-time value, and we can observe the changes of gene expression during the pseudo-time. You can continue to do this step after the cell sorting process is completed.
Create a Seurat object spleen _ monocle, and first remove some cells with poor sequencing quality:
Keep all genes that are expressed in at least 3 cells. Cell = 3;
Keep all the detected cells with > 200 genes. Gene =200.
Import Seurat objects from Monocle.
View data:
15655 genes and 1959 cells, which are consistent with Seurat's previous creation.
The calculation is for the convenience of later analysis.
According to the expression mentioned above, you can use the nUMI value for filtering.
Note that there is an extra Size_Factor column.
Leave the middle of two vertical lines:
After filtering, 1864 cells and 15655 genes remained.
SetOrderingFilter marks some genes for later clustering;
Plot_ordering_genes indicates the degree of gene expression difference according to the average expression level of these genes, and the red line indicates Monocle's expectation of dispersion based on this relationship. The genes we use for clustering are shown as black dots, and other genes are shown as gray dots. I don't quite understand what the empirical value of ordinate deviation means here.
Making a gravel map:
Select the first 8 components for clustering.
If the cells at each time point are gathered together, then roughly speaking, this diagram is only divided into four modules.
Here you can use the dpFeature command in the Monocle package to select genes.
Another method is to artificially select genes according to biological knowledge:
HSPA 1A* gene is an interesting gene discovered before Seurat package, and it is expressed in almost all clusters to varying degrees (as shown below). It was found that the gene expressed heat shock protein through literature review. The expression of this protein is a protective mechanism during ischemia/hypoxia and can be used as a prognostic marker for patients with cardiac arrest. Some scholars have studied the relationship between the duration of cerebral ischemia and the expression of HSPA1A. Although the final conclusion of this paper is that there is no significant difference in the expression between the two groups (ischemia time of 30 minutes and 60 minutes), spleen ischemia has been treated with 12h, 24h and 72h, and the time span is much longer, so it still has good research value.
So I think we can also choose HSPA 1A as the marker gene to reflect the cell state.
build
There is no running down there! ! !
[1]Jenei, Z.M., Széplaki, G., Merkely, B. et al. Cell stress and chaperone protein (20 13) 18: 447. https://doi . org/ 10. 1007/s 12 192-0 12-0399-2
[2] Cui Zhi, Jin Shude, Jin Sheng, Lin Dejie, Xia Shujuan. Semi-quantitative analysis of heat shock protein -70 expression in hippocampus based on ischemia duration and cerebral infarction volume in mice. J Korean Society of Neurosurgery. 20 14; 55(6):307- 12.