All cells in the body contain the same gene sequence, but each cell represents only the sub set of the gene. These cell-specific gene expression patterns in which brain cells ensure different from skin cells are partially determined by the three-dimensional structure of the genetic material that controls the accessibility of each gene.
MIT chemists now use a new way to determine this 3D genomic structure using generating artificial intelligence. Their technology can predict thousands of structures in just a few minutes, so it is much faster than conventional experimental methods to analyze the structure.
Using this technology, researchers can easily study how the 3D tissue of the genome affects the genetic expression patterns and functions of individual cells.
Bin Zhang, an associate professor of chemistry and a senior author of the study, said, “Our goal was to predict the three -dimensional genomic structure in the default DNA sequence. “Now we can do so, so if you make this technology equal to the cutting -edge experimental technology, you can actually open a lot of interesting opportunities.”
MIT graduate students Greg Schuette and Zhuohan Lao are the main authors of this paper. Science development.
From sequence to structure
Inside the cell nucleus, DNA and protein form a complex called chromatin, which has several levels of tissue, which allows 2 meters to flood 2 meters of DNA with a dialectic nucleus. The long strands of DNA winds around proteins called histone causes the same structure as the beads of the string.
Chemical tags, known as epigenetic transformation, can be attached to DNA in certain positions, and these tags depend on the type of cells affect the folding of dyeing and the accessibility of nearby genes. These differences in chromosome forms help to determine which genes are expressed in different cell types or in a given cell.
For the past 20 years, scientists have developed experimental technologies to determine the structure of chromosome. One of the technologies known as Hi-C works by connecting the adjacent DNA strands from the nucleus of the cell. The researchers then crush the DNA into many small pieces and sequenced to determine which segments are close to each other.
This method can be used in a single cell to determine the structure in many cell population or specific cells to calculate the average structure for one part of chromatin. But Hi-C and similar technologies are labor-intensive and can take about a week to generate data in one cell.
To overcome these limitations, Zhang and his students have developed a model that utilizes the recent development of the generated AI to create a fast and accurate method of predicting the structure in a single cell. Their designed AI models can quickly analyze the DNA sequence and predict the dyeing structure that these sequences can produce in cells.
Zhang said, “Deep learning is really good at perception of patterns. “This is a very long DNA segment, thousands of bass pairs and can be found in the corresponding DNA base pairs.”
Chromogen, a model created by researchers, has two components. The first component, the deep learning model, “Read” deep learning model, analyzes information encoded with default DNA sequence and chromosome access, and the latter is widely used and is specific.
The second component is a reproductive AI model that predicts a physically accurate physically accurate form of chromosome for more than 11 million chromosomes. These data were created from experiments using DIP-C (Hi-C variant) from 16 cells from human B lymphocytes.
When integrated, the first component effectively captures the sequential structure relationship, how the first component affects the formation of a different-specific environment in which the cell type-specific environment is different. For each sequence, researchers use the model to create as many structures as possible. Since DNA is a very disorder molecule, a single DNA sequence can cause many other possible forms.
“The main complex factor that predicts the structure of the genome is that there is no single solution we aim. No matter what part of the genome, there is a distribution of structure. It is very complicated and predicting a high -level statistical distribution, ”says Schuette.
Fast analysis
Once you are trained, this model can create a prediction for a much faster time scale than HI-C or other experimental technologies.
Schuette said, “You can experiment six months to test dozens of structures in a given cell type, but you can create a cloth structure in a specific area in a model within a 20 -minute GPU.
After training their models, the researchers used this to generate structural predictions for more than 2,000 DNA sequences and then compared them with experimental determined structures for these sequences. They found that the structure generated by the model is the same or very similar to what is seen in the experimental data.
Zhang said, “In general, we can examine hundreds or thousands of forms on each sequence and reasonably express the diversity of the structure that a particular area can have. “If you repeat the experiment several times, it will be very different in other cells. That is what our model wants to predict. ”
Researchers also found that this model could accurately predict the data of cell types other than the trained. This suggests that the model may be useful to analyze how the dyeing structure of the cell type is different and how these differences affect the function. This model can also be used to explore the different dyeing state that can be present in a single cell and how such changes affect gene expression.
Another possible application is to explore how mutations in a particular DNA sequence change the form of chromosome, which can reveal how such mutations can cause disease.
Zhang said, “There are many interesting questions that can be solved with this type of model.
The researchers have been able to use all the data and models.
The study was funded by the National Institute of Health.