Types of Bioinformatics Data

Biological processes are affected by plethora of events that might seem superficially irrelevant, but could initiate a cascade of events eventually culminating in its complex output. As such, variety of data sets emerge and needs to be analyzed for to understand the biological processes in its entirety. This is where bioinformatics comes into play, whereby it plays into the role ranging from data collection, management, integration, and interpretation for holistic approach to address multifaceted questions in the field of agriculture, health and environment. Such an approach helps in understanding the life as a system or a whole rather than individual parts.

Different types of data may be generated, which has been briefly described below:

Sequence Data:

a. Genomic Data: This includes DNA sequences derived from various sequencing technologies. These sort of data helps us to understand the underlying DNA sequence behind not just normal phenotype, but also determine the sequential changes occurring during pathological disorders. DNA sequences helps us to understand the evolutionary trends as well as identify novel genes and/or species, as well as perform comparative analytics.

b. Transcriptomic Data: RNA sequences helps us to understand the alternatively spliced mRNAs, post-transcriptional modifications, fused genes resulting in pathologies, mutations etc.

c. Proteomic Data: Primary structure of the protein can be delineated based on its amino acid sequence. Thus obtained sequence can be helpful in predicting protein’s 3D structure, post-translational modifications etc.

2. Structural Data: This pertains to the 3D data information of macromolecules. The data can be obtained experimentally or computationally, and could be used for comparison between two or more similar structures and their functions, prediction of novel protein’s structure and functions based on existing structures, ligand-receptor interaction studies, and in-silico drug-target analysis.

3. Functional Genomics Data: Functional genomics is the study of genome and its products, including their regulation and interactions with each other. These data are generated using techniques like Next Generation Sequencing that collates genomics, epigenomics, transcriptomics, proteomics, metabolomics and interactomics data which gives rise to a particular phenotype or pathological conditions and how they get affected due to treatments. Hence, these data help us to understand different complexities pertaining to gene expression. These data can be challenging as well as offer potential solutions towards personalized medicines.

4. Literary Data: These sort of data provides collective information on various subjects/topics that ranges from gene annotation to research articles, review papers, case studies, opinions. This helps on validating the experimental results as well as reduce the redundancy of such experiments while focusing the already dwindling resources for advances of science rather than re-inventing the wheel.

5. Annotation Data: The data sets here include relevant information pertaining to any genomes, genes, RNA or proteins. The annotated data makes it feasible for everyone and anyone globally to understand the data presented and maintain its uniformity. For e.g. A gene defined right from its name, location, people who reported it, the subsequent functionality with its RNA and proteins

6. Pathway and Network Data: High-through put technologies generate colossal amount data on genes, transcripts, proteins, metabolites and their interactions. However, to obtain a clear and big picture, these data need to be sorted and analyzed. They could be grouped based on their origin, similarity, functionality etc. This, in turn gives a better of a gene and its role in a biological system rather than as an individual entity. Hence, the predicted or established network along with the associated pathway can help to understand the global functionality of genes and their functional products.

7. Variation Data: These data include information on the changes or effects arising in an individual due to single nucleotide polymorphisms, insertions, deletions, copy number variations, structural changes or epigenetic changes in the DNA sequence. These data are used to establish relation between the phenotype and the underlying genetic changes which in turn helps to pave way for translational and personalized medicines.