Difference Between Similarity and Identity in Sequence Alignment

The key difference between similarity and identity in sequence alignment is that similarity is the likeness (resemblance) between two sequences in comparison while identity is the number of characters that match exactly between two different sequences.

Bioinformatics is an interdisciplinary field of science that mainly involves molecular biology and genetics, computer science, mathematics, and statistics. Sequence alignment is a major term in bioinformatics. It is the procedure in which the sequences of DNA, RNA or protein are arranged to identify regions of resemblance that is a consequence of functional, structural or evolutionary relationship between the sequences. At the end of the alignment, they will be presented as rows within a matrix. In order to align the identical characters in successive coloums, inserted gaps are present between the residues.

CONTENTS

1. Overview and Key Difference
2. What is Similarity in Sequence Alignment
3. What is Identity in Sequence Alignment
4. Similarities Between Similarity and Identity in Sequence Alignment
5. Side by Side Comparison – Similarity vs Identity in Sequence Alignment in Tabular Form
6. Summary

What is Similarity?

Similarity in sequence alignment is the resemblance between two sequences when compared. This fact is dependent on the identity of sequences. Similarity depicts the extent to which the residues are aligned. Hence, similar sequences contain similar properties. In bioinformatics, similarity is a tool to assess the likeness between two proteins.

Figure 01: Similarity in Sequence Alignment

There are two main steps to sequence alignment process. The initial step is pair-wise alignment, which helps to find the optimal alignment between two sequences (including gaps) using algorithms such as BLAST, FastA, and LALIGN. The matching algorithm finds the minimum number of edit operations; in-dels and substitutions in order to align one sequence to the other sequence. After pair-wise alignment, it is necessary to obtain two quantitative parameters from each pair-wise comparison. They are identity and similarity.

What is Identity?

Identity in sequence alignment is the number of characters that match exactly between two different sequences. Hence, gaps do not count when assessing identity. The measurement is considered to be relational to the shorter sequence among the two sequences. It significantly implies that it has the effect where the sequence identity is not transitive. If X=Y and Y=Z, then X is not necessarily equal to Z. This is deduced in terms of the identity distance measure.

Figure 02: Identity in Sequence Alignment

For example, X has a sequence of AAGGCTT, Y has a sequence of AAGGC and Z has a sequence of AAGGCAT. Identity between X and Y is 100% {5 identical nucleotides / min[length(X),length(Y)]}. Identity between Y and Z is also 100%. But identity between X and Z is only 85% {(6 identical nucleotides / 7)}.

What are the Similarities Between Similarity and Identity in Sequence Alignment?

  • Both similarity and identity are two terms we use in sequence alignment.
  • Also, they refer to the resemblance between the two sequences.
  • Moreover, we express them as a percentage value.

What is the Difference Between Similarity and Identity in Sequence Alignment?

Similarity in alignment tells the resemblance between two sequences when compared while identity in sequence alignment tells the amount of characters that match exactly between two different sequences. Therefore, this is the key difference between similarity and identity in sequence alignment.

Summary – Similarity vs Identity in Sequence Alignment

Sequence alignment helps to identify regions of resemblance in DNA, RNA or protein resulted due to functional, structural or evolutionary relationship between the sequences. Hence, similarity and identity are two key terms in the context of sequence alignment. The key difference between these two terms is that similarity is the resemblance between two sequences in comparison whilst identity is the number of characters that match exactly between two different sequences. Thus, this is the summary of the difference between similarity and identity in sequence alignment.