An In silico Approach for Structural and Functional Annotation of Uncharacterized Protein Rv0986 present in Mycobacterium tuberculosis

Tuberculosis (TB) is an ancient infectious disease caused by Mycobacterium tuberculosis (MTB). MTB is a human pathogen. Surprisingly, TB has become the top disease for its death rate worldwide. The uncharacterized protein Rv0986 is closely related to the transporters of the ATP-binding cassette domain, therefore, take part in the export of macrolide as well as a lipoprotein. Furthermore, it is associated with cell division protein. Hence, the protein has a significant role in mycobacterial infection. But, so far, the uncharacterized protein Rv0986 is not elaborated. As a result, in this study, the structural and functional annotation of the protein is described through in silico approach. The predicted tertiary structures of the protein generated by Swiss Model, Modeller, and Phyre2, and documented by the Ramachandran Plot analysis with PROCHECK, Verify 3D, and SwissModel Interactive Workplace. Z-score also applied for the overall structural assessment. This study will unleash the importance of the uncharacterized protein present in MTB, therefore, it provides an opportunity for drug and vaccine targeting against infection by MTB.


INTRODUCTION
Mycobacterium tuberculosis is a human pathogen caused the most lethal disease tuberculosis. MTB is a rod-like, acid-fast, Gram-positive organism. Amazingly, TB is one of the most lethal diseases and ranked over HIV/AIDS globally. Tuberculosis spreads from infected people with MTB bacteria into the air, i.e. by coughing. In general, pulmonary TB causes by infection with lungs by MTB, and infection with MTB of other sites of the body causes extra pulmonary TB. It is estimated that about 10 million people are in new cases (range: 9.0-11.1 million), and about 1.2 million deaths by MTB infection (range, 1.1-1.3 million) even though people with HIV-negative in 2018. Therefore, Therefore, the tertiary structure of the functional protein Rv0986 with detailed physicochemical characterization is predicted in this study by an in silico approach.

Sequence Retrieval
The amino acid sequence of the functional protein Rv0986 collected from the UniProtKB server (Alex, 2019) following the accession ID of P9WQK1. There was no tertiary structure available of the protein Rv0986 in the Protein Data Bank. Hence, the tertiary structure modeling of the protein Rv0986 undertook by using the 248 amino acid long protein sequence.

Physicochemical Characterization
We used two web-based servers for the determination of the physicochemical properties of the uncharacterized protein Rv0986 present in MTB. ExPasy's ProtParam tool (ProtParam, 2017) used to predict the composition of amino acid, aliphatic and instability index, extinction coefficients, and GRAVY -a grand average of hydropathicity. Also, the isoelectric point (pI) predicted by the Sequence Manipulation Suite (SMS) tool of version 2 (Martin, 2016).

Secondary Structure Prediction
The self-optimized prediction method with alignment (SOPMA) (Combet et al., 2000) and SPIPRED (David, 1999) programs used for the secondary structure prediction of the protein Rv0986. DISOPRED tool (Vikram and Kumar, 2018) utilized for disorder prediction.

Tertiary Structure Modeling and Validation
Three web-based servers including Modeller In the homology protein modeling process, the most suitable template selected for final tertiary structures prediction. The Ramachandran map analysis following the PROCHECK and the Verify 3D server (https://servicesn.mbi.ucla.edu/Verify3D/) app-lied for the modeled tertiary structure quality documentation. Also, theSwiss-Model Interactive Work place (https://swissmodel.expasy.org/assess) followed for the structural quality assessment of the predicted protein structures. Also, the Prosa-web (Wiederstein and Sippl, 2007) used to obtain the Z-scores which was followed for the overall tertiary structural quality assessment experiment.

Physicochemical Characterization
The protein Rv0986 is 248 amino acids long and the sequence retrieved in the FASTA format. This protein sequence applied as a query sequence to detect the physicochemical characteristics of the protein Rv0986 present in MTB. As the instability index is 32.44 which is less than 40, the protein is stable (Guruprasad et al., 1990;Islam et al., 2020). The theoretical isoelectric point (pI) indicates that the protein is acidic (pI 5.63, 5.72*).
The molecular weight of the protein Rv0986 is 27373.11 Da. The higher aliphatic index value of the protein is 95.52 indicating as a positive factor for increased thermos-stability for a wide temperature range (Gill, and Hippel, 1989). The hydrophilic nature of the protein and the possibility of better interaction with water (Ikai, 1980;Shahen et al., 2019) were indicated by the GRAVYindices value of -0.265 ( Table 1).

Secondary Structure Prediction
SOPMA considered the default parameters including, window width: 17; similarity threshold: 8; division factor: 4, for secondary structure prediction of the protein Rv0986 present in MTB.
SOPMA utilized 248 amino acid-long protein sequences (sub-database) as well as 33 aligned proteins; therefore, it predicted 32.66% of the residues as random coils in comparison to alpha-helix of 47.18%, Beta turn of 5.24%, and the extended strand of 14.92% ( Table 2). The PSIPRED program showed higher confidence in the prediction of the helix, strand, and coil (Fig 1).

Protein-Protein and Protein-Polynucleotide Binding Sites
Predict protein server predicted the binding sites in the protein structure of the protein Rv0986 where 11 different protein binding sites identified at the positions of:  Fig 2).

Fig 2:
Protein-Protein and Protein-Polynucleotide Binding Sites.

Structure Modeling and Validation
The HHpred tool was used for executing the Modeller for tertiary structure prediction. Protein sequence in FASTA format inserted into HHpred (Zimmermann et al., 2018) and executed the Modeller (Webb and Sali, 2016). The most suitable template selected (3TUI_D) among the number of hits of 250 for tertiary structure prediction of the protein Rv0986. This template indicated the probability rate of 100%, SS of 27.5, E-Value of 6.6e-32, Cols of 239, and the target length of 366 (data not shown). The generated structure stored in PDB format (Fig 3). The Ramachandran Map was analysis by PROCHECK ( Table 3) indication an excellent protein model quality. It showing that 95.2%of the total residues found in the core; 4.8% of residues in the additional allowed regions. But, there was no residue in both generously allowed regions and the disallowed regions (Fig 5). The number of nonglycine and non-proline residues (209) was 100%; 2 residues were as the end-residues (excl. Gly and Pro); the proline and glycine residues were 11 and 18, respectively, among the total residues (Fig 3).
The Verify 3D tool documented the modeled tertiary protein structure (data not shown  Similarly, the Phyre2 server was used for the prediction of the tertiary protein structure of the protein Rv0986. The tertiary structure predicted based on the top suitable template (c5ws4A) bearing the value of confidence of 100.0% and the coverage of 99% (Fig 4a). Phyre2 modeled 245 residues (99% of the protein sequence) with 100.0% confidence. Phyre2 also described the secondary structure parameter including the disordered of 16%,beta-strand of 25%, alpha-helix of 39% (data not shown).The Ramachandran Map followed by the PRO-CHECK indicating 87.9% of the residues in most favored regions and 11.2% as in additional allowed regions. There 0.5% of the total residues were in the disallowed regions, and 0.5% residues in the generously allowed regions (  Fig 4 (a): Structure of Rv0986 Predicted by Phyre2 .   Fig 4 (b): Structure of Rv0986 Predicted by Swiss Model. showed that 92.9% of the total residues found in the, and 92.9% of residues in the additional allowed regions. There were 0.5% of residues in the generously allowed regions and 0.2% of the residues in the disallowed regions ( Table 3). In the predicted protein structure, the number of nonglycine and non-proline residues was 423 which was 100% and 4 residues were asthe end-residues (excl. Gly and Pro). Among the total residues in the protein structure, the glycine residues and proline residues were 36 and 22, respectively (Fig  4b). Verify 3D validated the protein structural quality as an excellent structure of the protein (data not shown). The Swiss-Model Interactive Workplace validated the modeled tertiary structure of the protein Rv0986 indicating as the

CONCLUSIONS
In this study, it is concluded that the protein Rv0986 present in Mycobacterium tuberculosis is a functional protein, thereby, associated with microbial infection in humans. As the protein tertiary structure not available in the Protein Data Bank, the structural model prediction with its structural and functional annotation is well explained in this paper. Hence, provides an insight into this uncharacterized protein Rv0986. This study will bolster and sharpen our knowledge in pathogenesis, therefore, unleashes an opportunity for drug and vaccine targeting against the infection by MTB.

ACKNOWLEDGEMENTS
We would like to thank the Dept of Biochemistry and Molecular Biology, and the Dept. of Pharmacy of Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh, for supporting this study.

CONFLICTS OF INTETEST
The author declares no conflict of interest.