Bacillus cereus is enteropathogenic and widely distributed pathogen in the environment, which is mainly associated with food poisoning. In the intestine, B. cereus produces enterotoxins resulting in diarrhoea, abdominal distress and vomiting, and a range of infections in humans. BCRIVMBC126_02492 is a functional protein of B. cereus, which is related to oxidation glutathione persulfide in the mitochondria, cyanide fixation, and also has a variety of biological functions. Nevertheless, protein BCRIVMBC126_02492 is not explored. Therefore, the structure prediction, functional annotation, and characterization of the protein are proposed in this study. Modeller, Swiss-model, and Phyre2 are used for generating tertiary structures. The structural quality assessment of the protein determined by Ramachandran Plot analysis, Swiss-Model Interactive Workplace, and Verify 3D tools. Furthermore, Z-scores applied to detect the overall tertiary model quality of the protein. A comparison of the results showed that the models generated by Modeller were more suitable than Phyre2 and Swiss Models. This investigation decoded the role of this unexplored protein of B. cereus. Therefore, it can bolster the way for enriching our knowledge for pathogenesis and drug and vaccine targeting opportunities against B. cereus infection.
Bacillus cereus is omnipresent and a Gram-positive, spore-forming rod-shaped, non-capsulated, aerobic, or facultative anaerobic bacterium (Sankararaman and Velayuthan, 2013). The saprophytic life cycle of B. cereus is mainly in soil. Basedon 16S rRNA gene sequences, and it is closely related to other members in the B. cereus group, including B. anthracis and B. thuringiensis (Granum, 2017; Cui et al., 2019). Generally, two types of food poisoning caused by B. cereus, such as the emetic poisoning appear 0.5-6 hours and the diarrheal syndromes poisoning appears 8-16 hours after the ingestion of contaminated food (Tallent et al., 2015). B. cereus produces a potenttoxin- cereulide, which is a small, acid- and highly heat-resistant depsipeptide toxin resulting inthe food industries in several challenges (Rouzeau-Szynalski et al., 2020). B. cereus causes the most side effects in themicroecological preparations. The causes behind this are the overuse of antibiotics in animal feed and drug additives resulting in an unbalanced condition in the intestinal micro-ecosystems. This is also responsible for weakened immunity and drug resistance (Berthold-Pluta et al., 2015; Guo et al., 2020). B. cereus inhibits the growthof detrimental bacteria and selectively pushes the activity of the microorganisms which live in the gastrointestinal tract (Riol et al., 2018).
Additionally, B. cereus invigorates and boosts the growth of the host by producing advantageous metabolites (Raymond and Bonsall, 2013). The protein BCRIVMBC126_02492 present in B. cereus is associated with oxidation of glutathione persulfide to glutathione and persulfate in the mitochondria; cyanide fixation as well as other functions in biological systems. However, the tertiary structure with ligand binding active sites, physicochemical characterizations are not reported yet. Therefore, the tertiarystructures of the uncharacterized protein BCRIVMBC126_02492 with ligand binding active sites and functional annotations are propped in this study through an in silico approach.
2.1 Sequence retrieval
The amino acid sequence of BCRIVMBC126_02492 obtained from the National Center for Biotechnology Information (NCBI) with the accession ID SCN08319.1. The 3D Structure is not available in the Protein Data Bank (PDB). As a result, the 478 amino acid long protein BCRIVMBC126_02492 present in B. cereus undertook for modeling secondary and tertiary structures, and for characterization and functional annotation as well.
2.2 Physicochemical characterization
We have used two web-based servers for the determination of the physicochemical properties of the uncharacterized protein. ProtParam tool applied for the prediction of instability and aliphatic index, amino acid composition, aliphatic index, and GRAVY (Gasteiger et al., 2005). Besides, the Sequence Manipulation Suite (SMS) version 2 tool used for theoretical isoelectric point (pI) determination (Martin, Garrity and Yao, 2016).
2.3 Secondary structure prediction
The self-optimized prediction method with alignment (SOPMA) used for secondary structure elements prediction (Combet et al., 2000)and the SPIPRED program (Jones, 1999) used to predict the secondary structure of BCRIVMBC126_02492. The DISOPRED tool used for disorder prediction (Thakur and Kumar, 2018).
2.4 Tertiary structure modeling and validation
The homology structure modeling of the protein BCRIVMBC126_02492 of B. cereus performed as there was no tertiary structure available in the Protein Data Bank (PDB). Three servers including Modeller (Webb and Sali, 2016) following the HHpred tool (Zimmermann et al., 2018), Swiss-Model (Gasteiger et al., 2005), and Phyre2 (Kelley et al., 2015), used to predict the tertiary structures of the protein.The tertiary structures generated from Modeller, Swiss-Model, and Phyre2 compared. The most suitable tertiary structure selected for the final validation.For modeled tertiary structure validation, the Ramachandran plot analysis with PROCHECK and the Verify 3D (https://servicesn.mbi.ucla.edu/Verify3D/) followed. Also, the Swiss-Model Interactive Workplace (https://swissmodel.expasy.org/assess) applied for the final tertiary structure quality validation. Z-scores derived from the Prosa-web used for the overall tertiary model quality assessment experiment as well.
3.1 Physicochemical characterization
The amino acid sequence of BCRIVMBC126_02492 present in B. cereus was retrieved in FASTA format and used as a query sequence for the determination of physicochemical parameters. The instability index of BCRIVMBC126_02492 is 34.60 (<40) indicates the stable nature of the protein (Guruprasad et al., 1990). The protein is acidic (pI 5.76, 6.04*), with a molecular weight of 54188.16 Da. High extinction coefficient values (64790) indicates the presence of Cys, Trp, and Tyr residues (Gill and von Hippel, 1989). Higher aliphatic index values (95.27) of the query protein
suggests as a decisive factor for increased thermos-stability for a wide temperature range. The protein is hydrophilic, and the possibility of better interaction with water (Uddin et al., 2017) as indicated by the lower grand average of hydropathicity (GRAVY) indices value (-0.256) as shown in Table 1. The amino acid composition showed in Table 2, which obtained from the ExPASy ProtParam Tool. The amino acid composition can help us to reveal the active amino acid pocket for drug and vaccine targeting against the protein.The uncharacterized protein has several functions, including it related to persulfide dioxygenase. This non-heme iron-dependent oxygenase catalyzes the oxidation of glutathione persulfide to glutathione and persulfide in the mitochondria as well as involved in a variety of biological functions (Sattler et al., 2015). Also, it has a sulfide dehydrogenase enzymatic function. It plays a vital role in cyanide fixation and other features in biological systems as well as a variety of biological functions (Spallarossa et al., 2001).
3.2 Secondary structure prediction
For the secondary structure prediction, the default setup (similarity threshold of 8, window width of 17, and the division factor of 4) was considered by SOPMA. By Utilizing 478 proteins (sub-database) and 33 aligned proteins, SOPMA predicted 39.54 percent of residues as random coils in comparison to alpha-helix (36.61 percent), extended strand (16.95 percent) and beta-turn of 6.90percent (Table 3). PSIPRED is showing the higher confidence of the prediction of the helix, strand, and coil (Fig 1).
3.3 Protein binding sites and Gene Ontology (GO) prediction
Predict protein server was applied for the determination of binding sites prediction where 12 different protein binding sites were identified at positions viz.: 4; 7-10; 88-89; 93-94; 134-138; 166; 259; 264-265; 267; 324; 326; 365, and 2 different polynucleotide binding sites were identified at positions viz. 434-435; 438. The macromolecule binding sites were found at nine different positions viz. 432; 433; 434; 435; 436; 437; 438; 438; and 440 (Fig 2). Gene ontology predicted and categorized the functional aspects of molecular functional ontology (Table 4), and biological process ontology (Table 5) and cellular component ontology (Table 6). Molecular functional ontology (Table 4) calculated as hydrolase activity (35%); thiol ester hydrolase activity (25%); hydrolase activity, acting on ester bonds (25%); hydroxyl acylglutathione hydrolase activity (25%); catalytic activity (23%); transition metal ion binding (16); binding (14%); ion binding (12%); and thiosulfate sulfurtransferase activity (11%). Biological process ontology (Table 5) detected as single-organism metabolic process (18%); single-organism process (18%); cellular process (18%); cellular metabolic process (18%); small-molecule biosynthetic process (17%); single-organism biosynthetic process (17%); peptide biosynthetic process (17%); nonribosomal peptide biosynthetic process (17%); oxoacid metabolic process (17%); and organonitrogen compound metabolic process (17%). Cellular component ontology (Table 6) predicted as cytoplasm (39%); cell part (39%); cell (39%); intracellular (39%);
3.4 Homology Modeling and Structural Validation
The target sequence of BCRIVMBC126_02492 in FASTA format inserted to HHpred Template Selection tool as input and the most active template was selected (3TP9_A) among the number of hits of 250 with the probability rate of 100 percent, E-Value of 6.7e-53, SS of 53.7, Cols of 462 and the target length of 474 (data not shown), and finally stored the tertiary modeled protein structure in PDB format predicted by Modeller (Fig 3). The tertiary structure assessment analyzation of the uncharacterized protein, the Ramachandran Map by PROCHECK (Fig 4) was used which shows that 92.6% of the total residues (387) were found in the core [A,B,L]; 6.0% of residues were in the additional allowed regions [a,b,l,p]; and there was 1.0% of residue were in the generously allowed regions [~a,~b,~l,~p] and 0.5% residue was in the disallowed regions. The number of non-glycine and non-proline residues was 418, which was 100%; the end-residues (excl. Gly and Pro) were 2; the glycine residues and proline residues were 33 and 20, respectively, among the total residues of 473 (Table 7). Verify 3D; a tertiary structure assessment tool was applied to show that the predicted tertiary Structure passed the assessment experiment (data are not shown). The
Similarly, the tertiary model of BCRIVMBC 126_02492 performed with Phyre2 based on the most suitable template (c3tp9B_) with the value of confidence of 100.0% and coverage of 98%. The 467 residues (98% of the sequence) modeled with 100.0% confidence by the single highest scoring template with Phyre2. Phyre2 also described the secondary structure parameters as the disordered of 12%, alpha-helix of 32%, and beta-strand of 20% (data are not shown). Likewise, to analyze the tertiary structure assessment of the uncharacterized protein, Ramachandran Map by PROCHECK was used showing that 91.3% of the residues in most favored regions; 7.2% were in additional allowed regions; 0.2% were in disallowed regions, and there was 1.2% residue in the generously allowed regions (Table 7).
The predicted tertiary Structure gently passed the Verify 3D structure assessment experiment. The Swiss-Model Interactive Work place, another tertiary structure assessment tool, was used for the structure validation showing that the MolProbity Score was 2.81 and Ramachandran favored of 93.82%with the QMEAN, Cβ, all-atom, solvation, and torsion values of -2.36, -1.45, -2.61, 0.40, -2.21, respectively (data are not shown), thus, validating the predicted tertiary structure of the protein BCRIVMBC126_02492. The 3D model of BCRIVMBC126_02492also executed with Swiss-Modelbased on the top five suitable templates (3tp9.1.A; 3r2u.1.A; 3r2u.1.C; 3r2u.1.A; 3r2u.1.C), and the target sequence was selected based on the Qualitative Model Energy Analysis (QMEAN) score (-1.19), Global Model Quality Estimate (GMQE) score of 0.77, percentage of sequence identity of 45.06, and the coverage of 100%. The generated tertiary Structure stored in PDB format.Therefore, The tertiary structure assessment analyzation was performed by Ramachandran Map (PROCHECK) which showed 91.9% of the residues in most favored regions [A, B, L]; 7.4% were in additional allowed regions [a,b,l,p]; 0.4% were in disallowed regions, and 0.4% were in generously allowed regions (Table 7).
The modeled tertiary Structure passed the Verify 3D structure assessment experiment (data are not shown). The Swiss-Model Interactive Workplace, another tertiary structure assessment tool, was used for the structure validation showing that the MolProbity Score was 1.41 and Ramachandran favored of 94.05% with the QMEAN, Cβ, all-atom, solvation, and torsion values of -1.19, -0.54, -0.60, 0.46, -1.21, respectively (data are not shown) which validated the predicted 3D structure of the protein BCRIVMBC126_02492. The modeled structures of BCRIVMBC126_02492 were validated by another structure validation server, the Prosa-web (Wiederstein and Sippl, 2007). Standard bond angles in the shaped tertiary structures were determined by the Prosa-web, which performed to estimate the ‘degree of nativeness of the modeled tertiary structure. Z-scores for the tertiary structures predicted by the three servers, including Modeller, Swiss-Model, and Phyre2, were -10.39, -10.41, and -10.35, respectively. The Z-scored obtained from all three, e.g., Modeller, Swiss-Model, and Phyre2, are presenting similar values, thus, validating the predicted tertiary structures. The tertiary structure modeled by Modeller was more acceptable when compared to predicted structures by Phyre2, and Swiss Model. This comparison executed by following the Ramachandran Map analysis, Verify 3D results, Swiss-Model Interactive Workplace results, and by the z-scores analyzing.
In this study, it concluded that the structural model of the protein BCRIVMBC126_02492 with predicted active sites for ligand binding are useful for understanding the protein nature. The physicochemical
parameters prediction and functional annotation are useful for understanding the action of this proteins activity. The homology-modeled protein provides insights into the functional role of the protein BCRIVMBC126_02492 in pathogenesis which would help to design potential therapeutic drugs against the protein.
We are grateful to the Department of Biochemistry and Molecular Biology, and the Department of Pharmacy of Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Bangladesh, for supporting this study.
The authors declare no conflict of interest.
Dr. Abduleziz Jemal Hamido, Deputy Managing Editor (Health Sciences), Universe Publishing Group (UniversePG), Haramaya, Ethiopia.
Dept. of Biochemistry and Molecular Biology, Bangabandhu Sheikh Mujibur Rahman Science and Technology University (BSMRSTU), Gopalganj, Bangladesh.
Saikat ASM, and Khalipha ABR. (2020). Structure prediction, characterization, and functional annotation of uncharacterized protein BCRIVMBC126_02492 of Bacillus cereus: an in silico approach. Am. J. Pure Appl. Sci., 2(4), 104-111. https://doi.org/10.34104/ajpab.020.01040111