Skip Navigation
Skip to contents

PHRP : Osong Public Health and Research Perspectives



Page Path
HOME > Osong Public Health Res Perspect > Volume 8(1); 2017 > Article
Original Article
Genome Sequencing Analysis of Atypical Shigella flexneri Isolated in Korea
Nan-Ok Kim, Hae-young Na, Su-Mi Jung, Gyung Tae Chung, Hyo Sun Kawk, Sahyun Hong
Osong Public Health and Research Perspectives 2017;8(1):78-85.
Published online: February 28, 2017

Division of Enteric Diseases, Center for Infectious Diseases, National Research Institute of Health, Osong, Korea

Corresponding author: Sahyun Hong, E-mail:

Copyright © 2017 Korea Centers for Disease Control and Prevention

This is an open access article under the CC BY-NC-ND license (

  • 25 Download
  • 1 Crossref
  • 1 Scopus
  • Objectives
    An atypical Shigella flexneri strain with a plural agglutination pattern [i.e., reacting not only with serum samples containing type antigen II but also with serum samples containing group antigens (3)4 and 7(8)] was selected for genome sequencing, with the aim of obtaining additional comparative information about such strains.
  • Methods
    The genomic DNA of atypical S. flexneri strain NCCP 15744 was sequenced using an Ion Torrent PGM sequencing machine (Life Technologies, USA). The raw sequence data were preprocessed and reference-assembled in the CLC Assembly Cell software (version 4.0.6; CLC bio, USA).
  • Results
    Ion Torrent sequencing produced 1,450,025 single reads with an average length of 144 bp, totaling ~209 Mbp. The NCCP 15744 genome is composed of one chromosome and four plasmids and contains a gtrX gene. Among the published genome sequences of S. flexneri strains, including 2457T, Sf301, and 2002017, strain NCCP 15744 showed high similarity with strain 2002017. The differences between NCCP 15744 and 2002017 are as follows: i) NCCP 15744 carries four plasmids whereas 2002017 carries five; ii) 19 genes (including CI, CII, and cro) were lost in the SHI-O genomic island of NCCP 15744 and six genes were gained as compared with strain 2002017.
  • Conclusion
    Strain NCCP 15744 is genetically similar to 2002017, but these two strains have different multilocus sequence types and serotypes. The exact reason is unclear, but the 19 lost genes may be responsible for the atypical seroconversion of strain NCCP 15744.
Shigella spp. are transmitted via the fecal-oral route and cause diseases by invading the colonic epithelium, which results in tissue destruction and massive inflammation. For this reason, Shigella spp. are causative agents of bacillary dysentery in humans [1]. Several groups of serotypically atypical Shigella flexneri strains were isolated in Korea in 2008. Among these, one group displayed a plural agglutination pattern by reacting with serum samples containing type antigen II as well as reacting with serum samples containing group antigens (3)4 and 7(8). This atypical S. flexneri strain NCCP 15744 was registered at the National Culture Collection for Pathogens (NCCP) of Korea National Institute of Health (Korea NIH) as NCCP No. 15744.
NCCP 15744 shows higher antibiotic resistance to ampicillin, streptomycin, and trimethoprim-sulfamethoxazole than do typical S. flexneri strains [2,3]. Atypical strains, or newer subserotypes, are being isolated in different parts of the world, e.g., serotype 4c, which was isolated in China and East-Asian countries [47]. Serotype 1c has also been reported in Bangladesh, China, Egypt, and Pakistan [812]. Moreover, a serotype X variant (2002017) was reported, and its genome was sequenced in China [13]. All S. flexneri serotypes, except serotype 6, share the same backbone of the basic O-antigen repeat unit, which is a tetrasaccharide consisting of a single N-acetylglucosamine and three rhamnose residues [14]. Glycosylation of any of the four sugars and/or O-acetylation of the last rhamnose residue give rise to more than 13 known serotypes. The enzymes of both processes are encoded by genes carried by bacteriophages. N-Glycosylation involves three Glucosyltransferases (gtr) genes, with one being type specific, whereas O-acetylation involves only one gene: O-acetyltransferaase (oac) [14]. This O-antigenic variation is a major strategy used by this organism to evade the host’s immunity. These bacteriophage-encoded modifications allow S. flexneri to change its O-antigenicity rather simply.
In this study, we analyzed the genome profile of NCCP 15744 and compared it with that of 2002017 as a reference strain. Strain 2002017 was a dominant serotype in China from 2003 to 2006 and was identified as a variant of serotype X. NCCP 15744 emerged for the first time in Korea in 2008, with less than 10 isolated strains per year. (i.e., 2011: 6 strains, 2012: 7 strains, 2013: 1 strain, 2014: 1 strain, and 2015: 1 strain). The major serotype of S. flexneri in Korea was 2a in the same period.
It would be worthwhile to characterize atypical S. flexneri in relation to infections. Additionally, the atypical S. flexneri strains were isolated from children and adults with severe dysentery; these data highlight the need to study these isolates in detail. Specifically, the atypical S. flexneri presented in this study should also be subjected to analysis and identification of novel O-antigen modification genes. Therefore, it would be useful to analyze the genome by sequencing for comparison with the known genomes of other strains.
1. Sample collection and biochemical characterization
The atypical S. flexneri NCCP 15744 strain was isolated from a 2-year-old diarrheal patient in 2008. This strain was identified as S. flexneri by means of the API 20E kit (bioMérieux, Marcy-l’Étoile, France). The serotype of the strain was confirmed using a commercially available antiserum kit (Denka Seiken, Niigata, Japan) specific for all type and group factor antigens. Serological reactions were run according to the manufacturer’s instructions.
2. Polymerase chain reaction (PCR) and multilocus sequence type (MLST) analysis
Chromosomal DNA was purified using the Genomic-Prep DNA Isolation Kit (Amersham Biosciences, Roosendaal, Netherlands). All PCRs were conducted using the Expanded High Fidelity Polymerase System (Roche Diagnostics, Mannheim, Germany) or Taq polymerase (Takara Bio Inc., Shiga, Japan). The primers used for gtr and oac gene analysis are listed in Supplementary Table S1. The PCR product was purified and sequenced. For MLST analysis, seven housekeeping genes (i.e., adk, fumC, gyrB, icd, mdh, purA, and recA) were amplified by PCR and sequenced. This analysis was conducted according to a previously published report and MLST web site ( [15].
3. Genome sequencing
Genomic DNA was isolated from the atypical S. flexneri NCCP 15744 strain using standard protocols and was sequenced on an Ion Torrent PGM sequencing machine (Life Technologies, Carlsbad, CA, USA). The Ion Torrent sequencing produced 1,450,025 single reads with an average length of 144 bp, totaling ~209 Mbp.
4. Sequencing data analysis
Raw sequence data were preprocessed and reference-assembled using the CLC Assembly Cell software (version 4.0.6; CLC bio, Waltham, MA, USA). Eight novel genes were predicted in the de novo contig, using Glimmer 2.1 [16] with default options. Genomic localization of insertion elements was assessed using ISfinder [17], filtered with an e-value cutoff of 1E-5 and subject coverage of 50%. To compare the overall chromosomal organization among four strains (i.e., NCCP 15744, 2002017, sf301, and 2457T), dot-plots between the reference genomes were first generated using MUMmer [18]. Additionally, the colinear blocks between the genomes were analyzed by means of the Mauve algorithm [19], and the result was parsed and visualized within a circular map by means of Circos ( The average nucleotide identity (ANI) values were calculated in the JSpecies software and aligned by the Basic Local Alignment Search Tool (BLASTn) [20,21] (Table 1). A prophage search was conducted by means of the PHAge Search Tool (PHAST; [22].
5. The nucleotide sequence accession number
The National Center for Biotechnology Information (NCBI) accession number for the genome sequence of atypical S. flexneri NCCP 15744 reported in this study is AWOX00000000.
1. Sequencing data analysis
Raw sequence reads were generated by Ion Torrent PGM sequencing. The 648,242 reads (~96 Mbp, average length 148 bp) were assembled using the genome of strain 2002017 (serotype X variant; gbAcc. NC_017328) as a reference. As a result, 620,506 reads (95.7%) were successfully aligned with the reference genome. By combining the reference-assembled contigs with the single de novo contig, a pseudomolecule of genome size 4,641,722 bp was generated for the NCCP 15744 strain (Table 1).
2. Genome features
Sequence reads were mapped to the published genomes of three reference strains (i.e., 2457T, Sf301, and 2002017), resulting in reads that were best aligned with the 2002017 genome (Table 1). The NCCP 15744 genome was thus found to be composed of one chromosome and four plasmids (pSFII_1, pSFII_2, pSFII_3, and pSFII_4), including a large virulence plasmid and a drug resistance plasmid (pSFII_1; Table 2). The chromosome sizes of NCCP 15744 and 2002017 are similar (4,631,995 bp in NCCP 15744 vs. 4,650,865 bp in 2002017), but strain 2002017 has five plasmids instead of four.
3. Seroconversion-related genes (SHI-O) and antibiotic resistance-related genes (SRL and SRLII)
SHI-O (37,894 bp) is the serotype conversion island in S. flexneri, carrying genes for O-antigen modification. The SHI-O island of NCCP 15744 is located at the same site as are SHI-O genes of other S. flexneri reported previously [23]. As confirmed by PCR analysis, this site contains gtrA, gtrB, and gtrX genes for serotype X, 2b, and 3b conversion, identical to the SfX gtr genes published elsewhere. Nevertheless, 19 genes were lost in strain NCCP 15744 compared with strain 2002017 [24,25] (Figure 1, Table 3).
The 48,095-bp Shigella resistance locus (SRL) contains tetra-cycline, chloramphenicol, ampicillin, and streptomycin resistance genes. It is similar to the SRL island initially discovered in the S. flexneri strains 2a and YSH6000 [26,27]. SRLII (14,067 bp) contains multiple antibiotic resistance genes, including dihydrofolate reductase (dfrA1), streptothricin acetyltransferase (sat1), and aminoglycoside adenyltransferase (aadA1), which confer resistance to trimethoprim and streptomycin/spectinomycin, respectively [28,29]. The NCCP 15744 genomic islands SRL and SRLII were found to be almost identical to those of strain 2002017.
4. Lost and gained genes
In total, 132 genes were lost (112 partially and 20 completely) in NCCP 15744 compared with 2002017. Among these lost genes, 19 are missing in the SHI-O genomic island region; seven have an unknown function, four are in the functional category of immunity and regulation, four are involved in DNA replication and recombination, three in lysis, and one gene is related to DNA packaging (Table 3). Meanwhile, on the basis of the de novo contig, six genes were predicted to be gained in NCCP 15744 compared with 2002017, most of which have an unknown function (Supplementary Table S2).
5. PCR and MLST analysis
PCR analysis against serotype-specific gtr (including gtrII, gtrV, and gtrX) and oac was carried out. Strain NCCP 15744 was found to carry gtrX, but not gtrII, gtrV, or oac. The MLST results indicated that NCCP 15744 belongs to the sequence type complex 245 (which has the allele profile 6, 61, 6, 11, 13, 3, and 50 in the order adk, fumC, gyrB, icd, mdh, purA, and recA, respectively). ST245 is a well-known sequence type in Asian countries and is different from that of 2002017 (ST91).
Genome sequencing of strain NCCP 15744 revealed that it bears four plasmids and has acquired a Shigella serotype conversion island (SHI-O), via bacteriophage SfX (gtrX), which is responsible for the group 7(8) antigenic determinant. SfX (gtrX) converts serotype Y to serotype X, Y to 3b, and 2a to 2b [30].
The NCCP 15744 genome shares the highest similarity with that of 2002017, but some differences exist between these strains. First, NCCP 15744 has only four plasmids as opposed to five in 2002017 (plasmid pSFXV_2 is absent in NCCP 15744). Second, according to MLST, 2002017 is ST91 whereas NCCP 15744 is ST245. Third, in comparison with strain 2002017, 19 genes were lost in SHI-O and six novel genes were gained in NCCP 15744. Among these lost genes, three (CI, CII, and Cro repressor) in the functional category of immunity and regulation are the regulatory switches that determine whether the Sf bacteriophage would follow a lytic or lysogenic cycle [31,32].
The genomic sequence data revealed that the NCCP 15744 strain has gained two multi-antibiotic resistance genomic islands (SRL and SRLII) encoding genes that confer resistance to five antibiotics (i.e., tetracycline, streptomycin, chloramphenicol, ampicillin, and trimethoprim) that are commonly used for the treatment of shigellosis in Korea. Strain NCCP 15744 even showed higher resistance to ampicillin, streptomycin, and trimethoprim-sulfamethoxazole than did the typical S. flexneri 2a strain. Unlike strain 2002012, which was a major serotype from 2003 to 2006 in China, strain NCCP 15744 was not a dominant serotype from 2003 to 2006 in Korea.
Recently, a novel S. flexneri O-antigen modification, addition of phosphoethanolamine to RhaII was identified [33]. The strain in question has O-antigen phosphoethanoleamine transferase gene opt (formerly called lpt-O) carried by a pSFXV_2-like plasmid. This opt gene inactivates the serotype-specific gene gtrX, which generates the Xv serotype. The pSFXV_2-like plasmid is absent in strain NCCP 15744.
ST245 is a common sequence type in Asian countries and comprises a wide range of serotypes (i.e., 1b, 2a, 3a, 3b, 3c, 4a, 4b, 5, X, Y, and 6) [34]. In addition, in our previous study, NCCP 15744 showed 81.3% similarity with the typical 2a strain in pulsed-field gel electrophoresis analysis [2]. According to these data, NCCP 15744 is close to serotype 2a [II: (3)4] and has similarities with 2b [II: 7(8)]. This kind of atypical pattern can happen via inactivation of the gtr locus, resulting in reversion to either the parental or intermediate serotype [34]. Nonetheless, inactivation of the gtr locus was not detected in NCCP 15744. The exact reason why NCCP 15744 shows a plural agglutination pattern is still unknown. A possible explanation is actions of an unstable bacteriophage causing a loss of 19 genes (including CI, CII, and Cro repressor genes).
These results will facilitate functional studies of intracellularly regulated genes that may be important for the adaptation and growth strategies of this atypical S. flexneri strain during infection.
This study was supported by the Korea National Institute of Health (Grant: 4851-304-210-13).


No potential conflict of interest relevant to this article was reported.

Figure 1
Alignment of SHI-O loci of strains NCCP 15744, 2457T, Sf301, and 2002017.
Table 1
Average of nucleotide identities (ANI) of strain NCCP 15744 compared to 3 reference strains (2002017, Sf301, and 2457T)
Target Query ANIb ANIb alignments ANIb aligned
2002017 NCCP 15744 99.91% 4,557 (95.49%) 4,642,181 (95.36%)
Sf301 NCCP 15744 99.83% 4,446 (93.17%) 4,519,560 (92.83%)
2457T NCCP 15744 99.84% 2,264 (93.55%) 4,540,984 (93.20%)
Table 2
Genome features of Shigella flexneri NCCP 15744 and 2002017
Features of NCCP 15744 Features of 2002017

Chromosome Plasmids Chromosome Plasmids

Total length (bp) 4,631,995 223.049 6,200 4,042 3,117 4,650,865 223.364 6,850 6,200 4,042 3,180

No. of ORFs 4,268 293 8 4 5 4,372 302 11 8 4 6

Percentage of CDS (%) 79.6 75.2 71.47 58.86 27.43 83.3 79.3 64.2 71.5 58.9 61.8

G+C content (%) 50.86 45.92 45.92 52.55 45.4 50.86 45.92 45.92 52.55 45.4

IS elements (%) 336 (7) 491 (6) 157 (32) 1 0 0 0

No. of pseudogenes 216 232

No. of rRNAs 13 13 22

No. of tRNAs 98 98 101

ORF, open reading frame; CDS, coding sequence; IS, insertion sequence

Table 3
The list of genes lost in strain NCCP 15744 compared to strain 2002017 in the SHI-O genomic island
No. Locus_tag Gene product Function
1 SFXV_0311 Hypothetical protein Unknown
2 SFXV_0312 Putative bacteriophage protein
3 SFXV_0313 Hypothetical protein
4 SFXV_0315 Hypothetical protein
5 SFXV_0316 Hypothetical protein
6 SFXV_0317 Putative phage-related DNA recombination protein
7 SFXV_0318 Hypothetical protein
8 SFXV_0322 Antitermination of transcription at nut site protein Immunity and regulation
9 SFXV_0323 CI protein
10 SFXV_0323a cro protein
11 SFXV_0324 Putative regulatory protein CII of bacteriophage
12 SFXV_0327a Gp56 DNA replication and recombination
13 SFXV_0327b Gp60
14 SFXV_0328a NinD protein
15 SFXV_0328b NinE protein
16 SFXV_0335 Rz protein Lysis
17 SFXV_0335a Hypothetical protein
18 SFXV_0335b Hypothetical protein
19 SFXV_0347 Gene 10 protein DNA packaging & head and tail morphogenesis

Figure & Data



    Citations to this article as recorded by  
    • Genomic and proteomic characterization of two strains of Shigella flexneri 2 isolated from infants’ stool samples in Argentina
      Mónica F. Torrez Lamberti, Lucrecia C. Terán, Fabián E. Lopez, María de las Mercedes Pescaretti, Mónica A. Delgado
      BMC Genomics.2022;[Epub]     CrossRef

    • PubReader PubReader
    • Cite
      export Copy
    • XML DownloadXML Download

    PHRP : Osong Public Health and Research Perspectives