Frequently Asked Questions

Find answers to common questions about the platform

Input Data Options in PanWSGA

PanWSGA supports two types of input:
  1. CDS FASTA files
  2. Whole genome assembly FASTA files

We strongly recommend uploading the whole genome assembly FASTA whenever possible.

Using a genome assembly allows PanWSGA to:
  • Apply a uniform gene prediction and annotation pipeline
  • Ensure consistent annotation between the query genome and the reference pangenome
  • Reduce annotation-related biases and false positive results
  • Improve accuracy in classifying core, accessory, and unique genes

Yes. CDS FASTA files are supported, particularly when genome assemblies are unavailable. However, results may be influenced by:
  • Differences in gene prediction tools
  • Missing or truncated genes
  • Inconsistent annotation standards

Yes. Analyses based on genome assemblies generally produce more reliable and comparable results, especially for unique and accessory gene identification.

Yes. PanWSGA performs de novo gene prediction and annotation on uploaded genome assemblies using the same pipeline applied to the reference pangenome.

False Positive Results in PanWSGA

Yes. As with all comparative genomics tools, PanWSGA may occasionally produce false positive results, particularly when working with draft genomes or incomplete reference datasets.

False positives occur when a gene is incorrectly classified as present, accessory, or unique due to technical or data-related factors rather than true biological differences.

False positives may arise due to:
  • Fragmented or low-quality genome assemblies
  • Annotation differences between genomes
  • Partial or truncated genes
  • Low-complexity or repetitive sequences
  • Highly divergent homologs or paralogous genes

Yes. Genes classified as unique are more sensitive to reference completeness, assembly quality, and sequence divergence.

PanWSGA minimizes false positives by:
  • Using a species-level, high-quality pangenome reference
  • Applying stringent sequence identity and coverage thresholds
  • Classifying genes based on gene clusters rather than single hits
  • Separating core, accessory, and unique genes explicitly

Users are encouraged to:
  • Use high-quality, well-assembled genomes
  • Update or expand the reference pangenome when possible
  • Manually validate key genes using BLAST or DIAMOND
  • Examine gene length, coverage, and functional annotation

Yes. PanWSGA is designed for genome-level screening and hypothesis generation. Important findings, especially unique or strain-specific genes, should be validated using independent computational or experimental approaches.

General Questions

A pan-genome is the complete set of genes found across all strains of a species. It consists of:
  • Core genes: Present in all strains (essential functions)
  • Accessory genes: Present in some strains (variable functions)
  • Unique genes: Present in only one strain (strain-specific)

The platform supports standard FASTA formats:
  • .fasta - Standard FASTA format
  • .fa - FASTA format
  • .fna - Nucleotide FASTA format

Maximum file size: 200MB

Analysis time depends on genome size:
  • Small genomes (< 2,000 genes): 2-20 minutes
  • Medium genomes (2,000-5,000 genes): 2-20 minutes
  • Large genomes (> 5,000 genes): 2-20 minutes

The annotation step (eggNOG-mapper) typically takes 2-20 minutes and is optimized for speed.

Analysis & Results

COG (Clusters of Orthologous Groups) classification assigns genes to functional categories:
  • Information Storage & Processing: J (Translation), K (Transcription), L (Replication)
  • Cellular Processes: D (Cell cycle), M (Cell wall), O (Posttranslational modification)
  • Metabolism: C (Energy), E (Amino acids), G (Carbohydrates)
  • Poorly Characterized: R (General function), S (Unknown)

COG categories help understand the functional distribution of genes in your genome.

Genes are classified based on their presence in the reference pan-genome:
  • Core genes: Present in ≥95% of reference genomes (conserved across species)
  • Accessory genes: Present in 5-95% of reference genomes (variable presence)
  • Unique genes: Present in <5% of reference genomes (rare or novel)

This classification helps identify essential vs. variable genetic content.

Yes, you can download all results in multiple formats:
  • FASTA files: Core, Accessory, and Unique gene sequences
  • Excel files: Annotations with COG categories and functional information
  • Images: COG classification pie charts (.png)
  • Text files: Summary statistics and gene mappings

All downloads are available in the "Results & Downloads" tab.

Technical Questions

The platform uses eggNOG-mapper for functional annotation:
  • eggNOG database (version 5.0) for orthology assignment
  • Prokka for gene prediction (when needed)

All tools are optimized for speed while maintaining accuracy.

The platform works best with modern browsers:
  • Google Chrome (recommended)
  • Mozilla Firefox
  • Microsoft Edge
  • Safari (latest version)

Note: JavaScript must be enabled for full functionality.

If an analysis fails, try these steps:
  1. Check that your FASTA file is properly formatted
  2. Verify the file size is under 200MB
  3. Ensure you selected a valid species and pan-genome
  4. Try refreshing the page and running again
  5. Check the browser console (F12) for error messages

If problems persist, check the Help section for detailed troubleshooting.

Still have questions?

Check out our User Guide or contact support for assistance.

View User Guide Contact us: Panwsga@gmail.com