Faction II Genome Browser Group

From Compgenomics2017

Jump to: navigation, search

Contents

Project Description

The genomics 2017 class were provided 26 isolates of Salmonella Heidelberg classified as sporadic and sequence data generated from Illumina HiSeq from the CDC(Centers for Disease Control and Prevention) and proceed through five distinct stages of analysis and interpretation of that data: assembled the input data into whole genome with short contigs, did gene prediction with Ab-initio Gene Prediction RNA prediction, merged results intp a single GFF file for each sporadic isolate to get protein functional annotation, did comparative genomics and finally put all these results into a genome browser. In this browser, we could see all the tools, scripts, results and visualizations.

Goals

  • Visualize the output results of previous groups
  • Facility to download the pre-computed results of genome samples and the scripts used for computation
  • [optional] Function to BLAST a sequence provided by user against VFDB

Genome Browser Background

GMOD is a collection of interconnected applications and databases that biologists use as repositories and as tools. It is made up databases, applications, and adaptor software that connects these components together. GBrowse is probably GMOD's most popular component and almost all of the databases listed in GMOD Users use GBrowse. Here GBrowse is short for Genome Browser, or Generic Genome Browser. [1] A Genome Browser is a web/desktop based graphical tool, which can provide a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks (known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands, mouse homologies, and more). The Genome Browser stacks annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. The user can look at a whole chromosome to get a feel for gene density, open a specific cytogenetic band to see a positionally mapped disease gene candidate, or zoom in to a particular gene to view its spliced ESTs and possible alternative splicing. The Genome Browser itself does not draw conclusions; rather, it collates all relevant information in one location, leaving the exploration and interpretation to the user. [2]


Browser Established

We have tried many Genome Browsers, but finally we only chose JBrowse to build our own genome browser.

Browser List

Genome Browser Website Language used
JBrowse http://jbrowse.org/ Javascript & HTML5
D3 Browser https://github.com/brinkmanlab/GenomeD3Plot JavaScript
Anno-J http://www.annoj.org/ JavaScript
Gaggle http://gaggle.systemsbiology.net/docs/ Flexible. Python
Biodalliance http://www.biodalliance.org/index.html HTML5
GenPlay http://genplay.net Java
GenoVerse http://wtsi-web.github.io/Genoverse/ HTML5
Genome Projector http://www.g-language.org/GenomeProjector/ Perl, Asynchronous JavaScript & XML (AJAX)
WebApollo http://gmod.org/wiki/WebApollo Groovy
Rgb Browser http://bioinformatics.ovsa.fr/72/Rgb_130 R


JBrowse

JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5, with optional run-once data formatting tools written in Perl.[3]

The reasons that JBrowse takes advantage over other browsers are given below:

  • Very light server resource requirements.[3]
  • The visualization design:
               * highlight regions of importance
               * ease of getting region specific fasta file
               * quick search
               * detailed information about sequences
  • Supports GFF3, BED, FASTA, Wiggle, BigWig, BAM, VCF (with tabix), REST, and more. BAM, BigWig, and VCF data are displayed directly from chunks of the compressed binary files, no conversion needed.[3]
  • Don’t need a database to store the gff files.


Faction II Genome Web Browser

Website address: http://gbrowse2017b.biology.gatech.edu/

The Website includes 6 different pages: "Home", "Tools", "Downloads", "Scripts", "Browser" and "Discussion", providing various functions. We also have "About Us", "Contact Us", "Profiles", "Privacy Policies", "Acknowledgement" and "References" these basic links.

Implementation and Configuration

The website was implemented based on a free template provided at https://medialoot.com/blog/how-to-code-a-homepage-template-with-html5-and-css3/. The template itself is independent of any server-side scripting and based purely on HTML and CSS. We used this template to construct the Genome Browser for Faction II, by introducing server-side and client-side scripting to managing menues, dynamic loading, embeddings, re-directions and additional information/links. Some of the visual and aesthetic features added was learned from various free online resources. Images/backgrounds used were obtained from royalty free sources which allowed them to be used freely with or without any modifications to the original file.

You may find the source controlled version of the Genome Browser at https://github.gatech.edu/dwijeratne3/gbrowse2017b.biology.gatech.edu

Home Page

The Home page provides brief description of the bacteria and processes of this project, and portal access to tools, genome browser and download part with all useful links.

Home.png

You can also visit the wiki pages for other groups.

Other groups.png

Tools Page

We have implemented two tools: BLAST and Virulent factors detection tool.

BLAST:Users can BLAST their own samples against all 26 isolates from this study. Users can upload either nucleotide or protein sequences.

Virulent Factors Detection Tool: Users can find out Virulent factor in their isolates using Virulent factors detection tool. The tool uses BLAST program and VFDB database to find out virulent factors in the input sequence.

Tool.png

Main page of Blast and Virulent Factor Detection tool:

Users can paste their sequences and select appropriate parameters for BLAST program such as E-value, Target alignments, Gap open and Gap extension penalty,word size,reward value etc.

Blast.png V.png


Result Page: BLAST

The tool gives result in tabular format which gives the information about the query,subject,E-value,percent identity,subject start and subject end coordinates.

Selection 085.png


Result Page: Virulent Factor Detection tool

The tool outputs result in tabular format which includes VFDB id,query,E-value,percent identity,subject start and subject end coordinates.


Selection 086.png

Downloads Page

This page contains all the source files and result files for downloading the sequences and feature files. You may search any of the files by name and visualize each sequence or feature by clicking on the icon which would take you to the JBrowse instance.

Download.png

Scripts Page

Whatever the scripts hosted in above git-repo will be shown with a description/purpose of the script so that users may download it via the browser. We also give the functionality to search scripts by their name and description.

Script.png

Genome Visualization Browser Page

This is the visualization part. You can check the boxes on the right to see visualization of the tracks you want. You can also click on the specific gene if you want to see detailed information, then a box will pop up.

Collecting Input

Both the Genome Assembly Group and the Functional annotation groups have provided us with sequence files (fasta format) and 8 feature files (GFF3 format) for each of the 26 samples. These data was collected and grouped by sample so that configuring JBrowse would be easier.

Converting Input

Different tools has being used for generating these of these data files. Though the output format for most of the tools was GFF, they had small differences in syntax and none were fully compatible with JBrowse out the box. Therefore an assortment of regular expression based scripts were written in python to handle conversion. These scripts were wrapped together with a bash script to applied to the output of the necessary tools. Additional scripts were written to divide "Pthway" and "Non-pathway" feature files in to Protein and Polypeptide feaatures, so that we'd have total of 10 different genome features to visualize.

The individual scripts can be found in the github location for the genome browser mentioned above.

Configuring JBrowse

Adding The Sequence Data and Feature Data

JBrowse supports 2 types of data sources. Relational Database system and flat-file system. The packaged perl scripts provided by JBrowse allows users to convert their data in to these formats. We decided to use flat-file system as the data sources for JBrowse. The perl scripts which does the conversion creates JSON files such that the java-script running at client-end at the browser can incorporate it directly when constructing the front-end.

The command to add a sequence file:

$ JBROWSE_LOCATION/bin/prepare-refseqs.pl --fasta SP0001.fasta --out "JBROWSE_LOCATION/bin/data"

The command to add a features files:

$ JBROWSE_LOCATION/bin/flatfile-to-json.pl --gff abr.gff --tracklabel "Antibiotic Resistance.gff" --out "JBROWSE_LOCATION/bin/data"

Configuring Visual Aspects and Additional functions in JBrowse

trackList.json is modified to update the colors and other visual features of JBrowse.



Result.png Detail.png

Discussion Page

This page provides discussion forum for users to submit their questions and topics. The forum it self is a free hosted forum in http://nabble.com which took very little effort to setup and embed in to our browser.

Discussion.png

Discussion

Overall our main focus on goals was to bring 2 functionalities to light

  1. Visualize the genome and its features using an existing genome browser
  2. Allow users to run their data against BLAST program

Visualizing goal took several of the group members to work together to format the output from previous group and configure them in a JBrowse instance. This involved in spending quite alot of time analyzing the genome feature files at length and writing specialized scripts for varied formats even within the same genome feature file type.

Configuring a JBrowse instance required learning not only how to use the tools provided by JBrowse itself digging in to some of the undocumented JBrowse features and learning them in order to provide functionalities like different coloring for the features and opening further details of individual features in external web sites.

The second goal was achieved by deploying the relevant tools at the web server end and executing those tools on the fly when user send their input. The main challenge for this goal is properly connecting the frontend web pages with the back-end tools such that users would experience a fast and smooth flow in execution.

Presentation Files


Reference

  1. Gmod.org. (2017). Overview - GMOD. [online] Available at: http://gmod.org/wiki/Overview [Accessed 25 Apr. 2017].
  2. Anon, (2017). [online] Available at: https://genome.ucsc.edu/goldenpath/help/hgTracksHelp.html [Accessed 24 Apr. 2017].
  3. Jbrowse.org. (2017). JBrowse | A fast, embeddable genome browser built with HTML5 and JavaScript. [online] Available at: http://jbrowse.org/ [Accessed 25 Apr. 2017].
  4. Buels R, Yao E, Diesh CM, et al. JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology. 2016;17:66. doi:10.1186/s13059-016-0924-1.
  5. Arakawa K. et al. Genome Projector: zoomable genome map with multiple views.BMC Bioinformatics200910:31 DOI: 10.1186/1471-2105-10-31
  6. WebApollo: A Web-based Sequence Annotation Editor for Distributed Community Annotation, Gregg Helt, Ed Lee, Robert Buels, Christopher Childers, Justin Reese, Mónica Muñoz-Torres, Christine Elsik, Ian Holmes, and Suzanna Lewis
  7. Genplay.net. (2017). Documentation - GenPlay, Einstein Genome Analyzer. [online] Available at: http://genplay.net/wiki/index.php?title=Documentation [Accessed 12 Apr. 2017].
  8. Sylvain Mareschal, Sydney Dubois,Thierry Lecroq, Fabrice Jardin, “Rgb: a scriptable genome browser for R”, Bioinformatics (2014), 30 (15): 2204-2205.
  9. Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH. JBrowse: A next-generation genome browser. Genome Res. (2009).
  10. Paul T Shannon, David J Reiss, Richard Bonneau and Nitin S Baligar “The Gaggle: An open-source software system for integrating bioinformatics software and data sources
Personal tools