{"id":228,"date":"2024-05-06T19:09:18","date_gmt":"2024-05-06T19:09:18","guid":{"rendered":"https:\/\/sites.wp.odu.edu\/awolter\/?post_type=portfolio&#038;p=228"},"modified":"2024-05-06T19:09:18","modified_gmt":"2024-05-06T19:09:18","slug":"de-novo-transcriptome-reconstruction-with-rna-seq-using-galaxy","status":"publish","type":"portfolio","link":"https:\/\/sites.wp.odu.edu\/awolter\/portfolio\/de-novo-transcriptome-reconstruction-with-rna-seq-using-galaxy\/","title":{"rendered":"De novo transcriptome reconstruction with RNA-Seq using Galaxy\u00a0"},"content":{"rendered":" <h4 class=\"wp-block-heading\">What is De novo transcriptome reconstruction?<\/h4>    <p>De novo transcription reconstruction refers to assembling a transcript that does not have a a specified genome sequence. For this example, we will be using RNA-seq data and the galaxy platform.<\/p>    <h4 class=\"wp-block-heading\">Performing De novo transcriptome reconstruction on RNA Seq in Galaxy<\/h4>    <p>The process below can take a while, about 5 hours or more, due to galaxy job completion time.&nbsp;<\/p>    <p>We will first check the quality of our reads, groom them for cleaner data, recheck quality, map, reconstruction, assembly,&nbsp;<\/p>    <p>To use files paste into your web browser to download then upload to galaxy:<\/p>    <p>We will be using the following files for our project:&nbsp;<\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep1_forward_read_%28SRR549355_1%29\">https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep1_forward_read_%28SRR549355_1%29<\/a><\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep1_reverse_read_%28SRR549355_2%29\">https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep1_reverse_read_%28SRR549355_2%29<\/a><\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep2_forward_read_%28SRR549356_1%29\">https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep2_forward_read_%28SRR549356_1%29<\/a><\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep2_reverse_read_%28SRR549356_2%29\">https:\/\/zenodo.org\/record\/583140\/files\/G1E_rep2_reverse_read_%28SRR549356_2%29<\/a><\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep1_forward_read_%28SRR549357_1%29\">https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep1_forward_read_%28SRR549357_1%29<\/a><\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep1_reverse_read_%28SRR549357_2%29\">https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep1_reverse_read_%28SRR549357_2%29<\/a><\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep2_forward_read_%28SRR549358_1%29\">https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep2_forward_read_%28SRR549358_1%29<\/a><\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep2_reverse_read_%28SRR549358_2%29\">https:\/\/zenodo.org\/record\/583140\/files\/Megakaryocyte_rep2_reverse_read_%28SRR549358_2%29<\/a><\/p>    <p>Please note that in addition to the files listed above we will also require a reference genome for mouse which can be found at the following address:<\/p>    <p><a href=\"https:\/\/zenodo.org\/record\/583140\/files\/RefSeq_reference_GTF_%28DSv2%29\">https:\/\/zenodo.org\/record\/583140\/files\/RefSeq_reference_GTF_%28DSv2%29<\/a><\/p>    <h4 class=\"wp-block-heading\">Before Trimming&nbsp;<\/h4>    <p>After uploading we will use the FASTQC tool in Galaxy to access the sequence quality, pictured below,&nbsp;<\/p>    <p>Line 1: G1E forward 1, G1E reverse 1, G1E forward 2, G1E reverse 2<\/p>    <p>Line 2: Megakaryocyte forward 1, Megakaryocyte reverse 1, Megakaryocyte forward 2, Megakaryocyte reverse 2<\/p>    <figure class=\"wp-block-image size-full is-resized\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"370\" height=\"281\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim1.png\" alt=\"\" class=\"wp-image-229\" style=\"width:398px;height:auto\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim1.png 370w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim1-300x228.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim1-79x60.png 79w\" sizes=\"(max-width: 370px) 100vw, 370px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"407\" height=\"314\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim2.png\" alt=\"\" class=\"wp-image-230\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim2.png 407w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim2-300x231.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim2-78x60.png 78w\" sizes=\"(max-width: 407px) 100vw, 407px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"408\" height=\"312\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim3.png\" alt=\"\" class=\"wp-image-231\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim3.png 408w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim3-300x229.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim3-78x60.png 78w\" sizes=\"(max-width: 408px) 100vw, 408px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"412\" height=\"311\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim4.png\" alt=\"\" class=\"wp-image-232\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim4.png 412w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim4-300x226.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim4-79x60.png 79w\" sizes=\"(max-width: 412px) 100vw, 412px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim5.png\"><img loading=\"lazy\" decoding=\"async\" width=\"412\" height=\"315\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim5.png\" alt=\"\" class=\"wp-image-233\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim5.png 412w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim5-300x229.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim5-78x60.png 78w\" sizes=\"(max-width: 412px) 100vw, 412px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim6.png\"><img loading=\"lazy\" decoding=\"async\" width=\"413\" height=\"317\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim6.png\" alt=\"\" class=\"wp-image-234\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim6.png 413w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim6-300x230.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim6-78x60.png 78w\" sizes=\"(max-width: 413px) 100vw, 413px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim7.png\"><img loading=\"lazy\" decoding=\"async\" width=\"411\" height=\"314\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim7.png\" alt=\"\" class=\"wp-image-235\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim7.png 411w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim7-300x229.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim7-79x60.png 79w\" sizes=\"(max-width: 411px) 100vw, 411px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim8.png\"><img loading=\"lazy\" decoding=\"async\" width=\"418\" height=\"318\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim8.png\" alt=\"\" class=\"wp-image-236\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim8.png 418w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim8-300x228.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim8-79x60.png 79w\" sizes=\"(max-width: 418px) 100vw, 418px\" \/><\/a><\/figure>    <h4 class=\"wp-block-heading\">After Trimming&nbsp;<\/h4>    <p>As we can see, most of our sequences currently have less than optimal quality, therefore we will groom the data to make it cleaner. We will utilize the Trimmomatic tool with pair end reads for each of our four data sequences and follow up with an additional FASTQC on the output data. As we can see below our sequence quality has dramatically improved and we are now ready to continue onto mapping and reconstruction.&nbsp;<\/p>    <p>Line 1: G1E forward 1, G1E reverse 1, G1E forward 2, G1E reverse 2<\/p>    <p>Line 2: Megakaryocyte forward 1, Megakaryocyte reverse 1, Megakaryocyte forward 2, Megakaryocyte reverse 2<\/p>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim10.png\"><img loading=\"lazy\" decoding=\"async\" width=\"413\" height=\"310\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim10.png\" alt=\"\" class=\"wp-image-237\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim10.png 413w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim10-300x225.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim10-80x60.png 80w\" sizes=\"(max-width: 413px) 100vw, 413px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim11.png\"><img loading=\"lazy\" decoding=\"async\" width=\"419\" height=\"321\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim11.png\" alt=\"\" class=\"wp-image-238\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim11.png 419w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim11-300x230.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim11-78x60.png 78w\" sizes=\"(max-width: 419px) 100vw, 419px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim12.png\"><img loading=\"lazy\" decoding=\"async\" width=\"417\" height=\"311\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim12.png\" alt=\"\" class=\"wp-image-239\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim12.png 417w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim12-300x224.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim12-80x60.png 80w\" sizes=\"(max-width: 417px) 100vw, 417px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim13.png\"><img loading=\"lazy\" decoding=\"async\" width=\"418\" height=\"319\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim13.png\" alt=\"\" class=\"wp-image-240\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim13.png 418w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim13-300x229.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim13-79x60.png 79w\" sizes=\"(max-width: 418px) 100vw, 418px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim14-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"418\" height=\"313\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim14-1.png\" alt=\"\" class=\"wp-image-242\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim14-1.png 418w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim14-1-300x225.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim14-1-80x60.png 80w\" sizes=\"(max-width: 418px) 100vw, 418px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim15.png\"><img loading=\"lazy\" decoding=\"async\" width=\"426\" height=\"319\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim15.png\" alt=\"\" class=\"wp-image-243\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim15.png 426w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim15-300x225.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim15-80x60.png 80w\" sizes=\"(max-width: 426px) 100vw, 426px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim16.png\"><img loading=\"lazy\" decoding=\"async\" width=\"396\" height=\"303\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim16.png\" alt=\"\" class=\"wp-image-244\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim16.png 396w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim16-300x230.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim16-78x60.png 78w\" sizes=\"(max-width: 396px) 100vw, 396px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim17.png\"><img loading=\"lazy\" decoding=\"async\" width=\"391\" height=\"298\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim17.png\" alt=\"\" class=\"wp-image-245\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim17.png 391w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim17-300x229.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/trim17-79x60.png 79w\" sizes=\"(max-width: 391px) 100vw, 391px\" \/><\/a><\/figure>    <h4 class=\"wp-block-heading\">Mapping and Reconstruction<\/h4>    <p>To map our paired end reads we will use the HISAT2 tool. We will use our paired read outputs from Trimmomatic ensuring we pair the correct sequences (complimentary forward and reverse strands). This will produce 4 BED files, one for each pair, 2 for each cell line.&nbsp;<\/p>    <p><strong>HISAT2<\/strong>: alignment tool that uses FASTA and FASTQ file inputs, either single or paired end assembly<\/p>    <p>For the first portion of assembling the transcript we will use the StringTie tool and plug in the newly generated BAM files from HISAT2. This will provide the transcriptomes we will use to complete the reconstruction. For our reference annotation we will use the reference sequence (provided above).&nbsp;<\/p>    <p><strong>StringTie<\/strong>: uses a short, mapped reads file to determine transcript assembly and quantification (not to be confused with StringTie merge)<\/p>    <p>Secondly, we will use GFFcompare. This tool allows us to &nbsp;annotate the transcripts we just produced from StringTie. To do this we will select the output files from StringTie as our input, use the refs data as our Reference annotation and use the Mouse (Music Musculus) mm10 as our sequence data.&nbsp;<\/p>    <p><strong>GFFCompare<\/strong>: uses a reference annotation to compare transcripts that have been previously assembled using another tool<\/p>    <p>GFF Compare data output was as follows:<\/p>    <pre class=\"wp-block-code\"><code># gffcompare v0.11.2 | Command line was:  #gffcompare -r ref_annotation -s ref_seq.fa -e 100 -d 100 -p TCONS StringTie_on_data_57__Assembled_transcripts StringTie_on_data_58__Assembled_transcripts StringTie_on_data_59__Assembled_transcripts StringTie_on_data_60__Assembled_transcripts  #    #= Summary for dataset: StringTie_on_data_57__Assembled_transcripts   #     Query mRNAs :     280 in     263 loci  (121 multi-exon transcripts)  #            (16 multi-transcript loci, ~1.1 transcripts per locus)  # Reference mRNAs :     274 in     183 loci  (250 multi-exon)  # Super-loci w\/ reference transcripts:       75  #-----------------| Sensitivity | Precision  |          Base level:    12.9     |    38.8    |          Exon level:    23.6     |    45.1    |        Intron level:    30.7     |    82.8    |  Intron chain level:     8.0     |    16.5    |    Transcript level:     7.3     |     7.1    |         Locus level:    10.9     |     7.6    |         Matching intron chains:      20         Matching transcripts:      20                Matching loci:      20              Missed exons:     861\/1332\t( 64.6%)             Novel exons:     212\/696\t( 30.5%)          Missed introns:     747\/1163\t( 64.2%)           Novel introns:      46\/431\t( 10.7%)             Missed loci:     108\/183\t( 59.0%)              Novel loci:     168\/263\t( 63.9%)    #= Summary for dataset: StringTie_on_data_58__Assembled_transcripts   #     Query mRNAs :     296 in     273 loci  (141 multi-exon transcripts)  #            (22 multi-transcript loci, ~1.1 transcripts per locus)  # Reference mRNAs :     274 in     183 loci  (250 multi-exon)  # Super-loci w\/ reference transcripts:       76  #-----------------| Sensitivity | Precision  |          Base level:    13.2     |    33.6    |          Exon level:    24.4     |    44.9    |        Intron level:    31.4     |    80.9    |  Intron chain level:     9.6     |    17.0    |    Transcript level:     8.8     |     8.1    |         Locus level:    13.1     |     8.8    |         Matching intron chains:      24         Matching transcripts:      24                Matching loci:      24              Missed exons:     845\/1332\t( 63.4%)             Novel exons:     224\/724\t( 30.9%)          Missed introns:     736\/1163\t( 63.3%)           Novel introns:      54\/451\t( 12.0%)             Missed loci:     107\/183\t( 58.5%)              Novel loci:     172\/273\t( 63.0%)    #= Summary for dataset: StringTie_on_data_59__Assembled_transcripts   #     Query mRNAs :     251 in     234 loci  (105 multi-exon transcripts)  #            (16 multi-transcript loci, ~1.1 transcripts per locus)  # Reference mRNAs :     274 in     183 loci  (250 multi-exon)  # Super-loci w\/ reference transcripts:       60  #-----------------| Sensitivity | Precision  |          Base level:     8.1     |    34.7    |          Exon level:    14.8     |    37.7    |        Intron level:    20.0     |    81.8    |  Intron chain level:     4.4     |    10.5    |    Transcript level:     4.0     |     4.4    |         Locus level:     6.0     |     4.7    |         Matching intron chains:      11         Matching transcripts:      11                Matching loci:      11              Missed exons:    1001\/1332\t( 75.2%)             Novel exons:     191\/523\t( 36.5%)          Missed introns:     888\/1163\t( 76.4%)           Novel introns:      35\/285\t( 12.3%)             Missed loci:     123\/183\t( 67.2%)              Novel loci:     154\/234\t( 65.8%)    #= Summary for dataset: StringTie_on_data_60__Assembled_transcripts   #     Query mRNAs :     118 in     116 loci  (50 multi-exon transcripts)  #            (2 multi-transcript loci, ~1.0 transcripts per locus)  # Reference mRNAs :     273 in     182 loci  (249 multi-exon)  # Super-loci w\/ reference transcripts:       32  #-----------------| Sensitivity | Precision  |          Base level:     3.4     |    40.4    |          Exon level:     5.9     |    33.6    |        Intron level:     9.0     |    89.7    |  Intron chain level:     2.0     |    10.0    |    Transcript level:     1.8     |     4.2    |         Locus level:     2.7     |     4.3    |         Matching intron chains:       5         Matching transcripts:       5                Matching loci:       5              Missed exons:    1167\/1321\t( 88.3%)             Novel exons:      77\/232\t( 33.2%)          Missed introns:    1023\/1153\t( 88.7%)           Novel introns:      11\/116\t(  9.5%)             Missed loci:     150\/182\t( 82.4%)              Novel loci:      71\/116\t( 61.2%)     Total union super-loci across all input datasets: 396   <\/code><\/pre>    <figure class=\"wp-block-table\"><table><thead><tr><td><strong>Status<\/strong><strong><\/strong><\/td><td><strong>HISAT2 on data 26 and data 25: aligned reads (BAM)<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Status<\/td><td>HISAT2 on data 26 and data 25: aligned reads (BAM)<\/td><\/tr><tr><td>Assigned<\/td><td>536859<\/td><\/tr><tr><td>Unassigned_Unmapped<\/td><td>802<\/td><\/tr><tr><td>Unassigned_Read_Type<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Singleton<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MappingQuality<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Chimera<\/td><td>4925<\/td><\/tr><tr><td>Unassigned_FragmentLength<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Duplicate<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MultiMapping<\/td><td>348421<\/td><\/tr><tr><td>Unassigned_Secondary<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NonSplit<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NoFeatures<\/td><td>244990<\/td><\/tr><tr><td>Unassigned_Overlapping_Length<\/td><td>0<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Unassigned_Ambiguity                       333505<\/figcaption><\/figure>    <p><\/p>    <figure class=\"wp-block-table\"><table><thead><tr><td><strong>Status<\/strong><\/td><td><strong>HISAT2 on data 30 and data 29: aligned reads (BAM)<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Status<\/td><td>HISAT2 on data 30 and data 29: aligned reads (BAM)<\/td><\/tr><tr><td>Assigned<\/td><td>629853<\/td><\/tr><tr><td>Unassigned_Unmapped<\/td><td>512<\/td><\/tr><tr><td>Unassigned_Read_Type<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Singleton<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MappingQuality<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Chimera<\/td><td>7694<\/td><\/tr><tr><td>Unassigned_FragmentLength<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Duplicate<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MultiMapping<\/td><td>516965<\/td><\/tr><tr><td>Unassigned_Secondary<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NonSplit<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NoFeatures<\/td><td>311733<\/td><\/tr><tr><td>Unassigned_Overlapping_Length<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Ambiguity<\/td><td>496119<\/td><\/tr><\/tbody><\/table><\/figure>    <figure class=\"wp-block-table\"><table><thead><tr><td><strong>Status<\/strong><strong><\/strong><\/td><td><strong>HISAT2 on data 34 and data 33: aligned reads (BAM)<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Status<\/td><td>HISAT2 on data 34 and data 33: aligned reads (BAM)<\/td><\/tr><tr><td>Assigned<\/td><td>276145<\/td><\/tr><tr><td>Unassigned_Unmapped<\/td><td>325<\/td><\/tr><tr><td>Unassigned_Read_Type<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Singleton<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MappingQuality<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Chimera<\/td><td>7886<\/td><\/tr><tr><td>Unassigned_FragmentLength<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Duplicate<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MultiMapping<\/td><td>358737<\/td><\/tr><tr><td>Unassigned_Secondary<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NonSplit<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NoFeatures<\/td><td>178164<\/td><\/tr><tr><td>Unassigned_Overlapping_Length<\/td><td>0<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Unassigned_Ambiguity                       358853<\/figcaption><\/figure>    <figure class=\"wp-block-table\"><table><thead><tr><td><strong>Status<\/strong><strong><\/strong><\/td><td><strong>HISAT2 on data 38 and data 37: aligned reads (BAM)<\/strong><\/td><\/tr><\/thead><tbody><tr><td>Status<\/td><td>HISAT2 on data 38 and data 37: aligned reads (BAM)<\/td><\/tr><tr><td>Assigned<\/td><td>20886<\/td><\/tr><tr><td>Unassigned_Unmapped<\/td><td>44<\/td><\/tr><tr><td>Unassigned_Read_Type<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Singleton<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MappingQuality<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Chimera<\/td><td>202<\/td><\/tr><tr><td>Unassigned_FragmentLength<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Duplicate<\/td><td>0<\/td><\/tr><tr><td>Unassigned_MultiMapping<\/td><td>33210<\/td><\/tr><tr><td>Unassigned_Secondary<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NonSplit<\/td><td>0<\/td><\/tr><tr><td>Unassigned_NoFeatures<\/td><td>8430<\/td><\/tr><tr><td>Unassigned_Overlapping_Length<\/td><td>0<\/td><\/tr><tr><td>Unassigned_Ambiguity<\/td><td>13004<\/td><\/tr><\/tbody><\/table><\/figure>    <pre class=\"wp-block-code\"><code>   (65 multi-transcript, ~1.8 transcripts per locus)  695 out of 695 consensus transcripts written in gffcmp.combined.gtf (0 discarded as redundant)  <\/code><\/pre>    <p><\/p>    <h4 class=\"wp-block-heading\"><a>FeatureCounts<\/a><\/h4>    <p>GffCompare also produced a combined transcripts file which, I was able to use with HISAT2 data for the feature counts tool in Galaxy.&nbsp;<\/p>    <p><strong>featurecounts<\/strong>: allows you to measure the amount of expression happening in a gene.&nbsp;<\/p>    <p>A summary of the 4 featurecounts files can be found below:&nbsp;<\/p>    <h4 class=\"wp-block-heading\">DESeq 2<\/h4>    <p>DESeq2: uses count tables as input to determine differentially expressed features. We are interested in the up and down regulation of the transcripts, p values, and associations with the gene. This tools outputs two file formats. First a tabular file on which p-values and log2 values are shown. The second file outputs graphical representations of the data , see below.&nbsp;<\/p>    <p>The total number of transcripts that have a significant p-value are 197 transcripts.<\/p>    <p>When looking for down regulation we will look to the log2 column for negative number associations<\/p>    <p>Downregulated: 215 G1E transcripts<\/p>    <p>When looking for up regulation we will look to the log2 column for positive number associations<\/p>    <p>Upregulated: 189 G1E transcripts<\/p>    <h4 class=\"wp-block-heading\">Results of RNA-seq transcriptome reconstruction:<\/h4>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction.png\"><img loading=\"lazy\" decoding=\"async\" width=\"420\" height=\"418\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction.png\" alt=\"\" class=\"wp-image-246\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction.png 420w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-300x300.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-150x150.png 150w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-60x60.png 60w\" sizes=\"(max-width: 420px) 100vw, 420px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-2-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"341\" height=\"219\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-2-1.png\" alt=\"\" class=\"wp-image-248\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-2-1.png 341w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-2-1-300x193.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/transcriptome-reconstruction-2-1-93x60.png 93w\" sizes=\"(max-width: 341px) 100vw, 341px\" \/><\/a><\/figure>    <p>The G1E \u00a0seq 1 and 2 compliment eachother better than the Mega datasets 1 and 2. The comparison between the two sets shows a high variation of counts between Mega and G1E. We can see the greatest\u00a0amount\u00a0of alignment for each data set is with its companion set, followed by the counts of G1E data set 1 and Mega data set 2.\u00a0<\/p>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/dispersion-estimates.png\"><img loading=\"lazy\" decoding=\"async\" width=\"327\" height=\"325\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/dispersion-estimates.png\" alt=\"\" class=\"wp-image-249\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/dispersion-estimates.png 327w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/dispersion-estimates-300x298.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/dispersion-estimates-150x150.png 150w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/dispersion-estimates-60x60.png 60w\" sizes=\"(max-width: 327px) 100vw, 327px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/histo-p-value.png\"><img loading=\"lazy\" decoding=\"async\" width=\"298\" height=\"298\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/histo-p-value.png\" alt=\"\" class=\"wp-image-250\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/histo-p-value.png 298w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/histo-p-value-150x150.png 150w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/histo-p-value-60x60.png 60w\" sizes=\"(max-width: 298px) 100vw, 298px\" \/><\/a><\/figure>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/ma-plot.png\"><img loading=\"lazy\" decoding=\"async\" width=\"294\" height=\"295\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/ma-plot.png\" alt=\"\" class=\"wp-image-251\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/ma-plot.png 294w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/ma-plot-150x150.png 150w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/ma-plot-60x60.png 60w\" sizes=\"(max-width: 294px) 100vw, 294px\" \/><\/a><\/figure>    <p><\/p>    <p>The dispersion estimates above show that many of the transcript dispersions were at a minimum however, we can also see our fitted and final dispersion estimates align with a good portion of the given data flow.&nbsp;<\/p>    <p>As we can see in the plots above the p-values correlating to the transcripts of G1E showed various amounts of change in expression however, a decent amount had no change.&nbsp;<\/p>    <h4 class=\"wp-block-heading\">Results of Transcript expressions (using IGV)&nbsp;<\/h4>    <p><strong>The GTF data set from our GFFcompare file showed the following genes for &nbsp;G1E and Megakarocyte cell line at the chromosomal location indicated &nbsp; &nbsp;<\/strong><\/p>    <p>chr11:96193539-96206376 on the mouse mm10 gene model.&nbsp;<\/p>    <p><strong>Hoxb13 (3 variations)<\/strong><strong>&nbsp;<\/strong><\/p>    <p><strong>TCONS_00000076<\/strong><\/p>    <p><strong>TCONS_00000087<\/strong><\/p>    <p><strong>TCONS_00000088<\/strong><\/p>    <p><strong>Gm1538 (6 variations, 1 partially located in the region(TSS and 1 exons)<\/strong><\/p>    <figure class=\"wp-block-image size-full\"><a href=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/variations1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"425\" height=\"261\" src=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/variations1.png\" alt=\"\" class=\"wp-image-252\" srcset=\"https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/variations1.png 425w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/variations1-300x184.png 300w, https:\/\/sites.wp.odu.edu\/awolter\/wp-content\/uploads\/sites\/35065\/2024\/05\/variations1-98x60.png 98w\" sizes=\"(max-width: 425px) 100vw, 425px\" \/><\/a><\/figure> ","protected":false},"excerpt":{"rendered":"<p>What is De novo transcriptome reconstruction? De novo transcription reconstruction refers to assembling a transcript that does not have a a specified genome sequence. For this example, we will be using RNA-seq data and the galaxy platform. Performing De novo transcriptome reconstruction on RNA Seq in Galaxy The process below can take a while, about [&hellip;]<\/p>\n","protected":false},"author":28160,"featured_media":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","meta":{"wds_primary_portfolio-type":0},"portfolio-type":[],"_links":{"self":[{"href":"https:\/\/sites.wp.odu.edu\/awolter\/wp-json\/wp\/v2\/portfolio\/228"}],"collection":[{"href":"https:\/\/sites.wp.odu.edu\/awolter\/wp-json\/wp\/v2\/portfolio"}],"about":[{"href":"https:\/\/sites.wp.odu.edu\/awolter\/wp-json\/wp\/v2\/types\/portfolio"}],"author":[{"embeddable":true,"href":"https:\/\/sites.wp.odu.edu\/awolter\/wp-json\/wp\/v2\/users\/28160"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.wp.odu.edu\/awolter\/wp-json\/wp\/v2\/comments?post=228"}],"wp:attachment":[{"href":"https:\/\/sites.wp.odu.edu\/awolter\/wp-json\/wp\/v2\/media?parent=228"}],"wp:term":[{"taxonomy":"portfolio-type","embeddable":true,"href":"https:\/\/sites.wp.odu.edu\/awolter\/wp-json\/wp\/v2\/portfolio-type?post=228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}