Integrative methods for reference-independent genome assembly and error detection /

Saved in:
Bibliographic Details
Author / Creator:Bun, Christopher Chean, author.
Imprint:2016.
Ann Arbor : ProQuest Dissertations & Theses, 2016
Description:1 electronic resource (140 pages)
Language:English
Format: E-Resource Dissertations
Local Note:School code: 0330
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/11674629
Hidden Bibliographic Details
Other authors / contributors:University of Chicago. degree granting institution.
ISBN:9781369438536
Notes:Advisors: Rick Stevens Committee members: James Davis; Ian Foster; Robert Grossman; Fangfang Xia.
Dissertation Abstracts International, Volume: 78-06(E), Section: B.
English
Summary:High-throughput genetic sequencing technologies have driven the proliferation of new genomic data. From the advent of long-read Sanger sequencing to the now low-cost, short-read generation and upcoming era of single-molecule techniques, methods to address the complex genome assembly problem have evolved alongside and are introduced at an expeditious pace. These algorithms attempt to produce an accurate representation of a target genome from datasets filled with errors and ambiguities. Many of the challenges introduced, unfortunately, must be addressed through an algorithm's ad-hoc criteria and heuristics, and as a result, can output assembly hypotheses that contain significant errors. Without an inexpensive or computational approach to assess the quality of a given assembly hypothesis, researchers must make due with draft-level genome projects for downstream analysis. Solving three fundamental challenges will alleviate this issue: (i) automation and incorporation of algorithms from the dynamic landscape of genome assembly tools, (ii) developing optimal assembly algorithms best suited for various types, or mixtures, of sequencing data, and (iii) developing an approach to assess de novo genome assembly quality independence of a reference genome.
We provide several contributions towards this effort: We first introduce AssemblyRAST, a general compute orchestration framework and accompanying domain-specific language that facilitates rapid workflow design for rapid genome assembly, analysis, and method discovery. Next, we demonstrate the improvement of genome assemblies through novel integrative algorithm techniques. Finally, we devise a method for reference-independent assembly evaluation and error identification through supervised learning, along with several applications to further improve existing techniques.

MARC

LEADER 00000ntm a22000003i 4500
001 11674629
005 20170317131747.5
006 m o d
007 cr un|---|||||
008 170317s2016 miu|||||om |||||||eng d
003 ICU
020 |a 9781369438536 
035 |a (MiAaPQD)AAI10239438 
040 |a MiAaPQD  |b eng  |c MiAaPQD  |e rda 
100 1 |a Bun, Christopher Chean,  |e author. 
245 1 0 |a Integrative methods for reference-independent genome assembly and error detection /  |c Bun, Christopher Chean. 
260 |c 2016. 
264 1 |a Ann Arbor :   |b ProQuest Dissertations & Theses,   |c 2016 
300 |a 1 electronic resource (140 pages) 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
500 |a Advisors: Rick Stevens Committee members: James Davis; Ian Foster; Robert Grossman; Fangfang Xia. 
502 |b Ph.D.  |c University of Chicago; Physical Sciences Division; Department of Computer Science  |d 2016. 
510 4 |a Dissertation Abstracts International,   |c Volume: 78-06(E), Section: B. 
520 |a High-throughput genetic sequencing technologies have driven the proliferation of new genomic data. From the advent of long-read Sanger sequencing to the now low-cost, short-read generation and upcoming era of single-molecule techniques, methods to address the complex genome assembly problem have evolved alongside and are introduced at an expeditious pace. These algorithms attempt to produce an accurate representation of a target genome from datasets filled with errors and ambiguities. Many of the challenges introduced, unfortunately, must be addressed through an algorithm's ad-hoc criteria and heuristics, and as a result, can output assembly hypotheses that contain significant errors. Without an inexpensive or computational approach to assess the quality of a given assembly hypothesis, researchers must make due with draft-level genome projects for downstream analysis. Solving three fundamental challenges will alleviate this issue: (i) automation and incorporation of algorithms from the dynamic landscape of genome assembly tools, (ii) developing optimal assembly algorithms best suited for various types, or mixtures, of sequencing data, and (iii) developing an approach to assess de novo genome assembly quality independence of a reference genome. 
520 |a We provide several contributions towards this effort: We first introduce AssemblyRAST, a general compute orchestration framework and accompanying domain-specific language that facilitates rapid workflow design for rapid genome assembly, analysis, and method discovery. Next, we demonstrate the improvement of genome assemblies through novel integrative algorithm techniques. Finally, we devise a method for reference-independent assembly evaluation and error identification through supervised learning, along with several applications to further improve existing techniques. 
546 |a English 
590 |a School code: 0330 
690 |a Computer science. 
690 |a Bioinformatics. 
710 2 |a University of Chicago.  |e degree granting institution. 
720 1 |a Rick Stevens  |e degree supervisor. 
856 4 0 |u http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10239438  |y ProQuest 
035 |a AAI10239438 
929 |a eresource 
999 f f |i 278c8849-a63f-5a13-9748-20cfb3dc8760  |s e3c22603-2ab4-5448-9ae9-a5a6d0e0f2bf 
928 |t Library of Congress classification  |l Online  |c UC-FullText  |u http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10239438  |z ProQuest  |g ebooks  |i 11097584