Page 57 - Fister jr., Iztok, and Andrej Brodnik (eds.). StuCoSReC. Proceedings of the 2015 2nd Student Computer Science Research Conference. Koper: University of Primorska Press, 2015
P. 57
eloping a web application for DNA data analysis

Aleksandar Tošic´

University of Primorska Faculty of Mathematics, Natural Sciences and Information Technologies (UP
FAMNIT)

Slovenia, Koper

aleksandar.tosic@upr.si

ABSTRACT bined our data with the published datasets from [1], [3], and
a Spanish SSR database [11] to create a bigger dataset.
The article outlines the phases of developing a custom web
application, for cultivar DNA analysis. The nature of the We implemented a database application that computes most
data forces the database to expand in dimensionality, which commonly used statistical identifiers for analyzing microsatelite
effects performance. We review the current statistical meth- markers. We implemented an easy upload tool, which allows
ods for analyzing genotypization data in population genetics, researchers to upload their datasets, retrieve the statistical
and discuss a clustering approach for identifying individuals identifiers, and compare it with other datasets. The applica-
using multidimensional convex hulls to represent clusters. tion also charts all the results for easier analysis. Ideally, the
database containing genotyping data of all sequenced plants
Keywords would be publicly available. Researchers could upload their
genotyping data and the application would identify the cul-
Database, Cultiavar, Population genetics, Clustering, Con- tivars. We tackle the problems of cultivar identification in
vex hull large databases and discuss possible solutions.

1. INTRODUCTION 2. BACKGROUND

Genetics holds high hopes for new scientific discoveries. A We devote this chapter to quickly explain the nature of the
promising method is to find ways, in which information is data obtained from genotypization.
inherited within and between spices. Molecular markers are
commonly used for managing genetic resources of organisms For basic understanding we say that a gene is a subset of
and their relatives. Genotyping using molecular markers the DNA sequence that encodes information needed to form
is particularly appealing because it significantly accelerates a protein. Genes are inherited from parents and provide in-
the identification process by fingerprinting of each genotype formation needed to for the organism to evolve. This article
at any stage of development of a plant. Among many DNA focuses on genotyping, which is the process of determining
marker systems microsatellites combine several properties of differences in genetic make-up of an individual, by examining
an ideal marker such as polymorphic nature and information the DNA sequence and comparing them to other individu-
content, co-dominance, abundance in genome, availability, als. The goal is to structure the data for each individual (in
high reproducibility, and easy exchange of data between lab- our case cultivar) in order to compare groups of individuals
oratories [7] . With the popularity of microsatellites several and find relationships between them. Every cultivar is rep-
studies addressed the problem of consistency of genotyping resented by a number of loci. Loci are specific locations of
data in different laboratories. The inconsistencies appear a gene, DNA, or positions on a chromosome. These posi-
mostly due to different equipment and chemicals used to tions are found to hold important information of an species
amplify the DNA. One of the main goals in population ge- or an individual.When sequencing an individual researchers
netics is to identify a species using genotpying data but due usually check, which markers are most used amongst the
to before mentioned inconsistencies this is not a straight- research community; hence, some individuals have missing
forward task. The existing statistical indicators offer some data on some loci. Each locus contains two (sometimes even
insight on the data but in most cases require researchers to more) alleles stored as an integer representing their distance.
determine the identity. In this paper we present a database The distances are crucial for identifying changes in a genetic
application as a tool that helps researchers view and analyze fingerprint.
data. Besides implementing the usual statistical indicators,
we encourage researchers to identify their individuals and 3. APPLICATION
compare them with the existing data. This way we gather
expert’s knowledge that will be used to improve the error The application was developed using a conglomerate of frame-
rates of automatic individual identification using a cluster- works and open source libraries such as Googles AngularJS,
ing approach. Codegniter and Jquery. A modern choice of DOM ma-
nipulation tools with a small footprint yet powerful server
The department of Mediterranean Agriculture on the Fac- side framework was used to make an architectural model.
ulty of Mathematics and Information Technology provided The architectural pattern is somewhat different from the
us with genotyping data of Slovenian Olive trees. We com-

StuCoSReC Proceedings of the 2015 2nd Student Computer Science Research Conference 57
Ljubljana, Slovenia, 6 October
   52   53   54   55   56   57   58   59   60   61   62