Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/14722
Full metadata record
DC FieldValueLanguage
dc.date.accessioned2016-12-16T09:19:14Z
dc.date.available2016-12-16T09:19:14Z
dc.date.issued2016
dc.identifier.urihttps://www.um.edu.mt/library/oar//handle/123456789/14722
dc.descriptionB.SC.IT(HONS)en_GB
dc.description.abstractOver the years, effort put into maintaining a system has shifted to testing the system. In fact, many IT organisations today spend more than 40% of the project’s budget on Software Testing (Capgemini, 2015; Tata, 2015). One way of making the testing process easier is to automatically generate synthetic data that would be similar to the data inputted in the real environment. The benefits offered by an automated process are invaluable since real datasets can be hard to obtain due to data sensitivity and confidentiality issues. The generation of synthetic data is not an easy task because each system has different requirements. For instance, some systems have a well-defined data structure and the size of the data they would be dealing with can be estimated, other systems such as real-time processing systems have no control over the amount of data being inputted to the system. ADaGe provides a user interface that enables the end-user to define and tweak the dataset definition in the way that satisfies the system’s requirements. The end-user has the option to generate datasets with a fixed size or stream the generated data. ADaGe can generate structured data including basic data types such as random strings, integers and real numbers as well as first and last names from existing datasets. Additionally, it can generate unstructured data by reading from a book collection. In order to generate more statistically significant data, ADaGe can also generate values according to the normal distribution. Moreover, ADaGe makes use of current technologies to scale according to the user’s requirements, making it capable of generating large datasets efficiently by distributing the work amongst several nodes on a cluster. Results show that the application can generate small to large datasets with varying data types efficiently. When compared to other open-source tools, ADaGe proved to be scalable and time-efficient. ADaGe is also capable of generating a stream of data that can be stopped by an action from the user. With regards to the generation of values according to the normal distribution, statistical tests show that the data generated fit the distribution. Lastly, the application was successfully used to generate data for real-life scenarios.en_GB
dc.language.isoenen_GB
dc.rightsinfo:eu-repo/semantics/restrictedAccessen_GB
dc.subjectComputer software -- Testingen_GB
dc.subjectBig dataen_GB
dc.subjectDatabase managementen_GB
dc.titleAutomated dataset generatoren_GB
dc.typebachelorThesisen_GB
dc.rights.holderThe copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.en_GB
dc.publisher.institutionUniversity of Maltaen_GB
dc.publisher.departmentFaculty of Information & Communication Technology. Department of Computer Information Systemsen_GB
dc.description.reviewedN/Aen_GB
dc.contributor.creatorCamilleri, Sheryl
Appears in Collections:Dissertations - FacICT - 2016
Dissertations - FacICTCIS - 2016

Files in This Item:
File Description SizeFormat 
16BITSD006.pdf
  Restricted Access
2.51 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.