Automated dataset generator

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/14722

Full metadata record

DC Field	Value	Language
dc.date.accessioned	2016-12-16T09:19:14Z
dc.date.available	2016-12-16T09:19:14Z
dc.date.issued	2016
dc.identifier.uri	https://www.um.edu.mt/library/oar//handle/123456789/14722
dc.description	B.SC.IT(HONS)	en_GB
dc.description.abstract	Over the years, effort put into maintaining a system has shifted to testing the system. In fact, many IT organisations today spend more than 40% of the project’s budget on Software Testing (Capgemini, 2015; Tata, 2015). One way of making the testing process easier is to automatically generate synthetic data that would be similar to the data inputted in the real environment. The benefits offered by an automated process are invaluable since real datasets can be hard to obtain due to data sensitivity and confidentiality issues. The generation of synthetic data is not an easy task because each system has different requirements. For instance, some systems have a well-defined data structure and the size of the data they would be dealing with can be estimated, other systems such as real-time processing systems have no control over the amount of data being inputted to the system. ADaGe provides a user interface that enables the end-user to define and tweak the dataset definition in the way that satisfies the system’s requirements. The end-user has the option to generate datasets with a fixed size or stream the generated data. ADaGe can generate structured data including basic data types such as random strings, integers and real numbers as well as first and last names from existing datasets. Additionally, it can generate unstructured data by reading from a book collection. In order to generate more statistically significant data, ADaGe can also generate values according to the normal distribution. Moreover, ADaGe makes use of current technologies to scale according to the user’s requirements, making it capable of generating large datasets efficiently by distributing the work amongst several nodes on a cluster. Results show that the application can generate small to large datasets with varying data types efficiently. When compared to other open-source tools, ADaGe proved to be scalable and time-efficient. ADaGe is also capable of generating a stream of data that can be stopped by an action from the user. With regards to the generation of values according to the normal distribution, statistical tests show that the data generated fit the distribution. Lastly, the application was successfully used to generate data for real-life scenarios.	en_GB
dc.language.iso	en	en_GB
dc.rights	info:eu-repo/semantics/restrictedAccess	en_GB
dc.subject	Computer software -- Testing	en_GB
dc.subject	Big data	en_GB
dc.subject	Database management	en_GB
dc.title	Automated dataset generator	en_GB
dc.type	bachelorThesis	en_GB
dc.rights.holder	The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.	en_GB
dc.publisher.institution	University of Malta	en_GB
dc.publisher.department	Faculty of Information & Communication Technology. Department of Computer Information Systems	en_GB
dc.description.reviewed	N/A	en_GB
dc.contributor.creator	Camilleri, Sheryl
Appears in Collections:	Dissertations - FacICT - 2016 Dissertations - FacICTCIS - 2016

Files in This Item:

File	Description	Size	Format
16BITSD006.pdf Restricted Access		2.51 MB	Adobe PDF	View/Open Request a copy

Show simple item record Statistics