This research area focuses on the extraction of information from paper-based scribbles, together with the identification of sketched object edges from artistic cues and the geometric interpretation of the sketch from these cues.
Paper-based sketching and scribbling provide a natural way of conveying ideas and concepts, which would otherwise be difficult to express by means of verbal or written communication alone. Indeed, designers in various fields resort to paper-based sketching at the early stages of idea development since sketches are quicker to draw and, hence, communicate the designer's thoughts faster.
Human observers can easily interpret the 3-dimensional geometric shape of a sketched object, often using artistic cues that are introduced to facilitate this interpretation. However, the extraction of information from paper-based sketches is not a trivial task for a machine, where the same artistic cues may add to the algorithmic complexity required to interpret the sketch. In light of this challenge, our earliest work in this field aimed to reproduce algorithmically the ability of the human visual system in filtering and grouping the stroke information present in paper-based sketches and scribbles, in order to enable the interpretation of such sketches and scribbles by a machine. This has several useful applications, particularly in rapid prototyping where the extraction of the designer's intended shape information permits the 2-dimensional flat scribble to be transformed into a 3-dimensional virtual model. This would allow industrial designers to explore design issues, such as the functionality and aesthetics of a design concept, in an easier and rapid manner. This research work has been carried out in collaboration with the Department of Industrial and Manufacturing Engineering.
Further research work in this field has investigated the identification of the sketch strokes from artistic cues, such as shadow, introduced to the sketch in order to assist the interpretation of the geometric shape of the sketched object. The research work carried out thus far produces a vectorised representation of the sketch, distinguishing the drawing edges from the sketched shadow cues. Furthermore, a geometric interpretation of the sketch's edges may also be obtained, allowing the identification of the basic geometric structure of the object. A full reconstruction would require information on all parts of the object, however the sketched drawing would typically display parts that are visible from a particular point-of-view and omit others that are outside the field-of-view. Although such hidden views may be drawn by the user, creating what is referred to as a wireframe drawing, these additional strokes require a higher degree of skill and drawing accuracy than what is usually expected in rough drawing, particularly by non-expert sketchers. Our research work, therefore, focuses on exploiting the edge geometry interpretation of the visible edges to deduce the hidden object information. We achieve this through the use of a genetic algorithm approach, which allows us to search for possible hidden geometries using the visible edges to constrain the search space.
The eye movements have long been recognised to provide an alternative channel for communication with or control of a machine, such as a computer, substituting traditional peripheral devices. This research area seeks to address various challenges associated with passive eye-gaze tracking under uncontrolled daily life conditions, including handling of head and non-rigid face movements, and reduction or elimination of user calibration for more natural user interaction.
Human eye-gaze tracking has been receiving increasing interest over the years. Recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking and analysing eye movements continuously in unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. The notion behind the paradigm of pervasive eye-gaze tracking is multi-faceted and typically relates to characteristics that facilitate eye-gaze tracking in uncontrolled real-life scenarios, such as robustness to varying illumination conditions and extensive head rotations, the capability of estimating the eye-gaze at increased distance from the imaging hardware, reduced or implicit calibration in order to allow for situations that do not permit user co-operation and calibration awareness, and the estimation of eye-gaze on mobile devices comprising integrated hardware without requiring further hardware modification. This will potentially broaden the application areas for eye-gaze tracking within scenarios that may not permit for controlled conditions, such as for gaze-based interaction in public spaces.
Our research work has been motivated by an increasing interest in pervasive eye-gaze tracking and aims to address several of the main challenges associated with this field of interest. Specifically, our research work aims for eye-gaze tracking by joint head and eye pose estimation from image frames captured by a consumer-grade camera under ambient illumination alone. Over several years, we have applied classical computer vision techniques to estimate the eye-gaze from low-resolution eye images, while allowing head and face movement without requiring prolonged user co-operation during calibration prior to the estimation of gaze. We have developed a spherical eye-in-head rotation model that permits gaze estimation under head movement by compensating for the change in eye region appearance due to head rotation. We have also developed a method for the estimation of head pose under non-rigid face movement that exploits the information contained within trajectories of a set of feature points spread randomly over the face region, without seeking specific facial landmarks for model-fitting that are susceptible to occlusion during head rotations. More recent work investigates the use of deep learning techniques for eye-gaze estimation, as well as for robust iris centre estimation from eye region images captured under various illumination conditions.
Part of this research work has been carried out during the project R&I-2016-010-V WildEye, financed by the Malta Council for Science and Technology (MCST) through FUSION: The R&I Technology Development Programme 2016, in collaboration with Seasus Ltd. This project aimed to develop a low-cost eye-gaze tracking platform as an alternative communication channel for disabled persons that may not otherwise control a computer via traditional peripheral devices, such as the mouse and keyboard, and was concluded in early 2022. Ongoing research work involving the use of deep learning forms part of the REP-2022-002 LuminEye, financed by the MCST through FUSION R&I: Research Excellence Programme 2022, which addresses the challenge of estimating the iris centre under variable illumination conditions and iris occlusion, for more robust eye-gaze tracking.
The key factor in reconstructing the surface of an object of interest is the measurement of depth of an object's surface in 3-dimensional space. This research area focuses on the development of low-cost multiple-stereo 3D acquisition algorithms, with particular focus on the investigation of data fusion algorithms that could exploit the data redundancy from multiple cameras to reduce occlusions and improve depth accuracy.
The problem of reconstructing the surface of an object in 3-dimensional space has been studied extensively over the years, with applications ranging from industrial product inspection, to robot guidance and 3-dimensional object modelling. The key factor in the acquisition of such 3-dimensional information is the principle of range measurement, or in simpler terms, the measurement of depth of an object’s surface relative to the position of a known point or plane in space.
Stereo vision is a well-known technique to extract the 3-dimensional information from pairs of images by relating the image coordinates of corresponding points that belong to the same object feature in 3-dimensional space. This is similar to the human binocular vision, where the brain receives two horizontally shifted images of a scene as captured by the eyes and merges them into a 3-dimensional monocular view by matching the similarities in the separate views. While the brain performs this fusion of images efficiently, stereo vision methods face several challenges in identifying similar features due to occluded parts and repetitively textured or untextured surface patches.
This research work concerns the development and implementation of a low-cost multiple-stereo 3-dimensional acquisition system, for the specific purpose of modelling a person's head. Our work focuses on the investigation of data fusion algorithms that could exploit the data redundancy from multiple cameras to reduce occlusions and improve depth accuracy. Passive and active lighting techniques are combined together in order to address issues related to the reconstruction of repetitively textured patches on the face. Furthermore, techniques which permit an improvement in reconstruction resolution are also studied in order to obtain a reliable high-resolution 3-dimensional reconstruction of the person's head.
This research work has been carried out during 3D-Head, a project funded by the Malta Council for Science and Technology through the National RTDI Programme (2005), in collaboration with Megabyte Ltd.
The digitisation of musical scores benefits both novice and advanced musicians, such as by providing novice learners with help-tools which identify note pitch and rhythmic errors, and more advanced musicians, playing duets or other ensembles, with expressive play-back of accompanying parts. This research area focuses on the identification of note pitches, duration and expressive markings present on a printed musical sheet, as well as an automated expressive play-back of the musical score that mimics the musical performance of a human.
Ever since the 1980s there has been an increasing interest in automated computer systems for music performance. Software for musical typesetting is nowadays often used to compose music, moreover, with in-built cameras in smart phones and digital tablets, there is an increasing interest in applications which can scan and play-back musical scores. Our interest in music analysis is therefore two-fold. On one hand we are interested in optical musical character recognition through which it becomes possible to automatically analyse a musical sheet in order to re-write this digitally. This involves identifying the note pitches and duration as well as identification of the numerous other expressive markings that are present in the score.
Our second interest lies with the automated musical play-back of the printed score. While music software that allows for playback of written scores exist, these often result in playbacks that seem robotic and far removed from the musical performance of a human. While a computer system can generate ‘perfect’ performances which observe all notated instructions of the written score, a human performer will often deviate from the written notation introducing fluctuations in tempo and loudness level of the piece as it is being played. As a result, no two human performances will sound alike. Such fluctuations are expected and recognised as expressive playing. It is therefore desirable that computer generated performances mimic this expressive playing in order to obtain playbacks that are of reasonable quality.
This work has various applications. On one hand, music is still mainly produced as printed books, and thus, having an automated system that automatically interprets sheet music allows for the digitisation of music books. In addition, by digitising music, it is possible to create interfaces which allow novice learners to compare their own performance with that indicated by the score, providing music learners with help-tools which identify note pitch and rhythmic errors. The inclusion of expressive playback, will allow such learners to listen to musical play-backs of pieces they are learning, thus giving the novice learner hints to improve expressive playing. Moreover, such expressive play-back can also benefit more advanced musicians, in particular, musicians playing duets or other ensembles would be able to have accompanying parts played expressively.
The classification of ceramic artefacts in archaeology, based on qualitative and quantitative attributes related to surface finish, colour and decoration, provide an insight into different cultures and settlements, together with the manufacturing processes used to create the ceramic. In this regard, additional attributes, such as the ceramic texture and the size of particles that compose the ceramic paste, offer further discriminative information. This research area, therefore, concerns the development of image processing algorithms for precise measurement of texture roughness and particle size of ceramic artefacts.
In archaeology the provenancing of pottery is essential in order to allow archeologists to understand ancient societies and the interaction between them. Discrimination between different pottery fabrics and comparison of similarities allows for distinction between different cultures and settlements, or correlation between sources of clay and manufacturing centres, besides offering insights into the manufacturing processes employed to create the ceramic and the evolution of the technology used.
The methods used for classification of ceramic type are based on qualitative and quantitative attributes such as the surface finish of the ceramic, the colour of the ceramic, the decoration on the ceramic and the generic shape of the artefact among others. However, other attributes such as the size of the particles that form the ceramic paste and the characterisation of the texture of the ceramic body can offer discriminative information about the ceramic. These include the size distribution of non-plastic added particles as well as the frequency with which they occur. In addition, the texture of the ceramic may be classified as fine, medium or coarse.
These granulometric and textural attributes are typically described upon visual inspection and this makes their description highly subjective. The human psycho-visual system does not make a clear differentiation between different degrees of textural roughness. This makes the mapping of the texture classification subjective and imprecise. This gives rise for the need of image processing algorithms which may give a measure of the texture roughness and particle size metrics, and to do so repeatedly with precision.
This work is being carried out in collaboration with Dr John Betts from the Department of Classics and Archeology, at the Faculty of Arts.
Neural language models are artificial neural networks that are used to generate text. An interesting use of such models if for caption generation, by feeding images into a neural network such that it may learn to generate a suitable textual description. When a neural language model is used for caption generation, the image information can be fed to the neural network either in the recurrent neural network (RNN), here referred to as condition by injecting, or in a layer after the RNN, here referred to as conditioning by merging.
This research work focuses on these two main methods to feed a neural network with image information for caption generation. Empirical results show that merging is superior to injecting in terms of different evaluation metrics. This suggests that different modalities (visual and linguistic) for caption generation should not be jointly encoded by the RNN; rather, the multimodal integration should be delayed to a subsequent stage. Furthermore, this suggests that RNNs should not be viewed as actually generating text, but only encoding it for prediction in a subsequent layer.
This research work is being carried out in collaboration with Dr Albert Gatt from the Institute of Linguistics.
Infra-red reflectography is a non-invasive technique to reveal sketches, underdrawings or compositional changes beneath the visible layers of paint in order to shed light on an artist's painting technique. This research area concerns the implementation of an image acquisition platform as well as the development of algorithms for the extraction of information beneath the paint layers and the creation of false colour images.
Infrared (IR) reflectography is a non-invasive technique used to see through layers of paint in order to reveal valuable information on a work of art and the artist's painting technique. Surface pigments appear partially or completely transparent when illuminated with infra-red radiation, having wavelengths within the near infrared-region of the electromagnetic spectrum, and imaged with an IR-sensitive camera. The degree of penetration through the layers of paint depends upon the thickness and type of paint used, and the wavelength of the infrared radiation. Sketches and underdrawings, painted in carbon black pigment, may be visualised beneath the layers of paint since the black pigment absorbs the infrared radiation and appears dark.
This research area concerns the implementation of an image acquisition platform as well as the development of algorithms for the extraction of information beneath the paint layers and the creation of false colour images. The rendition of false colour images eases the identification of characteristics that may not be readily discernible otherwise, such as the identification of pigments that colour the work of art. False colour images are created by systematically replacing and swapping the red, green and blue channels of a visible colour image and a corresponding IR image of the same painting. This requires that the visible colour and IR images are registered, which therefore necessitates detection and matching of corresponding features between the two images.
The developed image acquisition platform has, to date, been used to image four panel paintings located at the Augustinian Monastery in Rabat, that depict the Virgin of Graces, Saint Paul, Saint Augustine and Saint Catherine of Alexandria. Among the most important artistic creations dating back to the late medieval period, these were painted using a tempera technique on a wooden support. Unfortunately, these panels have been extensively restored over the centuries. The addition of over-paint and layers of varnish have darkened and yellowed the paintings significantly, reducing their overall legibility and sharpness. The panels suffer other forms of deterioration, such as warping of the wood, cracks, flaking and lifting of paint. IR reflectography is part of a long-term project that will seek a conservation treatment for these panels. This multi-disciplinary project is being carried out in collaboration with painting conservator, Ms Erika Falzon, and involves the contribution of several local and foreign conservation scientists, art historians, and conservators specialised in paint layers and wooden supports.
Lead researchers: