Browsing by Author "Grond, Marco Marten"
Now showing 1 - 1 of 1
Results Per Page
- ItemText detection in natural images using convolutional neural networks(Stellenbosch : Stellenbosch University, 2017-03) Grond, Marco Marten; Brink, Willie; Herbst, B. M.; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Applied MathematicsENGLISH ABSTRACT : In this study we attempt to solve the problem of text detection in natural images. This requires us to identify regions in a natural image that contain text. Possible applications range from assistive technology, human computer interaction and context extraction. Although humans find the task almost trivial, large variations in colour, font, size and orientation must be accounted for, and text shares many features and structures with other objects that cause complications when attempting to automate a solution. We train multiple convolutional neural networks in an attempt to solve this problem. We chose convolutional neural networks both because they have already displayed potential in the context of text recognition, and to better understand how they operate. A sliding window approach is taken, where smaller regions of a full image are classified separately before the results are combined to identify text regions in the full image. Due to an insufficient number of annotated natural training images, we create a supplementary synthetic dataset. Using the synthetic data as a starting point we train networks of different structures, after which the same networks are finetuned on smaller natural datasets. Networks first trained on the synthetic data outperform networks trained solely on the smaller natural datasets, regardless of structure complexity. This is likely due to an inability to identify relevant features from a limited number of training examples. Our experiments further show that a larger network structure is required for generalization, and that smaller datasets are prone to overfitting. We apply our best performing trained network to the task of detecting text in full images, by extracting and classifying regions in an image using a sliding window. Image pyramids are also implemented to allow for greater variance in the size of text that can be detected. We find, however, that implementing image pyramids only slightly improves the accuracy over a single image, likely due to the fact that some scale variation was already present in the network’s training set. Ultimately, we find that convolutional neural networks show promise for the task of text detection in natural images. We also find that training a network on synthetic data and finetuning it on natural data improves the overall accuracy.