GRAPHJ: A Forensics Tool for Handwriting Analysis

 

Luca Guarnera, Giovanni Maria Farinella, Sebastiano Battiato e Antonino Furnari

Image Processing Laboratory (IPLAB), University of Catania - Department of Mathematics and Computer Science, Italia

 

Angelo Salici, Claudio Ciampini e Vito Matranga 
Raggruppamento Carabinieri Investigazioni Scientifiche RIS di Messina - S.S.114 Km 6,400 98128 Messina

 

Este endereço de email está sendo protegido de spambots. Você precisa do JavaScript ativado para vê-lo.
{gfarinella,furnari,battiato}@dmi.unict.it
{angelo.salici,claudio.ciampini,vito.matranga}@carabinieri.it
http://iplab.dmi.unict.it/,http://www.carabinieri.it/

 

Abstract. Handwriting analysis is a standard forensics practice to assess the identity of a person from written documents. Forensic document examiners consider diferent features related to the motion and pressure of the hand, as well as the shape  f the diferent characters and the spatial relationship among them. While examiners rely on standard protocols, documents are generally processed manually. This requires a signifcant amount of time and may lead to a subjective analysis which is difficult to replicate. Automated forensics tools to perform handwriting analysis from scanned documents are desirable to help examiners extract information in a more objective and replicable way. To this aim, in this paper we present GRAPHJ, a  forensics tool for handwriting analysis. The tool  has been designed to implement the forensics protocol employed by the \Reparto Investigazioni Scientifche" (RIS) of Carabinieri. GRAPHJ allows the examiner to 1) automatically detect text lines as well as the diferent words within the document; 2) search for a specific character and detect its occurrences in the handwritten text; 3) measure diferent quantities related to the detected elements (e.g., character height and width) and 4) generate a report containing measurements, statistics and all parameters used during the analysis. The generation of the report helps to improve the repeatability of the whole process. We also present a set of experiments to assess the compliance of GRAPHJ with respect to conventional handwriting analysis methods. Given a set of handwritten documents, the experiments compare measurements and statistics produced by GRAPHJ to those obtained by an expert forensics examiner performing classic manual analysis.

1 Introduction


Forensic handwriting examination is the analytical process of detecting regularities and singularities of a handwritten text to assess the identity of the writer [1]. The analysis focuses on recognizing the fundamental shapes of the stroke, as well as the relative positions and sizes of letters and words. For handwriting analysis methods it is important to adopt a quantitative approach in order to limit any subjective evaluation due to the personal experience of the examiner.

With this goal in mind, many forensics experts make use of a graphometric- based approach which takes into account the different quantifiable features of handwritten text. Many authors have considered the fundamental principles and techniques of document examination [2–5]. However, handwriting analysis should be robust to variations in the writing process due to several intrinsic and extrin- sic causes, such as different writing speeds, dissimulation, tiredness and available space. In this regard, it is well known that the study of character heights is valu- able to identify the range of variability of the writer [6]. For instance, Morris [7] underlined the importance of the analysis of dimensional parameters and the comparison of absolute and relative quantities. This kind of analysis, consider- ing speed, slope and style, can be used to identify an attempt of forgery. Hayes [1] shown that dimensions reflect the range of finger and hand movements which are characteristic of individual expression, e.g., some people produce an extremely small writing while others a taller one. Kelly and Lindblom [8] analyzed the ratio of lowercase to uppercase letters, showing that this value can be useful to identify the writer.

In this work, we present GRAPHJ, a useful tool for handwriting analysis. The proposed tool implements several algorithms to perform the analysis of handwritten documents which are currently considered in the protocol used by RIS - Carabinieri in Italy. For  instance, GRAPHJ allows to detect text lines  and words in the document and to search for all occurrences of a specific char- acter. GRAPHJ is also designed to simplify and improve the documentation of the analysis process, by generating a report containing statistics and measure- ments.  Our  approach  is  similar  to  the  one  of  Fabian´ska  et  al.  [9],  but  our  tool allows for automated detection of elements, in order to minimize the amount of required manual intervention. Aimed at assisting the examiner in analyzing digitalized handwritten documents, GRAPHJ can be considered a multimedia forensics tool [10]. It should be noted that our approach is different from hand- writing recognition [11], since we are not interested in digitalizing the analyzed text. To validate GRAPHJ, we performed experiments comparing the produced measurements and statistics to those obtained with classic manual analysis per- formed by a forensics expert. A video demo of GRAPHJ is reported at our web page http://iplab.dmi.unict.it/graphj/.

 

2 GRAPHJ

 We developed GRAPHJ as a plugin for ImageJ [12], which is a standard frame- work to perform many specific image processing tasks. The developed plugin allows to automate the standard procedures needed to analyze handwritten doc- uments. The implemented algorithms allow to perform three main tasks: 1) automated detection of elements (text lines and words); 2) automated search of instances of a given character; 3) automated measurement of quantities (e.g., distance between words and characters, height and width of characters). Fur- thermore, the examiner can manually intervene to adjust automated detections and measure other quantities such as absolute and relative heights. A typical workflow employed to perform handwriting analysis in GRAPHJ is shown in Fig. 1. Developed algorithms are detailed in the following sections.

 

2.1 Automated Search of Text Lines

A text line can in general be divided into three areas considering the analysis protocol: a lower area,  a median area,  and a higher area,  as it is illustrated    in Fig. 2. Automated search of text lines is performed in two steps. First, all median areas are detected in the document. Second, a lower and higher areas are identified for each detected median area.

The algorithm to search for median areas is illustrated in Fig. 3 and discussed in the following: 

  1. as a first step, the image is binarized (the resulting binary image is denoted by B) setting to 0 (black color) all pixels whose value exceeds a given thresh- old T and setting to 1 (white color) all other pixels;
  2. a per-row histogram (Hr) is created by counting the number of zero pixels contained in each pixel row of the binary The histogram will contain a number of bins equal to the number of rows contained in the original image (i.e., the image height);
  3. To detect the central lines of median areas, the algorithm considers all peaks of the histogram which values are above a user-specified threshold s1. Thresh- old s1 is introduced to reduce the influence of noise in the search of median areas;
  4. the algorithm hence finds starting and ending rows of each median area. Since histogram values are expected to decay gradually around the peak, this is done by searching for the nearest lower and higher rows which value is over 1/4 of the value of the histogram at the given peak.

 

The complete procedure is reported in Algorithm 1, where: histRows(B) computes the per-row histogram of zero pixels as discussed above; findPeaks(Hr) finds the peaks (i.e., local maxima) of histogram Hr and returns both positions (indMax) and values (valMax). The algorithm returns a list of starting and ending row indexes of all detected median areas (ind).

Once median areas are detected, the algorithm detects row indexes of higher and lower areas. This is done by looking for the nearest higher and lower empty rows. Such rows are easy to detect since they do not contain any black pixel, and hence histogram Hr has value equal to zero at those locations. If no empty rows can be found, the indexes of higher and lower areas are set to correspond to the starting and ending rows of the related median area. Algorithm 2 reports the procedure used to locate indexes of higher and lower areas. The algorithm returns a list of tuples of four values: starting row index of the higher area, starting row index of median area, ending row index of median area, ending row index of lower area.

 

 

2.2 Automated Detection of Words

Automated detection of words is performed starting from text lines detected in the binarized image B. The process works in two steps: 

  1. word boundaries are detected;
  2. higher and lower areas are refined for each 

The first step of the algorithm is illustrated in Fig. 4 and discussed in the fol- lowing. Let L be a crop of a given text line obtained from the binary image B. A column histogram Hc counting the number of black pixels contained in each column of L is computed. Note that computation of Hc is similar to computa- tion of Hr. To find word boundaries, the algorithm searches for bins in Hc which contain zero values. Such bins represent columns of L not containing any black pixel. If the detected gap is larger than a given threshold s2, then the starting and ending column indexes of a new word are stored in a list. The algorithm eventually returns a list of tuples of starting and ending indexes (is, ie). The procedure is reported in Algorithm 3.

Once word boundaries are obtained, median, higher and lower areas are de- tected for each word. This step is performed because words on same text line may have different size and orientation. The procedure works on image crops w of words detected using Algorithm 3. To determine the orientation of a given word, its crop w is rotated by different angles α sampled from interval [−N, N ] at step k. For each rotated crop, a row histogram Hw is computed using function histRows and its maximum m is computed. The correct orientation is obtained by selecting angle α leading to the highest value for m. This arises from the ob- servation that, if the word is aligned horizontally, histogram Hw will be strongly peaked. Once the correct orientation has been determined, median, higher and lower areas are detected using Algorithm 1 and Algorithm 2. The whole pro- cedure to detect lower, median and higher areas for each word is reported in Algorithm 4, where function rotate(w, α) rotates word w by α degrees.

 

2.3 Automated Search of Characters

This algorithm allows to search for all occurrences of a specific character in the document. To this end, the system allows the examiner to select a bounding box around the desired character to define a template T . The algorithm hence performs a sliding window search over the whole document to locate possible occurrences of characters. The size of the search window W is selected to be equal to the one of the template. To gain robustness to small rotations, additional candidates are generated by rotating the content of each search window by 10 and   10. Each window is assigned a score SW  using the procedure outlined   in Algorithm 5. Search windows with scores larger than a threshold set by the operator are retained as correctly detected character instances.

The scoring function reported in Algorithm 5 counts the number of black pixels contained in template T which are present in window W .

 

 

 

2.4 Measures

GRAPHJ allows to measure some quantities about words and characters in an automated way. The considered quantities are currently used in the standard protocol. In particular, the algorithm implements two functions:

  • automatic computation of the biaxial proportion and relative average;
  • automatic computation of the side expansion and relative.

 

Automatic computation of the biaxial proportion and relative average Biaxial proportions are the width and the height of the oval characters (see Fig. 5). To convert such measures from pixels to millimeters, we use the dedicated ImageJ functions. For each character, GRAPHJ computes the average ρi =Wi/hi, where wi and hi are width and height of the ith characters respectively.

Automatic computation of the side expansion and relative average The side expansion represent the distance between the characters of the word and the distance between words. Distances between words are easily computed using starting and ending word indexes computed using Algorithm 3. Distance between characters is computed in a similar way. To remove the influence of lower and upper termination of characters, those are removed. Fig. 6 illustrates the computation of side expansions. For each computed distance between characters (denoted as D(C)) and words (denoted as D(W )), GRAPHJ calculates the following ratios:

 

 

3 Experimental Analysis

GRAPHJ was tested on 10 different writing samples. Samples have been written voluntarily by 10 different right-handed subjects. All documents have been writ- ten in cursive writing and using similar ink and paper. Every subject was asked to write the same long paragraph of text which was dictated to him. The text included all letters of the Italian alphabet, as well as sentences with different length and complexity.

To compare GRAPHJ performance with standard analysis methods, each sample has been manually analyzed by a forensics expert of RIS. In particular, the examiner measured the heights of two groups of 40 different letters analyzed in a sequential way with a degree of precision of 0.1mm. In the first group, it is analyzed the height (U) of letters with an upper elongate stroke on the right or on the left side (i.e. “l”, “t”, “d”, “f”, “t”, ...). In the second group, it is analyzed the height fo the body in the median zone (M) of letters without elongate stroke (i.e. “a”, “c”, “o”, “m”, ...).

Table 1 shows the mean µ and standard deviation σ for the two groups of letters. The table compares measurements performed by the forensics expert to those obtained using GRAPHJ on the 10 documents of the dataset. Table 2 reports the mean absolute percentage error related to the measurements obtained on the two groups of letters considering the 10 documents in the dataset. Results show compliance of GRAPHJ analysis to measurements obtained by experts using standard manual techniques. The report generated bt GRAPHJ guarantees repeatability of the process.

A video demo of GRAPHJ is reported at our web page http://iplab.dmi. unict.it/graphj/.

 

4 Conclusion

We have presented GRAPHJ, an automated tool to aid the analysis of hand- written documents by forensics experts. The tool has been implemented as a plugin for ImageJ and allows to automate many operations such as detection of elements (e.g., text lines, words and characters) and measurement of quanti- ties (e.g., character height and width). Experiments show that analyses carried out using GRAPHJ are compliant to those obtained by forensics experts using standard manual techniques.

 

References

  1. C. Hayes. Forensic handwriting examination: a definitive guide. ReedWrite Press, 2006.
  2. W. Evett and R. N. Totty. A study of the variation in the dimensions of genuine signatures. Journal of the forensic science society, 25(3):207–215, 1985.
  3. A Huber and A. M. Headrick. Handwriting identification: facts and fundamen- tals. CRC press Boca Raton, 1999.
  4. E. Abbey. Natural variation and relative height proportions. International Journal of Forensic Document Examiners, 5:108–116, 1999.
  5. Maciaszek. Natural variation in measurable features of initials. Problems of Forensic Sciences, 85:25–39, 2011.
  6. M. Koppenhaver. Forensic document examination: principles and practice. Springer Science & Business Media, 2007.
  7. Morris. Forensic handwriting identification: fundamental concepts and princi- ples. Academic press, 2000.
  8. Seaman Kelly and B. S. Lindblom. Scientific examination of questioned docu- ments. CRC press, 2006.
  9. Fabian´ska,  M.  Kukicki,  G.  Zador,  T.  Dziedzic,  and  D.  Bu lka.    Graphlog– computer system supporting handwriting analysis. Probl Forensic Sci, 68:394–408, 2006.
  10. Battiato, O. Giudice, and A. Paratore. Multimedia forensics: discovering the history of multimedia contents. In Proceedings of the 17th International Conference on Computer Systems and Technologies 2016, pages 5–16. ACM, 2016.
  11. Plamondon  and  S.  N.  Srihari.   Online  and  off-line  handwriting  recognition:   a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):63–84, 2000.
  12. D. Abramoff, P. J. Magalh˜aes, and S. J. Ram. Image processing with ImageJ. Biophotonics international, 11(7):36–42, 2004.