Indexación y reconocimiento automático de texto manuscrito

Celio Hernández Tornero, Verónica Romero Gómez, Joan Andreu Sánchez Peiró, Alejandro Héctor Toselli Rossi, Enrique Vidal Ruiz

Resumen


Se especula que la cantidad de texto manuscrito acumulado en documentos custodiados por bibliotecas y archivos alrededor del mundo, supera ampliamente a la cantidad de texto (original) impreso o mecanografiado existente hasta la actualidad. Solo una pequeñísima fracción de esta ingente cantidad de documentos ha sido digitalizada hasta el momento, y de ella solo una parte infinitesimal ha sido transcrita. Así pues, la información de mayor interés contenida en la inmensa mayoría de imágenes digitales (es decir, la información transmitida por el texto), continúa siendo inaccesible para su fácil lectura, edición, indexación y búsqueda. En este artículo se introducen proyectos, y soluciones efectivas recientemente desarrolladas en ellos, para la búsqueda de información y para la transcripción completa de imágenes de documentos manuscritos históricos.

Texto completo:

PDF

Referencias


Rashad; Al-Khatif, Wasfi G.; Mahmoud, Sabri (2017), “A survey on handwritten documents word spotting”, International Journal of Multimedia Information Retrieval, 6 (1): 31-47.

Bluche, Théodore (2015), Deep Neural Networks for Large Vocabulary Handwritten Text Recognition, Tesis doctoral, Université Paris Sud - Paris XI.

Bluche, Théodore; Hamel, Sebastien; Kermovant, Christopher; Puigcerver, Joan; Stutzmann, Dominique; Toselli, Alejandro; Vidal, Enrique (2017), “Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project”, In Proceedings of the International Conference on Document Analysis and Recognition, 311-18.

Dempster, A.P.; Laird, N.M.; Rubin, D.B. (1977) “Maximum likelihood from incomplete data via the EM algorithm (with discussion)”, Journal of the Royal Statistical Society, ser. B. 39 (1): 1-38.

Fiel, Stefan; Grüning, Tobias; Gatos, Basilis; Dien, Markus; Kleber, Florian (2017), “cBAD: ICDAR 2017 competition on baseline detection”, Proceedings of the International Conference on Document Analysis and Recognition.

Fischer, A.; Keller, A.; Frinken, V; Bunke, H. (2010), “Lexicon-free handwritten word spotting using character HMMs”, Pattern Recognition Letters, 33 (7): 934-42.

Frinken, V; Fischer, A; Manmatha, R; Bunke, H. (2012), “A Novel Word Spotting Method Based on Recurrent Neural Networks”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (2): 211-24.

Giotis, Angelos P.; Sfikas, Giorgos; Gatos, Basilis; Nikou, Christophoros (2017), “A survey of document image word spotting techniques”, Pattern Recognition, 68: 310-32.

Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009), “A novel connectionist system for unconstrained handwriting recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (5): 855-68.

Jelinek, Frederick (1998), Statistical Methods for Speech Recognition, Cambridge (Mass.), MIT Press.

Kim, G.; Govindaraju, V.; Srihari, S.N. (1999), “An architecture for handwritten text recognition systems”, International Journal on Document Analysis and Recognition, 2 (1): 37-44.

Makhoul, J.; Schwartz, R.; Lapre, C.; Bazzi, I. (1998), “A script-independent methodology for optical character recognition”, Pattern Recognition, 31: 1285-94.

Pastor i Gadea, Moisés (2007), Aportaciones al reconocimiento automático de texto manuscrito, Tesis doctoral, Universitat Politècnica de València.

Plamondon, R.; Srihari, S.N. (2000), “On-line and off-line handwriting recognition: a comprehensive survey”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (1): 63-84.

Pratikakis, I.; Zagoris, K.; Gatos, B.; Puigcerver, Joan; Toselli, Alejandro H.; Vidal, Enrique (2016), “ICFHR2016 handwritten keyword spotting competition (h-kws 2016)”, 15th International Conference on Frontiers in Handwriting Recognition, IEEE: 613-18.

Puigcerver, Joan; Toselli, Alejandro H.; Vidal, Enrique (2015), “ICDAR2015 competition on keyword spotting for handwritten documents”, Document Analysis and Recognition (ICDAR), IEEE: 1176-80.

Romero, Verónica; Toselli, Alejandro H.; Vidal, Enrique (2012), Multimodal Interactive Handwritten Text Transcription, Machine Perception and Artificial Intelligence (volume 80), Singapore, World Scientific Publishing.

Sánchez, Joan Andreu; Romero, Verónica; Toselli, Alejandro H.; Vidal, Enrique (2014), “ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS)”, 15th International Conference on Frontiers in Handwriting Recognition, IEEE: 181-6.

—, (2015), “ICDAR 2015 competition HTRtS: Handwritten text recognition on the tranScriptorium dataset”, 13th International Conference on Document Analysis and Recognition, IEEE: 1166-70.

Steinherz, T; Rivlin, E.; Intrator, N. (1999), “Off-line cursive script word recognition-a survey”, International Journal on Document Analysis and Recognition, 2: 90-110.

Toselli, Alejandro H; Romero, Verónica; Pastor i Gadea, M.; Vidal, E (2010), “Multimodal interactive transcription of text images”, Pattern Recognition, 43 (5): 1814-25.

Toselli, Alejandro H; Vidal, Enrique; Casacuberta, Francisco (2011), Multimodal Interactive Pattern Recognition and Applications, Springer.

Toselli, Alejandro H; Vidal, Enrique; Romero, Verónica; Frinken, Volkmar (2016), “HMM word graph based keyword spotting in handwritten document images”, Information Sciences, 370-371: 497-518.

Toselli, Alejandro H; Leiva, Luis A.; Bordes-Cabrera, Isabel; Hernández-Tornero, Celio; Bosch, Vicent; Vidal, Enrique (2017), “Transcribing a 17thcentury botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription”, Digital Scholarship in the Humanities, 33 (1): 173-202.

Toselli, Alejandro H; Vidal, Enrique (2013), “Fast HMM-Filler approach for Key Word Spotting in Handwritten Documents”, 12th International Conference on Document Analysis and Recognition: 501-5.

Vidal, Enrique (2017), “Advances in handwritten keyword indexing and search technologies”, Codicology and Palaeography in the Digital Age 4, eds. Patrick Sahle; Hannah Busch; Franz Fischer. Norderstedt, Books on Demand: 103-19.




DOI: http://dx.doi.org/10.14672/0.2018.1432

Enlaces refback

  • No hay ningún enlace refback.




Copyright (c) 2018 Celio Hernández Tornero, Verónica Romero Gómez, Joan Andreu Sánchez Peiró, Alejandro Héctor Toselli Rossi, Enrique Vidal Ruiz

Licencia de Creative Commons
Este obra está bajo una licencia de Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional.