Table Analysis and Information Extraction for Medical Laboratory Reports
Published in IEEE Fourth International Conference on Cyber Science and Technology (CyberSciTech), 2018
Recommended citation: W. Xue, Q. Li, Z. Zhang, Y. Zhao and H. Wang, "Table Analysis and Information Extraction for Medical Laboratory Reports," IEEE 4th International Conference on Cyber Science and Technology, 2018, pp. 193-199. https://ieeexplore.ieee.org/document/8511886
Medical laboratory report is one kind of essential document for health care professionals in patient assessment, diagnosis, and long-term monitoring. Compared with paper files, electronic records are convenient for keeping up to date, complete, and accurate, which is already common in modern medical system. But the recognition from historical medical laboratory reports is still in great needs, especially in developing countries. In this paper, we present a document image processing system used for extracting information from medical laboratory reports. Given an image of medical laboratory report, its table areas and texts are firstly segmented following a top-down pipeline. Then, recognition is undergoing for every text that may contain Arabic numerals, mathematical symbols, and multilingual characters. We evaluate the system on a new dataset of medical laboratory reports that includes scanned images and camera-captured images. Our experiments demonstrate that the proposed system can effectively segment the medical document according to its layout and recognize the texts mixed with multi-type characters and symbols to obtain information from medical laboratory reports. The proposed system and the public dataset will benefit the remote healthcare in developing countries.