Kuppili Venkata, Santhilata and Green, Alex Computational Intelligence to aid Text File Format Identification., 2019 (Unpublished) [Preprint]
Preview |
Text
Report_journal.pdf Download (231kB) | Preview |
English abstract
One of the challenges faced in digital preservation is to identify the file types when the files can be opened with simple text editors and their extensions are unknown. The problem gets complicated when the file passes through the test of human readability, but would not make sense how to put to use! The Text File Format Identification (TFFI) project was initiated at The National Archives to identify file types from plain text file contents with the help of computing intelligence models. A methodology that takes help of AI and machine learning to automate the process was successfully tested and implemented on the test data. The prototype developed as a proof of concept has achieved up to 98.58% of accuracy in detecting five file formats.
Item type: | Preprint |
---|---|
Keywords: | File format identification, Digital Preservation |
Subjects: | J. Technical services in libraries, archives, museum. J. Technical services in libraries, archives, museum. > JH. Digital preservation. |
Depositing user: | Dr Santhilata Kuppili Venkata |
Date deposited: | 17 Sep 2019 08:31 |
Last modified: | 17 Sep 2019 08:31 |
URI: | http://hdl.handle.net/10760/38969 |
References
Downloads
Downloads per month over past year
Actions (login required)
View Item |