Computational Intelligence to aid Text File Format Identification

Kuppili Venkata, Santhilata and Green, Alex Computational Intelligence to aid Text File Format Identification., 2019 (Unpublished) [Preprint]

Preview

Text
Report_journal.pdf
Download (231kB) | Preview

English abstract

One of the challenges faced in digital preservation is to identify the file types when the files can be opened with simple text editors and their extensions are unknown. The problem gets complicated when the file passes through the test of human readability, but would not make sense how to put to use! The Text File Format Identification (TFFI) project was initiated at The National Archives to identify file types from plain text file contents with the help of computing intelligence models. A methodology that takes help of AI and machine learning to automate the process was successfully tested and implemented on the test data. The prototype developed as a proof of concept has achieved up to 98.58% of accuracy in detecting five file formats.

Item type:	Preprint
Keywords:	File format identification, Digital Preservation
Subjects:	J. Technical services in libraries, archives, museum. J. Technical services in libraries, archives, museum. > JH. Digital preservation.
Depositing user:	Dr Santhilata Kuppili Venkata
Date deposited:	17 Sep 2019 08:31
Last modified:	17 Sep 2019 08:31
URI:	http://hdl.handle.net/10760/38969

Check full metadata for this record

References

Downloads

Downloads per month over past year

Actions (login required)

View Item

Facebook

Twitter

RSS