See More
Popular Forum

MBA (4887) B.Tech (1769) Engineering (1486) Class 12 (1030) Study Abroad (1004) Computer Science and Engineering (988) Business Management Studies (865) BBA (846) Diploma (746) CAT (651) B.Com (648) B.Sc (643) JEE Mains (618) Mechanical Engineering (574) Exam (525) India (462) Career (452) All Time Q&A (439) Mass Communication (427) BCA (417) Science (384) Computers & IT (Non-Engg) (383) Medicine & Health Sciences (381) Hotel Management (373) Civil Engineering (353) MCA (349) Tuteehub Top Questions (348) Distance (340) Colleges in India (334)
See More
( 5 months ago )

Technology to use for understanding unstructured document

General Tech Technology & Software
Max. 2000 characters

Charles Kyobe


( 5 months ago )

I've multiple unstructured documents (PDFs and HTMLs). These unstructured documents have a predictable pattern. And there are 'n' instances of these patterns.

I need to write a program to extract information from these documents. The program should be in such a way that once it is trained for a particular pattern, it should be automatically pick the data points from other documents of same pattern.

Which technology to use for writing this program? Any help on specific algorithm will be much appreciated.

what's your interest