See More
Popular Forum

MBA (4887) B.Tech (1769) Engineering (1486) Class 12 (1030) Study Abroad (1004) Computer Science and Engineering (988) Business Management Studies (865) BBA (846) Diploma (746) CAT (651) B.Com (648) B.Sc (643) JEE Mains (618) Mechanical Engineering (574) Exam (525) India (462) Career (452) All Time Q&A (439) Mass Communication (427) BCA (417) Science (384) Computers & IT (Non-Engg) (383) Medicine & Health Sciences (381) Hotel Management (373) Civil Engineering (353) MCA (349) Tuteehub Top Questions (348) Distance (340) Colleges in India (334)
See More

Big Data Analytics Choice of Technology

General Tech Technology & Software
Max. 2000 characters

Garry Buttler


( 5 months ago )

I am asked to asses possible chice of technology we need to use for the problem described below. Possible options are Hadoop, Hive, and Pig. I do not have much experience with either of those. If you could point out a good source to read. I google and find tons of references but it is hard to find a step by step explanation or comparison.

Here is the task I need to solve.

Users enter sentences into the system. Sentences are broken out by words and stored in Cassandra column family. Each row is a single word (key) and column names are the time stamp this record was entered with no column values.

I need to be able to query the database and extract N words that are taken from the following breakdown:

a_1% must be the top words from period T1 from now into the past a_2% must be the top words from period T2 from now into the past a_3% must be the top words from period T3 from now into the past

a_n% must be the top words from period T_n from now into the past

a_1+a_2+...a_n = 100%

and T1, T2, etc are arbitrary time intervals.

any suggestion for a choice of technology I should use for this task would be greatly appreciate. We are using Cassandra and we are quite familiar with it. Now we need to decide which analytical tool to put on top of it.

Links or specifics would be quite appreciated.

what's your interest