Big Data for Freshers

Big Data for Freshers

Hi, I am an engineering student and wanna know how to start working in the field of Big Data ? Cause I wanna learn more about it before stepping into the industry. I have a great interest in Big Data technologies.

49 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The simplest way of describing “big data” is to say “lots and lots and lots of data” – because that’s what the term means. When people talk about big data, though, they also generally mean the process for making sense of all that data – filtering it to draw out meaningful patterns, predictions and conclusions.

What is all this data, then, and where is it coming from? Well, it’s everywhere. Think about your working day and you’ll realise you interact digitally with organisations, hundreds, if not thousands of times: you check your emails on your phone, send tweets, order something online, interact with websites, buy your lunch from the local supermarket, and so on. All these interactions create data points that can be captured.

As it happens, every day we are generating 15 petabytes of data (that’s 1,000 to the power of 5) and 12 terabytes of tweets worldwide. We create 350 billion meter readings per annum, and 500 million call data records. And those examples are just the tip of an enormous data iceberg.

Of course, for any one business, you won’t be handling 15 petabytes of data – but you could well be handling hundreds of thousands of data points; if, that is, you have systems in place to capture the way your customers interact with you on their phones, your website, in store, at point of sale, and so on. Each interaction can be captured, giving you reams of useful data that can give you invaluable insights into individual and group customer behaviour – if only you could make sense of it.

That’s where data analytics comes in. Data analytics is the process of making sense of all that data and drawing out useful patterns and insights.

That's a great insight about the field of Data Analytics yaswanth k. Can you also elaborate about the tools and tech I should brace myself with to be able to work in the field of Big Data ?

Here is my advise...

Start programming! For example, in C or C++ write a simple program: Create an array of 1 Giga Elements of random Single-Precision values ( 4 Giga Bytes of memory in total ) and sort it with some sorting algorithm, like Merge, Heap or Quick sort.

Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers". What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered "big" one year becomes ordinary later. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration

Best Reply

some of the skills you may required for "BIG DATA"

1. Apache Hadoop

Sure, it’s entering its second decade now, but there’s no denying that Hadoop had a monstrous year in 2014 and is positioned for an even bigger 2015 as test clusters are moved into production and software vendors increasingly target the distributed storage and processing architecture. While the big data platform is powerful, Hadoop can be a fussy beast and requires care and feeding by proficient technicians. Those who know there way around the core components of the Hadoop stack–such as HDFS, MapReduce, Flume, Oozie, Hive, Pig, HBase, and YARN–will be in high demand.

2. Apache Spark

If Hadoop is a known quantity in the big data world, then Spark is a black horse candidate that has the raw potential to eclipse its elephantine cousin. The rapid rise of the in-memory stack is being proffered as a faster and simpler alternative to MapReduce-style analytics, either within a Hadoop framework or outside it. Best positioned as one of the components in a big data pipeline, Spark still requires technical expertise to program and run, thereby providing job opportunities for those in the know.

3. NoSQL

On the operational side of the big data house, distributed, scale-out NoSQL databases like MongoDB and Couchbase are taking over jobs previously handled by monolithic SQL databases like Oracle and IBM DB2. On the Web and with mobile apps, NoSQL databases are often the source of data crunched in Hadoop, as well as the destination for application changes put in place after insight is gleaned from Hadoop. In the world of big data, Hadoop and NoSQL occupy opposite sides of a virtuous cycle.

4. Machine Learning and Data Mining

People have been mining for data as long as they’ve been collecting it. But in today’s big data world, data mining has reached a whole new level. One of the hottest fields in big data last year is machine learning, which is poised for a breakout year in 2015. Big data pros who can harness machine learning technology to build and train predictive analytic apps such as classification, recommendation, and personalization systems are in super high demand, and can command top dollar in the job market.

5. Statistical and Quantitative Analysis

This is what big data is all about. If you have a background in quantitative reasoning and a degree in a field like mathematics or statistics, you’re already halfway there. Add in expertise with a statistical tool like R, SAS, Matlab, SPSS, or Stata, and you’ve got this category locked down. In the past, most quants went to work on Wall Street, but thanks to the big data boom, companies in all sorts of industries across the country are in need of geeks with quantitative backgrounds.

6. SQL

The data-centric language is more than 40 years old, but the old grandpa still has a lot of life yet in today’s big data age. While it won’t be used with all big data challenges (see: NoSQL above), the simplify of Structured Query Language makes it a no-brainer for many of them. And thanks to initiatives like Cloudera‘s Impala, SQL is seeing new life as the lingua franca for the next-generation of Hadoop-scale data warehouses.

7. Data Visualization

Big data can be tough to comprehend, but in some circumstances there’s no replacement for actually getting your eyeballs onto data. You can do multivariate or logistic regression analysis on your data until the cows come home, but sometimes exploring just a sample of your data in a tool like Tableau or Qlikview can tell you the shape of your data, and even reveal hidden details that change how you proceed. And if you want to be a data artist when you grow up, being well-versed in one or more visualization tools is practically a requirement.

8. General Purpose Programming Languages

Having experience programming applications in general-purpose languages like Java, C, Python, or Scala could give you the edge over other candidates whose skill sets are confined to analytics. According to Wanted Analytics, there was a 337 percent increase in the number of job postings for “computer programmers” that required background in data analytics. Those who are comfortable at the intersection of traditional app dev and emerging analytics will be able to write their own tickets and move freely between end-user companies and big data startups.

9. Creativity and Problem Solving

No matter how many advanced analytic tools and techniques you have on your belt, nothing can replace the ability to think your way through a situation. The implements of big data will inevitably evolve and new technologies will replace the ones listed here. But if you’re equipped with a natural desire to know and a bulldog-like determination to find solutions, then you’ll always have a job offer waiting somewhere.








Thanks Mr. Sergey Kostrov and Mr.yaswanth k. for taking time out and helping me !! I am gonna start right away with everything you have suggested. Kudos !!

have a great start and better future.....! :-)

Sure yaswanth k. sir !! Thank You !!

Thanks for your time nancy a, but I am not a resident of chennai. Hence, the links are of no use !

Hey Ashish A!

I am one of the fans and followers of Big Data. I consider that it is the future, and whoever is in this industry, he is the winner!

It's nice to see that you are interested in this topic too. Thee so many tools now to understand and use big data correctly.

If you are that much interested in big data, I can suggest you visiting this website to find out more about the newest big data tools

DataPlay is an integrated suite of applications, which fully meets your analysis, visualization and presentation needs. It gives integrated project management, complete data management, better and faster analysis, as well as automated rich visualization.

You are welcome with any question you'll have!

John Rosenberg

Awesome comments


Good to hear that John R. and thanks a ton for your suggestion.

Thanks for the explained view ,Ashisha :)


You can go for a training of hadoop and big analytics. I also joined the same from an online training institute. You can also join the training. For more information write at:

thanks for the post were very helpful. I am a student who is studying also about big data and this forum is very helpful.

Our pleasure Aulia R. :)

I have learn the new technical tricks to recover big data with our latest ideas.

thank you yaswanth k for your post about some of the skills you may required for "BIG DATA" .it was very helpfull.

I'm trying to learn Python for machine learning. As I go deeper and deeper I come across more tools and SDK's I have no idea what to do. I'm confused with stuffs like Theano, NumPhy etc.I hope that I'll understand all these sometime

I have look in post for getting data retrieve technique and know about lots of latest technology.

Hi luke l., hope you receive all the necessary information from this post. You can write your queries, if any, for the experts in the community to help you out.

Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.

Thank you For the Post..! Useful for us.

Big Training courses,


Here i have shared some reference about bigdata & Hadoop Training. hope it will be useful to you.

Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate.



LIVEWIRE Velachery

NO-01, 2nd floor, railway station service road, Annai Indira Nagar,

Velachery, Chennai-600042

How is Big data fairing among students these days? Is there a lot of interest still?

Thanks for sharing such a valuable content on Big Data Analytics Solutions

Data Management Services

To get knowledge about big data and data analytics is Excelr Solutions.
<a href="" title="data science courses in bangalore">

data science courses in bangalore</a>

Thank you for sharing so much of information. I am currently undergoing Manual testing training in Chennai. I found this to be really helpful. Regards.

[url=] Manual Testing Training in Chennai [/url]

ExcelR Offers Business Analytics / Data Scientist Course / Data Analytics Training &amp; Data Science Course Training In Bangalore, With 100% Placement.<a href="”>Data science certification in Bangalore</a>

ExcelR Offers Business Analytics / Data Scientist Course / Data Analytics Training &amp; Data Science Course Training In Bangalore, With 100% Placement.<a href="”>Data science training in Bangalore</a>

Nowadays bigdata is used to predictive analytics,user behaviour analytics, and now it sets new trends in business..

Hi there! This post could  be written any better answers 

Python Training in Bangalore

The learning pathway of big data involves learning the right technology which is relevant to your job role. Let us discuss the basic big data skills that you need to begin with.

The core of big data technologies is formed by Scala or Java. But you can also choose R or Python if you are uncomfortable coding in the former ones. Hence these skills need your focus one by one. You can pick any of these programming languages as a basic qualification.

The big data engineer shall also carry the knowledge of Linux and Bash scripting as it is used for deployment of applications. Hadoop Developer shall also be familiar with the technologies HBase and Zookeeper.

If you want to explore more, Flume and Sqoop are also widely popular technologies. The knowledge of these tools and technologies will make you an expert in essential big data skills. This will serve as a foundation for the next level.

Once you carry the scripting and programming skills, next requirement is the knowledge of cloud. The big data technologies need the experience of cloud, and you can begin learning it by practicing on any cloud provider like AWS with smaller datasets.

The knowledge of Hadoop Distributed Filesystem along with NoSQL is also required for a big data professional as these form the basic infrastructure of big data. The knowledge of NoSQL database in your relevant domain can help you build a strong big data foundations. This pathway can prepare you with basic big data expertise.

There could be many other paths to follow further once you carry these skills. There could be a Kafka path or MapReduce path as per your interest. Else, you can also go for Pig or Hive to study more about the trending technologies of big data.

For a professional looking for experience and skill set for big data architect, the advanced certifications and skills are also required. These include the knowledge of MongoDB and Cassandra as these two are very commonly used these days.

The best CCIE Routing and Switching Lab Crams would be provided by many institutions, but for me, the best one would be the SPOTO CCIE Club, they have the best CCIE LAB Dumps, which are cheap as well as totally reliable.

Awesome post sir, 

Best Training Institute in Marathahalli

Best AWS with Devops Training in Marathahalli

Elegant IT Services offers Best Training Institute in Marathahalli with 100% placement assistance..!!

Thanks for sharing this amazing information about data science its very helpful to our company. we are also offering big data with other proffesional courses.Best big data and data science institute in hyderabad

Hi All,

This is a knowledge-driven discussion forum, so please don't advertise your startups and ventures here. It's a request.


This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

PANDORA ONE APK  am one of the fans and followers of Big Data. I consider that it is the future, and whoever is in this industry, he is the winner!  post could  be written any better answers 

Thanks for this much help. It really helped while web development with higher accuracy.

Quantum It Innovation

Big Data is a growing field and you probably have a lot to learn if you want to learn about it.I will try to provide the path I took:

1. Start by Learning a Programming Language:

If you want to tackle Big data you should know Python/Java. If you don't know both of these start with Python. Just start with the basics- For loop, Lists, Dictionaries, Iterating through a list and dictionary etc.

2. Learn about a Big Data Platform:

Once you feel that you could solve basic problems using Python/Java, you are ready for the next step. You need to learn about some Big Data Technology like Hadoop/Spark. Now you could start with Spark also but I feel that Hadoop would be the best place to start as it can provide you with more background of the MapReduce Paradigm, and you will be able to understand the problems that introduction of Spark solves. 

Once you are done through these,  you would have gained quite a basic understanding of concepts and you would have installed a Hadoop VM in your own machine. You would also try to solve the Basic Wordcount Problem.

Just read the basic mapreduce codes. Don't use Iterators and Generators yet. This has been a starting point for many of us Hadoop developers.

3. Learn a Little Bit of Bash Scripting:

In the meantime while you are learning Hadoop and in the process of getting your hands dirty with coding, try to read up on shell scripting. 
It allows you to do simple data related tasks in the terminal itself.

This learning plan would help you a bit to learn data science concepts as well, as Python, Big data concepts and Shell scripting are used there.

4. Learn Spark:

Now comes the next part of your learning process. This should be undertaken after a 
little bit of experience with Hadoop. Spark Will provide you with the speed and tools that Hadoop couldn't. But you need to know Scala/Python to use it. That is one of the reason I suggested that you go with Python if you don't know any of Java/Python.

Now Spark is used for data preparation as well as Machine learning purposes.Hope this Helps. Now get working!!!

Colaberry is a proven leader in Data Science Training and Consulting World in the USA.Colaberry transformed the lives of 4000+ professionals in the USA to achieve their career goals in Data Analytics.For the first time ever Colaberry is coming to India with a Practical approach to Data Science training.Colaberry created a unique model which is a combination of “Harvard Case" method with "Learn by Doing approach that transform the way student learn Data Science.

The program starts with a problem statement where the students will be taught on how to address the problem and find a right solution to it. In this process students are encouraged to learn the concepts as well as apply them to real-world problems. By interacting with Global peer group the students get adept at analyzing issues, making difficult decisions, build each others ideas and exchange perspectives; wherein they are finally transformed as the leaders for tomorrow.


Downloadimage/png Optimizing Your Startup.png608.12 KB

Hadoop skills are there to clamor for – this is an indisputable fact! The Allied Market Research says the Global Hadoop Market may reach $84.6 Billion by 2021. Big Data is something which will get bigger day by day so furtherance in big data technology will not refrain from but Hadoop is a must know skill in the present day scenario as it is the hub of Big Data solutions for many enterprises and new technologies like Spark have evolved around Hadoop.

As fresher has a great opportunity in the field of Big Data & Hadoop. I would recommend you to first get a certification before starting the job hunt. This will give you a stronger hand while searching or applying for a Hadoop entry-level job and second you’ll also be able to test to knowledge & skills in dealing with real-time problems. Learn from Big data with data science certification 

meenati biswal

I have picked cheery a lot of useful clothes outdated for this amazing blog. I’d love to return greater than and over again. Thanks! 

Mean stack online training

Thanks for sharing useful information. This is the best blog.
I have also have wrote good content in my website <a href="">Innomatics research Labs</a>. please check out once.
Data science and big data training in hyderabad

You need to learn latest big data technologies for that to be updated and know all the news. 

Leave a Comment

Please sign in to add a comment. Not a member? Join today