Data science is the talk of the town in the present-day world. It came into the picture in 2008 when with the advancement of internet and device connectivity, an immense flow of data was observed. A need for professionals who could handle and analyze the huge bulk of data of all kinds was felt.

Since then, the Data science course has gained great momentum with certain myths and facts associated with it. We shall discuss some of those facts in this blog.

Data Science Course Certificate and Training

Henry Harvin Ranks#1 in the List of Top 5 Upskilling Courses in India to Make You Job Ready by India Today India Today and Tribune India. Check out for more details using this Pioneer Link

View Course

The volume of data worldwide is growing at an astronomical rate. Data produced in the last two years alone is greater in volume than all of the data produced before that. This clearly shows how important it is to have data science roaring to handle this data eruption across the platforms.

All this and there is much more to write in acclamation for data science. 

Looking forward to becoming a Data Scientist? Check out the Data Science Bootcamp Program and get certified today.

But First, What is Data Science?

Simply put, data science is the scientific approach to handle data and make the most out of it to drive a business. Now from handling data to making the most out of it comes the real heroic deal to manifest in the world of crooked and prodigious data. 

Post Graduate Program in Data Science Courses

Henry Harvin Ranks#1 in the List of Top 5 Upskilling Courses in India to Make You Job Ready by India Today India Today and Tribune India. Check out for more details using this Pioneer Link

View Course

The scientific approach mentioned above encompasses various tools, machine learning, algorithms and a fair bit of analytical skills that wind the sailing ship of a business. The data collected is cleaned and analyzed with the help of effective tools and techniques.

Check out the most recommended Online Machine Learning Course for better understanding of the concepts.

Data – The Game Changer

Data Science was born out of data. The volume of any type of data today cannot be underrated. Data is truly the game changer to have produced this entirely new field of data science. It is therefore customary for anyone interested in data science to know some facts first about data itself before steering for the data science facts. Reckoning that, let us now check out a couple of knowledge bytes around data.

What is the Biggest Data Unit You know?

From our childhood we have been reading about the different units of digital data starting from bit, byte to even gigabyte or terabyte. But it is exciting to know the measuring units of all the big data that is floating around and driving the engines of big and small businesses. Check out the facts behind Big Data and Data Science.

Here is a tabular representation of some data units worth having a look. 

Abbreviation Unit Value Size (in bytes)
b bit 0 or 1 1/8 of a byte
B bytes 8 bits 1 byte
KB kilobytes 1,000 bytes 1,000 bytes
MB megabyte 1,000² bytes 1,000,000 bytes
GB gigabyte 1,000³ bytes 1,000,000,000 bytes
TB terabyte 1,000⁴ bytes 1,000,000,000,000 bytes
PB petabyte 1,000⁵ bytes 1,000,000,000,000,000 bytes
xxEB exabyte 1,000⁶ bytes 1,000,000,000,000,000,000 bytes
ZB zettabyte 1,000⁷ bytes 1,000,000,000,000,000,000,000 bytes
YB yottabyte 1,000⁸ bytes 1,000,000,000,000,000,000,000,000 bytes

  • If someone asks the fundamental – “How much data exists in the world today?”, there is probably no definite answer to this. But it is estimated that by the year 2025, 463 exabytes of data will be created daily. 
  • We create new data every millisecond. We make 40k search queries on google each second, which amounts to 1.2 trillion searches per year.
  • Data coming into any field or business is now no more just data, it is Big Data. 

Since data science revolves around data and is gaining a remarkable gravity, there are many interesting trivia trails in the data science arenas too, just like data, that deserve a heed. If you are a data science enthusiast, this article is sure to enamor you more towards it.

Here I list out the top intriguing facts about data science that will give you a closer view of the subject.. Let us know in your comments below, which one is your favourite trivia bite!

# DATA SCIENCES FACTS


1.Data Scientists and Data Analysts are NOT the Same

Data Scientists and Data Analysts are NOT the Same

This is a common myth among the people having a superficial idea about data science. Reality is, the work of data scientists and data analysts is totally different. Whereas data analysts work on finding the trends and analyzing the data, data scientists work on finding the cause of a trend and forecasting the upcoming trends.  As data science is a new field, popping up of certain misconceptions is inevitable.  

However, it is worth noting that the two work in tandem. They complement each other and work for a common goal. Now let us check out some of the basic differences between the two.

Data Scientist Data Analyst
Discovers unexplored questions that may need an answer. Uses existing information to get workable data on existing questions
Skillset : Algorithms, data mining, programming, database management, data analysis, machine learning, predictive analysis Skillset : Data mining, modeling, programming, statistical analysis, database management, data analysis
They estimate the unknown data They work with known data set
They choose to address business problems that would have maximum effect  They address the business problem assigned to them
They work at macro level They work at micro level

2.Data is Never Clean

data cleaning logo

That’s true. Data is nasty. Even when data is collected and cleaned with an eagle eye, some or the other data discrepancy does creep in at some point. And data scientists know to work with data chaos and noise, while cleaning it on the way.

About Dirty Data

Dirty data is of one or more of the following forms-

  • Incomplete
  • Duplicate
  • Irrelevant
  • Inaccurate 
  • Incorrect
  • Misspelled

“It’s way more than just errors. It can break your data science project.”

– towardsdatascience.com

The collected data being dirty is one problem. But the bigger problem is joining multiple datasets into a single entity. Now data can have been collected from different sources by different people, softwares, devices etc. There is a huge possibility of them being non-coherent. The join key may not be consistent or the format may be different for different systems. Data scientists clean the entire data by re-formatting, screening, organising and so on. 

“You will spend most of your time cleaning and preparing data.”

– Kamil Bartocha, head of data science R&D

Question now is, if data is so unclean, how then analysis is done out of it? Well, at this point it would be good to paraphrase, in the end, data is clean enough to reach a desired outcome. 

There are several data cleaning techniques which are implemented at every step to reach the least dirty form of data. And this becomes the basis of the final analysis.

3.You do Not Need to be a Tech Savvy or Hold a PhD to Learn Data Science

Data science sounds like a field of tech savvy professionals and this leads to the common misbelief that to be eligible to learn data science, one needs to have a super brain or hold a PhD degree. 

This is absolutely incorrect. As a matter of fact, anyone with an average intelligence can learn data science. 

Data science learning involves upskilling in the below fields – 

  • Statistical modelling
  • Predictive modelling
  • Machine learning
  • Programming
  • Algorithm
  • Analytics

This is the theory behind learning data science. But it would be interesting to hear a few words about data scientists, straight from the horse’s mouth.

Joma, a famous youtuber and an experienced data scientist at a GAFA(Google/Amazon/Facebook/Apple) company describes in a video, what it takes to learn and become a data scientist. I’ll summarize his view in the below points- 

  1. One does not need a degree from a very high profile university.
  2. Data scientists come from different backgrounds like – electricals, economics etc. Some even don’t have a science degree.
  3. To learn data science one can do an internship or study basic stats. 
  4. Other things like programming, algorithms, statistics – can be easily picked up along the way.  
  5. One needs to have an empathy to ask the right questions related to data.
  6. One needs to learn to apply the correct SQL queries and also learn a bit of Python language that anyone can, with the correct approach.
  7. Data scientists work a lot with data, sequel query and presentation. 

In a nutshell, data science is not as heavy as it may look. Just an empathy towards possibilities is the requisite. Rest fall in place along the learning. 

4.Data Science is Not Just Excel Sheets

Contrary to the aforementioned belief, this one can seem surprising but many people are of the opinion that the life of a data scientist revolves around excel sheets.

This is anything but true. As mentioned before, data science is a vast field with basic focus on the correct and intended outcome. And to get that outcome, the data science professionals fight tooth and nail.  They use different data analytics techniques, SQL query, statistical analysis, predictive analysis and what not. 

They do work on excel sheets, but that is just a small unit within their work periphery. 

There was once a time when excel sheets played a major role in arriving at a conclusion and making analysis using formulae and calculations. At present with an easy availability of programming tools like Python and R, most of the data scientists spend a great portion of their time coding rather than on excel sheets.

5.Data Science Competitions and Real Life Projects are Different

real-life-data-science-projects

Getting a success in a data science competition(eg. through an online platform like Kaggle)  may give a boost to one’s confidence so much that one starts thinking of landing to a data science career. But it is here to understand that there is quite a lot of difference between a competition and a real-life scenario.

“ I remember I was a little bit overwhelmed when on my first real-life project all the models that typically worked well on Kaggle, miserably failed. I wish I was prepared for this.”

– Sergii Makarevych, data scientist

Here is a listing of few differences between the two-

Data science competitions Real-life projects
Number of datasets is limited There is no limit on data and datasets. It’s the data that matters.
In online competition platforms, a warning is given when you have made an error There is no warning. You only learn after you have committed a mistake and borne the consequences. You go back all over again and do some data cleaning and rework. 
You need to write the code just once You need to rewrite the code every 5-15 minutes. 
You do not need to deploy your model.  You definitely deploy your model
There is no authentication or security Authentication and security is equally important as the data itself. 

So, it would be safe to say that competitions do give a fair practice for data science. But it is not enough. You need to make your hands dirty and work in the live real-time projects to know the correct essence of data science. 

6.More Data Does Not Always Mean More Accuracy

I am tempted to use a cliche here – Quantity does not always mean Quality. 

Let us understand this point with reference to data through the bottom-up approach. 

Suppose we have a dataset with the exact number of minimum data that is needed to make a correct analysis. This would be an ideal dataset. Now if we add some more data, the entire dataset will need to be reconstructed considering the new set of data as well. While reconstructing, there will be a need to clean the new data and spend time to understand their deviation from the existing set, if any. 

Now even after the new data is cleaned and merged to the existing ideal dataset, there is a possibility that some new element is still dirty but unidentified. This will lead to an overall degradation of the final result or analysis. 

In this case, lesser data was surely better than more data. 

Hence, more data doesn’t mean more insight or more value addition. Using smart data is the key.

7.Data Science Field has Different Roles, Not just Data Scientists

Many people associate data science to data scientists only, ignoring the other prominent roles belonging to the field.

Data science includes all of these – 

  • Data engineer – They are responsible to manage data infrastructure throughout the data science lifecycle. Basic skills include – programming tools like python, database tools like NoSQL and big data tools like Hadoop. 

  • Data analyst – They find answers of questions by working through the data available, using appropriate tools. Basic skills include – programming, data visualisation, statistics, mathematics and of course data analysis.

  • Data scientist – Data scientists work on big data, analyse it and then communicate the finding through reports and presentations. Basic skills include – statistics, mathematics, programming,data visualisation, SQl, Hadoop, machine learning. 

Apart from these too you can make your career in data science through various other roles.

8.Data Science is Not Meant Only For Large Organizations 

Many businesses believe that data science is meant only for big organizations having high class infrastructure. 

Such belief pops out from a wrong notion about data science. Data science is not made up of machines, heavy tools or the size of working resources. It perhaps is made up of big data, statistics, analysis, programming, presentation and some smart people who know how to make the best out of data and add value to the organization. It has nothing to do with big or small organizations.

A data scientist needs to arrive at a result that benefits the company. And no one really cares as to what tools and techniques have been used to achieve that result. 

Coming to infrastructure, all that is needed is a computing device, internet and some tools that help through the data science life cycle. There are a number of open source tools available online that can be downloaded to get the ball rolling. 

9.Data Science Needs Great Communication Skills

Communication and presentation play a key role in data science.

Communication here refers to two areas – 

  • Coordinating within and among the teams during the different stages of data science life cycle. 
  • Presenting the final outcome in the most comprehensive and lucid manner. 

Without a proper communication, the entire exercise may fall futile. It may not project into any substantial product. It is important to learn public speaking as there are a lot of presentations involved. 

Also, learning to do better and crisp writing enhances one’s visibility in and around the organization.

Writing involves –

  • Powerpoint 
  • Blog
  • Email
  • Report 

An analysis without a proper communication in writing or otherwise, is just a placeholder with no significance.

10.Data Science is Not for Everyone

Let me first throw some light on what a data science interview smells like.

Again, I have taken this information from the famous youtuber and data science expert, Joma. 

The data science job interview questions spin around the below – 

  • SQL or a simple coding such as Python, as they want to make sure you know these because you would be doing a lot of it on the job
  • A quantitative analysis or a math question including statistics, probability or linear algebra.
  • Some graduation level math theorems like Bayes’ theorem, distribution, law of large numbers,  linear regression etc. 
  • Product interview : they give some hypothetical product and ask you how you can improve it. 

So did the questions smell sweet or sour? 

The topics may seem a bit overwhelming for those who are an absolute novice in this area. But those who are actually prepared and are ready to jump into the pool of data science, would find it interesting to glimpse over such interview topics. 

However enchanting it may look to a beholder, the data science field is no cake walk. Even preparing for the field needs a good amount of data affinity. 

There are lots of videos and articles on the web suggesting anyone can be a data scientist. It’s true with certain conditions. It is always a good idea to ask yourself first, why do you want to be in this field. It is good to do some reality check before taking a blind leap. 

Introspection in the start is a great virtue for a successful stint in any field.

Conclusion:

Data science is becoming inevitable with data explosion in almost every field. It offers a good career opportunity. Thinking of data science as a career option can be a wise decision for anyone who enjoys problem solving and has data empathy.  

As cool as it sounds, it has immense potential for both business as well as for job seekers. But it is advisable not to fall for any wrong information about the field. 

With its growing popularity, data science has got some myths associated, that we saw along with some interesting facts. Let me know in the comments if I missed any point.

Recommended Read:

Also Check this Video

Recommended videos for you

49 Comments

  1. Nimrat kaur Reply

    The information in this blog is relevant to organizations and keeping customer satisfaction and performance in mind. I enjoyed the learning and plan to continue.

  2. I was glad to know the facts about data science this blog is amazing . Please let me know about fees, eligibility, Duration of the course.

  3. Rajni Arora Reply

    If you are going for a Data Science course in India then read this article once in order to know about the facts of data science. Keep sharing such useful and interesting articles. Great work!

  4. U have explained the facts about data science in an amazing way. Cleared many doubts through this blog. Great work by HH

  5. The content is well explained to understand about data science and its facts. U r working as a helping hand for the students.

  6. I have some blogs on this website. They share content with full of clarity and this facts about data science is also a great content.

  7. All the facts about data science are well explained. It is really a great for sharing this type of content.

  8. Wonderful and well explained content about the facts of data science. Thanks for this information

  9. I got maximum clarification through this blog and the fact about data science is well explained.

  10. Twinkle Rajore Reply

    The best article where I can found for data science courses. I have lot of knowledge for this blog. Thanku Henry Harvin

  11. Really helpful article it helps me to know about data science facts. I learned a lot for this blog

  12. It was really awesome facts about data science blog, the content, material & the trainers all are very good. I really enjoy it.

  13. Garima Bisht Reply

    I find these blog very knowledge oriented, excellently designed for every industry person.

  14. These facts are so informative! I didn’t know majority of the things. Really appreciate the blog.

  15. I was trying to find information about data science, really great blog, understood a lot about the topic!

  16. I would recommend Henry Harvin’s Course for this. The teaching was over the top for me, cleared every topic, and the support team after the course is really helpful.

  17. Data science is truly an interesting topic to learn and has so many opportunities. Thank you so much for the information.

  18. I was wondering about the data science course. This definitely makes me curious about it, thank you so much for the blog.

  19. I always thought more quantity of data means more quality. Never really thought that quality means relevant data. Thank you so much for this information.

  20. Wow, Learned something new today. Great work guys! Keep these types of content coming, they help a lot!

  21. Thank you so much for the facts! you guys really amaze me with this kind of content!!

  22. Love the detailed information you guys come up with every time! Thank you for these facts.

  23. The blog contains some really nice points, a great piece of information. Thank you so much for the information.

  24. I didn’t know these facts!! Really nice information, I was thinking about whether to take the course or not, I’m really curious now!

  25. Great facts about data science. I haven’t heard most of these facts, learned something new today. Thank you for the information

  26. thanks for your reliable information regarding to know about data science

  27. Good article for data science courses Thank you so much for sharing this pretty post, it was so good to read and very useful.

  28. Henry harvin has clearly showed all the facts related to the data science, best part of the blog is they have mentioned the difference among data analyst and data scientist.

  29. Blog points out the key points of the data science by keeping industrial demand in check and expectations from this course.

  30. Every part of the course is explained well. Moreover, these facts have cleared the clouds regarding this subject.

  31. The topics may seems interesting for those who are absolute new in this field. But those who are actually prepared and are ready to jump into the pool of data science, would find it interesting.

  32. The blog contains all the relevant information regarding data science. Its purely shows the current industrial demand and who should opt for this course.

  33. Rekha Sharma Reply

    Thanks for sharing a valuable topic in this Blog Really it is very helpful and interesting.

  34. Tanu Goswami Reply

    The best article where I can found for data science courses

  35. vishal Agnihotri Reply

    I am impressed with the writers, this article is useful to know about data science course

  36. suman shukla Reply

    one of the top blog for student who want to learn data science course in india

  37. Manish sharma Reply

    very informative article on the data science course really great work done by the HH

  38. Vijay L. Parate Reply

    Worth it Best facts-about-data-science. Every chapter is quite descriptive and gave good information. Thanks for this opportunity of continous learning.

  39. PATIL OMKAR Reply

    The data science training blog helped me to understand in a better figure, all the improvement tools that we can use in the daily routine to continuous improvement in the processes. I strongly recommend.

  40. Jaclin Farnandiz Reply

    This great data science blog, has combination of reading, learning videos and quizzes, perhaps best combination for learning.

  41. MULLA TABISH Reply

    The blog is a very powerful one with well prepared learning materials and a super video learning material that enabled me to be well focused from the beginning till the end.

  42. NARAWADE KAUSHAL Reply

    The information in this blog is relevant to organizations and keeping customer satisfaction and performance in mind. I enjoyed the learning and plan to continue.

  43. Very informative blog about facts about data science you should know.

  44. nikhil chauhan Reply

    did not knew about the facts mention on the Data Science great content.

Join the Discussion

Interested in Henry Harvin Blog?
Get Course Membership Worth Rs 6000/-
For Free

Our Career Advisor will give you a call shortly

Someone from India

Just purchased a course

1 minutes ago

India Address:

Henry Harvin House, B-12, Sector 6, Noida, Uttar Pradesh 201301

FREE 15min Course Guidance Session: