Data science is the talk of the town in the present-day world. It came into picture in 2008 when with the advancement of internet and device connectivity, immense flow of data was observed. A need for professionals who could handle and analyse the huge bulk of data of all kinds, was felt.

Since then, data science has gained a great momentum with certain myths and facts associated with it. We shall discuss some of those facts in this blog.

predictive analysis
A fun clip showing predictive analysis

Some people do not hesitate to call data science a fad, owing to its fast growing success and popularity. But I strongly demur to this idea as fad is something that lasts short. But data science is here to stay, for as long as one can imagine.

The volume of data worldwide is growing at an astronomical rate. Data produced in the last two years alone is greater in volume than all of the data produced before that. This clearly shows how important it is to have data science roaring to handle this data eruption across the platforms.

All this and there is much more to write in acclamation for data science. 

But First, What is Data Science?

Simply put, data science is the scientific approach to handle data and make the most out of it to drive a business. Now from handling data to making the most out of it comes the real heroic deal to manifest in the world of crooked and prodigious data. 

The scientific approach mentioned above encompasses various tools, machine learning, algorithms and a fair bit of analytical skills that wind the sailing ship of a business. The data collected is cleaned and analyzed with the help of effective tools and techniques.

Check out the most recommended Online Machine Learning Course for better understanding of the concepts.

Data – The Game Changer

Data science was born out of data. The volume of any type of data today cannot be underrated. Data is truly the game changer to have produced this entirely new field of data science. It is therefore customary for anyone interested in data science to know some facts first about data itself before steering for the data science facts. Reckoning that, let us now check out a couple of knowledge bytes around data.

data

What is the Biggest Data Unit You know?

From our childhood we have been reading about the different units of digital data starting from bit, byte to even gigabyte or terabyte. But it is exciting to know the measuring units of all the big data that is floating around and driving the engines of big and small businesses. Check out the facts behind Big Data and Data Science.

Here is a tabular representation of some data units worth having a look. 

AbbreviationUnitValueSize (in bytes)
bbit0 or 11/8 of a byte
Bbytes8 bits1 byte
KBkilobytes1,000 bytes1,000 bytes
MBmegabyte1,000² bytes1,000,000 bytes
GBgigabyte1,000³ bytes1,000,000,000 bytes
TBterabyte1,000⁴ bytes1,000,000,000,000 bytes
PBpetabyte1,000⁵ bytes1,000,000,000,000,000 bytes
xxEBexabyte1,000⁶ bytes1,000,000,000,000,000,000 bytes
ZBzettabyte1,000⁷ bytes1,000,000,000,000,000,000,000 bytes
YByottabyte1,000⁸ bytes1,000,000,000,000,000,000,000,000 bytes

  • If someone asks the fundamental – “How much data exists in the world today?”, there is probably no definite answer to this. But it is estimated that by the year 2025, 463 exabytes of data will be created daily. 
  • We create new data every millisecond. We make 40k search queries on google each second, which amounts to 1.2 trillion searches per year.
  • Data coming into any field or business is now no more just data, it is Big Data. 

Since data science revolves around data and is gaining a remarkable gravity, there are many interesting trivia trails in the data science arenas too, just like data, that deserve a heed. If you are a data science enthusiast, this article is sure to enamor you more towards it.

Here I list out the top intriguing facts about data science that will give you a closer view of the subject.. Let us know in your comments below, which one is your favourite trivia bite!


#TEN

Data Scientists and Data Analysts are NOT the Same

Data Analyst vs Data Scientist
data analyst Vs data scientist

This is a common myth among the people having a superficial idea about data science. Reality is, the work of data scientists and data analysts is totally different. Whereas data analysts work on finding the trends and analyzing the data, data scientists work on finding the cause of a trend and forecasting the upcoming trends.  As data science is a new field, popping up of certain misconceptions is inevitable.  

However, it is worth noting that the two work in tandem. They complement each other and work for a common goal. Now let us check out some of the basic differences between the two.

Data ScientistData Analyst
Discovers unexplored questions that may need an answer.Uses existing information to get workable data on existing questions
Skillset : Algorithms, data mining, programming, database management, data analysis, machine learning, predictive analysisSkillset : Data mining, modeling, programming, statistical analysis, database management, data analysis
They estimate the unknown dataThey work with known data set
They choose to address business problems that would have maximum effect They address the business problem assigned to them
They work at macro levelThey work at micro level

#NINE

Data is Never Clean

data is never clean
Data is dirty, but clean enough

That’s true. Data is nasty. Even when data is collected and cleaned with an eagle eye, some or the other data discrepancy does creep in at some point. And data scientists know to work with data chaos and noise, while cleaning it on the way.

About Dirty Data

Dirty data is of one or more of the following forms-

  • Incomplete
  • Duplicate
  • Irrelevant
  • Inaccurate 
  • Incorrect
  • Misspelled

“It’s way more than just errors. It can break your data science project.”

– towardsdatascience.com

The collected data being dirty is one problem. But the bigger problem is joining multiple datasets into a single entity. Now data can have been collected from different sources by different people, softwares, devices etc. There is a huge possibility of them being non-coherent. The join key may not be consistent or the format may be different for different systems. Data scientists clean the entire data by re-formatting, screening, organising and so on. 

“You will spend most of your time cleaning and preparing data.”

– Kamil Bartocha, head of data science R&D

Question now is, if data is so unclean, how then analysis is done out of it? Well, at this point it would be good to paraphrase, in the end, data is clean enough to reach a desired outcome. 

There are several data cleaning techniques which are implemented at every step to reach the least dirty form of data. And this becomes the basis of the final analysis.


#EIGHT

You do Not Need to be a Tech Savvy or Hold a PhD to Learn Data Science

no PhD, only skills

Data science sounds like a field of tech savvy professionals and this leads to the common misbelief that to be eligible to learn data science, one needs to have a super brain or hold a PhD degree. 

This is absolutely incorrect. As a matter of fact, anyone with an average intelligence can learn data science. 

Data science learning involves upskilling in the below fields – 

  • Statistical modelling
  • Predictive modelling
  • Machine learning
  • Programming
  • Algorithm
  • Analytics

This is the theory behind learning data science. But it would be interesting to hear a few words about data scientists, straight from the horse’s mouth.

Joma, a famous youtuber and an experienced data scientist at a GAFA(Google/Amazon/Facebook/Apple) company describes in a video, what it takes to learn and become a data scientist. I’ll summarize his view in the below points- 

  • One does not need a degree from a very high profile university.
  • Data scientists come from different backgrounds like – electricals, economics etc. Some even don’t have a science degree.
  • To learn data science one can do an internship or study basic stats. 
  • Other things like programming, algorithms, statistics – can be easily picked up along the way.  
  • One needs to have an empathy to ask the right questions related to data.
  • One needs to learn to apply the correct SQL queries and also learn a bit of Python language that anyone can, with the correct approach.
  • Data scientists work a lot with data, sequel query and presentation. 

In a nutshell, data science is not as heavy as it may look. Just an empathy towards possibilities is the requisite. Rest fall in place along the learning. 


#SEVEN

Data Science is Not Just Excel Sheets

data science is not just excel sheets
Excel sheets are just a part of data science

Contrary to the aforementioned belief, this one can seem surprising but many people are of the opinion that the life of a data scientist revolves around excel sheets.

This is anything but true. As mentioned before, data science is a vast field with basic focus on the correct and intended outcome. And to get that outcome, the data science professionals fight tooth and nail.  They use different data analytics techniques, SQL query, statistical analysis, predictive analysis and what not. 

They do work on excel sheets, but that is just a small unit within their work periphery. 

There was once a time when excel sheets played a major role in arriving at a conclusion and making analysis using formulae and calculations. At present with an easy availability of programming tools like Python and R, most of the data scientists spend a great portion of their time coding rather than on excel sheets.


#SIX

Data Science Competitions and Real Life Projects are Different

contest
real time project

Competition Vs real-life project

Getting a success in a data science competition(eg. through an online platform like Kaggle)  may give a boost to one’s confidence so much that one starts thinking of landing to a data science career. But it is here to understand that there is quite a lot of difference between a competition and a real-life scenario.

“ I remember I was a little bit overwhelmed when on my first real-life project all the models that typically worked well on Kaggle, miserably failed. I wish I was prepared for this.”

– Sergii Makarevych, data scientist

Here is a listing of few differences between the two-

Data science competitionsReal-life projects
Number of datasets is limitedThere is no limit on data and datasets. It’s the data that matters.
In online competition platforms, a warning is given when you have made an errorThere is no warning. You only learn after you have committed a mistake and borne the consequences. You go back all over again and do some data cleaning and rework. 
You need to write the code just onceYou need to rewrite the code every 5-15 minutes. 
You do not need to deploy your model. You definitely deploy your model
There is no authentication or securityAuthentication and security is equally important as the data itself. 

So, it would be safe to say that competitions do give a fair practice for data science. But it is not enough. You need to make your hands dirty and work in the live real-time projects to know the correct essence of data science. 


#FIVE

More Data Does Not Always Mean More Accuracy

More Data Does Not Always Mean More Accuracy

I am tempted to use a cliche here – Quantity does not always mean Quality. 

Let us understand this point with reference to data through the bottom-up approach. 

Suppose we have a dataset with the exact number of minimum data that is needed to make a correct analysis. This would be an ideal dataset. Now if we add some more data, the entire dataset will need to be reconstructed considering the new set of data as well. While reconstructing, there will be a need to clean the new data and spend time to understand their deviation from the existing set, if any. 

Now even after the new data is cleaned and merged to the existing ideal dataset, there is a possibility that some new element is still dirty but unidentified. This will lead to an overall degradation of the final result or analysis. 

In this case, lesser data was surely better than more data. 

Hence, more data doesn’t mean more insight or more value addition. Using smart data is the key.


#FOUR

Data Science Field has Different Roles, Not just Data Scientists

Data Science Field has Different Roles, Not just Data Scientists

Many people associate data science to data scientists only, ignoring the other prominent roles belonging to the field.

Data science includes all of these – 

  • Data engineer – They are responsible to manage data infrastructure throughout the data science lifecycle. Basic skills include – programming tools like python, database tools like NoSQL and big data tools like Hadoop. 
  • Data analyst – They find answers of questions by working through the data available, using appropriate tools. Basic skills include – programming, data visualisation, statistics, mathematics and of course data analysis.
  • Data scientist – Data scientists work on big data, analyse it and then communicate the finding through reports and presentations. Basic skills include – statistics, mathematics, programming,data visualisation, SQl, Hadoop, machine learning. 

Apart from these too you can make your career in data science through various other roles.


#THREE

Data Science is Not Meant Only For Large Organizations 

Data Science is Not Meant Only For Large Organizations

Many businesses believe that data science is meant only for big organizations having high class infrastructure. 

Such belief pops out from a wrong notion about data science. Data science is not made up of machines, heavy tools or the size of working resources. It perhaps is made up of big data, statistics, analysis, programming, presentation and some smart people who know how to make the best out of data and add value to the organization. It has nothing to do with big or small organizations.

A data scientist needs to arrive at a result that benefits the company. And no one really cares as to what tools and techniques have been used to achieve that result. 

Coming to infrastructure, all that is needed is a computing device, internet and some tools that help through the data science life cycle. There are a number of open source tools available online that can be downloaded to get the ball rolling. 


#TWO

Data Science Needs Great Communication Skills

Data Science Needs Great Communication Skills

Communication and presentation play a key role in data science.

Communication here refers to two areas – 

  • Coordinating within and among the teams during the different stages of data science life cycle. 
  • Presenting the final outcome in the most comprehensive and lucid manner. 

Without a proper communication, the entire exercise may fall futile. It may not project into any substantial product. It is important to learn public speaking as there are a lot of presentations involved. 

Also, learning to do better and crisp writing enhances one’s visibility in and around the organization.

Writing involves –

  • Powerpoint 
  • Blog
  • Email
  • Report 

An analysis without a proper communication in writing or otherwise, is just a placeholder with no significance.


#ONE

Data Science is Not for Everyone

Data Science is Not for Everyone

Let me first throw some light on what a data science interview smells like.

Again, I have taken this information from the famous youtuber and data science expert, Joma. 

The data science job interview questions spin around the below – 

  • SQL or a simple coding such as Python, as they want to make sure you know these because you would be doing a lot of it on the job
  • A quantitative analysis or a math question including statistics, probability or linear algebra.
  • Some graduation level math theorems like Bayes’ theorem, distribution, law of large numbers,  linear regression etc. 
  • Product interview : they give some hypothetical product and ask you how you can improve it. 

So did the questions smell sweet or sour? 

The topics may seem a bit overwhelming for those who are an absolute novice in this area. But those who are actually prepared and are ready to jump into the pool of data science, would find it interesting to glimpse over such interview topics. 

However enchanting it may look to a beholder, the data science field is no cake walk. Even preparing for the field needs a good amount of data affinity. 

There are lots of videos and articles on the web suggesting anyone can be a data scientist. It’s true with certain conditions. It is always a good idea to ask yourself first, why do you want to be in this field. It is good to do some reality check before taking a blind leap. 

Introspection in the start is a great virtue for a successful stint in any field.


Conclusion:

Data science is becoming inevitable with data explosion in almost every field. It offers a good career opportunity.Thinking of data science as a career option can be a wise decision for anyone who enjoys problem solving and has data empathy.  

As cool as it sounds, it has immense potential for both business as well as for job seekers. But it is advisable not to fall for any wrong information about the field. 

With its growing popularity, data science has got some myths associated, that we saw along with some interesting facts. Let me know in the comments if I missed any point.

Spread the love
Author

A reader by day and a writer by night, Rashi is a content writer and an ex-IT professional. Science and technology have always fascinated her. And so has art. But that doesn’t stop her from writing anything beyond these subjects. For her, any topic under the sun can be inked. All that is needed is a proper research and a writer’s bent of mind. In her free time she loves to read a book, do some DIY craft or play with her little son. When asked about her career shift from IT to writing, she replies with a smile, “IT gave me living, writing gives me life”.

Write A Comment