Data Science Roles in a Corporate: It is too confusing!!!
I started off my corporate journey with a title called Associate Consultant and I specialized in Supply Chain management; to be very specific into what industry call indirect procurement. Two years back I decided to change my track because I did not find that thrill that an engineer is expects to have; being a consultant giving consulting services to companies and organization.
I decided to get into the world of data specially into data science because that was exactly what I knew about data science jobs in the industry. Reality struck hard when I started researching on the various roles associated with data science in the IT industry.
I was thrilled to learn that data science is the general term which encapsulates many other titles such as Data Engineers, Data Analysts, Machine Learning Engineers, etc. What I thought of as one role in the industry was actually a role that needed expertise in many fields, being a data scientist.
I see a similar condition for many young graduates who complete their degree and venture out in search of jobs in the leading corporates. A similar thing happens for young professionals who just started their careers and perhaps wanted to switch the same to data science, however, they are in a better position because they are already in the industry and getting an access to industry professionals becomes easy and they are more readily available.
Adding on, many corporates believe in the concept of reskilling and cross skilling which ensures employees getting trained on recent technologies. Nevertheless, students and young graduates have no such privileges and hence they often end up being confused and left in the dark as so what various roles have to offer them in terms of challenges and career growth.
It also creates confusions as to how one role is different from the others such as how machine learning engineers are different than the data engineers or how data scientists are different from data engineers?
The main reason behind so many distinct roles being currently created in any organization which deals in data science is perhaps related to the fact that industries want to acquire as many talents as possible to make their data science services as robust as possible.
The only way that this can be done is to segregate different aspects of a data science projects into different departments and let one owner execute that aspect of the project instead of a limited number of individual driving a project under a single banner called the data scientists or the data analysts.
Another reason that has given rise to so many roles in the data industry is the ever changing perspective and the way organizations want to use the captured data. Some years back the focus was to primarily figure out trends so that organizations are better prepared to handle them whereas, in the current scenario the focus has changed to not only get insights but to build models that can predict the future to making intelligent applications to enabling the applications to adapt to the ever changing demographics of the data and the sources from which they originate.
All of these technological advancements that was once thought to be a science fiction has now turned into a reality and in order to make this into a reality, organizations had to introduce a variety of roles to incorporate all the complexities.
In this article, I will try to get into as much detail as I can with the two roles that sound similar but actually are not. Data Engineering and Data Science can be considered interdependent on each other to some aspect but on a broader perspective they are not and we will see how.
I will talk about how Data Engineering is different from Data science. I will cover this with the help of three factors which sets both Data Engineering and Data Science apart,
- The job profiles for Data Engineering and Data Science.
- The average pay scale for Careers in Data Engineering and Data Science
- The pathway to Data Engineering and Data Science
Hopefully, this will give you a clear picture of what Data Engineering and Data Science roles deal in and what are the skillsets needed to get into these roles. But before getting started with the differences between the two roles in the data industry, let us first have an overview of what each roles deals in and how each of these roles contributes to the overall picture.
Data Engineering: A Bird’s eye view
Data Engineering is the aspect of data science that focuses on mechanisms that captures data, validating data and also making sure the data is not erroneous. As is the case with most of the other roles of data science, data engineering is more of a practical aspect of data science which does not involve data analysis, even if it does in some cases it is limited to very menial data analysis which comes handy when data scientists or data analysts start analyzing the data.
Data Engineering is more focused on building data sets which later results into a comprehensive understanding of the data. They are usually experts in the one or more of the following fields,
- Computer and system architecture
- Database administrators
- Database designers
- Experts in integrating one application with others so that data flows seamlessly
The key skillsets one should possess if they want a career in Data Engineering are an in depth understanding of information theory, information system and system architecture. Below are some of the key skillsets that the recruiters search for while hunting for data engineers.
- Data modelling
- Database theory and administration
- Database reporting
- ETL practice in real time data
- Architectures: Computers, systems and integrated environments
In a nutshell, data engineering is inclined towards data gathering, initial data analytics and data flows in an organization. Data Engineering has very limited data analytics and they seldom play any role in extracting insights and producing fancy graphs and visualizations from the data.
Data Scientists: A Bird’s eye View
Data Scientists are the story tellers of the corporate world who are experts in creating compelling visualizations and project such intense convictions while telling the story from the data that even big shots in any organization nod their head in tandem with what is being said and delivered by the story teller. Data Scientists are considered to be the only species in the data industry who are paid because they are a jack of all trades and master of none.
One can consider data scientists as a part time mathematician, part time statistician, part time programmer, part time computer engineer, part time business analysts and part time thinkers. They act as bridges between the data analysts and the business and can handle both aspect of the same coin. They are highly motivated professionals with a knack of finding out as many trends as possible from a given chunk of data.
Now that we know that data scientists are the ultimate professionals in the data field, below are some of the skillsets that a data scientist must possess in order to get hired by the HR professionals,
- Data Analytics
- Machine Learning
- Data Visualization and reporting
- Statistics and Mathematics
- Data Cleaning and data munching
- BigData: Tools, platforms and concepts
- Cloud computing
- Data Warehousing
- Effective communication skills
In essence, a data scientist is someone who not only analyses the data and get insights from it but also are effective communicators who communicate their findings as fluently as possible.
Now that we have seen an overall picture of both Data Engineering and Data Science and their key roles and responsibilities in the data world, let us now get into the details of the three factors that sets the fields different from each other.
The job profiles for Data Engineering and Data Science
One of the most fruitful way to understand how Data Engineering is different from Data Science is to understand the job profiles of both data engineering and data science.
Job Profile in Data Engineering
The usual job description of a data engineer in any corporate mainly focuses in building a robust infrastructure and pipeline of data which enables the data scientists to do their jobs. The data engineers are the ones that set up data flows and data pipelines so that the data gets captured and stored which is later retrieved by the data scientists for further insights and investigation.
Data Engineering not only deals in building a solid data infrastructure but also plays a key role in optimizing the same in the corporate set up. The data engineer often engages with different portfolios and experts in the IT landscape which often include software developers, database admins, data analysts and data scientists and ensures all the data related architecture in any of the fields is optimized. In some extreme cases, data engineers are often engaged in a complete redesign of the entire data infrastructure of the organization.
The key responsibilities that are generally seen in the job description of a data engineer are as follows,
- Building and maintaining a robust data pipeline and a solid data architecture.
- Collection and wrangling of data to meet business and technical requirements of the underlying processes and business tools.
- Identify areas of improvements of the data infrastructure; automate manual data related jobs; make the data infrastructure scalable and efficient.
- Make the data infrastructure optimal for Extraction, Transformation and Loading of the data from a wide array of data collecting sources.
- Work with business stakeholders including management, product design teams, software development teams and data analytics teams and help them out with their data related needs.
- Working with the security teams to ensure the data is secured across nationalities and data centers.
The key working experience that is usually associated with the job description is as follows. Please note that the qualification requirements might vary depending on the job level, years of experience and the position for which the hiring might take place.
It also depends on the organization, the number of employees, the clients and the industry in which it operates. These may change from time to time and depending on the requirements.
- Knowledge in advanced SQL and an experience in working across different relational and non-relational databases including structured and unstructured databases.
- Experience in Big Data technologies including big data pipelines and data sets.
- Ability to dive deep into the internal and external processes of the organization and generate reports to answer specific questions and also find out areas of improvements.
- Working experience with unstructured data.
- Build processes which complement the Extraction, Transformation and Load of the data.
- Ability to integrate large and complex data sets from a wide variety of related and non-related data sources.
- Experience in working across cross functional teams and the ability to work under pressure.
- Experience with Big Data tools such as Hadoop.
- Experience with unstructured databases such as PostgreSQL and Cassandra.
- Experience with tools like Azkaban, Luigi, Airflow, etc. and related workforce management tools.
- Experience with AWS.
- Experience with Streaming technologies and tools.
- Experience with programming languages such as Python, R, Scala, etc.
Job Profile for a Data Scientist
The job description of a data scientist particularly focuses on analytical, statistical and programming skills because at the end of the day it is the data scientist who brings out the trends and visualizations from the data so that a well-crafted story is communicated to the business stakeholders for decision making.
Data Scientists are known for their love towards their data and they analyze the data to provide insights and develop data driven solutions to the problems at hand. Data Scientists are expected to have knowledge on a wide spectrum of concepts and technologies including machine learning, programming languages, databases, statistics and reporting.
Usually, a data scientist is expected to work with cross-functional teams including sales, marketing, product and executive leadership of an organization to answer specific questions to gain business insights. The data scientist uses data models to develop optimization opportunities on a product and process level. They can execute simulations on large and complex data sets to develop predictive models and generate insights to help answer questions and optimize business outcomes.
The key responsibilities of a data scientist are the following,
- Ability to work with stake holders across the organization and use data from the legacy systems to generate insights.
- Ability to analyze large data sets to derive opportunities for process improvement and business strategies.
- Ability to run tests and develop an idea whether new data sources and data from new applications are trustworthy and accurate.
- Develop and use predictive modelling to improve user experience and optimize marketing strategies along with developing models for other business requirements.
- Develop test cases and hypothesis to test data models and develop metrics to measure their accuracy.
- Coordinate with cross functional teams to develop models specific to their needs and help them install and run on production environments.
- Develop strategies to measure the performance of the models and design feedbacks to improve upon their performances.
The key working experience and qualifications of a data scientists are as follows,
- Strong problem solving skills
- Working experience on statistical programming languages such as R, SAS, etc.
- Experience in Data Architecture
- In depth knowledge on machine learning and their applications to real world problems. Relevant experience is often considered as a plus when recruiters scan through the resume of the applicants.
- Knowledge on advanced statistics and mathematics with good knowledge on probability theory, linear algebra and hypothesis testing.
- Experience in working across teams and stakeholders.
- Coding experience is a must for almost all the openings in the corporate. Having coding experience in the data science programming languages such as Python, R, etc. is often an advantage.
- Experience in working with structured and unstructured databases.
- Experience in web services such as Spark, DigitalOcean, etc.
- Experience in developing models using advanced machine learning such as boosting, gradient descent, Artificial Neural networks, random forests, etc.
- Experience with third party data analytics tools such as Google analytics, Facebook insights, AWS, etc.
- Working experience in distributed systems and high performance computing tools.
- Experience on data visualization for communicating results to stakeholders such as ggplot, Matplotlib, Tableau, etc.
Please note however that the above mentioned experience is very generic in nature which can be found in several job descriptions from a variety of organizations. Please also note that there might be categories that might change depending on the business logic and requirements.
The average pay scale for Careers in Data Engineering and Data Science
A Career in Data Engineering
As per recent statistics from glassdor, the average salary of a data engineering professional results to be around $137.776 per year with a range between $110,000 per year to $155,000 depending on the seniority, skills, hierarchy and type of work the professionals are expected to perform on a daily basis.
Keeping in mind the same statistics, the average salary of a senior data engineering professional happens to be around $172,603 per year with a range between $152,000 per year to $194,000 per year. If we drill down this generalization to leading MNCs across the globe, we get the below results,
|Sl. No:||Organization||Salary Range||Average Salary|
|1||Amazon||$78,000 to $133,000||$103,849|
|2||HP||$64,000 to $105,00||$86,164|
|3||$93,000 to $171,000||$122,695|
|4||IBM||$90,000 to $116,000||$99,351|
A Career as a Data Scientist
According to Datajobs.com the average salary of a Data Scientist is about $127,500 per year with a range of $85,000 to $170,000 depending on skills, seniority, the organization and the sector. As per the same website the highest paying sectors for a data scientist are as follows,
- Cloud Services
- Social Networking
- Banking and Finance
Depending on the skillsets, a data scientist working in these domains is expected to earn more than the average salary in some cases. Some of the highest paying tech organizations are as follows,
- Google: $152,856
- Apple: $145,974
- Twitter: $135,360
- Facebook: $134,715
- PayPal: $132,909
- Airbnb: $127,852
- Microsoft: $123, 328
Having talked about the average salaries of professionals working in both the domains namely Data Engineering and Data Science, we should also note that the average annual pay scale of a professional is widely dependent on the type of organization, the size of organization, the type of data which is in reference and the type of insights one is expected to dig out of the data.
The pathway to Data Engineering and Data Science
Now that we had an insight into Data Engineering and Data Science in terms of the roles and responsibilities and the average pay scale, let us know have a peek into what it takes for someone to get into fields like Data Engineering and Data Science.
The Pathway to Career in Data Engineering
According to David Bianco, a data engineer veteran,
“Languages come and go, so it is better to gain a full understanding of the concepts behind building a robust pipeline”
And if one is to follow this advice, one has to come out of the myth that data engineering is all about learning to code and learning the technical know-how behind the data pipeline. Below are the steps that one can follow towards becoming a data engineer,
- Get a Bachelor’s degree and work on small projects: Since Data Engineering deals in a lot of aspects related to data architecture and data pipelines, it needs a lot of expertise to get the job done. Having a bachelor’s degree in Computer Science, Physics, Mathematics or statistics increases the chance of one getting hired as a data engineer. Having worked on a small project increases the chance multifold because of the practical experience one gets by working on these projects. For students, who are from different fields, they can also enter the field provided they have an overall knowledge of basic concepts of data such as Data Structures, Algorithm analysis, software engineering and programming. This can be further enhanced by participating in various online coding contests, hackathons and contributing towards open source projects across the globe.
- Fine tune your skills related to the data engineering: One of the core skillsets a data engineer must possess is the ability to work with databases both relational and non-relational databases, and should also possess the expertise to work with structured and unstructured data. Hence, it won’t be wrong to say that Data Engineers are expected to have a good knowledge on SQL. Fine tuning this skill of yours to work with integrated applications and also the ability to write complex queries to retrieve relevant data from the databases is always a plus which increases the chances of hiring as the data engineer.
Another skill which is expected out of a data engineer is an understanding of some of the leading programming languages such as Python and R with a knowledge of tools such as Spark, Hadoop and Kafka.
A Knowledge on Database management applications also comes handy since the majority of the time Data Engineers are more focused on creating data flows, data pipelines and data mining. Fine tuning all the above skills adds an advantage to the profile of a data engineer. The more tuned the skills are, the better is the chances of getting hired as a data engineer.
- Start with an entry level data engineering job: This is where the fun part begins and perhaps this is one of the most confusing part when trying to get into the lucrative field of data engineering. Your first job may or may not revolve around data engineering since data engineers handle chunks of data which might be highly protected, secured and delicate. Usually, Industries look for people who have an experience working in a IT field so that they understand the importance of data being private and highly secure. Hence, try to get into a IT related field which might seem off track to start with but remember gaining an understanding of the industry is equally important and hence do not be hesitant to take up any entry level job in the IT world.
- Build your profile as a data engineer by adding valuable certifications and professional courses: One of the best ways of letting your profile stand out among the thousands of applicants who apply for a particular job is to continuously making it better and better in terms of certifications, internships and badges from reputed third party vendors which specialize in certain aspects. These vendors might be Oracle, IBM, Microsoft and CloudEra to name a few. The most general certifications almost all data engineers possess is the Certified Data Management Professional (CDMP) by Data Management Association International (DAMA). There are many similar professional certifications specific to a skill set or an expertise that add value to your resume, but before you decide to take on one of them be sure to research in your area and talk to your seniors or managers to see which certification would add more value to your resume as a data engineer.
- Earn your Master’s Degree: Earning a Master’s degree in one of the professional courses in Mathematics, Physics, Computer Science and Statistics will add weightage to your resume as a master’s degree explicitly tells the employer that a particular person is subject matter expert in that field of study. There are many data engineering professionals who do not hold a master’s degree and are equally qualified and equipped to face challenging situations but if you are considering to have a master’s degree it will be a plus. Note here that a master’s degree is completely optional and if you are regularly keeping a track of the emerging concepts in your field you are good.
The Pathway to Career as a Data Scientist
A Data Scientists as experts like to call are jack of all trades and may be master of none. A Data Scientist is obsessed with data and obsessed with finding more and more insights by delving into the data.
Since the role of a Data Scientist is not only limited to the technical abilities of getting insights from the data, there are a lot more competencies that a data scientist should possess such as leadership skills, project management, analytical storytelling, strong communication skills to communicate the results and persuasion skills to persuade the audience to adapt to their findings. The pathway to being a data scientist can be broadly divided into six steps which are as follows,
- Get an Undergraduate Degree: Getting an academic accreditation is the best way to mark your career start in the world of data science. An undergraduate qualification ensures that the candidate has been through a structured learning scheme, has got sufficient hands on experience to complement the skill sets through small projects and internships starting from MNCs to startups.
- Develop the competencies needed to become a data scientist: Some of the competencies that a data scientist is expected to possess are as follows,
- Machine Learning
- Data Visualization
- Building Statistical models of the data
- Software Engineering
- Communication and leadership skills
- Data Munching and data cleansing
- Data Research
- Big Dara and Big Data Analytics Engines and platforms
- Cloud Applications
- Data Warehousing and Data Structures
- Get specialized in one particular area: A Data Scientist is mainly concerned with getting as much insights as possible from the data they receive. The catch here is that though the insights could be valuable, interesting and tricky at times, if it is not relevant to the industry, they are of no use. Hence, getting specialized in a particular industry and having a thorough understanding of how they operate and the insights they are normally interested in for a better performance and optimization in their industry. In addition, getting a specialization in one of the fields such as database management systems, artificial intelligence, machine learning or research is also a plus for an in demand data scientist in the corporate world.
- Get into an entry level data science job: Once all the above homework is done and you have sufficient exposure to the concepts relating to data science, it is time to start with your first job. Normally, this is an entry level data science job with a limited number of daily jobs to be completed. While getting into the first entry level data science job, it is also important to get an insight into the growth potential of the organization. The greatest experience that one can get out of an entry level job in data science is to work in a corporate set up in collaboration with other teams and also working under pressure.
Data Engineering and Data Science: A Conclusion
Now that we have looked into the problem “How Data Engineering us different from Data Science?” I hope it is pretty clear as to how Data Engineering and Data Science fields are different from each other in terms of the three parameters we looked into in detail. We looked in the below parameters to understand both the fields are different from each other,
- The job profiles for Data Engineering and Data Science
- The average pay scale for Careers in Data Engineering and Data Science
- The pathway to Data Engineering and Data Science
In conclusion, data engineering and data science are very different from each other. On one hand where data engineering focus mainly on the data infrastructure, data flows and data pipeline, data science is more of a science that engulfs the entire data analytics process right from visualizing data to getting insights to communicating the same to the stakeholders so that everyone is on the same page.
Data Engineers can be seen more of data enablers which help data scientists get relevant data in proper format and shape so that insights can be extracted from the data.