DSjobtracker

Getting started with DSjobtracker

The package contains two datasets

  1. DSraw : Raw dataset with 551 rows and 152 columns
  2. DStidy : Cleaned tidy dataset with 430 rows and 115 columns

Both of these datasets contain information about job vacancies related to data science, which were collected for the span of a month, by searching for specific Search_Term and then following the search results to gather data manually.

Usage

  1. Install the library from github
# install devtools if not already installed
# install.packages("devtools")
devtools::install_github("thiyangt/DSjobtracker")
  1. Load the library
library(DSjobtracker)
  1. Load the dataset into your environment
data("DStidy")

Overview of columns

tibble::glimpse(DStidy)
#> Rows: 430
#> Columns: 115
#> $ ID                                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,…
#> $ Consultant                         <chr> "Thiyanga", "Jayani", "Jayani", "J…
#> $ DateRetrieved                      <date> 2020-08-05, 2020-08-07, 2020-08-0…
#> $ DatePublished                      <date> NA, 2020-07-31, 2020-08-06, 2020-…
#> $ Job_title                          <chr> NA, "Junior Data Scientist", "Engi…
#> $ Company                            <chr> NA, "Dialog Axiata PLC", "London S…
#> $ R                                  <fct> 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0…
#> $ SAS                                <fct> 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
#> $ SPSS                               <fct> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
#> $ Python                             <fct> 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0…
#> $ MAtlab                             <fct> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Scala                              <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ C_Sharp                            <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ MS_Word                            <fct> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1…
#> $ Ms_Excel                           <fct> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1…
#> $ OLE_DB                             <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Ms_Access                          <fct> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1…
#> $ Ms_PowerPoint                      <fct> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0…
#> $ Spreadsheets                       <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Data_visualization                 <fct> 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0…
#> $ Presentation_Skills                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Communication                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ BigData                            <fct> 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0…
#> $ Data_warehouse                     <fct> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ cloud_storage                      <fct> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Google_Cloud                       <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ AWS                                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Machine_Learning                   <fct> 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0…
#> $ Deep_Learning                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Computer_vision                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Java                               <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0…
#> $ Cpp                                <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
#> $ C                                  <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
#> $ Linux_Unix                         <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ SQL                                <fct> 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0…
#> $ NoSQL                              <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ RDBMS                              <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Oracle                             <fct> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
#> $ MySQL                              <fct> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0…
#> $ PHP                                <fct> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
#> $ Flash_Actionscript                 <fct> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
#> $ SPL                                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ web_design_and_development_tools   <fct> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
#> $ Wordpress                          <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ AI                                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ `Natural_Language_Processing(NLP)` <fct> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0…
#> $ Microsoft_Power_BI                 <fct> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Google_Analytics                   <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ graphics_and_design_skills         <fct> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0…
#> $ Data_marketing                     <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ SEO                                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Content_Management                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Tableau                            <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
#> $ D3                                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
#> $ Alteryx                            <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ KNIME                              <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Spotfire                           <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Spark                              <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0…
#> $ S3                                 <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
#> $ Redshift                           <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
#> $ DigitalOcean                       <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
#> $ Javascript                         <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
#> $ Kafka                              <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Storm                              <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Bash                               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Hadoop                             <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
#> $ Data_Pipelines                     <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ MPP_Platforms                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Qlik                               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Pig                                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Hive                               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
#> $ Tensorflow                         <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Map_Reduce                         <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Impala                             <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Solr                               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Teradata                           <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ MongoDB                            <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Elasticsearch                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ YOLO                               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ agile_execution                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
#> $ Data_management                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ pyspark                            <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Data_mining                        <fct> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
#> $ Data_science                       <fct> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…
#> $ Web_Analytic_tools                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ IOT                                <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Numerical_Analysis                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Economic                           <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Finance_Knowledge                  <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Investment_Knowledge               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Problem_Solving                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Korean_language                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Bash_Linux_Scripting               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Team_Handling                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Debtor_reconcilation               <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Payroll_management                 <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Bayesian                           <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Optimization                       <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Bahasa_Malaysia                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ Knowledge_in                       <chr> NA, NA, "Elasticsearch, Logstash, …
#> $ City                               <chr> NA, "Colombo", "Colombo", "Colombo…
#> $ Location                           <chr> "NY", "LK", "LK", "LK", "LK", "Mal…
#> $ Educational_qualifications         <chr> NA, "Degree in Engineering / IT or…
#> $ Salary                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ English_proficiency                <chr> NA, NA, NA, NA, NA, NA, "1", NA, N…
#> $ URL                                <chr> NA, "https://www.google.com/search…
#> $ Search_Term                        <chr> NA, "Data Analysis Jobs in Sri Lan…
#> $ Job_Category                       <fct> Unimportant, Data Science, Data Sc…
#> $ Minimum_Years_of_experience        <dbl> 4, 2, 1, 2, 0, 5, 0, 0, 1, 7, 5, 2…
#> $ Experience                         <chr> "4+", "2-3", "1-2", "2+", "0 years…
#> $ Experience_Category                <fct> More than 2 and less than 5 years,…
#> $ Job_Country                        <chr> NA, "Sri Lanka", "Sri Lanka", "Sri…
#> $ Edu_Category                       <fct> NA, Some Degree, Some Degree, Some…
#> $ Minimum_Salary                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ Salary_Basis                       <fct> unspecified, unspecified, unspecif…

More information on the meanings of the column names can be accessed through the help

?DStidy

Examples

Barplot of top twenty skills required for data science jobs

Wordcloud of software skills

The log of the counts were used to visualize them better

Required experience and the salary

Software Skills needed for each Job Category