Back to posts

October 08, 2019 What Exactly is a Job? By Darren Sunart, Data Scientist at Madgex

What Exactly is a Job?

At Madgex, we power job board technology for over 200 brands all around the world, and that number is growing all the time. With our scale and reach, we have a lot of data at our fingertips, so we evolved our Data Science team to look at Machine Learning and Knowledge Graph models to enhance the experience for users of our platform. We started by looking at jobs, and asking the basic question... when we talk about a ‘job’ what exactly do we mean?

So, what exactly is a job?

  1. Are there aspects of a job description which make it belong to a particular sector? 
  2. Or make it likely that it is representing a particular function? 
  3. Or mean that it represents a more senior role than an entry level role? 

The answer to these questions is almost always yes, but it is important to note that finding a straightforward answer can be complex. Over the last year, we have learnt a lot about our data and specifically about our jobs data. From our work during the last year, we can see that the path to answering these questions is not as straight-forward as we thought it might be. 

Messy and inaccurate labelling of jobs is an obvious issue we have faced, as that makes it difficult to categorise jobs, but this has not been the biggest hurdle. Our biggest hurdle has actually been the obstacle of scaling our approaches to how we analyse all that data about jobs that we have. For example, for a single job board it’s relatively easy to build a model and get reasonable predictions about jobs for categories like sector, function and salary. However, each of our customers has a completely different ‘ontology’ (categories) for describing jobs on their job board. 

These differences can be in language, depth and meaning, as our customers each operate in completely different market sectors, and each has its own way of representing what a job means.

Add to that the fact that jobs themselves are complicated, they can be many different things and can vary depending on who is writing the job description. 

Confused? Here’s an example:

  • Imagine a job board that specialises in the Accountancy sector posting a job for an IT Director at an accountancy company... they might list the sector for this job as IT
  • On the other hand, a generalist board covering a wide range of market sectors, might list that same IT Director job as being in the Accountancy sector as it’s a job at an accountancy company

Both are correct in their own context, but they are telling different parts of the same story.

Ontology is all about showing the relationships between contexts and categories, as there are very few ontological similarities across job boards. 

We started out by building a model per category, per job board. Given that we power job boards for over 200 brands, it became quickly obvious that this approach does not allow us to scale our data modelling work. A huge percentage of our job boards are English language (UK, USA, Australasia), so we needed to find a way to scale the modelling work, to use the power of all of the data we have.  The work was useful, but we needed to think bigger, be bolder, and work smarter.

On the other hand, it did provide a great testbed for approaches and helped us understand the ‘shape’ of the data that we are working with. This has helped with how we have progressed with our work. 

Combining Machine Learning and Knowledge Graphs for scalable recommendations

As we progressed in creating cross-board representations of jobs, we realised we needed to break away from using strict ontologies in place of something much more flexible - Knowledge Graphs. Unlike databases, which represent data across many tables, a Knowledge Graph connects everything explicitly, based on content.

In the diagram below, the jobs are represented by nodes and the relationships between jobs are represented by lines.  

Figure 1:

Knowledge graph using collaborative filtering to connect jobs. Edges between nodes represent a user having viewed both jobs within a session. User A edges are blue, User B red and User C green.

For example, you can make relationships between jobs based on user data to identify similar items. In Figure 1, User B viewed the Payroll Administrator, HR Administrator and HR Manager jobs within a session, so an edge is drawn between them within the network to represent this relationship. Working with user data to build relationships in this way is known as ‘Collaborative Filtering’ (CF). Collaborative Filtering takes advantage of user-driven behaviour to make recommendations about what users might want. 

Figure 2:

Knowledge graph combining collaborative filtering and content-based edges to connect jobs. Dashed-lines represent a user having viewed both jobs within a session. User A edges are blue and User B red. Solid lines represent a job belonging to a ‘topic’, where a topic could be 'Accountancy’ or 'HR’.  Without these content-based edges, the network would be sparse, and the knowledge graph rendered almost useless.

A disadvantage of using Collaborative Filtering alone – known as the ‘cold start problem’ - is that new jobs don’t immediately get views or applications, so they start with no connections within the graph. User-driven relationships are therefore sparse, so the graph is not immediately useful. 

Incorporating Content-Based (CB) edges within the knowledge graph should help increase the number of connected jobs (see Figure 2). A 2016 research article on this subject created embeddings based on job adverts. Before creating the embeddings, they identified and repeated key pieces of information, such as skills and requirements, to up-weight them for encoding. 

Using these embeddings, they could start representing similarity scores between jobs. The results from this were impressive; they achieved 91.8% precision for the top 10 most similar jobs and 100% of jobs had at least one edge.

For us, the key here is that it’s possible to model explicit Content Based relationships across job boards within a Knowledge Graph. We can use this model to begin building set ups that work for all job boards. 

Unlike the research work in 2016 which was modelling jobs from one job board, at Madgex, we need to model jobs from hundreds of job boards. So, we must connect every job via cosine similarity scores, which means the number of connections for each job would increase exponentially. So, we reach the same problem as we did before - of scalability.

Where does this all lead us to? Well we first need to build general cross-board representations of jobs to use within a Knowledge Graph. Secondly, we have to reduce the number of connections within the knowledge graph so that it runs efficiently. 

What’s next?

As our work progresses on the Knowledge Graph, we are experimenting with ways to use the technology to benefit our customers. Knowledge Graphs are powerful tools for driving recommendations for job seekers; we are already leveraging this capability to show candidates:

• ‘Similar’ jobs to those that they are currently browsing

• Exploring how we can integrate data extracted from CVs into the Knowledge Graph to better understand things like career progression and skills required for roles

• Building combinations of machine learning algorithms to tie together content from job adverts and job seeker’s behaviour, to help enhance the job recommendation system

Look out for more articles from us on how we are using Knowledge Graphs and Machine learning to transform and enhance the experience for users of our platform.  

References

Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2001). Item-Based Collaborative Filtering Recommendation Algorithms. Retrieved from http://www.ra.ethz.ch/cdstore/www10/papers/pdf/p519.pdf

Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76–80. https://doi.org/10.1109/MIC.2003.1167344

Yuan, J., Shalaby, W., Korayem, M., Lin, D., Aljadda, K., & Luo, J. (2016). Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach. Retrieved from https://arxiv.org/pdf/1611.05480.pdf

Shalaby, W., Alaila, B. E., Korayem, M., Pournajaf, L., Aljadda, K., Quinn, S., & Zadrozny, W. (2018). Help me find a job: A graph-based approach for job recommendation at scale. Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, 2018-Janua, 1544–1553. https://doi.org/10.1109/BigData.2017.8258088

Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2001). Item-Based Collaborative Filtering Recommendation Algorithms. Retrieved from http://www.ra.ethz.ch/cdstore/...