Keyword-based job search often lacks the expressiveness to communicate the intent of the job seeker adequately. However, in the human resources (HR) technology space, there are plenty of sources of content with this information. For example, a job seeker's resume contains indicators of the perfect job for that person. This project demonstrates how to leverage the natural language context analysis and recommender models of Analytics Zoo in the Databricks* platform on Amazon Web Services (AWS)* to predict a candidates’ probability of applying to specific jobs based on the contents of resumes and job descriptions.
Talroo* is a high-volume, data-driven national job advertising network. Talroo’s marketplace partnerships make the network both expansive and highly-targeted. These marketplace partners provide Talroo with job seeker keywords which Talroo translates to relevant jobs – at scale. Every month, Talroo processes tens of billions of job impressions and billions of job queries which result in tens of millions of interaction events like clicks and conversions. These impressions, queries, and interaction events are all processed by Databricks on AWS and stored to Amazon Simple Storage Service (Amazon S3)* as parquet tables. Learn More About Talroo*
One of Talroo’s job recommendation challenges is that short hire cycles limit history around job advertisements and job seekers. To alleviate this cold start problem, job recommendation systems tend to search via keywords. Unfortunately, this short keyword context lacks the expressiveness to describe the job seeker's intent effectively. In contrast to keywords, resumes offer a much richer source of context in natural language.
Newly developed Deep Neural Networks (DNNs) have shown success as recommender systems by capturing the non-linear relationships in the user-item dataset. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance2.
Therefore, the combination of Natural Language Processing (NLP) and DNNs at scale on a production platform of Databricks on AWS is the key to improve the recommender system. Talroo accomplishes this by building an end-to-end solution using BigDL1.
Data is extracted directly from Talroo’s data processing pipeline for the preparation of the training, testing, and validation datasets. Restoring the job application process from separate log files of compressed applications, clicks, impressions, and resumes. The general flow is:
- Derive (Resume, Applied-Job) pair for every job seeker from the application log
- Join the (Resume, Applied-Job) pair with the click log to retrieve click info
- Join (Resume, Applied-Job, Click) tuple with the impression log to restore all recommended jobs, regardless of jobs being applied to or not
Using five months of resume-search data with queries, impressions, clicks, and conversions, the split between training/testing and validation is 4:1.
Connecting deep learning capability to the Databricks platform on AWS is simple. By adding the appropriate package of Java* ARchive files (JARs), Databricks adds the Analytic Zoo APIs to the classpath. Then the deep learning model is iterated like any other machine learning model through Databrick’s jupyter notebook interface.
Analytics Zoo Solution
Analytics Zoo, a unified analytics and AI platform open-sourced by Intel, which seamlessly unites Apache Spark*, TensorFlow*, Keras, and BigDL programs into an integrated pipeline that can transparently scale out to large Apache Hadoop* and Spark clusters for distributed training or inference, without needing extra GPU infrastructure.
Databricks on AWS is used to build the end-to-end pipeline for the recommender system, which runs Apache Spark and Analytics Zoo, including data integration, feature extraction, model training, and evaluation. The input for the recommender system is a resume-job pair, which then trains the Neural Collaborative Filtering (NCF)2 recommender models to learn the relationship between jobs and resumes, and predict the probability of applying to specific jobs for certain resumes. In the current version, clicks represent positive samples while non-clicks represent negative samples. Figure 1 below illustrates the entire end-to-end pipeline.
The system first reads raw data in Spark as DataFrames, then extracts the features. In the raw datasets, each resume or job provides a document with text content. Vectors of documents are extracted as features to train and test the models.
In this project, we use Global Vectors (GloVe)3 for word representation in the data which adopts a global log-bilinear regression model for the unsupervised learning algorithm to map each word to a vector of real numbers. The training uses an aggregated global word-word co-occurrence statistics from a Wikipedia 2014. Download the data from this example: zip file from the Stanford NLP Group.
Each document goes through a process for tokenization, cleaning, and a weighted average of all vectors of individual words is computed to represent the embedding of the document; document embeddings show meaningful linear substructures of the document vector space.
The system trains a K-means model using native Spark APIs to cluster resumes into several groups. K-means ++ algorithms is chosen to train a model. Learn about K-means++.
val kmeans = new KMeans() .setK(param.kClusters) .setSeed(1L) .setInitMode("k-means||") .setMaxIter(param.numIterations) .setFeaturesCol("kmeansFeatures") .setPredictionCol("kth") val trained: KMeansModel = kmeans.fit(resueDF)
The system further processes resumes and corresponding jobs in each group, concatenating the embeddings of each resume-job pair as features, and clicks or non-clicks are converted to represent positive or negative labels respectively. A DataFrame of features and labels transforms the data into Resilient Distributed Datasets (RDD) of samples at the end for the optimizer.
The system then extends Analytics Zoo APIs to build an NCF recommender model for each group (which consists of four dense layers, as shown below) and trains the model. Initialize weights and biases with a zero mean and standard deviation of 0.1 to converge faster.
val model = Sequential[Float]() model .add(Linear(100, 40, initWeight = Tensor(40, 100).randn(0, 0.1), initBias = Tensor(40).randn(0, 0.1))).add(ReLU()) .add(Linear(40, 20, initWeight = Tensor(20, 40).randn(0, 0.1), initBias = Tensor(20).randn(0, 0.1))).add(ReLU()) .add(Linear(20, 10, initWeight = Tensor(10, 20).randn(0, 0.1), initBias = Tensor(10).randn(0, 0.1))).add(ReLU()) .add(Linear(10, 2, initWeight = Tensor(2, 10).randn(0, 0.1), initBias = Tensor(2).randn(0, 0.1))).add(ReLU()) .add(LogSoftMax())
The pipeline trains five recommender models in total, and gives an ensemble prediction for each test record.
Evaluation and Results
The recommender system can then be evaluated on one month of data, using two offline evaluation methods for the evaluation. One with a focus on the quantity and the other on the quality of the model.
For quantity evaluation, you want to measure the number of recommendations that are being selected by the job seeker. Apply a precision4, 5 formula to measure the fraction of relevant instances among the retrieved instances. Precision is an offline way to estimate the click-through rate (CTR) by simulating the ratio of clicks over impressions.
Quality means that the ranking position of the recommendations accurately reflects their relevancy to the resume, meaning the better matching jobs should get higher rankings. Because the current model is a binary ranker, we use mean reciprocal ranking (MRR)4. MRR calculates the mean of reciprocal of the first relevant instance out of multiple queries, which will give a general measure of ranking quality.
Because job seeker behaviors usually vary greatly, the final evaluation chart corresponds with the increasing percentile of jobs. We chose not to study Recall because it is hard to get authentic false negative data.
Figure 2 shows offline comparisons between recommender model predictions and search recommendations. The orange diamonds are search metrics, and the blue squares represent the results of the NCF recommender model. The two statistics in the charts represent Precision (Figure 2a) and MRR (Figure 2b). By adopting the end-to-end pipeline of the Analytics Zoo solution, we saw an improvement of about 10% of MRR and 6% of precision respectively in comparison to search recommendations.
Figure 2: Comparisons between the NCF recommender model predictions and base search recommendations. Note, the orange diamonds are search metrics, and the blue squares represent the result of NCF recommender model.
This article briefly introduces the challenges Talroo faces with using keywords to recommend jobs and the opportunity of leveraging the rich NLP resume content to improve the process. We describe key metrics for success, and then we go through the solution. The final result is an end-to-end deep learning pipeline using Analytics Zoo running in Databricks on AWS to model job and resumes through embeddings and predict the chance of a click using the NCF DNN based recommender. A similar DNN based recommender system of rich content will likely play a key role in other use cases such as web search and e-commerce; more examples and APIs are in the Analytics Zoo Model Recommendation.
Talroo is a data-driven talent attraction solution designed to help recruiters and talent acquisition professionals get the volume and quality of applications they need to make hires. Through unique audience reach, custom niche networks, and industry-leading client service, Talroo enables companies to find their ideal candidates and reduce cost-per-hire. Talroo has earned a spot on the Inc. 5000 list of fastest-growing companies for five consecutive years. For more information on how your organization can make better hiring decisions, visit Talroo.com