Overview

Lucid's Recommender is an API-based service providing modern real-time recommendations for many different application domains.

The recommender algorithm is powered by Lucid's graph neural-networks, which operate on a knowledge graph representation of your product and user database. Lucid improves recommendation results by learning patterns based on user-behavior, product attributes, and even photos of your products.

Both collabrative and content-based filtering

Recommender systems are typically classified as either being based on collaborative filtering or content-based filtering:

  • Collaborative filtering: these recommender systems look at past interactions between users and items. Patterns can be identified that identify likely future actions based on a user's past history. These methods take advantage of learning from recent trends in behavior from other users, and can be trained without knowing the details of the items being recommended. This of course is also the downside: because collaborative filtering relies solely on historical interactions, it does not work well for new items or items with low volume.

  • Content-based filtering: these systems utilize attributes and metadata on items and users to learn patterns and make recommendations. Such metadata may include descriptions, item properties, prices, ratings, or even images. Content-based filtering is better able to handle the "cold-start" problem, as it can use attributes to recommend items even if there is little to no historical sales.

Lucid's graph strategy for recommendations combines both filtering techniques into a single, comprehensive solution. By adding both user interactions and item attributes, we are able to build and train on a unified data graph.

Setting up your data

To setup a recommendation model, you need to define your set of entities, which have a set of associated entity properties. From there, you link entities together with interactions. The set of entities and interactions forms a graph, where the entities are the vertices and the interactions are the edges. Note that anything you have data on can be considered an entity, including users!

Graph representation of your products and users

The first step to configuring your recommender is to define your entities and interactions. You can consider everything to be an entity, such as products, product lines, users, countries, or even keywords.

Below we will walk through how to set up a simple video recommender. The interactions between the videos and users will teach the recommender system how to learn patterns. For now, note that both the media (blue) and users (green) are entities, which are linked by past interactions.

Defining your entities

For example, let's consider a simple video recommendation system being set up through the Lucid recommender tool.

idtypenamemetadata

0001

Media

Jurassic Park

{ "type": "movie", "genre": "sci-fi",

"keywords": ["dinosaurs"] }

0002

Media

Lost World

{ "type": "movie", "genre": "sci-fi",

"keywords": ["dinosaurs"] }

0003

Media

Rick and Morty

{ "type": "tv-show", "genre": ["sci-fi", "animated"], "keywords": ["dinosaurs"] }

0004

User

User A

{ "preferences": ["action", "animated"] }

0005

User

User B

{ "preference": ["sci-fi", "mystery"] }

Above we can see that there are two different entity types: media and users. Each entity has a unique ID and an optional name (used for display and analytics only). Additionally, each entity can have associated metadata, in a JSON dictionary format. The metadata represents known information about the entity that may be useful in making a recommendation.

Note that some metadata could instead be represented as entities. For example, we could have instead added the following entities:

idtypenamemetadata

0006

Genre

sci-fi

0007

Genre

animated

0008

Genre

action

0009

Genre

mystery

Internally, the Lucid engine will analyse the metadata and may automatically create entities for common entries. However, you can choose to create entities yourself if you want it to guarantee that information is used by the graph structure.

The following table shows the properties that an entity may have:

PropertyFormatDescription

id

string

A unique identifier for this entity. If another entity is created with the same ID, it will overwrite any previous versions.

type

string

A string indiciating the type of entity. Examples would be "product", "user", "country", etc. Recommendations can be filtered downstream based on the type.

name

string

A human readable name for the entity (optional).

description

string

A description for this entity (optional)

tags

list[str]

A list of tags associated with this entity, used for indexing and tracking (optional)

metadata

JSON

JSON metadata describing details and attributes about the entity. The JSON should be a dictionary with string-based keys. The values may be numeric or strings, or lists of numbers or strings.

url

string

A URL linking to this entity

media_url

string | list[string]

If the entity has an associated image or video, a URL or list of URLs may be provided. The Lucid engine will read the media to process and index it as part of the graph.

Defining your interactions

With your entities in place and their metadata defined, you can now specify the interactions between them. Continuing with our video recommendation engine, here are example interactions:

interaction_typesource_entity_idtarget_entity_idtimestamp

Watched

0004 (User A)

0001 (Jurassic Park)

2022-05-12

Watched

0004 (User A)

0002 (Lost World)

2022-08-25

Watched

0005 (User B)

0001 (Jurassic Park)

2022-07-01

Watched

0005 (User B)

0003 (Rick & Morty)

2022-08-16

Favorited

0005 (User B)

0003 (Rick & Morty)

2022-08-16

The following table lists all properties associated with interactions:

PropertyFormatDescription

type

string

A string indiciating the type of interaction. Examples would be "purchased", "clicked", "watched", etc. Recommendations can be filtered downstream based on the type.

source_id

string

The source entity of this interaction

target_id

string

The target entity of this interaction

timestamp

datetime

A timestamp associated with this interaction

metadata

JSON

JSON metadata describing details and attributes about the interaction. The JSON should be a dictionary with string-based keys. The values may be numeric or strings, or lists of numbers or strings.

Understanding your graph

You set your entities and interactions to help Lucid's recommender build a graph representation of your data and its connections. The following illustrates the example graph we built.

If we had expanded out some of the metadata tags, we would have a more complex graph that contains more interactions. Note how complex these graphs can get with just a few entities!

Adding images or media

A key value-add of Lucid's algorithms are the ability to incorporate visual information into the recommendation engine. By adding pictures of your products or content, our backend can further improve its accuracy.

You can add one or more images to your entities by using the media_url field. These URLs can either be a string or an array of string (for multiple URLs corresponding to multiple images). When data is loaded, Lucid will scrape the images and process them into your trained model index.

Note that many sites rate-limit how many images can be downloaded per second by a user. You may need to whitelist Lucid's servers to bypass these limits, or provide URLs specifically meant to bypass rate limits. Lucid will make best efforts to get the media, but will skip downloading content if it repeatedly fails.

Size and scope of your data model

Your application may involve thousands or millions of users, products, media, or other types of entities. Lucid's graph recommender backend is able to scale up to however large you need. It's not uncommon for many different entity types to be defined with millions of rows in the entity and interaction tables.

We do recommend trying to minimize your entity types when possible. When a model is trained, you can then query it for recommendations based on the entity type (e.g., give me the top media recommendations for all user entities. If you have too many entity classes, it may constrain your recommendations.

Finally, the Lucid backend will help filter data for you. As you stream more training data through our APIs, it can be filtered to only keep the most recent interactions or to remove obsolete entities. This will keep your recommendations and trends fresh.

Training a model

Good news! Setting up your data model and collecting the entities and interactions is the hard part. Once you have your data configured with Lucid's API, training the model and getting insights is easy.

After triggering the training API, our servers will churn through your entities and interactions building an optimized recommendation model using a graph neural network. Once the model is trained, recommendations and trends can be instantly queried for each entity or entity type.

Several options exist for model training, such as data filters and data sources. Please see the section on model training for more details.

Querying recommendations

After training your model you'll want to get your recommendations out. Recommendations can either be read on a per-entity basis (e.g., one API call per entity, often done in real-time) or in batches (e.g., one API call to get all user recommendations).

The following shows an example of a recommendation query to pull video recommendations for all users:

{
    "model_id": "<your model id>"
    "source_type": "user",
    "target_type": "video",
    "max_recommendations": 100,
    "output_format": "csv"
}

Essentially, we are asking the trained model to provide us with all video recommendations for each user entity, with up to 100 per user. Configuring the query parameters lets us generate a wide variety of different recommendation sets with a single trained model!

This is a key advantage of the graph neural network approach to recommendation systems: a single trained model can provide many different varieties and combinations of recommendations, trends, and insights.

In addition, once the model is trained, querying the results is almost instantaneous. Thus, you can be assured of low-latency results for all your site integrations.

Last updated