Import Data

Data is imported to the Lucid recommender system via simple API calls. You can batch load data on a regular cadence or continuously to ensure the most pertinent and effect recommendations. All data is uploaded to the API in CSV file format.

The typical steps in maintaining your data are:

  • Periodically batch-load your entity and interaction datasets

  • Tag datasets

  • List datasets you have previously uploaded

  • Delete or archive older datasets no longer needed

Loading data via API

A single endpoint is used to do all data-related operations. The Lucid backend will automatically determine the type of data you are uploading (e.g., entities or interactions).

Data is loaded to the Lucid API through the /data_batch request. Uploads to this endpoint must have data in the following format:

  • Data must be in CSV format.

  • Files must be 1GB or less and have no more than 10 million rows.

  • Both entity and interaction data can be uploaded to the same endpoint. They will be automatically differentiated depending on the column names present.

  • The first row must contain the column names:

    • For entity data, the following column names are required: id, type, name, metadata, media_url.

    • For interaction data, the following column names are required: type, source_id, target_id, timestamp, metadata.

    • Additional columns in the CS files are permitted but will be ignored.

  • Each row in the files represents a single entity or interaction.

  • If an id is repeated (either in the same file or in a subsequent file), it will overwrite the prior version.

Some spreadsheet programs, such as Microsoft Excel, may add special characters or cause formatting issues when exporting files to CSV format. If you have issues loading data through the API, open your files in a basic text editor to validate their format and header names.

Tagging your data

When uploading data you can optionally add one or more tags to the dataset. Tags are useful to make managing your data easier, such as filtering or combining multiple datasets together.

For example, let's say you want to train a recommender system for all of your e-commerce products. In addition, you want to train a second, independent recommender system to recommend blog posts to your users. When uploading datasets, you can add tags such as product-data or blog-data to differentiate the datasets. When training your recommender, you can then specify the tags that each model should use as input.

It is highly recommended that you tag all datasets. If no tags are provided, then a default tag will automatically be applied.

Getting dataset info

Once you've loaded batches of data, you can then query to see the total size by using tags or a batch_id. Sending a GET request to the data_batch/ endpoint will return the following. In addition, each time you upload a batch, the same information is returned.

NameDescription

batch_id

A unique ID assigned to this data batch.

timestamp

The timestamp for when the data was uploaded.

tags

A list of tags provided by the user associated with this batch.

entity_count

The number of entities in the batch.

interaction_count

The number of interactions in the batch.

Loading data from BigQuery

Often the source data lives in BigTable or other cloud-based data structures. You may be able to write a simple query to periodically load your data and format it to load into Lucid's API. BigQuery is a common tool for doing such data loads.

For enterprise customers, you may contact us about having Lucid connect directly to your backend datasets and have our servers directly query the data without going through the API.

Last updated