Feed Structure

Definitions

Feed is the term we use for the process of collecting data from a data source. Feeds run on a schedule, make requests to a data source and then save the responses to those requests in our databases. Every feed has an entry point, the root. A feed will create tasks, and will also have chain logic (branches) and write logic (leaves).

Tasks - Tasks contains all of the relevant information for extracting data from a given source. This can be anything - a set of countries, a domain, a list of URLs. Tasks are sent to chain logic and write logic, and depending on the details of the task and write/chain logic will either produce more tasks or write data to our databases.

Root - The root is a special instance of chain logic. When it is time to collect data, the root will receive an empty task, then produce tasks based on business context.

Chain Logic (Branches) - Chain logic is business logic that can result in the production of more tasks or writing to the database.

Write Logic (Leaves) - Write logic is similar to chain logic, but leaves can only write to the database and cannot produce more tasks.

Visualization

It’s useful to think of the feed as a tree, with each level representing the recursive depth a task has reached. Below is a visualization of a simple case, with a simple root that produces 2 tasks, each of which produce 2 tasks that are then handled by the write logic and written to the database. After the feed is complete there will have been

1 task executed at root (depth 0)
2 tasks executed by chain logic (depth 1)
4 tasks executed by write logic (depth 2)

graph TD
  Root --> Chain1; Root --> Chain2;Chain1 --> Write1; Chain1 --> Write2; Chain2 --> Write3; Chain2 --> Write

<aside> 🎵 This is a simplified example. There are more complexities/nuances to the possible behaviors of feeds because of the recursive nature of chain tasks; the theoretical structure of a feed is a cyclic graph.

</aside>

<aside> ⚠️ Any feed that produces 0 tasks is considered failed, however feeds that write no data are not considered failed by default.

</aside>