Data Language

Data Language

Overseed uses CUE to define data attributes and their behavior.



What and Why CUE?

At the highest level, and for our purposes, let us call CUE a strongly-typed JSON language.

At first, we used a custom JSON schema until CUE came on our radar. CUE offered a structured language with data validation, JSON compatibility, and types.

We chose CUE because it allowed us to do the following:

  1. Allow users to define structures with types as we would in a programming language.
  2. Allow us to build and publish reusable types with an associated data behavior (specs).
  3. Extend the language to add things like arithmetic operators (+ - \ *).
  4. JSON-based format and compatibility.

For example, here is how we can define a person object in CUE.

{ person: { first_name: "ana" last_name: "moe" age: 33 address: { street: "1234 St." state: "CA" country: "United States" latitude: 37.33712269999999 longitude: -121.8898271 } }

Does that look like JSON? Yes! Because CUE is a superset of JSON.

However, that is not all; CUE lets us can define a type of person!

// person definition #Person: { first_name: string last_name: string age: int address: { street: string state: string country: string location: { latitude: >=-90 & <=90 & number longitude: >=-180 & <=180 & number } } }

Great, but how does this relate to data generation?



Data Generation

Overseed converts schemas into data.

  • For the person object, the attributes were static. So if we asked for ten instances of data, we would get the same person ten times.
  • For the person definition, we can generate data if we were to declare an attribute using our definition type.
// person definition type #Person: { first_name: string last_name: string age: int address: { street: string state: string country: string location: { latitude: >=-90 & <=90 & number longitude: >=-180 & <=180 & number } } } // attribute customer is of type person customer: #Person

So how will it generate the above schema?

  • All attributes with only a type definition will return a random value for that type.
    • Attributes with type string will return a random string.
    • Attributes with type int will return a random value within the range of an int.
  • All attributes with a range definition will return a random number within that range
    • For example, latitude may return -55.123.

So if we asked for ten instances of data from the above, we would get ten different objects. Note the data may not make sense for the typed fields since we have not defined the behavior for each attribute other than specifying a type.

Below we show two instances that may be output from the schema.

// person definition [{ "customer": { "address": { "country": "quas", "location": { "latitude": -24.002786221589716, "longitude": 90.89133895923777 }, "state": "omnis", "street": "vel" }, "age": -6429151865872097000, "first_name": "dignissimos", "last_name": "et" } }, { "customer": { "address": { "country": "enim", "location": { "latitude": 80.82559773676047, "longitude": -103.13865144814562 }, "state": "totam", "street": "vel" }, "age": -3734884868815636500, "first_name": "aperiam", "last_name": "aut" } } ]

OK, how do we get better data?

Next, In the Data Design section, we look at how we can design our data with behavior in mind.




Data Design