How to Write an Object Model That Doesn't Suck

In this post we cover some object modeling best practices and how to approach writing one

How to Write an Object Model That Doesn't Suck

When we started DevRev, one of the first items we worked on was defining our object model. Having seen a lot of previous object models and schemas for systems like SFDC and Jira (which are #*$&3d), I knew this was a crucial thing to do correctly.

What is an object model? An object model defines the objects for a system, their attributes and relationships between them. Think of it as defining a schema any object in the system is based upon.

We debated the core (foundational) types for months before we achieved clarity. Finally, we understood what the foundational types needed to be.

Was it perfect? Absolutely not. However, because we started with a minimal set of types, we didn’t have a lot of throw-away work as we could enhance and extend the existing types. Key finding: you’ll never get your initial model correct; the key is building a solid foundation.

Insights

This section discusses some key findings and best practices we discovered going through the process or in retrospect.

Write a style guide

This is one of the most essential items to ensure consistency across objects for the platform. Just like programming has style guides, you must create one for how you model your objects. This will ensure the objects are consistently named and common constructs are followed

The following shows some good and bad examples:

Do

//  consistent structure with snake_case naming
"object_attr_foo";
"object_attr_bar";

Don’t

//  snake_case with inconsistent capitalization
"oBjecT_AttR_foo";

//  camelCase
"objectAttrBar";

Do

// ID attribute correctly postfixed with _id
"name": "foo_id",
// correct ID type
"type": "id"

Don’t

"name": "foo_id",
// should be ID type, not string
"type": "string"

Internally we use constructs like:

  • All naming must use snake_case (your preference may vary, key is consistency)
  • Use common postfixes where necessary (e.g., foo_id and bar_id vs. foo_id and bar_idtype)
  • Abstract things where possible and leverage inheritance and/or mixins to leverage throughout object types (section below)
  • Keep root objects clean
  • Embed when the object cannot exist on its own, instantiate when it can

You can view an actual sample of our style guide HERE

Use a markup language to define object types and leverage code generation

Building upon the style guide we built an internal framework for authoring schemas, doing validations and performing code validations (e.g. Protos, RPCs, etc.) named Archetype. Traditionally people may author schemas directly in a .proto file, however, that is very manual and doesn’t account for other things you may care about like ui hints, api visibility, etc.

This ended upon being one of the most beneficial items, as we were able to generate .proto files for types, the corresponding Go code, MongoDB helper functions, OpenAPI specs, and a ton of other items. It took a little time to build this up front, however, the time savings it has gained us are a strategic enabler for us now.

The following shows a snippet of this:

- name: atom
  fields:
    - name: id
      devrev_field_type: id
      is_required: true
      is_system: true
      is_immutable: true
      description: Globally unique DevRev Object Name (DON)
      gateway:
        api_required: true
        api_visibility: public
        description: Globally unique object ID.
        id_resolution: none
        summary: true
      ui:
        display_name: ID
        is_hidden: true
    ...

We use this to drive the a ton of code generation:

Archetype Definitions
  |
  |--gRPC/Proto
  |   |--.proto
  |   |--Go helpers (via protoc)
  |   |--gRPC (via protoc)
  |
  |--MongoDB
  |   |--Helper methods (e.g., filters, Insert, Update, Delete, etc.)
  |
  |--Gateway
  |   |--OpenAPI spec
  |   |--Resolvers
  |
  |--Documentation
  |   |--Markdown
  |
  |--UI
  |   |--Validations
  |   |--Tool tips
  |   |--Visibility

// TODO: I hope to contribute this to the community so others can use it.

Start small and iterate

Spend the time to hone down and refine what core object types are essential. I cannot stress this enough. Adding more types down the line is much easier than removing them, as removing items is a breaking change that impacts the APIs, etc. This will also force you to be very critical of new object types.

We originally started with a few base types:

Atom
  |
  |--Work
  |--Part
  |--Org
  |--User
  |--Link

We then evolved and expanded those core types with leaf type:

Atom
  |
  |--Work
  |   |--Issue
  |   |--Ticket
  |--Part
  |   |--Product
  |   |--Capability
  |   |--Feature
  |--Org
  |   |--DevOrg
  |   |--RevOrg
  |--User
  |   |--DevUser
  |   |--RevUser
  |--Link

Over time these types expanded as well:

Atom
  |
  |--Work
  |   |--Issue
  |   |--Ticket
  |   |--Conversation
  |--Part
  |   |--Product
  |   |--Capability
  |   |--Feature
  |   |--Runnable
  |   |--Linkable
  |--Org
  |   |--DevOrg
  |   |--RevOrg
  |--User
  |   |--DevUser
  |   |--RevUser
  |--Link
  |--Artifact
  |   |--Emoji
  |--Campaign
  |--...

The key here is to start with a good foundation and build on top of it. You don’t know what you don’t know, so there’s no sense in trying to come up with every possible type. Originally I had tried to model all types, but we brought it back to the core types and iterated. I’m glad we did this as the product and needs of our users changed which led to a divergence in types we assumed we would have needed.

Use inheritance and class modeling constructs

This will help with consistency and uniformity across objects. For example, we have a root base “class” called ‘Atom’ which all objects inherit from. This root base class defines the common attributes we want in each object (e.g., the attribute used for multi-tenancy, object versioning, etc.)

If a method like this isn’t followed, there may be risks of required items being forgotten about in individual definitions. We do this at multiple levels (we limit to 4 to keep things simple). In our schema definition (more below) we reference the parent which will be pulled in during generation:

- name: work
  # inherits from atom
  parent: urn:devrev:objects:atom
  fields:
  ...
- name: issue
  # inherits from work which inherits from atom
  parent: urn:devrev:objects:work
  fields:
  ...

If you look at the previous structure, you can see the inheritance hierarchy:

Atom
  |
  |--Work
  |   |--Issue
  |   |--Ticket
  |   |--Conversation
  |--Part
  |   |--Product
  |   |--Capability
  |   |--Feature
  |   |--Runnable
  |   |--Linkable
  |--Org
  |   |--DevOrg
  |   |--RevOrg
  |--User
  |   |--DevUser
  |   |--RevUser
  |--Link
  |--Artifact
  |   |--Emoji
  |--Campaign
  |--...

Types != Instances; model types (generics), not instances

This comes up fairly frequently when teams come up with proposals for new object types. When modeling, they are thinking about what an instance of the object should look like, but not necessarily the type. The two are very different. An object model should define the types which can be used to implement the instances you need.

For example, one recent discussion was on handling surveys for customer satisfaction (CSAT). Right off the bat I knew there would be multiple types of surveys besides CSAT.

Initially, the proposal encoded the survey type into the attribute, this is bad and doesn’t scale.

Do

  ...
  "id":"don:...:ticket/44",
  "object_type": "ticket",
  // ability to handle multiple survey response
  "surveys": [
    {
      "type": "CSAT",
      // these can be kept disjoint and resolved
      "response":"good",
      ...
    },
    {
      "type": "feedback",
      "response":"foobar",
      ...
    }
  ...

Don’t

  ...
  "id":"don:...:ticket/44",
  "object_type": "ticket",
  "csat_respose": "good"
  ...

As new survey types come, nothing needs to be restructured and we have uniformity. Instead, we turn something from a early binding build-time item into something that can be late binding, customizable and run-time providing much more flexibility.

Pro-tip: encode things into attribute names sparingly, and try to avoid where possible.

In another example, say you had a use case where you may allow users to attach files to a variety of objects (say foo, bar, bas), you could model this a few ways:

Do

  • Create a generic attachment type that can be referenced by each type

Don’t

  • Create an attachment type for each object type (foo_attachment, bar_attachment, bas_attachment)

By abstracting things to a single type you ensure consistency; if the type changes you only have to manage in a single place. We have an ‘artifact’ object type we created for a scenario exactly like this, which is used for attachments, uploads, kbs (extend this), design docs, etc.

Plan for multi-tenancy in the object

We knew we were building a multi-tenant SaaS platform, so this tenancy must be done correctly. There is a time and place for physical separation of data, however, this isn’t always required and logical tenancy will do in most cases. Even if you don’t initially plan for multi-tenancy or need it, it’s never a bad idea to plan ahead for this.

Rather than partitioning tenant data as the storage layer, which was very inefficient, we built multi-tenancy into every one of our objects. All objects inherit from the root base class (Atom), which includes a tenancy key unique to each tenant (dev_org and rev_org). Using this, we have built-in safeguards to ensure only actors from a certain tenant can access objects of that tenant. Our authorization, mongo-client and other services all leverage this tenancy key and use this for validations.

Even if there is a case for physical segmentation, it could still leverage this attribute to define which physical partition it goes to.

Remember, you don’t know what you don’t know, doing some of these things initially will allow you to grow and evolve without having to worry about a massive overhaul.

Just use globally unique ids

Do IDs need to be globally unique? Unique within a partition? What about in the future? It’s much easier to plan for the worst up front and not have to rip and replace things down the line. Each ID in the system is universally identifiable which gives us flexibility and minimizes and risk of changes down the line. This approach also ensures that any future mergers will not cause any issue.

For example, if you had unique object IDs per customer but the IDs are not universally unique you can have object ids that overlap across customers (not good):

  • cust A: obj id: aaa
  • cust b: obj id: aaa

Now say cust A and cust B merge and you need to merge their data, what happens? In this scenario, there is overlap so merging will be a problem.

Also, if you do merge you’d need to do a translation where some IDs change turning into massive updates as old ID references would be invalid.

Now, if you had taken the universally unique approach you wouldn’t have any merge conflicts if you needed to consolidate as the IDs don’t overlap:

  • cust A: obj id: A/aaa
  • cust B: obj id: B/aaa

Notice how object ids are prefixed with the customer ID, making each of them unique.

Taking inspiration from the AWS urn, we created what we call a DevRev Object Name (DON) ID format. The structure of this ID format is similar to the following:

<don>:<service>:<partition>:(:<type>/<id>)+

where:

  • is the type of identifier, in the future we may have additional specifications
  • is the name of the service creating the DON. For example, "core" or "identity"
  • is the name of the well known instance the SOR is located in
  • is the type of an object in the path
  • is the identifier of the object without a prefix

The following are some examples of DONs:

An issue: don:core:dvrv-us-1:devo/55:issues/789

A comment on the above issue: don:core:dvrv-us-1:devo/55:issues/789:comment/5. Note how this is nested in the issue.

An issue for a different org (partition): don:core:dvrv-us-1:devo/44:issues/789. Note the same issue id but different devo id

An application: don:identity:dvrv-us-1:app/67. Note there is only one component

One powerful things with this is because everything is built upon the core tenancy keys (e.g. dev_org), we have a ton of flexibility, clarity and overlap avoidance.

For example, look at the following 2 IDs from the above examples:

  • don:core:dvrv-us-1:**devo/55**:issues/789
  • don:core:dvrv-us-1:**devo/44**:issues/789

These are issues from 2 different orgs, they have the same issue ID, but because of the devo in the DON they are both globally unique. This allows us to keep the type ids clean but still maintain a globally unique ID format.

// TODO: I hope to share more here on our DON format.

Decouple object schemas from the underlying storage implementation

Don’t let the underlying storage platform bias your object models. It is much easier to start with a generic model, then extend to account for the storage model.

For example, we used a document database (MongoDB) since JSON is the primary payload construct, and they are very flexible (optional fields may be omitted, simple to extend schema, etc.). Regardless if you choose a document store, a key/value store, or relational database, a good object model should easily mold to each system type.

By abstracting from the underlying storage system, you can enable flexibility to change platforms (if need be).

TL;DR

  • Write a style guide for consistency
  • Start small and iterate
  • Try things out before making core changes
  • Model types (generic), not instances
essential