When embarking on a journey to lay the foundations of a platform for present and future, an architect must carefully weigh several primary considerations to ensure the system is not only technically sound but also poised for future growth and success.
Here are some key considerations, each paired with critical questions to ask and insightful advice, including considerations for leveraging AI.
Core Considerations
Technology Selection
Latest vs. Stable
Balancing cutting-edge capabilities with the stability of the chosen tech stack ensures reliability. Evaluate trade-offs by considering the technology’s roadmap, compatibility, and system criticality.
Ecosystem and Community Support
Robust ecosystems accelerate development. Assess ecosystems by contributor activity, update frequency, and resource availability.
Insight: Consider emerging technologies aligned with long-term vision. Ensure fallback plans for calculated bets on new tech.
Here are some of the bets we made, including the rationale behind each:
- Open Policy Agent (OPA): OPA serves as a flexible and efficient policy engine primarily used for authorization, making it a popular choice for Kubernetes (k8s) due to its flexibility and efficiency. Its ability to be ported as WebAssembly (WASM) enhances its versatility.
- DuckDB: DuckDB functions as a high-performance, client-side database that supports full client-driven analytics. It stands out for its great future potential and compatibility with traditional approaches, providing a reliable fallback.
- LangChain: LangChain acts as an abstraction layer between service code and both internal and external models. This role is crucial for providing flexibility and simplicity, streamlining the integration of future models with minimal refactoring. Its open-source nature, coupled with adoption by well-known entities, underscores its value.
- Edge, Functions, and WASM: This trio focuses on delivering content and logic as close to the client as possible. The rationale behind this approach is to minimize latency, which is critical for user experience. By offloading certain functions to a scalable platform and only paying for consumption, it offers scalability and locality. This strategy eliminates the need for building global Points of Presence (PoP), thanks to the ability to run binary code on edge PoP and client browsers via WASM, along with supporting other languages on function services.
Service Architecture
Scalability
Design for both vertical and horizontal scaling. Consider auto-scaling, stateless design, and load balancers. Embrace Kubernetes (k8s), functions and PaaS where possible.
APIs, Flexibility and Modularity
Use an API-first mindset with well-defined interfaces between services and externally. Ensure consistency in API structure, and enforce service contracts between services. Use an efficient and well-defined model for communication between services like gRPC and protobuf. Ideally, auto-generate OpenAPI specs and publish.
Maintainability
Maintain evolution with well-defined interfaces and documentation. Adopt standards and review processes. Publish API specs and do your best to maintain them (automation enables this).
Tip: Design a robust service mesh for easier communication management and improved observability.
Our considerations:
- 100% cloud native
- don’t assume state, embrace stateless
- require service contracts and enforce them
- API-first thinking
- Enable, don’t just write docs, explain reasoning
- Autogenerate as much as possible
- Leverage service mesh and mTLS
Infrastructure and Operations
Cloud-native Considerations
Expertise in cloud services is crucial. Understand cost models, best practices, scaling capabilities, and complexities. If you’re starting clean, prefer to leverage cloud-native including containers and functions over monoliths and VMs. Educate yourself on the latest technologies, and don’t fall doomed to reverting back to legacy constructs. VMs are dead, long live cloud-native (well, at least for any modern app).
Infrastructure as Code (IaC)
Minimize manual configurations and prioritize the definition of deployments and configurations through code. Leverage management tools like Terraform.
Monitoring
Service visibility is crucial to any deployed service. Ensure you have adequate monitoring tools and the necessary visibility. Consider forcing users to troubleshoot through the monitoring tools instead of direct access to ensure the visibility is adequate.
Security in Development Process and CI/CD
Integrate security from the start. Use security scanning tools, secure access controls, and developer training.
Advice: Leverage cloud-native technologies while monitoring costs closely. Try before you buy with various cloud vendors.
Tip: If you’re a startup, many cloud providers offer startup programs which can provide up to $200K+ in service credits.
Our considerations:
- leverage fully cloud-native and cloud services (f*ck monolithic constructs)
- use PaaS where possible/necessary (e.g., Kafka, Dynamo, BigQuery, etc.)
- service mesh for communication between services
- full CI/CD and code-driven deployments for all platform and infrastructure
Object Modeling and Definition
Type Definition Langauge
Use a common language and structure to define all object types across the system. Validate schemas, and ensure structure and style are aligned with a style guide. Write a style guide (you can find ours HERE). Explore the possibilities to leverage this definition language to drive things like tooltips and other items.
Code Generation
While using a standard object type definition language, write generation tools using things like Jinja to auto-generate necessary files (e.g., protos, gRPC specs, APIs). The more this is automated the less error-prone things can be.
Structure
Follow object and class structure best practices. Embrace the concepts of polymorphism and collapse where traits are uniform and interfaces are similar. Use inheritance to ensure all child object types have the attributes of their parents. Leverage the use of mixins, literally or figuratively, and prefer this over definition in multiple types.
Insight: Define a markup language which can be used to auto-generate necessary code. Learn more with How to Write an Object Model That Doesn’t Suck
Our considerations:
- build multi-tenancy into the object model
- build a unified object type definition language and auto-generation platform
- auto-generate and standardize as much as possible
- start small, iterate
Data Management
Choosing the Right Data Platform
Driven by the data type, use cases and performance needs. Consider consistency, availability, and partition tolerance (CAP theorem). Also, consider the types of data being stored, the use cases and interaction methods and choose the right platform. We segment this into 3 areas: intermediate (e.g., Kafka), transactional, and analytics/data warehousing. Leverage multiple platforms and services where necessary to reduce risk and ensure you’re using the right platform for the job.
Choosing the Right Data Types
Use the correct storage medium/format to ensure the best possible experience. This will vary by the data platform being leveraged. Consider extensibility, optimization support (e.g., compression, sparse), read/write amplification, and overheads. Consider formats like parquet for analytics and avro for flexible data structures.
Data Privacy
Data privacy and controls are crucial when it comes to any system of record (SOR). Go above and beyond what is required and leverage encryption, backups, multi-factor authentication (AuthN) and granular authorization (AuthZ). Embrace a zero-trust security model, and plan for General Data Protection Regulation (GDPR) and other standards. Data is the new gold, ensure you have all threat vectors covered.
Data Integrity and Consistency
Implement transactional mechanisms or eventual consistency in NoSQL databases. For analytics/data warehouse cases, ensure as little lag as possible and do everything possible to ensure data quality (e.g., avoid missing data/duplication).
Insight: Explore hybrid storage solutions for various data needs.
Our considerations:
- plan to store everything, indefinitely
- leverage best-of-breed format(s) at the right area (e.g., avro, parquet)
- keep cost overheads as low as possible
- provide a solution for customer data sovereignty (they store/own their data)
- build multi-tenancy into the object model
- never leak or lose data!
Developer Experience and Productivity
Ease of Development and Deployment
Use Docker and Kubernetes for streamlined processes. Leverage GitHub and explore the utilization of AI assists like Copilot.
Extensibility and Customization
Enable platform adoption through extensibility. Implement plugin architectures or SDKs. Prioritize the use of shared components, prefer making things generically vs. single use-case.
Avoid Black Box and Knowledge Gaps
Ensure developers all have the necessary context and promote active sharing and collaboration. The more context they have, including the more visibility on what others are doing, the more productive all can be.
Eliminate Noise
One of the most annoying things for any developer is context switching or being interrupted with something potentially irrelevant or a generic status question. Look for tools that simplify focus, cluster similar items, and prioritize correctly. Look for a tool that automates status updates and notifications downstream. (hint: DevRev)
Tip: Encourage documentation and knowledge sharing to foster innovation.
Our considerations:
- developer time is expensive automate wherever possible and let them focus
- prioritize objectively based on customer demand and impact
- focus on shared components and reuse
- build the platform we want, and love, to use (aka DevRev)
User and Operational Security
Comprehensive Access Control
Implement access control for refined access, providing full visibility of actions and audit control.
Identity Provider (IdP)
Use a cloud-based IdP like Okta, Auth0, Google, etc. Ensure all authentication is centrally controlled and all app access flows through this choke point. This ensures immediate removal of access in the case of malicious activity.
Data Segregation
Use encryption and secure multi-tenancy architectures.
Harden the User
Teach and enable employees to practice safe standards, follow best practices and embrace and a paranoid mindset when it comes to security and privacy issues.
Advice: Regularly update security practices to adapt to new threats.
Our considerations:
- Use Okta for SSO/Auth, ditch AD (FSMO roles is a dirty word in the new age)
- Create structured groups and access roles in apps and map to IdP groups
- build multi-tenancy into the object model
- leverage mTLS between services
- define everything via IaC and enforce reviews/approvals/compliance
- start compliance regulation early (e.g., SOC2)
- protect the user from themselves
Performance and User Experience
Edge Computing and Caching
Use CDNs and edge solutions for performance and scale. Try to push as much as possible out to the edge.
Client-Side Technologies
Evaluate the benefits of offloading work to the client side.
Insight: Continuously optimize user experience through monitoring and feedback.
- Leverage client-side computation via WASM and DuckDB
- Use CDN POPs to scale delivery network vs. build
- Push as much logic as possible to the edge (client or CDN)
Additional Considerations for AI Integration:
AI Readiness
Ensure data infrastructure can support AI and ML workloads with scalable storage and compute resources. How can your infrastructure adapt to the dynamic needs of AI/ML models?
Data Quality and Availability
High-quality, accessible data is crucial for training effective AI models. What strategies will ensure data quality and accessibility for AI applications?
Ethical and Responsible AI
Incorporate ethical considerations and transparency in AI applications to build trust. How will you address potential biases and ethical concerns in AI models?
AI Governance
Implement governance frameworks for AI usage, model validation, and continuous learning. What processes are in place for model governance, validation, and lifecycle management?
Embrace a mindset of continuous learning and improvement, be open to revisiting decisions as technologies and requirements evolve, and remain focused on delivering value. Incorporating AI into your architecture requires thoughtful preparation in data management, ethical considerations, and system flexibility to harness AI’s full potential responsibly.
To learn more about our actual design and decisions, learn more here: No Legacy, No Limits - The DevRev Blueprint for a Tech Utopia