Typical software workflows gather, process, and store information. They are triggered by events and can pass information to other workflows. Workflows come in all shapes and sizes, ranging from simple to very complex with many moving parts.
Information, in the form of data, can come from anywhere—local or remote—using communication protocols that format the data into a processable form. Executing logic on the gathered data changes an application's state. The resulting state then needs to be stored for other applications. Storage can range from faster but temporary to slower but longer-lasting.
The issue is that "data" is not the same everywhere. Data shared with other organizations differs from data shared within an organization. Crossing organizational boundaries matters.
AI Assistants, like OpenAI's ChatGPT, are here. Blockchains, thanks to the crypto boom, are also here. AI can calculate and analyze data, while blockchains can store data in an immutable and fault-tolerant vault. AI Assistants can answer questions about data and can be accessed from anywhere via an API. Ethereum blockchain smart contracts can perform logic and keep data safe from modification and system failure for a long time while permitting access from anywhere.
AI Assistants should process data that can be easily verified, while smart contracts should store data that can be easily verified. AI Assistants will take on more responsibility for data processing as costs decrease and trust in AI builds. Blockchain smart contracts will store more data faster as costs decrease and performance increases.
I developed a proof of concept application cluster with an OpenAI GPT-4o Assistant that stores shared data in a Web3 Ethereum distributed ledger implemented in a smart contract running on the Energy Web Foundation (EWF) Volta blockchain. The Energy Web Foundation supports the Volta testnet and EWC mainnet chains. These are fast, proof-of-authority Ethereum blockchains designed with energy in mind.
A C# application acts as a data pump that gathers and formats data before passing it to the AI Assistant using MQTT messaging. The AI Assistant analyzes the data and initiates actions that result in shared state data being sent to the blockchain.
State data is divided into confidential and shareable data.
Shareable data is stored in smart contract hash tables that map onto SQL tables. This enables straightforward extract, transform, and load (ETL) into a local ORACLE MySQL database or Microsoft Excel for analysis and display. Capacity depends on several factors, but smart contracts holding around 100,000 records with non-trivial data complexity should be doable. Capacity increases as the data structures are simplified. Shared data has transparency and is available across organizational boundaries but cannot be changed and will stay available as long as the blockchain is alive. This provides an indelible history that can be analyzed and audited. Data access is controlled by the smart contract, so roles and permissions should be addressed before contract deployment.
Confidential data is stored and managed in a secure local or remote SQL database. Confidential data needed for referential integrity, such as customer identity, is protected in the smart contract using 256-bit hash keys.
Partitioning and clustering based on the confidentiality of data enhances safe data sharing and collaboration across organizational boundaries. The simplest approach is binary: confidential (i.e. private) vs. shared (i.e. public). Many use cases, such as distributed energy resources, could benefit from system architectures that are designed up front to protect what should be protected, and share what should be shared.
A cluster architecture leveraging advanced AI and blockchains enables the creation of components that keep data in different locations for different reasons while leveraging the decreasing cost of machine intelligence. Let AI do what it does best, SQL do what it does best, and blockchains do what they do best. All working together.
As a system grows, clusters can be cloned with multiple smart contracts running on a blockchain or on multiple blockchains. Partitioning a cluster to stay within the data limits imposed by one smart contract keeps complexity at bay as long as shared data is limited. Using public blockchains for production may, or may not, be prudent, but they are cost-effective for beta systems.
Separation of data storage into "can see" and "no can see" makes sense if the goal is for wider "can see" collaboration in a world where danger lurks around every virtual corner. Data can't be found if it's not there, or well hidden.
Dave Hardin