Thinking of the distributed cloud like a giant computer

In my intro to computer engineering class, my professor, James Ho, emphasized that a computer has 2 key components. Memory and calculation. A calculator just takes inputs, performs a function, and returns outputs, but a computer stores some outputs in an internal state, and it uses that state in future computations. So one model is- next state is a function of inputs and current state and outputs are a function of inputs and current state as well.

But how does this translate to the cloud? In a distributed world, there are a few different components, which are actually usually computers themselves. We have different types of storage systems- databases, distributed filesystems, block filestores. From one angle, there are lots of tiers of caching. (Small memory systems like databases, or distributed caches, with faster access vs large memory filestores with big records and slow access). We have systems for data flow- queues and notification services, aggregators, etc. From my aws perspective, sns, sqs, and kinesis are popular tools for abstracting data movement across systems. We have compute systems, serverless setups to do logic, which we avoided in the data flow & data store situations. There are workflow orchestrators and a supply of tools for machine learning/statistics, security support, networking support, and things that are not coming to mind atm.

If we are looking down from above, how do we call this a computer? Do we observe, that every database is a computer, and every SQS queue and log aggregator, is another computer, and that we are working on computer farms? Or do we pretend we are working with one giant computer, where we have state spread across the world through our databases, and compute spread across our serverless setups for our docker containers. Our networking tools are actually the first level of the calculator work, since they are doing a little calculation grouping data. Or maybe they're just really long/slow wires. Machine learning tools etc can be inputs, and they can also be internal computation steps, taking in the current state + inputs and giving us results for future state + output.

Going in a different direction, how do we apply techniques like pipelining? Are we in a slow world, where explicit checks (if statements at the software level) are the name of the game? The timing on data transfer is so long and complex, milliseconds in a good case, maybe microseconds, but hardly single digit nanoseconds we can handle at a gate level. Could we even pipeline? If we started sending requests from A to B, and A is sending requests at a high rate (every 50 nanoseconds), how controlled are arrival times? Can we keep the messages in order? The abyss between A and B might be 5 milliseconds wide, but can we satisfy a hardware pipeline with a 50 nanosecond clock cycle? 1 message per? Or will we get 2 messages in 1 cycle, and an empty cycle on the next? Is there a point?

Published: 2019-08-11