UVM Compute

UVM Compute

Quick Links

Project Partners

Josh Bongard

Project Summary

In Progress (Started June 2022)


UVM Compute is a project to create a general purpose super-computer by using the capacity of computers that are on a web page with the script. This is much like SETI (https://setiathome.berkeley.edu/) and other projects that uses spare computing to work on tasks. UVM Compute was started in 2021 using the Pyodide and was almost completed but still needs more work.

Details

The requirements for UVM Computes are purposefully minimal to make it easier for the open source community to work with. The workflow for how we would get research code to execute safely in the browser went through multiple evolutions until we discovered the Pyodide project. We chose AWS mainly because we weren’t sure exactly what the code base would need in the end and it’s a realistic choice for others who may want to launch their own version of UVMComputes. For example educators or small research groups with small budgets would likely host this in the cloud if they were to adopt it.
 

The stack for the application is Node.js, which runs the React.js website and the server API. The queue is backed up with a Redis database to hold requests waiting to be verified, encrypted or saved. The database used to store “research jobs” (the uploaded python code) is MySQL, though it could easily be adapted for MongoDB or something else. That’s because the heavy lifting is done in the browser making a modern browser the remaining requirement to complete the application data flow. 

Any over abundance of server activity is solved by placing requests in a queue system so the backend can handle (validate/encrypt/save) each request in turn without holding up the website that the user sees.
 
 
The Pyodide package is a virtual machine of sorts that we run on a web worker keeping it contained and allows for management of the process. If an error occurs or the process is taking too long the code monitoring it can shut down and restart the entire python runtime freeing the browser from the background process. There are a few other throttles that slow down the speed that new jobs are processed if the connection is poor to limit bandwidth. 
 
This is another interesting technology that is still under development. WebGPU (https://gpuweb.github.io/gpuweb/) is the W3C standard for the specification. It started development in 2017 and just recently made some updates to their draft. The Pyodide community (and several others) are ready to jump on WebGPU as soon as its implemented as it has been a much requested feature on their GitHub. – https://github.com/pyodide/pyodide/issues/1911 
 
Right now results are saved in a blob file into MySQL database. We didn’t have time to create a very sophisticated export process for data. For this version results can be anything that’s returned from the research program as text such as an array, object, JSON, a comma separated list etc. That data is packaged as a .txt file that the researcher can download. Lab members had excellent input about feeding those results back into either the same program or setting up a series of programs to further calculate results. But unfortunately that was out of our timeline. 
 
Currently only one job can be processed at a time on a browser however I considered wanting to do this in the future and have left space for it to be implemented. Pyodide is executed within a web worker and it would be fairly simple to manage two workers instead of one. What is not implemented for this step is for the controller to look at available resources and decide if multiple workers are appropriate for the device. 
 
To be specific, only one job is ever being worked on in the browser at one time. If other jobs are loaded they are waiting in the queue to be run next or are being sent as results to the server. 
 
As for threading within the researchers code, this is not allowed by the Pyodide virtual machine. They are listed as included but not working. https://pyodide.org/en/stable/usage/wasm-constraints.html?highlight=threading#included-but-not-working-modules
 
Unfortunately we don’t collect partial results and it will need to be restarted. I do my best to save jobs in browser cache (always encrypted) so that requests can be limited on page refresh or if the device is interrupted in some way (poor internet connection). But if I’m running a job on my computer but then I turn off my computer and log onto my phone, that job now needs to be re-downloaded from the server by another browser (maybe my computer, maybe another user) to be completed.
 
 
We only have support for Python using the Pyodide virtual machine currently but more language runtimes are available compiled to web assembly like Pyodide and could be implemented in a similar way.