Within Windmill: Not recommended
This page is part of our section on Persistent storage & databases which covers where to effectively store and manage the data manipulated by Windmill. Check that page for more options on data storage.
Windmill is not designed to store heavy data that extends beyond the execution of a script or flow. Indeed, for each computation the worker executing is not the same as the previous computation, so the data would have to be retrieved from another location.
Instead, Windmill is very convenient to use alongside data storage providers to manipulate big amounts of data.
There are however internal methods to persist data between executions of jobs.
States and Resources
Within Windmill, you can use States and Resources as a way to store a transient state - that can be represented as small JSON.
States
States are used by scripts to keep data persistent between runs of the same script by the same trigger (schedule or user).
In Windmill, States are considered as resources, but they are excluded from the Workspace tab for clarity. They are displayed on the Resources menu, under a dedicated tab.
A state is an object stored as a resource of the resource type state
which is meant to persist across distinct executions of the same script.
import requests
from wmill import set_state, get_state
def main():
# Get temperature from last execution
last_temperature = get_state()
# Fetch the temperature in Paris from wttr.in
response = requests.get("http://wttr.in/Paris?format=%t")
new_temperature = response.text.strip("°F")
# Set current temperature to state
set_state(new_temperature)
# Compare last_temperature and new_temperature
if last_temperature < new_temperature:
return "The temperature has increased."
elif last_temperature > new_temperature:
return "The temperature has decreased."
else:
return "The temperature has remained the same."
States are what enable Flows to watch for changes in most event watching scenarios (trigger scripts). The pattern is as follows:
- Retrieve the last state or, if undefined, assume it is the first execution.
- Retrieve the current state in the external system you are watching, e.g. the list of users having starred your repo or the maximum ID of posts on Hacker News.
- Calculate the difference between the current state and the last internal state. This difference is what you will want to act upon.
- Set the new state as the current state so that you do not process the elements you just processed.
- Return the differences calculated previously so that you can process them in the next steps. You will likely want to forloop over the items and trigger one Flow per item. This is exactly the pattern used when your Flow is in the mode of "Watching changes regularly".
The convenience functions do this are:
TypeScript
getState()
which retrieves an object of any type (internally a simple Resource) at a path determined bygetStatePath
, which is unique to the user currently executing the Script, the Flow in which it is currently getting called in - if any - and the path of the Script.setState(value: any)
which sets the new state.
Please note it requires importing the wmill client library from Deno/Bun.
Python
get_state()
which retrieves an object of any type (internally a simple Resource) at a path determined byget_state_path
, which is unique to the user currently executing the Script, the Flow in which it is currently getting called in - if any - and the path of the Script.set_state(value: Any)
which sets the new state.
Please note it requires importing the wmill client library from Python.
Custom Flow States
Custom flow states are a way to store data across steps in a flow. You can set and retrieve a value given a key from any step of flow and it will be available from within the flow globally. That state will be stored in the flow state itself and thus has the same lifetime as the flow job itself.
It's a powerful escape hatch when passing data as output/input is not feasible and using getResource/setResource has the issue of cluttering the workspace and inconvenient UX.
- TypeScript
- Python
import * as wmill from "windmill-client@1.297.0"
export async function main(x: string) {
await wmill.setFlowUserState("FOO", 42)
return await wmill.getFlowUserState("FOO")
}
import wmill
#extra_requirements:
#wmill==1.297.0
def main(x: str):
wmill.set_flow_user_state("foobar", 43)
return wmill.get_flow_user_state("foobar")
Resources
States are a specific type of resources in Windmill where the type is state
the path is automatically calculated for you based on the schedule path (if any) and the script path. In some cases, you want to set the path arbitrarily and/or use a different type than state
. In this case, you can use the setResource
and getResource
functions. A same resource can be used across different scripts and flows.
setResource(value: any, path?: string, initializeToTypeIfNotExist?: string)
: which sets a resource at a given path. This is equivalent tosetState
but allows you to set an arbitrary path and chose a type other than state if wanted. See API.getResource(path: string)
: gets a resource at a given path. See API.
The states can be seen in the Resources section of Windmill app with a
Resource Type of state
.
Variables are similar to resources but have no types, can be tagged as secret
(in which case they are encrypted by the workspace key) and can only store strings. In some situations, you may prefer setVariable
/getVariable
to resources.
In conclusion setState
and setResource
are convenient ways to persist json between multiple script executions.
Shared Directory
For heavier ETL processes or sharing data between steps in a flow, Windmill provides a Shared Directory feature.
The Shared Directory allows steps within a flow to share data by storing it in a designated folder.
Although Shared Folders are recommended for persisting states within a flow, it's important to note that all steps are executed on the same worker and the data stored in the Shared Directory is strictly ephemeral to the flow execution.
To enable the Shared Directory, follow these steps:
- Open the
Settings
menu in the Windmill interface. - Go to the
Shared Directory
section. - Toggle on the option for
Shared Directory on './shared'
.
Once the Shared Directory is enabled, you can use it in your flow by referencing the ./shared
folder. This folder is shared among the steps in the flow, allowing you to store and access data between them.