In order to install wendelin on slapos, you need to request a slapos instance
from vifib or use
and overwrite an existing one (different SR).
Once you have the webrunner available create a configuration for
a custom software release:
git clone https://lab.nexedi.com/nexedi/slapos.git
Edit custom_wendelin/software.cfg to contain
version = versions
repository = https://lab.nexedi.com/klaus/wendelin.git
revision = caaf87de2c7e1edc212cd348e04c7da663de9529
revision = 1ec989d62586a1d88808a953c2e3fefc4bbb42ea
Here two custom branches/forks of the main repositiories are used and
required. If you want to substitute the revisions for the newst versions
please use the klaus/wendelin fork and the portal_callable branch of
You may also want to install keras/tensorflow. In this case substitute
the extends part to be
This may however cause problems during building and if not necessary should
Once you have set up the configuration for the software release,
open and build the software release (Home > Open Sofware Releases > Select
custom_wendelin > Green Arrow Button). Note that opening a new sofware
release erases all previous data and software from the instance. So
make sure you have no data to lose in case you re-purposed a slapos instance.
After the software release is built and the services are available you need
to request a custom frontend for ERP5 to get an IPv4 address for it (
describes this towards the end of section 2).
Once the front-end is available, add an ERP5 site, configure the database
and fix the consistencies to make ERP5 ready (follow the previously linked
tutorial if unsure what to do).
When ERP5 is ready, install the business template erp5_wendelin_configurator.
This BT contains (almost) everything needed for data ingestion with Wendelin.
Once the business template is installed, got to My Favorites > Configure your Site
> Wendelin. Unselect Jupyter unless you need it and start. Wait until the
configurator has completed. Now you should have a lot of new modules starting
of the form "Data something".
Finally, go to your preferences (My Favorites > Preferences > Default Site
Preferences > User Interface) and select the source code editor of your choice.
Save and enable (Action Bar) the preference.
Follow the instruction on the
to install (and test) Embulk.
Install three custom plugins for embulk by either pulling them as pre-compiled
embulk gem install embulk-input-filename
embulk gem install embulk-parser-none-bin
embulk gem install embulk-output-wendelin
or getting the source and compiling them yourself. Here the repositories
of the plugins:
To test the whole setup, well formatted easily accessible data is desired.
Weather data provided by the European Climate Assesment is suitable.
If you want to follow this tutorial closely, go to
and download the blended Daily Cloud Cover CC data set (
direct download link
Unpack the data into a new directory. Create an additional directory and
copy only a few files (i.e. CC_STAID000001.txt CC_STAID000002.txt) into
it. This directoy and these files will act as test for the ingestion
You can (should) also truncate the test files to only contain 20-100 data lines
to make testing and debugging easier.
Of course you can also use any other data for ingestion. However, if you
want to follow this tutorial closely, it is suggested to stick to the
weather CC data for now.
In the following component names of ERP5 are marked with as such
For some reason an important Portal Category was not included in the installed
business templates, so it needs to be created manually.
Got to My Favorites > Configure Categories. Search for title %Use%, descend to Big Data,
then descend to Ingestion. Action > Add Category, fill it in and save
(do not validate in this case).
It takes some while for the portal categories to update and become available.
To speed up this process, go to
and press "Clear all cache factories". Now the newly created Portal Category should be available.
This Callable Script is responsible for writing data to a stream.
Go to My Favorites > Portal Callables > Add PyData Script. Fill in the fields, save and
validate. Here the argument list
data_chunk=None, out_cc_stream=None, bucket_reference=None
and source code for c/p
This component represents the local machine from which the data is acquired
This component represents the action performed on the raw uploaded data. In this
case we want to write into a Data Stream (i.e. call the Callable Script described two
This component represents the type of data within Wendelin, a Data Stream. Originally the data
was in CVS format, hence the name.
Do not forget to fill in Quantity Unit and Use.
The Data Supply component connects the Data Operation (with
its Callable Script) and the Data Product which determines
the format of the data (in this case Data Stream).
Two Data Supply Lines need to be added representing the Data Operation and
Data Product respectively.
For Product or Service you can use the wheel to select the previously created
Data Operation or Data Product.
Go to My Favorites > Manage Ingestion Policies > Add Ingestion Policy.
Fill in the fields, save and validate it.
Then go to Metadata and set the id to weather-cc.
This is important, as it is the name of the API endpoint Embulk will try to reach.
The Ingestion Policy represents the API endpoint Embulk will communicated with.
This Callable Script parses the Embulk tag which provides information about the type
of data uploaded. The names for the automatically created components
Data Ingestion and Data Stream are determined based on this information.
The script also determines the Data Product to use.
Here the code for c/p
reference_tuple = reference.split('.')
# The tag specified in the embulk configuration
data_product_tag = reference_tuple[-1]
# The Data Product
data_product_reference = "Weather-CC-CSV"
movement_reference = data_product_reference
'resource_reference' : data_product_reference,
Embulk only needs a single configuration file telling it where to find the input-file(s)
and how to process them before writing (uploading) them to the specified location.
Create a file upload_wendelin.yml on your local PC (with Embulk installed) and edit it to contain
Replace /path/to/data/directory to point to the (test) directory containing the prepared
test data files.
Replace XXXXX to be our instance and YYYYY to be the id
(not reference!) of your Ingestion Policy. Further fill in ERP5 user
and password for ZZZZZ and PPPPP.
Now that everyting is set-up you are ready to try and upload the example weather data. Run
embulk -J-O run upload_wendelin.yml
Embulk will then try to upload the data. If everything finishes
without errors shown in the Embulk output, Wendelin accepted the data.
Some of the errors that can appear in embulk
If embulk does not report any errors check Modules > Data Ingestions. You should see
a new object
Go to Modules > Data Streams. Here you should also see a
new object called Weather-CC-CSV. Inspect the stream and verify it has data in it.
In the right column, Total size (bytes) should
not be 0, but have a few thousand bytes of data, depending on the amount
of test data you uploaded.
A quick overview of the different modules and components used for