We also want to have some data to work with. A simple way to get that is to generate it using scikit-learn's method to generate ten blobs in a three-dimensional space.
To make it more realistic we also add an 'id' column of strings, which in real life is often the customer id or the IP address of the IoT device, or similar.
Last we write the dataset as a CSV file, which despite being an awful format, is the one I encounter every day.