Fork me on GitHub

Other articles

  1. Data wrangling in Pandas and Spark - Time series of energy production

    An important part of the everyday tasks of a data scientist is data wrangling. Input data must be accessed, retrieved, understood, and transformed before machine learning can be applied to create predictive models. While most talk is around deep learning these days, this less sexy topic is arguably more important for real life situations.

    read more

    comments

  2. Which is the most popular language for data science in 2017?

    Which is the most popular language for data science in 2017?

    While performing a market analysis for a business case I looked at some trends in the field, and made a quick query to answer the above question "for fun". Since the result was both striking and rather unexpected I decided to share it with you in this blog post. (Plus, it gave me an excuse to add support for Bokeh read more

    comments

  3. Building a data lake 1: Weather and time

    I am currently building a data lake which will be used to improve operations at an energy company using machine learning. Among the many interesting topics the following are prioritized: Can we predict the energy production of hydroelectric, solar and wind power plants? Can we predict the energy consumption using weather reports? After all, home owners need to heat their homes more on a cold winter day than on a sunny day in October. read more

    comments

  4. Using Folium to show geographic data

    This post demonstrates how to

    • Display a map from OpenStreetMaps using Folium
    • Add custom shape files to define regions of interest
    • Color the regions of interest based on data in a pandas dataframe
    • (New) Brief introduction to GeoPandas.

    The code of this post is available at https://github.com/rsandstroem/IPythonNotebooks/blob/master/GeoMapsFoliumDemo/GeoMapsFoliumDemo.ipynb .

    August 1, 2017: This post is almost a year old, but I decided to make some technical improvements to it to improve the viewing experience online. (It is the Swiss National Day after all!) During this renovation, I stumbled upon GeoPandas, which is a great package worth mentioning in this context. read more

    comments

  5. Simple MongoDB demo

    Introduction

    This blog post is a tutorial that introduces the basics of MongoDB and how to utilize it within a python environment. To accomplish this we will use pymongo to connect python with MongoDB.

    Mongo DB is a convenient NoSQL database which uses JSON syntax to store and query documents. I am frequently using MongoDB when dealing with streaming data from sensors or APIs. In big data scenarios I find MongoDB powerful since it allows for replication and sharding, thus enabling distributed computing and high availability, but that is a topic for another tutorial!

    read more

    comments

blogroll

social