On Wednesday, due to the half-day, we had a pretty short class. I was able to get through the lectures for week four (because there are only a few), which dealt with a library called seaborn as well as using pandas in conjunction with matplotlib's features. Over the weekend, I intend to both complete the optional practice assignment for Week 3 and get a head-start on the final assignment for Week 4 (the last week of the course!). On completion of the course, I will then focus on tying up the loose ends with the Weekly Report.
Today, I split my time again between Coursera and work on the weekly report. Over the weekend, I had worked a lot on the project, which had four different levels of difficulty (these were just so that we would have options on how much to push ourselves -- not for additional grades or benefits). I had figured out the second hardest level, so after refining and polishing my project, I switched focus to learning how to send e-mails using Python scripts. I drew up a quick script using this tutorial. In the future, I'll need to deploy and test this on the server, as well as start week 4 of the Coursera course!
On Friday, I worked on the data visualizations course. I finished the lecture videos and began working on the project, which commenced with a reading. This was a study on making graphs easier to read when they have margins of error, a topic which is also the focus of the Week 3 project.
Since I had worked a lot at home on the weekly report yesterday, I decided to focus on Coursera today, getting through all but two of the week's videos. The videos covered topics like histograms, box plots, and heatmaps. I'm excited to apply these more advanced visualization techniques to both the weekly projects and to my work with Energize!
Today, having finished Weeks 1 and 2, I started Week 3 of Coursera and watched a video about subplots. Additionally, I continued my work with the weekly report for the latter half of the class. At home, I finished debugging the date/time functionality, testing the new strategy on example data. As of now, the functionality seems to be fixed!
Yesterday, I again tried to split my time between debugging the Weekly Report and Coursera.
This time, however, I got pretty far ahead with Coursera. As of now, I'm really close to having all the data cleaned and ready to be plotted. As for the weekly report, I was able to implement the different timestamp format in my free time, but the current issue is that SQL won't accept the new format. The solution for this problem is in progress.
Today, I split my time evenly between Coursera and debugging the Weekly Report.
During the first 30 minutes of the class, I finished the Coursera videos and began the project, which looks interesting and is coincidentally about weather conditions!
During the second half of class, I worked on debugging the faulty times (see my Day 24 post for more details on the issue). I inserted "test rooms" into the data to see where they would land, re-creating the specific buggy data (November 9th was the first time a room was too cold, and the 5th was the last time) and changing it until it worked. I discovered that the issue seemed to be that I was using the np.min function -- when I isolated the function, it gave a TypeError: cannot perform reduce with flexible type. I think this means that you cannot find the minimum of two Timestamps directly. However, I have test code that does a bit more conversion work to find the minimum, so the next step is implementing that directly into the DataFrame.
As you probably know if you have been reading this blog for a while, the place Energize students deploy much of their code is a remote server that runs Linux. On Monday night, one of the other students using the server contacted me letting me know that I should back up/save any files I was storing there, because he was going to run updates on it. When I sent these files to myself (including the first weekly report, which has yet to be deployed to a mailing list), I peeked at the output file and discovered a bug in the timing features.
In other words, for some of the rows, the first time a problem occurred was actually later than the last time! I resolved to debug later, but I also needed to work on Coursera, which I focused on during class.
In class and at home, I worked through almost all of the videos, in the process learning how to make scatter plots, line graphs, and bar charts in Matplotlib! I'm definitely gaining comfort with the library at a faster speed than I did with Pandas, probably because of my increased comfort with Python and efficiency in Googling/looking through documentation.
Today, I mainly worked on testing and documentation, but I also started the second course in the Applied Data Science with Python Specialization on Coursera.
First, I deployed the new task system to the server and tested that the cron job was running successfully. After completing this, I worked on documentation (linked here) to help those at the facility to understand the report. Additionally, I updated the GitHub repository's README and repository description.
I still had twenty minutes left of class, so I began the next Coursera course. Now, I'll continue working on the course at home, especially over the long weekend!
After finishing the rough version of the weekly report, I consulted with Mr. Navkal about including additional pertinent data for the most useful possible report. He sent along a list of data points that should be included for each room:
On Monday (before today's class), I put in about an hour of work at home in order to accomplish these objectives. In class today, I was able to finish a version that achieved all these objectives!
One challenge that I encountered was that in my first draft of this version, I calculated all these data points for each day. However, as I was finding the mean and median temperature and carbon dioxide throughout the week, I realized that this data point could not be aggregated daily and then weekly. This is true because in a data set, the mean of the means of different parts (days) is not necessarily the mean of all the data, and the same goes for the medians. I eventually saved all the temperatures and carbon dioxide values alone with their room numbers to sort of a copy database, which was not accessed until the weekly task (Task 4). This use case was an important example of why it is important to have copies of the raw data, so that it is easier to obtain accurate analysis later on.
Now that the bulk of the data analysis work is done, I will work on deploying this new code to the server and start gathering data in real time!