Today, I worked for about an hour on debugging the Weekly Report. My main objective was to fix the incorrect values in the Days with Problems column, and I'm happy to say that I succeeded in debugging this error!
Luckily, I realized that I still had the code for the all_data DataFrame (the one I had previously used to get the correct Days With Problems values), albeit commented out, so I first un-commented out that code. Once I had that set up, I had to figure out exactly what was being stored in all_data and how I was going to merge the all_data table with my new weekly_log table. Once I had made sense of the data, I cleaned up the merge from which the DataFrame originated, and finally, I had to use a series of groupby commands to isolate just the day for each problematic interval and find the number of unique days belonging to each room. This took a few tries to get right. Finally, I merged the new DataFrame into the weekly_log. After checking back against my manually calculated test data, I came to the conclusion that the error had been fixed.
Today, I worked for about an hour on debugging the Weekly Report.
I discovered that the issues with timestamps were actually human errors, not programmatic errors. However, I looked deeper into the discrepancies in the number of days with problems. This number is too high in the program's results because it counts any day on which the room has data instead of filtering out which days have problems.
This should not happen because Task II filters out any intervals that don't have problems. At first, I thought that perhaps when I reference the old database in Task IV, I unknowingly bring back the days without problems. I tested this suspicion out by using the debugger and some strategically plotted breakpoints. First, I tried breaking at the end of the "Task III" portion of the code, which led me to discover that the DataFrame at the end of Task III also contained the unproblematic intervals. This meant that the problem was not in Task IV at all -- it had to be earlier, since the data being aggregated in Task IV already had the unproblematic rooms.
This couldn't be possible, because Task II should filter it out before it goes into the daily database in Task III... however, when I broke at Task II, I finally realized the issue.
Sometime in January, I had changed the central DataFrame of Task III to include all intervals, not just problematic ones, so that I could find the true highest and lowest values. However, I had not realized that the "Days With Problems" column would be aggregated incorrectly.
Now that I know the origin of the problem is not with the switch to the historical report, my task is to develop a solution that correctly counts the number of days with problems.
The last couple months have been pretty busy for me, so I haven't gotten around to posting in a while. (Don't worry -- I'll definitely be posting more regularly, especially as I start working on the report more often!)
As for the class I started for the 5 new girls I recruited into the program (more on that here), they all finished the course! After that, I taught them Version Control (for those who didn't know it) with Git and GitHub, how to use the PyCharm IDE and debugger, and actually guided them through the same data36 pandas tutorial (all 3 parts) that I used when I was first learning to use the library. In the last couple weeks, Dan, Ayush, Justin, and I all demoed our projects for them as examples of the kind of things they will be making.
Now, they are beginning projects in the areas of water consumption and political data. (They were placed into smaller groupings based on their interests. Rishika and Holly are working on water consumption data, while Madeline, Avanthika, and Sarah are working on political data.) They are also starting blogs similar to this one -- stay tuned for links to their blogs on this page!
I'm so proud of how far they have all come -- it is a super awesome achievement to learn a whole language and some of the basics of development, as well as start working on real-life projects, in less than 2 months!
In other news, the Weekly Report is currently undergoing testing. I used similar test rooms to the ones I used to test version 1, and manually calculated values to compare against the data I was getting from the report. Currently, it seems that there are some errors, which I am in the process of debugging. Specifically, the number of days with problems as well as some of the times are having errors.
Finally, the future of the Weekly Report is looking very bright. I have been lucky enough to get the opportunity for a really exciting partnership on the project, but I'll go into more detail on that once we get started in a couple of weeks.
Overall, I am really excited for the future of not only my own projects, but those of the new recruits!