In class today, I set out to find the issue with the mean and median values. I created a tester file in which I performed np.mean and np.median on what should have been the temperatures in the test series. While at first I got drastically different values than what was included in the output file, I soon realized how I had been misinterpreting what my own program was doing. The test data was [80, 70, 70, 70, 70, 70, 70, 70, 5], the mean of which is somewhere in the lower 60s, but the output in the report was 42.5. However, my program filters for problematic rooms only, meaning that the series would have been whittled down to [80, 5] -- the mean of which is 42.5. I made this clearer by updating the columns to say "Mean Problematic Room" and "Median Problematic Room".
After this, I dealt with another Git adventure. For the past week or so, I have been making structural changes to the way the report is organized in order to improve readability and efficiency. Such that I would not ruin my working (albeit slightly inefficient) version, I created a new branch on which to make these changes. Now that the new version was finished, I merged the branches back together. Then, I realized that the old code was in the file called task_three, while the new one resided in task_three_cleanup. An attempt to refactor both of these (into task_three_old and task_three respectively) resulted in a lot of merging issues. Essentially, it seemed that the system had tried to merge the old file with the cleanup file and drop all that code haphazardly into the main file. After learning how to revert changes, I did so with the last few commits, but the code was still not the same! After thinking about possible solutions for a while, I realized that fortunately, I had not deleted the other branch. Recovering that branch, I changed it to the default and just deleted the master branch!
After this, I began deploying to the server (with the old names!). There is now only one part left to set up -- I need to create an e-mail account for this purpose and make sure it's actually sending mail. Then, the report will finally be done!
Yesterday (12/17), we had another snow day, so I did not have class. Today, I continued testing the Weekly Report and managed to verify all the values -- except for the means and medians! When I checked these against the real values, they were significantly off for both my test rooms and for both CO2 and temperature. I'll need to continue investigating this error.
I just realized I hadn't made a blog post for Friday's class. This past weekend was pretty busy for me, as I competed in three programming competitions (the American Computer Science League on Friday, the Acton-Boxborough Competition for Informatics and Computing on Saturday, and the USA Computing Olympiad on Sunday).
On Friday, I continued testing the Weekly Report on sample data. I found that rooms with only one problem interval experienced issues in Tasks 3 and 4, so I spent time trying to debug that. This process involved a lot of if-statements. If the data type of the listing of all the problems for the room (typically a multi-row DataFrame) was a single-row Series, I had to convert it to a DataFrame. I think most of the problems are fixed now, but I'm still testing for more complicated test cases.
Yesterday at home, I continued working on the weekly report. So we could aggregate the times of the most extreme values in Task Four, I copied all the date/time values into a different database, which would then be read from weekly. Additionally, I used more conversion functions to fix the SQLite type errors I had been getting.
Today in class, I finished a rough Task Four with the new requirements. However, when I looked at the output file, I noticed that all the "intervals" values did not seem to be numbers! (It looked something like: b'\x05\x00\x00\x00\x00\x00\x00\x00'. After Googling this strange format, I learned that bytes were an actual data type, to which integers can be converted. I decided to trace the cause of the problem. When I printed to a CSV file just before writing it to the database, it printed correctly, but the same action directly after reading it back in was impossible to decipher! This meant that when I was writing to the database, the integers were somehow converted into bytes. Luckily for me, I found that there is an integer method, from_bytes, which can reverse this conversion.
After solving this problem, I began testing the code by running task_one_and_two a certain number of times, followed by task_three_cleanup (essentially, simulating 'interval' runs and 'daily' runs of the program). However, when I ran task_one_and_two only once before running the daily file, I ran into an error! During calculation of the intervals, I grouped the temperature database by a boolean column ("High Temp?") stating whether it was too warm or too cold. I then counted the size to achieve the number of intervals. However, when a room had only one entry, it read as a Series, not a DataFrame! I am still trying to fix this issue.
Overall, I made good progress in the past 24 hours!
I would have had class today (12/10), but I was on a field trip for one of my other classes. That said, the Weekly Report was not forgotten! Yesterday, I put in a couple hours and fixed Task III (the daily task) to adapt to the new objectives. Now, I need to fix Task IV (the weekly task) to aggregate the updated daily values.
Additionally, I got a grade for my final Coursera assignment, so I received my Course Certificate!
Today, we had a short class. I continued work on implementing the new feature (max and min CO2) into the cleaned-up file. Deciding when to merge, and which functions should be aggregated at which times, took quite a bit of logical reasoning. However, I feel I made good progress.
Yesterday, I worked on the Weekly Report, since I had finished Coursera. Looking at my code for task three (daily aggregation), I realized that due to the amount of time and debugging strategies needed to get it working, the code used a lot of unnecessary data frames. In order to resolve this, I drew out a diagram of which tables depended on which. Then, I wrote out another file (task_three_cleanup) in order to whittle it down to the necessary bits, making it cleaner and more understandable. Additionally, I began working on an additional feature that was requested by the facility -- adding in the maximum and minimum CO2 values and the times at which those values occurred.
Andover schools had a very long Thanksgiving break due to snow days! I used this opportunity to work a lot on both Coursera and the mailing list for the weekly report. When I began to update the weekly report, I realized that I had made a mistake - when the server was updated a few days ago, I had forgotten to save the crontab somewhere else! Therefore, I had to rewrite the crontab in class today. Additionally, I worked on deploying and testing the SMTP mail server in order to send the weekly report to the mailing list. Finally, since I had basically finished the Coursera project over the weekend, I made final retouches and submitted it in H-Block.