The last couple months have been pretty busy for me, so I haven't gotten around to posting in a while. (Don't worry -- I'll definitely be posting more regularly, especially as I start working on the report more often!)
As for the class I started for the 5 new girls I recruited into the program (more on that here), they all finished the course! After that, I taught them Version Control (for those who didn't know it) with Git and GitHub, how to use the PyCharm IDE and debugger, and actually guided them through the same data36 pandas tutorial (all 3 parts) that I used when I was first learning to use the library. In the last couple weeks, Dan, Ayush, Justin, and I all demoed our projects for them as examples of the kind of things they will be making. Now, they are beginning projects in the areas of water consumption and political data. (They were placed into smaller groupings based on their interests. Rishika and Holly are working on water consumption data, while Madeline, Avanthika, and Sarah are working on political data.) They are also starting blogs similar to this one -- stay tuned for links to their blogs on this page! I'm so proud of how far they have all come -- it is a super awesome achievement to learn a whole language and some of the basics of development, as well as start working on real-life projects, in less than 2 months! In other news, the Weekly Report is currently undergoing testing. I used similar test rooms to the ones I used to test version 1, and manually calculated values to compare against the data I was getting from the report. Currently, it seems that there are some errors, which I am in the process of debugging. Specifically, the number of days with problems as well as some of the times are having errors. Finally, the future of the Weekly Report is looking very bright. I have been lucky enough to get the opportunity for a really exciting partnership on the project, but I'll go into more detail on that once we get started in a couple of weeks. Overall, I am really excited for the future of not only my own projects, but those of the new recruits! On Saturday, I spent about an hour and a half setting up the test for the new Weekly Report and testing the warm and cold spreadsheet.
I adapted the test I had used before, which included test rooms with made-up temperature and CO2 values to reflect a variety of test cases, to fit into the historical report. Since the report produced was comprehensive, I decided to focus on temperature for that day -- everything checks out with the values I determined manually with a calculator (a process which took a decent amount of time even with the few data points I had -- that's why automation is so helpful). Next, I will test the report on carbon dioxide values and then start deploying to the server. This new version will only make use of cron for the fifteen-minute logging and the two programs (task_zero and generate_historical_report) run at the end of each week. Additionally, since school is closed, the values collected will not be meaningful; they are simply a test of the capabilities of this new report. I also have a bit of functionality to add to the final piece of the new report, based on what I was told by Facility members in January. In the automated email, I should include the top 5 or so rooms that need attention, so that the Facility members can look at them. This is an easily reachable goal as it simply requires the method DataFrame.head() to return the top 5 rows of each DataFrame in sorted order. Recently, I had been thinking about ways we can get more people into the Energize program (since I was previously the only member in 10th grade or below, as well as the only girl). Since everyone has a lot of unexpected extra time due to COVID-19, I recruited some girls I know and set up an online "class". This week, I began teaching them about Python and Data Science so that they would be prepared to start working with Energize.
About a week ago, I talked to Mr. Navkal and began recruitment. I ended up with a group of 5 girls, all in eighth and ninth grade, eager to learn or review Python and join Energize! Since then, we have had three class sessions over Zoom. As a syllabus, we have been using Codecademy's Learn Python 3 course. I'm excited to continue running the class and see where this goes! On Wednesday and Thursday, I spent a total of about 45 minutes on the historical report.
I mainly spent this time integrating Task Three (the creation of the "daily" reports) into the main program. I also separated out what I had from Task One of the old report to create the logging program, which is now a standalone program that will run every 15 minutes. In future sessions, I need to more comprehensively test Task Three to make sure the data it is producing is accurate and bug-free. I also need to integrate Task Four as the final step in creating a weekly report based on historical data. Yesterday, I spent about an hour working on the historical report. After debugging an issue with task 0, I finished integrating task 2 into the new system.
When creating a report from historical data, you need a lot more filtering than you do when running the numbers in real time -- you have to filter first by the week itself (selecting the week you want to report on), and then by day of the week. I had used a Dictionary to successfully filter out which days were school days (this is a basic implementation which assumes that every weekday is a school day -- I still have to get access to the school calendar, somehow scrape those dates and add the updated values into the dictionary) -- but after implementing most of task II and testing the results, I realized I had never actually filtered out which week I needed to select. For now, I used a simple input function to determine the start date of the selected week. (Hopefully, this will evolve into an interactive front-end where users can select the day and the parameters.) Once I had the start date, I added 7 days to make the end date, looped through all the days in between, and only set those weekdays to true in the dictionary. After this, I finished integrating Task 2 into the generate_historical_report program. Right now, it logs a TemperatureProblemDatabase and a CarbonDioxideProblemDatabase the same way the old task_two did. (Right now, it performs Task 2 once for each day -- it should run task 3 at the end of the loop as a way of "daily" aggregation, so as to save data in between days.) Today, I worked for about half an hour on the historical report.
My main objectives were to link task 0, which I had created on March 2nd, and the generate_historical_report program. I was able to successfully link them through the use of a filtered SQL database. Next, I worked on integrating task 2 into the generate_historical_report program. The current setup is a for loop that traverses the data by day (right now, it runs 7 times, but in the future I will change it to a while loop where the condition is the end of the week having been reached), and runs what was task_two (filtering which rooms are problematic at a certain interval) on that day's data. While I managed to integrate this into the for loop, right now it isn't actually separated out by day -- it is just running 7 times on ALL of the data. I need to fix this as well as the other issues next time. I just realized that I never posted about my work last Monday, March 2nd. I worked on the Weekly Report for about an hour and a half.
My main goal last Monday was to create a task_zero as a precursor to the rest of the tasks currently running. I am currently trying to have data logged from all possible time intervals, but filter it before entering the report process to ensure that my systems are only calculating on relevant values (that is, when school is in session and the building climate control systems are powered on). Task 0 would allow a week of raw data to be selected (either manually by the user in an interactive front-end, or in an automated fashion each Friday), and then filtered for whether or not school is in session. I plan to use a Datetime:Boolean dictionary that includes each day of the year as the index, and the boolean value for whether it is a school day as the value. I implemented a basic version of this where I set the boolean value to (start_date.weekday() < 5), and filtered the values based on this dictionary. While it does successfully filter, it also takes around a minute to run, because reading from the incredibly large DataFrame and applying pd.to_datetime is quite time-consuming. In the future, I hope to integrate the actual school calendar into this dictionary as well as figure out ways to increase the efficiency of the program, if necessary. Yesterday, I worked for about a half hour on generating the historical report.
On Monday, I tried to index the central dataframe using a MultiIndex (timestamp, room #) so that I could select data by time, but encountered some KeyErrors when I tried to actually get a specific row. In the debugging process yesterday, I took a step back, created a test DataFrame, and tried to index it. This helped me realize my very simple mistake -- that I should have been using .loc instead of just appending brackets to the name of the DataFrame! Additionally, I started planning out exactly how I was going to select a week of data. The most likely option seems to be taking the desired start time, adding one week to it, and getting all of the data in between. (I had considered using some kind of for loop to separate out each interval, but some weeks may be missing some intervals due to days off from school or other issues. Therefore, I decided to separate out each interval after selecting the initial week of data.) Yesterday, I worked for around an hour on the new version of the Weekly Report.
I mainly focused on trying to generate a report from historical data (as compared to taking a week to generate each new report). This would be helpful because it would further the application of separation of concerns; the part of the program that analyzes the data would be able to run immediately on any week of data. This would allow thresholds to be changed easily, to create a different report from the same data. As I began this new version, I wrote psuedocode, and documented issues I encountered along the way, so I could check back on them later. Right now, my document looks like this: Issues:
To do:
I just realized that I never updated this blog with news of my 2 presentations!
First, I presented my work to the Andover Green Advisory Board. This was an exciting experience for me; talking about my work to an audience for the first time was amazing, especially when they were so engaged and helpful! A week later, I presented to the Facility Department. For this presentation, I went more in-depth into what the report can offer as a workable product. I was thrilled with the response, as the members began discussing different use cases and options to add to the report's functionality. In between all this, I have had a few issues with the report itself. (For the last couple weeks, it seems to have been running on the same week of data.) Now that we are on February Break at school, I can get to work on debugging this issue and any others that arise. I also want to create a version that operates entirely on historical data. I am so grateful to AGAB and to the Facility for giving me these amazing opportunities to present! |
AuthorI'm a high school senior and programming enthusiast. Archives
March 2022
Categories |