Over the past month or so, a lot of exciting things have happened!
First, I've continued meeting with the group of new recruits. We've mainly been working on figuring out the logistics of starting Energize projects while adjusting to a new hybrid school schedule. They've spent some time exploring the data in their chosen areas of expertise as well. In the next couple of weeks, they will officially start their projects.
For me personally, I've been working on a new version of the reporting software that pulls data directly from Metasys instead of relying on 15-minute logs I create myself. This is really important because it means the scope of data that my software can produce reports about is much wider, and because Metasys is a widely used system, which means that I can adapt the software to produce reports about almost any school or building. However, I've also been busy with the new hybrid schedule, and converting the data from Metasys into the format accepted by my software has proven a bit of a challenge so far -- I'll put more updates on that here the next time I continue to work on it.
Finally, I presented a status update on this work to the Andover Green Advisory Board at their Wednesday meeting. I'm really grateful to them for giving me the awesome opportunity to present!!
I just realized - I never updated this blog after I was featured on CNBC as a HomeGROWN Hero for my teaching initiatives!
Here's the full article: https://grow.acorns.com/teenager-teaches-kids-to-code-in-quarantine/
I was also featured on the CNBC television special that aired in July.
This was a really awesome and exciting experience for me!
NEWS: The Eagle-Tribune and Andover Townsman wrote an article about my teaching initiatives! Link:
Over the last couple of weeks, I've also continued working on the visualization component of the reporting engine: I've focused on refining the look of the visualizations and adding a table of rooms with likely sensor issues to the end of the report. I still need to fix formatting & spacing issues with this sensor table, which will conclude the first stage of the development process. Working with matplotlib has been really fun! :)
Over the last couple of weeks, a few really exciting things happened:
1. I was named Andover Youth Services' Youth of the Week because of the virtual teaching initiatives I started there! I really appreciate that Energize Andover gave me the freedom to start running a virtual Python class with a group of 5 girls, an experience I eventually used to start virtual technology classes through the Youth Services.
2. While I recently began a summer internship, I have still been working on my Energize projects in my free time. Kate, the PhD student working with Energize through the BU URBAN program, helped me figure out the best way to visually represent the data using matplotlib. I have written a script that generates a PDF of the following visualizations:
I have also separated out the rooms likely to have sensor issues into their own spreadsheet based on certain conditions in the data. I am now making adjustments to the script to refine the visualizations a bit more.
EDIT: I forgot to mention that I also started teaching a bit of matplotlib to the new recruits! I showed them my project and explained how it works.
Also, here are some sample images of what my visualizations will look like:
A few important things happened over the last couple of weeks, so I will update this blog on them here.
1. School is ending this week, so in order to give the new recruits a bit of a break, the Energize recruit class will meet only once every 3 weeks. However, the plan is for the students to work on their projects of interest in their small groups.
2. I finished testing the historical data version of the weekly report -- it is completely free of bugs and usable.
3. I will be working with a PhD student, Kate, through the BU URBAN program! We met yesterday to discuss how I can apply my project to a health context.
Kate is currently studying the ways to use carbon dioxide as an indicator of ventilation quality, which is particularly relevant to places looking to safely reopen during the COVID-19 pandemic, as bad ventilation can increase the risk of disease spread. Therefore, we are currently looking into connecting that aspect of things to the reporting engine.
On Monday, I continued training the new recruits. Before class, I spent about an hour developing challenges for them to take on (and corresponding solutions) using the student database. (You can see them at my GitHub repo here.)
When class started, they were divided into two groups (one for water consumption and one for political data). Each group worked on challenges that were tailored to the type of data they wanted to work with.
As some students have Chromebooks and are using repl.it (an absolutely fantastic tool!), we tried to integrate the database into repl. When this did not work, I asked them to collaborate with their groups through repl, such that the person with the capacity to run code on the database would test each time they were ready. This worked much better, but figuring out the setup took a bit more time than expected.
Back in September, I had figured out how to read from a sqlite file based on the example file in the database's Google Drive folder. In order to give the new recruits an exercise in "real-world" problem-solving (as opposed to a classroom-like environment), I gave them the same challenge to start, having them glean knowledge from the example rather than teaching it to them directly. Interestingly, both groups were getting errors when their code was perfect; we eventually realized that the databases had somehow become empty when they were copied into the repository.
I am really proud of both groups for adapting really well to both the challenge and the technical issues that came up along the way. The rest of the challenges involve using the pandas library -- I can't wait to see where they go with them next Monday! (Also, keep on the lookout for blog links next week!)
Today, I worked for about an hour on debugging the Weekly Report. My main objective was to fix the incorrect values in the Days with Problems column, and I'm happy to say that I succeeded in debugging this error!
Luckily, I realized that I still had the code for the all_data DataFrame (the one I had previously used to get the correct Days With Problems values), albeit commented out, so I first un-commented out that code. Once I had that set up, I had to figure out exactly what was being stored in all_data and how I was going to merge the all_data table with my new weekly_log table. Once I had made sense of the data, I cleaned up the merge from which the DataFrame originated, and finally, I had to use a series of groupby commands to isolate just the day for each problematic interval and find the number of unique days belonging to each room. This took a few tries to get right. Finally, I merged the new DataFrame into the weekly_log. After checking back against my manually calculated test data, I came to the conclusion that the error had been fixed.
Today, I worked for about an hour on debugging the Weekly Report.
I discovered that the issues with timestamps were actually human errors, not programmatic errors. However, I looked deeper into the discrepancies in the number of days with problems. This number is too high in the program's results because it counts any day on which the room has data instead of filtering out which days have problems.
This should not happen because Task II filters out any intervals that don't have problems. At first, I thought that perhaps when I reference the old database in Task IV, I unknowingly bring back the days without problems. I tested this suspicion out by using the debugger and some strategically plotted breakpoints. First, I tried breaking at the end of the "Task III" portion of the code, which led me to discover that the DataFrame at the end of Task III also contained the unproblematic intervals. This meant that the problem was not in Task IV at all -- it had to be earlier, since the data being aggregated in Task IV already had the unproblematic rooms.
This couldn't be possible, because Task II should filter it out before it goes into the daily database in Task III... however, when I broke at Task II, I finally realized the issue.
Sometime in January, I had changed the central DataFrame of Task III to include all intervals, not just problematic ones, so that I could find the true highest and lowest values. However, I had not realized that the "Days With Problems" column would be aggregated incorrectly.
Now that I know the origin of the problem is not with the switch to the historical report, my task is to develop a solution that correctly counts the number of days with problems.
The last couple months have been pretty busy for me, so I haven't gotten around to posting in a while. (Don't worry -- I'll definitely be posting more regularly, especially as I start working on the report more often!)
As for the class I started for the 5 new girls I recruited into the program (more on that here), they all finished the course! After that, I taught them Version Control (for those who didn't know it) with Git and GitHub, how to use the PyCharm IDE and debugger, and actually guided them through the same data36 pandas tutorial (all 3 parts) that I used when I was first learning to use the library. In the last couple weeks, Dan, Ayush, Justin, and I all demoed our projects for them as examples of the kind of things they will be making.
Now, they are beginning projects in the areas of water consumption and political data. (They were placed into smaller groupings based on their interests. Rishika and Holly are working on water consumption data, while Madeline, Avanthika, and Sarah are working on political data.) They are also starting blogs similar to this one -- stay tuned for links to their blogs on this page!
I'm so proud of how far they have all come -- it is a super awesome achievement to learn a whole language and some of the basics of development, as well as start working on real-life projects, in less than 2 months!
In other news, the Weekly Report is currently undergoing testing. I used similar test rooms to the ones I used to test version 1, and manually calculated values to compare against the data I was getting from the report. Currently, it seems that there are some errors, which I am in the process of debugging. Specifically, the number of days with problems as well as some of the times are having errors.
Finally, the future of the Weekly Report is looking very bright. I have been lucky enough to get the opportunity for a really exciting partnership on the project, but I'll go into more detail on that once we get started in a couple of weeks.
Overall, I am really excited for the future of not only my own projects, but those of the new recruits!
On Saturday, I spent about an hour and a half setting up the test for the new Weekly Report and testing the warm and cold spreadsheet.
I adapted the test I had used before, which included test rooms with made-up temperature and CO2 values to reflect a variety of test cases, to fit into the historical report. Since the report produced was comprehensive, I decided to focus on temperature for that day -- everything checks out with the values I determined manually with a calculator (a process which took a decent amount of time even with the few data points I had -- that's why automation is so helpful).
Next, I will test the report on carbon dioxide values and then start deploying to the server. This new version will only make use of cron for the fifteen-minute logging and the two programs (task_zero and generate_historical_report) run at the end of each week. Additionally, since school is closed, the values collected will not be meaningful; they are simply a test of the capabilities of this new report.
I also have a bit of functionality to add to the final piece of the new report, based on what I was told by Facility members in January. In the automated email, I should include the top 5 or so rooms that need attention, so that the Facility members can look at them. This is an easily reachable goal as it simply requires the method DataFrame.head() to return the top 5 rows of each DataFrame in sorted order.