Last week, I presented my working Metasys report at the Andover Green Advisory Board meeting! Before then, I did make some changes, such as consolidating everything into one script and parameterizing the file name, which make it closer to a product than a prototype. The presentation went great and I got some really awesome questions and suggestions from the Board. I'm so grateful for the opportunity to present and get this kind of feedback. In a similar vein, I will be receiving feedback from the Facilities sometime soon.
When I finished the Metasys version of the reporting engine, I sent the reports over to Mr. Navkal. After he suggested some small changes to make the visualizations more clear and descriptive (one of the main ones being to migrate the sensor issues page to a spreadsheet just for clarity), he sent me the raw data from March. I was then able to run my programs on the data and successfully create four weekly reports, one for each week of March. These ran smoothly and error-free!
I will be presenting my new software at the Andover Green Advisory Board meeting this Wednesday the 28th!
It's been a while since I updated this blog, but I have some very exciting updates!!
After trying a few different testing methods and following some rooms through the process, I realized that I had been going off of an old version of my program when examining the output. In the newer version of my code, the low_co2 spreadsheet (which has now been renamed sensor_issues in order to avoid another misunderstanding of this nature) designates rooms with possible sensor issues in temperature and/or CO2, not just CO2. In addition, it is filtered such that if a room has a potential temperature sensor issue, it won't appear in the warm or cold spreadsheets due to the data being unreliable, but it may still appear in the CO2 spreadsheet assuming there's no issue with the CO2 sensor, and vice versa. This means that rooms in the low_co2 spreadsheet that are there because of a temperature sensor issue may still appear in the high_co2 spreadsheet, but those that are there because of a CO2 sensor issue will not. This explains the issue from the earlier post and provides a lesson on the importance of both maintaining and frequently referencing quality documentation.
After confirming the validity of the spreadsheet data, I was then able to connect it to the visualization aspect, which involved debugging several data type errors and other issues of that level. The report now runs successfully from start to finish (apart from a minor issue with an extra line appearing on some of the graphs - EDIT: this was fixed by adding in an additional sort!).
This stage of the project has been a great learning process, both in terms of the difficulties of integrating different systems and the importance of keeping up with good practice as a software developer.
I haven't updated this blog in a while, but here's what I've been up to over the last couple months with my project.
I debugged the SQL issues, which was a time-consuming process, but after a while I finally realized that the old data that was still in the permanent database could be causing the issue. But after relocating to a fresh database, for some reason 1) the error was still happening and 2) there was still data from February in the table deep into the processing stage where the error happens. After opening the new MetasysLog database using database viewing software and sorting by timestamp, it was clear that everything in that database is from September. This meant the program was still feeding in old data from another database. It turned out that although I had been updating DailyDatabase, DailyTempDatabase, DailyCarbonDatabase, and FilteredT3Database (all from Task 3), because there were errors on Task 4, I never reached the end of Task 4 where I was able to clear the databases. For this reason, the old data from February was still in the databases. As a result of discovering the bug, I created and ran a separate file to clear the databases.
Next, after clearing the databases, when reading from DailyDatabase, I got a
ValueError: invalid literal for int() with base 10: b'=\x00\x00\x00\x00\x00\x00\x00'`. In order to solve this, I wrote a function fix_bytes() and applied it to the columns that were producing the byte values (all the Intervals Too ____ columns) just before writing to SQL. Then, in posttask4.py, I modified the convert_to_int function to be compatible with floats as well. This made everything run smoothly!
After fixing the data type conversion issues, I was able to run everything with no errors! However, I then opened up the resulting spreadsheets to find that almost every room was flagged as having a potential CO2 sensor issue... and they all had values at around 60-80 ppm... which were all the same numbers as their temperatures in Celsius... I was not sure where the program swapped out the CO2 values for temperature ones. After checking the databases at each part of the program, I was able to zone in on the problem: there was a basic copy error within convert_metasys_data.py which replaced all of the CO2 values with all of the temperature values while running transformations. After fixing this and messing a little bit more with conversions, I was able to fix the issue!
Now that the application seems to have cleared basic testing and runs without obvious errors, I have to check the details.
I've already noticed one potential issue: some rooms have moments with both low co2 and high co2, but they're only found within the low co2 spreadsheet, while other rooms are the same way and can be found in both. This inconsistency should be further examined, and I might create a smaller testing file in order to catch more.
I've been working on getting the report up and running using Metasys data instead of the data logged by my own systems. This involves converting the format of the Metasys data into the format accepted by the permanent database `TempAndCO2Log`. This process was lengthier than I anticipated it would be, but once I had figured it out I still had some obstacles remaining.
First, the system was having trouble indexing although the indices of the data table within the database and the indices of the table I wanted to add in looked identical. I was able to debug this successfully when I realized that I had to set the index of the new table to the room number because the database had the room number as the index (it just didn't look like it when it was printed).
Next, I got some ValueErrors like this one:
ValueError: could not convert string to float: '2020 Sept - AHS ZN-T OA-T & CO2.csv'
I was confused as to why I had any kind of data point with a .csv at the end of it, let alone something the program was trying to read as a float! Finally, I figured out that there was some descriptive information at the bottom of the original spreadsheet, that had somehow filtered its way into the data. In fact, the string above had been mistakenly read as a CO2 value!
After adding lines to delete the bottom couple of rows from the original file, I ran into another issue with the permanent database -- now that the faulty rows were in the database, it was still throwing errors! I tried running some SQL commands like this one:
`DELETE FROM TempAndCO2Log WHERE Timestamp='Metasys File'`
However, these commands didn't actually delete anything from the database -- I'm not quite sure why and will try to figure it out in the future. On the other hand, I was able to go into the database using the database browser we had and get rid of all the rows where the timestamp was set to 'Metasys File.' This fixed the problem in the short-term, but for the future, I still want to figure out why the SQL didn't work.
After that, I finally was able to get to the point where I was running generate_historical_report on the Metasys data! However, this is not a process free from errors. As the standard story goes, the little elves descended on my code at some point since July -- generate_historical_report seems to be having some data type/rounding issues, not only with the Metasys data, but with the old data that my systems had logged before (and from which I have successfully generated reports). I'm going to continue looking into this issue in the next few days and hopefully figure out what went wrong.
For more documentation on all of this, see the issue history on the GitHub repository.
A few weeks ago, I made a breakthrough! I was able to successfully use the .melt() and .pivot_tables() functions to effectively do a total rearranging of the data. I now have a rough version of the program that converts the data from the Metasys format into the format accepted by my systems. I'm currently working on testing a full report directly from the Metasys data -- more updates to come!
Also, I met with the recruits one or two times since the last time I've updated this blog -- they are now beginning the same Coursera courses that I completed during my independent study! :)
Over the past month or so, a lot of exciting things have happened!
First, I've continued meeting with the group of new recruits. We've mainly been working on figuring out the logistics of starting Energize projects while adjusting to a new hybrid school schedule. They've spent some time exploring the data in their chosen areas of expertise as well. In the next couple of weeks, they will officially start their projects.
For me personally, I've been working on a new version of the reporting software that pulls data directly from Metasys instead of relying on 15-minute logs I create myself. This is really important because it means the scope of data that my software can produce reports about is much wider, and because Metasys is a widely used system, which means that I can adapt the software to produce reports about almost any school or building. However, I've also been busy with the new hybrid schedule, and converting the data from Metasys into the format accepted by my software has proven a bit of a challenge so far -- I'll put more updates on that here the next time I continue to work on it.
Finally, I presented a status update on this work to the Andover Green Advisory Board at their Wednesday meeting. I'm really grateful to them for giving me the awesome opportunity to present!!
I just realized - I never updated this blog after I was featured on CNBC as a HomeGROWN Hero for my teaching initiatives!
Here's the full article: https://grow.acorns.com/teenager-teaches-kids-to-code-in-quarantine/
I was also featured on the CNBC television special that aired in July.
This was a really awesome and exciting experience for me!
NEWS: The Eagle-Tribune and Andover Townsman wrote an article about my teaching initiatives! Link:
Over the last couple of weeks, I've also continued working on the visualization component of the reporting engine: I've focused on refining the look of the visualizations and adding a table of rooms with likely sensor issues to the end of the report. I still need to fix formatting & spacing issues with this sensor table, which will conclude the first stage of the development process. Working with matplotlib has been really fun! :)
Over the last couple of weeks, a few really exciting things happened:
1. I was named Andover Youth Services' Youth of the Week because of the virtual teaching initiatives I started there! I really appreciate that Energize Andover gave me the freedom to start running a virtual Python class with a group of 5 girls, an experience I eventually used to start virtual technology classes through the Youth Services.
2. While I recently began a summer internship, I have still been working on my Energize projects in my free time. Kate, the PhD student working with Energize through the BU URBAN program, helped me figure out the best way to visually represent the data using matplotlib. I have written a script that generates a PDF of the following visualizations:
I have also separated out the rooms likely to have sensor issues into their own spreadsheet based on certain conditions in the data. I am now making adjustments to the script to refine the visualizations a bit more.
EDIT: I forgot to mention that I also started teaching a bit of matplotlib to the new recruits! I showed them my project and explained how it works.
Also, here are some sample images of what my visualizations will look like: