Blog Posts

Wrapping up the Independent Study...

1/19/2020

Last week, sadly, was the final week of my Independent Study! During the week, I worked on a paper that will summarize my experience this semester, as well as a presentation for the Facility and other groups like the Andover Green Advisory Board.

Additionally, on the weekend I implemented the changes discussed in the previous blog post (including all values in the highest/lowest/mean/median, splitting it up into four sheets to improve readability). Now, I am debugging on the server - for some reason, some of the tables in SQLite don't seem to be saving, but since I just deployed yesterday, it's possible that my testing is getting in its own way!

WRAPPING UP, PRESENTATION, AND NEXT STEPS...

1/15/2020

Over the last couple of weeks, I officially wrapped up the Independent Study, and the term came to an end. I have been working on several projects to showcase what I've learned and accomplished over the semester.

First, I wrote a paper outlining my work throughout the semester. This process took up most of the time I had during the final few classes, and the paper went through a few different drafts.

Simultaneously with this process, I finally updated the report itself to show the real mean, median, highest, and lowest values, instead of taking those from the pool of problematic values only.

Next, I developed a presentation that showcased the Weekly Report and, more generally, my journey so far with Energize. Last week, I presented this talk (as well as a 3-day demo version of the report) to the Andover Green Advisory Board. Tomorrow, it will be presented to the Facility staff themselves!

Currently, I still only have the 3-day demo version, because I discovered that my program had not been logging data into the permanent database (or, for that matter, any database) the way it should. I am still debugging this issue, but I am excited to present the structure of the report and the way it works!

day 42: 1/9/2020

1/9/2020

At yesterday's meeting, Mr. Navkal and I had discussed consolidating the Weekly Report into a more readable and presentation-audience-friendly format, as a giant data table may not be the most appealing of forms in which to present a report. Therefore, I spent some time today working on scripts that would solve this issue, based on examples supplied by Mr. Navkal.

At first, I wrote a program to print out the facts in this form:

Room #: example1
Highest Problematic Temperature: x
Lowest Problematic Temperature: y
Mean Problematic Temperature: z
Median Problematic Temperature: a
Room #: example2
Highest Problematic Temperature: x
Lowest Problematic Temperature: y
Mean Problematic Temperature: z
Median Problematic Temperature: a

and so on, as well as the equivalent for carbon dioxide. Then, Mr. Navkal sent me an email containing an example format in which the spreadsheet was divided into parts based on temperature problems and CO2 problems.

I also realized that since the highest, lowest, mean and median values are all calculated from the pool of problematic values rather than all the values for a room, I should probably have a version which does the opposite. I also began working on this version today.

Overall, the changes I'm making should help to make the final product more readable and helpful.

Day 41: 1/8/2020

1/8/2020

Today, I continued playing around with Matplotlib and tried to plot some relevant data from the Weekly Report. However, I had to cut this time short due to a club fair event, at which I represented the Andover Robotics Club along with leaders of the other two teams in the club.

In the evening, I attended a public meeting on Community Choice Aggregation, or CCA. My interest in the program was sparked a few months ago, when Mr. Navkal told me about it and later presented the idea to the Environmental Club (of which I am a member). At that point, I had the idea to use my Data Visualization skills to help create interactive data charts in support of the program. (CCA is a program that enables a town to develop a default plan that uses more renewable energy than the state requires. Consumers can then choose between the default and options with less or more renewable energy.) While I probably won't get to this project before my Independent Study ends, I definitely want to work on it as soon as possible!

DAY 40: 1/6/2020

1/6/2020

Today, I mostly worked on my final presentation. I wrote a rough draft of a script that discusses my journey with Energize (so far!), from the start of the independent study to the completion of the Weekly Report. Additionally, I began playing around with Matplotlib in PyCharm (I've only ever used in in the Jupyter Notebook, so this was new) in the last 10 minutes of class.

DAY 39: 1/3/2020

1/3/2020

Happy New Year!!

Today, I worked to solidify the remainder of the Weekly Report -- and successfully sent the email using the Python script! Now, I am not only able to produce a comprehensive report each week, but successfully send that report to any e-mail list.

This exciting milestone came at just the right time -- sadly, the term ends in a couple of weeks, meaning that my Independent Study is drawing to a close. I am so lucky to have received this opportunity back in September, and happy that I've learned so much about data science and the real world of development since then! As for final plans, I will prepare a presentation over the next few weeks, but details about that are TBD.

Day 38: 12/20/19

12/20/2019

In class today, I set out to find the issue with the mean and median values. I created a tester file in which I performed np.mean and np.median on what should have been the temperatures in the test series. While at first I got drastically different values than what was included in the output file, I soon realized how I had been misinterpreting what my own program was doing. The test data was [80, 70, 70, 70, 70, 70, 70, 70, 5], the mean of which is somewhere in the lower 60s, but the output in the report was 42.5. However, my program filters for problematic rooms only, meaning that the series would have been whittled down to [80, 5] -- the mean of which is 42.5. I made this clearer by updating the columns to say "Mean Problematic Room" and "Median Problematic Room".

After this, I dealt with another Git adventure. For the past week or so, I have been making structural changes to the way the report is organized in order to improve readability and efficiency. Such that I would not ruin my working (albeit slightly inefficient) version, I created a new branch on which to make these changes. Now that the new version was finished, I merged the branches back together. Then, I realized that the old code was in the file called task_three, while the new one resided in task_three_cleanup. An attempt to refactor both of these (into task_three_old and task_three respectively) resulted in a lot of merging issues. Essentially, it seemed that the system had tried to merge the old file with the cleanup file and drop all that code haphazardly into the main file. After learning how to revert changes, I did so with the last few commits, but the code was still not the same! After thinking about possible solutions for a while, I realized that fortunately, I had not deleted the other branch. Recovering that branch, I changed it to the default and just deleted the master branch!

After this, I began deploying to the server (with the old names!). There is now only one part left to set up -- I need to create an e-mail account for this purpose and make sure it's actually sending mail. Then, the report will finally be done!

DAY 37: 12/18/19

12/18/2019

Yesterday (12/17), we had another snow day, so I did not have class. Today, I continued testing the Weekly Report and managed to verify all the values -- except for the means and medians! When I checked these against the real values, they were significantly off for both my test rooms and for both CO2 and temperature. I'll need to continue investigating this error.

DAY 36: 12/13/19

12/16/2019

I just realized I hadn't made a blog post for Friday's class. This past weekend was pretty busy for me, as I competed in three programming competitions (the American Computer Science League on Friday, the Acton-Boxborough Competition for Informatics and Computing on Saturday, and the USA Computing Olympiad on Sunday).

On Friday, I continued testing the Weekly Report on sample data. I found that rooms with only one problem interval experienced issues in Tasks 3 and 4, so I spent time trying to debug that. This process involved a lot of if-statements. If the data type of the listing of all the problems for the room (typically a multi-row DataFrame) was a single-row Series, I had to convert it to a DataFrame. I think most of the problems are fixed now, but I'm still testing for more complicated test cases.

DAY 35: 12/12/19

12/12/2019

Yesterday at home, I continued working on the weekly report. So we could aggregate the times of the most extreme values in Task Four, I copied all the date/time values into a different database, which would then be read from weekly. Additionally, I used more conversion functions to fix the SQLite type errors I had been getting.

Today in class, I finished a rough Task Four with the new requirements. However, when I looked at the output file, I noticed that all the "intervals" values did not seem to be numbers! (It looked something like: b'\x05\x00\x00\x00\x00\x00\x00\x00'. After Googling this strange format, I learned that bytes were an actual data type, to which integers can be converted. I decided to trace the cause of the problem. When I printed to a CSV file just before writing it to the database, it printed correctly, but the same action directly after reading it back in was impossible to decipher! This meant that when I was writing to the database, the integers were somehow converted into bytes. Luckily for me, I found that there is an integer method, from_bytes, which can reverse this conversion.

After solving this problem, I began testing the code by running task_one_and_two a certain number of times, followed by task_three_cleanup (essentially, simulating 'interval' runs and 'daily' runs of the program). However, when I ran task_one_and_two only once before running the daily file, I ran into an error! During calculation of the intervals, I grouped the temperature database by a boolean column ("High Temp?") stating whether it was too warm or too cold. I then counted the size to achieve the number of intervals. However, when a room had only one entry, it read as a Series, not a DataFrame! I am still trying to fix this issue.

Overall, I made good progress in the past 24 hours!

<<Previous

Forward>>