I tested and debugged the functionality I worked on last time and added further abstraction in the form of another function to get the data from aggregating participation in three elections (which were indexed a little differently than town meetings, meaning the call to agg_voter_data was different) directly.
Instead of printing this data out, I wanted a way to save it somewhere, so I returned the values I was calculating in the form of a Series instead of printing them out. I then added all of these Series into a DataFrame which was then logged to a CSV file.
From this CSV, I hope to be able to visualize voter participation over time from a variety of different samples. I created pie charts surrounding the sample of voters who did not participate in the first event, comparing the percentage that continued to be inactive throughout and the percentage that was active in another event later on in the selected time frame.
My goal right now is to continue highlighting different data points and bringing them to life by creating various types of visualizations. In addition to the pie and bar charts I've created so far, I want to make line graphs that go into more detail of voter participation in things like local and state elections.
voter engagement deep dive
Today, I debugged my visualizations from last time and then began using pandas to get together some statistics surrounding voter engagement over time.
My goals at the outset were to collect a few key data points about how voter engagement evolved over time:
- Did citizens who were active in earlier years keep up momentum or lose it?
- How often does the reverse happen? Are citizens increasing their engagement over time?
- In general, is engagement trending upward or downward each year?
I used statistics about town meeting attendance as well as voter participation in a number of elections starting from 2012.
First, I used some groupby functionality to dissect the sample based on how many town meetings were attended in 2017, 2018, and 2019. This way, I was able to glean what percentages of each subgroup kept up momentum (attended at least one meeting in 2017, and returned in the next two years), gained momentum (began attending in 2018 or 19), or lost momentum over time. I then repeated the same process to record participation in samples of 3 local or state elections from 2012 through 2019.
Now, I have a function that aggregates engagement data given any three events (election or town meeting) and provides these statistics. I'm still trying to figure out one issue with standardizing this function, as the town meeting data and election participation data are stored in different ways.
civic engagement updates
I've been busy with the college application process over the last few months, but I just wanted to update this blog on my progress with the civic engagement project! So far, I've continued exploring the database and started working on some bar graphs and charts using matplotlib to help bring the data to life visually. This includes data on how voter registration and engagement as calculated within the database corresponds to student status and precinct membership. I'm currently working on generating accurate and clear bar graphs for these data points as well as a line chart representing the progression of voter engagement for different subgroups over time. I hope to add other visualizations to the project depending on what other data I encounter!
Civic engagement data analysis update
It's been a while since my last post, as things have been quite busy with college applications, but I have worked a bit on my new project and just wanted to post a quick update here!
The new data I'm working with centers around voter engagement at the local, state, and national level. I started by analyzing voter registration, first as compared to precinct and then as compared to student status. I followed those results with analysis on the civic engagement score, a precalculated number that factors in various different aspects of voter engagement, allowing for a more balanced view across precincts and students vs. non-students. Together, this code should shed light on whether there are differences in voter activity between different precincts, and whether students exhibit different engagement patterns than their non-student peers. After extracting those key sections of the data, I moved on to what I'm currently working on: finding out whether there are correlations between civic engagement and more environmentally focused statistics like water or energy usage.
Ideally, all of the functions I'm creating analyze different key cross-sections of data, and will work on any batch of data formatted in the same way as our SQLITE database. In the future, I hope to configure it so that users can compare data points over time, rather than just seeing an overview of the current conditions. But first, I plan to create visualizations for these key points, and expand the project's scope further toward a fuller picture of engagement statistics.
SUmmer updates + New project!
It's been a busy summer, but I just wanted to update this blog about what I've been up to!
With my teaching initiatives at AYS, I led the transition to in-person classes for summer 2021. We ran two 4-week sessions of Python and one 4-week session each of CAD and Web Design at the Cormier Youth Center. Instructors were from both inside and outside of the Robotics Club. This brought the total number of students to 83 and instructors to 17. I'm so grateful that I could scale the teaching initiatives I started at Energize Andover to impact 100 people! (Thank you once again Mr. Navkal for supporting me!)
I also returned to Codio for my second summer as an intern! I was so grateful for their flexibility in terms of my schedule, which allowed me to teach at the Youth Center on weekdays.
For my next project, I am analyzing civic engagement data -- more about that in a future post!
Last week, I was honored to present my work to the Facilities! I explained the new version of my script and its capabilities, and asked for feedback/direction for the future. They liked the tool and were interested in using it to assist in future CO2 and temperature analysis! I'm really excited about this and am glad my work will help the Town analyze data in the future.
Stay tuned for information on my next projects!
Last week, I presented my working Metasys report at the Andover Green Advisory Board meeting! Before then, I did make some changes, such as consolidating everything into one script and parameterizing the file name, which make it closer to a product than a prototype. The presentation went great and I got some really awesome questions and suggestions from the Board. I'm so grateful for the opportunity to present and get this kind of feedback. In a similar vein, I will be receiving feedback from the Facilities sometime soon.
When I finished the Metasys version of the reporting engine, I sent the reports over to Mr. Navkal. After he suggested some small changes to make the visualizations more clear and descriptive (one of the main ones being to migrate the sensor issues page to a spreadsheet just for clarity), he sent me the raw data from March. I was then able to run my programs on the data and successfully create four weekly reports, one for each week of March. These ran smoothly and error-free!
I will be presenting my new software at the Andover Green Advisory Board meeting this Wednesday the 28th!
Working metasys version developed!
It's been a while since I updated this blog, but I have some very exciting updates!!
After trying a few different testing methods and following some rooms through the process, I realized that I had been going off of an old version of my program when examining the output. In the newer version of my code, the low_co2 spreadsheet (which has now been renamed sensor_issues in order to avoid another misunderstanding of this nature) designates rooms with possible sensor issues in temperature and/or CO2, not just CO2. In addition, it is filtered such that if a room has a potential temperature sensor issue, it won't appear in the warm or cold spreadsheets due to the data being unreliable, but it may still appear in the CO2 spreadsheet assuming there's no issue with the CO2 sensor, and vice versa. This means that rooms in the low_co2 spreadsheet that are there because of a temperature sensor issue may still appear in the high_co2 spreadsheet, but those that are there because of a CO2 sensor issue will not. This explains the issue from the earlier post and provides a lesson on the importance of both maintaining and frequently referencing quality documentation.
After confirming the validity of the spreadsheet data, I was then able to connect it to the visualization aspect, which involved debugging several data type errors and other issues of that level. The report now runs successfully from start to finish (apart from a minor issue with an extra line appearing on some of the graphs - EDIT: this was fixed by adding in an additional sort!).
This stage of the project has been a great learning process, both in terms of the difficulties of integrating different systems and the importance of keeping up with good practice as a software developer.
I haven't updated this blog in a while, but here's what I've been up to over the last couple months with my project.
I debugged the SQL issues, which was a time-consuming process, but after a while I finally realized that the old data that was still in the permanent database could be causing the issue. But after relocating to a fresh database, for some reason 1) the error was still happening and 2) there was still data from February in the table deep into the processing stage where the error happens. After opening the new MetasysLog database using database viewing software and sorting by timestamp, it was clear that everything in that database is from September. This meant the program was still feeding in old data from another database. It turned out that although I had been updating DailyDatabase, DailyTempDatabase, DailyCarbonDatabase, and FilteredT3Database (all from Task 3), because there were errors on Task 4, I never reached the end of Task 4 where I was able to clear the databases. For this reason, the old data from February was still in the databases. As a result of discovering the bug, I created and ran a separate file to clear the databases.
Next, after clearing the databases, when reading from DailyDatabase, I got a
ValueError: invalid literal for int() with base 10: b'=\x00\x00\x00\x00\x00\x00\x00'`. In order to solve this, I wrote a function fix_bytes() and applied it to the columns that were producing the byte values (all the Intervals Too ____ columns) just before writing to SQL. Then, in posttask4.py, I modified the convert_to_int function to be compatible with floats as well. This made everything run smoothly!
After fixing the data type conversion issues, I was able to run everything with no errors! However, I then opened up the resulting spreadsheets to find that almost every room was flagged as having a potential CO2 sensor issue... and they all had values at around 60-80 ppm... which were all the same numbers as their temperatures in Celsius... I was not sure where the program swapped out the CO2 values for temperature ones. After checking the databases at each part of the program, I was able to zone in on the problem: there was a basic copy error within convert_metasys_data.py which replaced all of the CO2 values with all of the temperature values while running transformations. After fixing this and messing a little bit more with conversions, I was able to fix the issue!
Now that the application seems to have cleared basic testing and runs without obvious errors, I have to check the details.
I've already noticed one potential issue: some rooms have moments with both low co2 and high co2, but they're only found within the low co2 spreadsheet, while other rooms are the same way and can be found in both. This inconsistency should be further examined, and I might create a smaller testing file in order to catch more.
I'm a high school senior and programming enthusiast.