Today, I continued with Week 2 of Coursera!
The course introduced DataFrames, the most important data structure in Pandas. (Of course, I had worked with DataFrames in the other tutorials and the actual code prior to this, but it was nice to have a more formal, in-depth lecture about it.) One of the biggest topics that was covered was boolean masking, which is a way to only show the subset of a DataFrame that fits a certain requirement (filling the rest with NaN). Interestingly, the technique they described here (using the '.where' and 'dropna' methods) was different from the one I had used in the other tutorials and in the Energize code. However, they both served exactly the same purpose. Additionally, there was a lot of other experimentation with DataFrames. I'm getting closer to finishing Week 2, so I'll probably get to the project in a couple days. On Thursday, I continued with Week 2 of the Coursera course, which began exploring Pandas. While of course I knew some of the material, the videos, being longer, went much more in-depth than the tutorials I had read before.
The first couple videos dive into series in Pandas. These data structures store and index elements in a specific order. These tutorials were really fascinating to follow along with! After I went home, I decided to get a little more formal about my experimentation with Task Scheduler. I created a spreadsheet, in which I changed settings one by one and recorded whether or not the job ran on schedule. Through this method, I figured out that if I changed the settings such that the computer did not sleep when the lid was closed in the first place, it worked! While a good temporary workaround (all the jobs ran perfectly the next day!), this isn't the most power-efficient solution. Today, I remembered that I had been in a similar situation in the past -- where Windows settings looked fine, but the desired outcome (at that point it had been enabling my microphone and camera) wasn't being achieved. The solution to that was to go into my device(not OS)-specific settings, in which those elements were disabled. Now, I wondered if something similar was happening with the wake timers. However, it seemed like some settings were missing from the device-specific settings center, which was updated recently. This was confirmed by many online forums I read, some of which were from as recent as this week. Luckily, I should be able to switch onto the school's Ubuntu server in a few days! However, this was a good experience through which I got to strengthen my problem-solving muscles. Yesterday, I finalized and published this very blog!
Additionally, I finished up the last couple minutes of the last video from Week 1 of the Coursera course. At home that night, I studied for a quiz on all the material so far, which I would take the next day. For the rest of the class period, I worked on the Task Scheduler conundrum. I figured out that it had run three times in the last two days, all without me doing anything and all fairly randomly. On Monday night at around 11PM, and on Tuesday at around 7PM and 10PM, the task was shown to have run and data had been written to the file! Even more confusing, during the last run, a test project which I had created also wrote to its file. I'm not sure why it did this, but I hypothesized it could have something to do with plugging the computer in before going to bed, even though I had turned off the setting which said "Run only if computer is on A/C Power." Going home for the afternoon, I continued trying to figure out this issue. Wondering if the action of waking the computer was disabled somewhere else, I Googled some settings -- but some of the settings they mentioned (in Power Options, a setting to "Allow wake timers") were missing on my computer! I will keep trying to figure this out! Yesterday, I mainly continued with the Coursera course. I'm almost done with Week 1 -- now I mainly have to study for an upcoming "quiz"!
Some of the advanced Python functions were really interesting and served to make things much more efficient. For example, the lambda feature allows for a quick way to create a function that fits into one expression. Ex: example_function = lambda a, b: a - b (This would return the difference of two inputs, a and b) Additionally, I noticed that the task scheduler did not run. There could be many reasons for this, but my current theory is likely that my computer was not open at the time of the job. I tried changing several of the settings such that this would not affect the ability to perform. One issue I found was that it would only run the task if it was connected to AC power. I changed that, but it still did not run the program. Another idea I had was to check the box that said "Wake the computer to perform task". That, too, resulted in the same outcome. The task scheduler is still a work in progress, but I'm going to keep trying until it works! I met with Mr. Navkal on Thursday. He gave me a new assignment: to use the methods of analyzing data I had set up to accomplish a few meaningful tasks:
At home, I was able to accomplish this using the Pandas functions I had learned earlier. Example: Where new_data is the dataset and temp_min has been set to 65 degrees, this accomplishes the third objective: print(new_data[new_data.Temperature < temp_min][['Location', 'Temperature', 'CO2']]) Once I finished these objectives, I began catching up on blog posts from the first week, and began a Coursera course taught through the University of Michigan (it's a 5-part specialization, the first of which I began this weekend). The course material is really interesting so far! In addition, I received my next assignment: to record the data regularly as a "cron job" (or the Windows equivalent, Task Scheduler) and put it in a CSV file. The scheduled times would be at the start of the school day, at 10:00 AM, at noon, and at 2:00 pm. Of course, this required I learn what a cron job is -- it means scheduling your computer to run a task at a certain time. I thought this was really interesting. After trying out Task Scheduler in a practice file, I combined it with my updates in the problem_areas_cron.py file. Of course, I haven't really been able to test it out yet as 7:39 AM tomorrow is yet to happen, but hopefully it will work! Today, I was able to finish the sorting from yesterday and commit back to my fork on GitHub.
Here's how my sorting code worked: (this was in pure Python: at this point, I hadn't utilized Pandas all that much) I made a two-dimensional list (with 3 rows) beforehand. During the iteration of the csv file (which was already happening in the sample code when it was making the requests), I added the value in the current row to two of the list's rows (so one could be sorted and the other would preserve the original place), while adding the label to the third. After the looping was done, I sorted the first row of the list. Then, I printed based on the order in the first row, but I still had the original index of the current element in the second row. I was then able to match the correct name by using the name at that original index in the third row. As an additional note: before this, I had a bit of a foundation in Git/GitHub, but this experience is really solidifying my Version Control knowledge! I used a lot of StackOverflow to guide me through the process and troubleshoot. In particular, I had cloned Mr. Navkal's original repository to my computer, and I wanted to figure out how to commit back to my fork instead of the original one. In the end, I read that I had to run git remote set-url origin (link), which worked well. After I figured this out, I thought, "Wait, how can I actually use Pandas to work with this data?" Turning my attention to the ahs_air file, I realized that the CSV file I was reading from didn't have actual data -- instead, it contained room numbers and the IDs they were using to access the API. If I wanted to store and read the actual data using Pandas, I would need to write to a different CSV file. By the time I figured out how to write each row (by replacing the print statement in the original file) to the new CSV and read in the data with Pandas, class was over. I'm so excited to continue working with the data, especially now that I can apply my knowledge from the tutorials! After I finished the Pandas tutorials last class, Mr. Navkal e-mailed me about next steps, linking me to a GitHub repository containing example code to fetch data from Energize's API. (Here's my fork, which has been updated past this point) The data is vast and contains detailed sensor-based information about various environmental factors of the facilities at Andover High (and some overall information about the other schools in the district).
After forking the repository and setting everything up on my computer, I started looking at the data. One file that stood out to me was ahs_power.py, which retrieved the amount of energy in kilowatts that each section of the school was using. The example code just printed out the sections in alphabetical order with their respective energy usages. Therefore, I decided to try and sort them in ascending order of how much energy they used. Not having yet made the connection that I could write to a new CSV file, I read in the data and started working on a way to sort and print the results with a two-dimensional list. At the end of this class, this sorting was still a work in progress. Today was the first official day of my Independent Study!
For some background, I discovered computer programming last year, and have been learning as much as I can from a variety of programs. This summer, I attended a summer camp and a online class, so I learned a lot of object-oriented Java, data structures, and algorithms. At the start of the school year, I found myself in a programming class that covered material I already knew, thanks to my summer work. I discussed with my teacher and guidance counselor, and was lucky enough to get the opportunity to work with Energize Andover as an Independent Study. For the past couple days, I had been informally working on tutorials for a data science library called Pandas, which is used with Python to analyze data stored in a table-like format (CSV files). I was able to finish the third and final part of the tutorials today. So far, I’ve learned how to use a myriad of functions, from counting and summing columns to maximum and minimum to mean and median to merging and sorting. One challenge I faced was adapting the lesson to a slightly different environment -- while the tutorial assumes you use Jupyter Notebook, we use PyCharm IDE at Energize Andover. Overall, the tutorials were a great learning resource, and I can't wait to use these in real-world applications! |
AuthorI'm a high school senior and programming enthusiast. Archives
March 2022
Categories |