Week 10 : Working on Pandas

Changing documentation

This week I started to tackle a documentation issue in pandas. The problem was subtle but can be confusing for the users. The optional allow_duplicates parameter in the method DataFrame.insert was listed in the API reference with a default value of <no_default>. From digging in the source code I realized that the pandas backend initializes the parameter as an internal object lib.no_default to check if the user explicitly passed an argument. If they didn’t, the code checks it and logically sets allow_duplicates to False. Then <no_default> is then directly put into the docstring and doesn’t tell the user the default behaviour of the parameter. I followed the official contributing guide to change the documentation and the docstring, where it suggests

The utility script scripts/validate_docstrings.py can be used to get a csv summary of the API documentation. And also validate common errors in the docstring of a specific class, function or method.

Read More

Week 8 : Getting familiar with Pandas

This week we finalized the decision to contribute to Pandas. Everyone in our group was able to successfully install the development environment for Pandas. We started to get familiar with the Pandas project, the community, and its codebase. We also by join the project Slack for easier communication. We started to find places in the project where we can contribute to. We first looked at some issues tagged “good first issues”, but many of them were already assigned to other contributors. We then looked at documentation related issues and testing related issues. The plan is to find an issue in these to categories and claim it as a starting point. We have also made plans to join the Pandas developers meetings that happens every Wednesday, although the one this week was canceled. Maybe we can learn more about making first contributions in the meeting

After watching the reports by the other groups, it seems like some other groups know very well what they want to do, and what new features they want to contribute to the project. I think we can take inpiration from them on what we can contribute to Pandas. However, I do believe that we are on schedule with our pace on contributing. Our group has everything set up and are ready to contribute to Pandas once we decide on a issue.

Read More

Week 7 : Pandas

In the process of deciding the project that we’ll be contributing to, we have looked at several options such as VS Code, Zed, and FreeCodeCamp. We finally decided on Pandas as the project because we evaluated it to be very active and beginner friendly. Most of our group has worked with Pandas before and are familiar with the Python language. In our meeting we also planned ahead and tasked everyone to install the development environment of Pandas and FreeCodeCamp, which is our backup project if one of us can’t successfully install the development environment for Pandas.

Read More

Week 6 : Choosing a Project & small contributions

Small Contributions

I’ve made contributions to OpenStreetmap and Wikipedia. For OpenStreetmap, I updated a closed restaurant with it’s replacement restaurant, and deleted a food court that was closed that were next to my dorm. The closed drestaurant was one of my favorite and it was really sad to see it move somewhere else. This was my proudest change because it help others who wants to find a place to eat around the area. For wikipedia, I’ve updated some equipment lists with new acquisitions and more credible sources. The biggest challenge I faced was finding a reliable sources to support my changes, so I can ensure I’m not spreading misinformation.

I’ve also found a broken link to the CAS academic policy on the course homepage, and created an issue on it in the github repo.

Choosing a project

I want to contribute to a project that I’m actively using or have used in the past. Then I’m somewhat familiar with its functionality, and knowing that I’m using a project that I’ve contributed to is a great sense of accomplishment. However, while looking at Beyond-All-Reason, a real time strategy game I’ve played, I don’t think it’s a good project because its written in a language that I don’t know (Lua). Therefore I think I’ll want to contribute to a project written in a language that I’m familiar with (Python, Java, C), which shouldn’t be that hard to find since they are the most common languages.

Read More

Week 5 : Extension Project Presentations & OSS Conference

Presentations

This week we presented our browser extensions in class. Our extension was on the simpler side, but it still had all the required components for it to be a open source project. The biggest takeaway from our project is value of in person collaboration. Although we all had busy schedules, we still found time to meet and work together in the library. It allowed us to talk through our ideas and communicate much more efficiently than over text or zoom. It also made debugging easier since we can directly look at each other’s screens.

The extensions and presentations by the other groups were all fabulous, it was fascinating to see the creativity they had and the different problems they solved. I realized the scope that could be done on just a browser is much larger than I previously thought, especially if I use apis to further integrate backend services or LLMs.

OSS Conference Videos

Wednesday’s class was asynchronous so we watched select videos from the Linux Foundation’s Open Source Summit. One of my concerns that I had for a long time with open source projects was the security, which was discussed directly by Linus Torvalds. He believes that the community factor is the most important, citing the recent XZ backdoor incident as a positive sign since the exploit was caught before doing much damage. He argues that such cases of community vigilance comes directly as a result of the project being open source, whereas for hardware and hardware related bugs, the secret nature makes it hard both for malicious hackers to plant backdoors and for developers like Linus to work on such bugs. Craig McLuckie also points out how people are using AI to generate large amounts hostile packages to flood package registries. This threatens the trust that open source projects rely on to thrive and that we need to choose our dependencies wisely.

Presentation Styles

From the presentaions I’ve watched this week, I realized that sometimes the technical details isn’t as important as I thought. When I present, I should be more focused on shaping a clear core idea to the audience and make the presentation easier to understand, even for people who aren’t so familiar with the topic. To do so, I should to relate to the audience more, use more analogies, and explain the logic behind certain decisions.

Read More

Week 4 : Git & Project Evaluation

This week we learned git in depth and how it helps us to version control our code. It was especially helpful to know what git does in the background after we type git commands in the terminal. I got the chance to look at the .git directory, which stores the commit history and other data about our repository, and see it’s contents using git commands such as git log. I have only used git and github as a VCS for my own projects, but never for collaborating with others on a large project. I’m excited to use the new tools such as git branch when contributing to FOSS projects.

We also looked at open-source projects to evaluate them based on its activeness and easiness of contribution. Our group looked at Godot Engine, an open source game engine. We found the project to be very friendly to new contributors and very active. There were more than 3100 contributors and both the number of commits and issues were in the tens of thousands, and the latest commits or issues being submitted hours or even minutes before our review. There’s a clear document on how to contribute and there are example issues that are good for beginners to create.

I’ve also looked at the evaluations written by other class memebers, it seems like that most large open source projects such as Godot or Scikit-learn are friendly to new contributors and have a clear guideline to contributing. However, with these large projects it might also be more difficult for me to find places to contribute as I’m not so familiar with the project and doing so might take too much time. I think the part that excites me the most about working on an opens source project is knowing that I will be using the same project that I’ve contributed to. So I plan on looking into some projects that I’ve used and see if any of them are fit for me to contribute. At the same time, choosing a project that I’ve already used also means that I’m already somewhat familiar with the functionalities of the project, which will make it easier to understand the codebase.

I’ve also made some contributions on OpenStreeMap, updating the status of closed restaurants that have been replaced by new ones around my dorm. It’s exciting to know that I’ve made the map just a bit more accurate than before.

Read More

Week 3 : First Project Progress

Browser Add-on project

Our first open source project aimed at making a broswer extension. We started off in groups and learned about how to create a simple browser extension in Firefox. Then we made an a bit more complicated extension that allowed the user to select a beast from a popup. We analyzed the FOSS elements of the MDN WebExtension example repository. It was super helpful for us to see from this example what we needed to include in our browser extension repository.

We also examined some existing non-Firefox official browser extensions, such as Return Youtube Dislike. Return Youtube Dislike is a open source browser extension for both Chrome and Firefox. My biggest contribution was help finding the repository and the FOSS elements in this project. It was really interesting to see it include README and CONTRIBUTING files for tens of languages.

We discussed some ideas for the browser extension that we’ll be making, such as one that tracks the time stayed on a given tab, but we haven’t decided the exact idea we’ll be using. We have arranged a meeting next week to work on the project in person. I found the group work part of this project toarranged a meeting next week to work on the project and hopefully we’ll also contribute some be very exciting. In all of my previous CS classes, projects/labs all required individual work and had little to none collaboration.

Read More

Week 2 : Code of Conduct

Code of Conduct Activity & Reflection on Presentation

The community element in open source development is arguably more important than the code element. As the community for a project grows bigger and bigger, it inevitably will require strangers who might never have interactions in real life work closely together. They might live totally different lives and have the complete opposite values. Thus there needs to be a set of guidelines or rules to ensure the project functions efficiently as an organization. I believe that the code of conduct is important because it determines how contributors to the project interact, collaborate, and resolve disputes with each other. The Go Community Code of Conduct serves as a good example, it explicitly defines what the contributors should and shoudn’t do.

The Go Community Code of Conduct is adapted directly from the Contributor Covenant, version 1.4, as it added a section on Gopher Values and the goals of the code of conduct on top of the original covenant. This adds a layer of focus on the positive behaviours that benefits collaboration, while the original covenant focuses on negative behaviours to avoid. The Eclipse Community Code of Conduct is also adapted from the Contributor Covenant. Eclipse expanded the section on enforcement actions and added the no retaliation rule, which protects contributors that raises issues from being punished or harassed.

The Suger Labs Code of Conduct, which is based on the Ubuntu Code of Conduct. This code of conduct focuses more on how to collaborate with each other, rather than protecting members from malicious behaviour seen in the contributor covenant. It serves more as a tutorial of what to do as opposed to a set of rules telling contributors what not to do.

Another example is The Docker Code of Conduct is adapted from the Slack Developer Community Code of Conduct, The Ada Initiative, geekfeminism.org, and Drupal Events Code of Conduct. Although it’s a shorter code of conduct to the ones above, the Docker Code of Conduct also includes in-person interactions on top of the digital interactions. It’s mentions unacceptable behaviours at in-person events such as “Inappropriate or unwanted physical contact”, and also specifies physical enforcements such as “expulsion from the conference without refund, and referrals to venue security or local law enforcement” in case the code of conduct was breached.

In the presentation How to Drive Consensus & Transparency in OpenSource Communities, the speakers Jill Lovato and Trishan de Lanerolle offers another perspective on the code of conduct. They argues that although a code of conduct can be helpful in safeguarding community members from others’ extreme behaviours, it’s up to the contributors to foster a consensus on how the day to day operations of the project is like. Sometimes referring to a rigid rulebook like the code of conduct will be less efficient than simply letting “lazy consensus” solve the issue.

Read More

Week 1 : Getting Started

What is Open Source?

Open Source to me means that the line between users and developers are not so rigid. Users can always inspect the source code, and if there is a part they don’t like or a feature they think is missing, they can become a developer to contribute to the project. This has both pros and cons simultaneously. I remember learning about how a hacker contributed to the Linux xz Library for three years genuinely to gain the trust of the other contributors before installing a backdoor. This is an almost unavoidable consequence to letting anyone contribute. On the other hand, the backdoor was found and fixed because a developer using the library noticed a performance decay and inspected the source code. I think this total transparency is what makes open source unique and so important for the software community.

Open Source Projects I Use

  • Windows Subsystem for Linux (WSL)
  • Wikipedia
  • Docker
  • Visual Studio Code (VS Code)
    I use WSL and Docker for my development environment as they make the environment isolated and consistent. WSL runs a Linux distribution on my Windows machine, and Docker allows me to run programs in a container. VS Code is the editor I use as it has many useful extensions. I use Wikipedia for quick searches of topics that I’m not so familar with.
Read More