Week 10 : Working on Pandas
Changing documentation
This week I started to tackle a documentation issue in pandas. The problem was subtle but can be confusing for the users. The optional allow_duplicates parameter in the method DataFrame.insert was listed in the API reference with a default value of <no_default>. From digging in the source code I realized that the pandas backend initializes the parameter as an internal object lib.no_default to check if the user explicitly passed an argument. If they didn’t, the code checks it and logically sets allow_duplicates to False. Then <no_default> is then directly put into the docstring and doesn’t tell the user the default behaviour of the parameter. I followed the official contributing guide to change the documentation and the docstring, where it suggests
The utility script scripts/validate_docstrings.py can be used to get a csv summary of the API documentation. And also validate common errors in the docstring of a specific class, function or method. However, I could not find the file
scripts/validate_docstrings.pyin the newest version (3.0.x) of Pandas. From looking at the source code of older versions, the file existed in the previous version (2.3.x). I’m not sure if I should bypass the verfication and just submit a pull request on my changes. I will reachout to the some of the more senior contributors on what I should do in this case. I will also try to change the contributing guide as well.
