Technical Writer Application for 2020 Season of Docs

Photo by Kaitlyn Baker on Unsplash

Documentation is essential to the adoption of open source projects as well as to the success of their communities. Season of Docs brings together technical writers and open source projects to foster collaboration and improve documentation in the open source space.

https://opensource.googleblog.com/2020/06/season-of-docs-now-accepting-technical.html

Open source organization for proposal

Bokeh

Title of technical writing project proposal

Improving the Documentation Experience for Bokeh Developers

Description of technical writing project proposal

Current documentation state

Bokeh has done a tremendous job in documenting visualization use cases in the User Guide [1]. In the Reference [2], you can find all the API methods afforded by their models. The documentation has grown large and there is no easy way to find misspellings, repetition errors, or formatting issues in the text [3].

You can find dozens of code examples on how you might use Bokeh with your own data on GitHub[4]. You can find some of these examples inline in the documentation but not all of them are referenced[5]. Users may spend a considerable amount of time trying to figure out how a tool works without realizing there exists code they can reference. For example, you can use Themes to style a plot on Bokeh but these examples exist in the Reference when one would expect to find an example listed inline or referenced in the User Guide [6][7].

Lastly, a subset of the Bokeh documentation could benefit from the inclusion of metadata. Bokeh uses Sphinx to build documentation. Sphinx[8] is a tool that makes it easy to document software projects. This tool does not automatically include any structured data on the HTML pages it generates. Metadata in this case is metadata about the HTML pages. When searching for “Bokeh docs” on a search engine, the results users get back do not describe the content of the page. When sharing links to the Bokeh documentation on social media sites or forums, there is no way to preview the content on the page before clicking on links.

Proposed documentation state

Automated checks for spelling, repetition, and writing style errors

Vale Linter [9] is available as a GitHub Action [10]. It checks for spelling, repetition, and styling issues on every pull request. This Action can be added to the existing build process Bokeh uses for pull requests on GitHub. Automated checks would find existing errors in the documentation to fix. This technology would prevent future errors from creeping into the documentation. Vale Linter can also enforce a consistent writing style across all documentation. For example, suggesting the term "JavaScript" over "Javascript," preferring active voice over passive voice, etc.

Additional cross-referencing across docs

Different parts of the documentation should link back and forth for a more complete discussion. Users interested in learning more about a topic should be able to navigate to the Reference from the User Guide. Users interested in seeing an example of an API method should also be able navigate to the User Guide from the Reference. All examples found in the GitHub repository should either be referenced or exist inline in the documentation.

Metadata across docs

Search engines extract and collate the metadata found on web pages to describe and classify them. Including metadata, such as descriptions, in the Bokeh documentation would give users more data when browsing search engine result pages. This metadata would also provide rich previews when sharing links to these pages. Some metadata would appear alongside these links, giving readers a preview of the content before clicking. Specifying HTML metadata, like a description, can be done by manually adding the the "meta" directive on some pages. Later,  Sphinx extensions can be developed to automate adding relevant metadata throughout the entire documentation.

Timeline

Pre-community bonding

  • Stay active as a contributor by tackling documentation issues
  • Start a friction log to keep track of areas of documentation needing improvements

Community bonding

  • Establish project requirements
  • Schedule a time to meet with mentors
  • Agree on method of providing progress and updates

Week 1

  • Set up and test Vale to check for existing spelling and repetition errors
  • Identify terms to ignore that cause spelling errors like http, Bokeh, JupyterLab, etc.
  • Add a new text file with list of terms to ignore when checking for spelling errors

Week 2 and Week 3

  • Identify suggested terms to use throughout documentation for consistency
  • Add a new style guide for suggested terms
  • Configure Vale to run on every pull request submitted to Bokeh

Week 4 and Week 5

  • Start working on improving cross-referencing across Bokeh documentation
  • Identify existing Bokeh examples not shown in-line in documentation
  • Link examples in the documentation to the source code location on GitHub

Week 6 and Week 7

  • Review topics covered in the User Guide
  • Identify topics to link to sections in the Reference

Week 8

  • Identify pages on https://bokeh.org/ and manually add metadata
  • Investigate existing Sphinx extensions that can be used to add metadata across docs

Week 9

  • Integrate existing Sphinx extension or develop a new Sphinx extension to automatically add metadata across docs

Week 10

  • Test Sphinx extension(s)

Week 11

  • Finish remaining tasks
  • Start working on Season of Docs project report

Week 12

  • Finish project report
  • Submit project report to Google

References

  1. User Guide - https://docs.bokeh.org/en/latest/docs/user_guide.html
  2. Reference - https://docs.bokeh.org/en/latest/docs/reference.html
  3. Documentation spelling and formatting - https://github.com/bokeh/bokeh/issues/8448
  4. Bokeh Examples - https://github.com/bokeh/bokeh/tree/master/examples
  5. Include example code of PolyEditTool and PolyDrawTool Docs - https://github.com/bokeh/bokeh/issues/9962
  6. Add mention of Themes to "Styling Visual Attributes" docs page - https://github.com/bokeh/bokeh/issues/9007
  7. Reference Guide should link to Users Guide where appropriate. - https://github.com/bokeh/bokeh/issues/9363
  8. Sphinx - https://www.sphinx-doc.org/en/master/
  9. Vale - https://github.com/errata-ai/vale
  10. Vale Linter - https://github.com/marketplace/actions/vale-linter

When Google and Stack Overflow don't pick up

Larry David knows all about phone etiquette.

Image source: NBC News

Pick up the phone, baby

I recently worked on improving some phone number validation logic at Winnie. We validate a batch of phone numbers and send them off to a third-party service. Some of the numbers we were sending were deemed invalid by the service. This was preventing us from automating some data updates we wanted to run daily. How hard could validating digits be?

Some validation boxes we already checked off included:

  • regex pattern matching to only return digits (e.g. removing non digit characters from 281-330-8004)
  • checking if the value is equal to 10 characters (e.g. 2813308004 has no country code)
  • checking if the value is equal to 11 characters (e.g. 12813308004 has a country code)

An edge case we were not considering were 800 numbers! A code change went out to ignore these type of phone numbers. The next day we were able to send a new batch of phone numbers to the third-party with no issues. Problem solved? Not quite.

Man with a plan

We were still sending them invalid phone numbers. Perfectly-looking phone numbers were being deemed invalid by them. For example, 234-911-5678 is an invalid phone number. How? There are no non-digit characters and it looks like a valid phone number!

It turns out there is something called the North American Numbering Plan. Under the modern plan, a U.S. phone number must adhere to the NPA-NXX-xxxx format. Each component in this format must follow certain rules. The valid range for the first digit in the NPA component is 2-9. The valid range for each digit in the xxxx component is 0-9. 123-234-5678 is invalid because the first digit is a 1. In the example above, 234-911-5678 was invalid because it violated the following rule: the second and third digit in the NXX component can't both be 1.

I was determined to avoid translating these rules to brittle Python code. I knew there had to be a solution we could leverage instead of reinventing the wheel.

1-800-GOOGLE-IT

What does a software engineer do when they're stuck? Turn to Google. Here are some search queries I tried:

  • "npa nxx validator"
  • "npa nxx github"
  • "npa nxx python"
No luck. The Stack Overflow results I got were not what I was looking for. Where was the accepted answer I yearned for? Finally, I Googled "django phone look up". One of the first results was a GitHub link for django-phonenumber-field. I started searching for more of the same terms in this repository: "nxx", "valid", "is_valid". On a side note, the search experience on GitHub has improved tremendously.

I finally found a promising method in the source code:

def is_valid(self):
    """
    checks whether the number supplied is actually valid
    """
    return phonenumbers.is_valid_number(self)

I searched for is_valid_number to get the method definition but got nothing. I realized that phonenumbers was an external package that the project relied on. I immediately Googled the package, skimmed the README and tested it with our invalid phone numbers. It worked! I was confident that this package was enough for our needs and soon it found a home in our requirements.txt file.

I went back and looked at django-phonenumber-field README and saw the following:
The answer to my problem was right there! All I had to do was read the freaking docs.

Can you hear me now?

Would I have saved 5 minutes by skipping straight to the README instead of browsing the source code? Sure. But being able to read code, especially code that you didn't write, is a useful skill. Plus, GitHub has made it even easier to navigate code on their platform. Can you tell I'm a fan of GitHub?

Googling is a skill. Reading source code is a skill. Reading documentation is a skill. Combine these skills with communicating effectively and what do you get? Probably something better than a 10x engineer.