Reading and understanding code

Do humans read and understand code in the same way that they read and understand natural language?

A fascinating question in language research is how humans read and understand programming languages, or code. Traditionally, coding has been seen as a problem-solving skill, with little attention given to its similarities to natural language. However, just like spoken and written languages, code has structure, patterns, and conventions that shape how people read and write it. As programming becomes more common in everyday life, it’s important to understand how our brains process it.

Our research explores whether people use the same mental strategies to read and understand programming languages as they do with natural languages. In particular, we are interested in whether reading unpredictable or “unnatural” code slows comprehension, much like confusing sentence structures do in natural language. As a first step, we examined eye-tracking data from an existing dataset, where programmers read different functions and answered a question about the function’s output.

Our analysis demonstrated that:

  • Unlike natural language, predictability did not significantly impact reading times while reading code.
  • Contrary to expectations, higher predictability led to fewer regressions (i.e., moving backward in the text), suggesting different processing strategies in coding versus natural text.
  • Programmers allocate more attention to content tokens (e.g., variable names) than function words, influencing reading behaviors.

These insights highlight fundamental differences between natural and programming language processing, opening avenues for further research into the cognitive demands of code comprehension.

Read the full paper here.