The Importance of Client JavaScript Recon
Introduction
This is a high level overview of performing reconnaissance against the JavaScript content in web applications. Subsequent articles will go into more details.
The History
Long ago, web applications, or web pages rather, were static. This was known as “The World WIde Web”. Each click required a request sent to the server which would compose a new page, and return the new page in the HTTP response. This was considered sub-optimal as it required a full page refresh and bandwidth was insufficient at the time. From a pentest perspective, we had to assume what the code was doing based on the responses to specific input. These “echos” gave us, at best, and idea of what the code was doing. And we were often wrong.
Then, in 1995, a coding engine was inserted into browsers. It was called JavaScript and had nothing to do with Java. Now web pages could provide a level of interactivity with the user. Independently, the XMLHttpRequest JavaScript API call appears. First in concept in MicroSoft Outlook in 2000 and then in Internet Explorer 5 (2001). It did not appear as XMLHttpRequest until 2006 when all browsers supported this construct. XMLHttpRequest was a watershed moment in web applications as it allows a browser JavaScript application to communicate with the server after page loading completes. When combined with a data schema mechanism (XML or JSON) and HTML/CSS we had what was called the AJAXIAN movement. As time went on JavaScript libraries such as Angular and React emerged to help developers build web applications to complete the ecosystem.
This is where we find ourselves today.
JavaScript Sensitive Information Exposure
Now, with access to the code we have the potential to find issues with a web application. In addition, development teams, either under time pressure, or lack of understanding, or other issues, may leave sensitive information within the JavaScript code sent to the browser.
This can include:
API Keys, credentials.
Backend infrastructure.
Old or unreleased APIs.
Sensitive links.
Interesting and entertaining comments.
Finding JavaScript
This may seem obvious but there are subtleties we can explore with extracting JavaScript content from web application. The trivial case is the linked JavaScript that is downloaded to your browser based on a link in a HTML file.
But how has that file evolved over the course of its development? What did it look like a month, a week, or a year ago? Would the difference between that file now, and its older version tell us anything?
What if we could go back in time and find its older versions and then compare and contrast? Well in fact we can using tools such as the WayBackMachine. Here, a simple diff
can help us discover interesting possibilities. In fact, the WayBackMachine, and others can help us discover JavaScript code that is no longer in production which may expose older APIs that are still operational and forgotten on the server, or perhaps still functional API keys.
The Unlinked
But both production, and older production provided by the WBM only show us JavaScript files that are linked or where at one point in time linked to the HTML of the web application. In other words these are known to the application. What if there were interesting files residing on the server file system that was addressable but not linked to the HTML? Such as backup files or files that deployment just renamed?
Lets call these files “The Unlinked”.
How would we find them?
Clearly we would need to “Fuzz” for them using wordlists that might help us find these files. We could base these worldlists on filenames we have actually found. We could also fuzz for common files that might be there as well. In addition, we could fuzz for directory names based on known directory paths of the web application.
Assessment
Once we have downloaded the JavaScript files we can employ a number of tools to investigate the file for interesting content. These are typically command line tools that either use Regex or Abstract Syntax Trees (AST) to find content.
Alternatively, for production sites with linked content, we can use an intercepting proxy such as Burp Suite or Zap to spider (crawl) the site for JavaScript code and then use plugins to assess the content of these files. Many of these plugins attach to the built-in scanner and “passively“ scan responses as they site is crawled. In these cases it is important to invoke as much functionality as possible in the site to ensure you find all the code that may be held back due to “lazy loading” or “code splitting”.
Conclusion
Finding, collecting, and curating Client side JavaScript is important in order to inspect for exposed secrets, APIs, indicators of infrastructure, and credentials. This Blog series will discuss some of the techniques development teams and penetration testers can use to determine the level of exposure.