Most AR programs use some amount of text to convey information to a user. This is problematic from a development perspective, as users may engage with the program in many different uncontrolled environments, resulting in backgrounds that can render text unreadable. As such, how to display text in a reasonable manner may differ from VR significantly, and may be dependent on the expected use cases of the program.
It's difficult to have issues with readable text if there isn't text to read. AR benefits greatly from being 3D and being able to show information as it might exist in the real-world, minimizing the amount of explanation that might be needed. AR programs should aim to display items both symbolically and literally whenever possible. For example, when displaying instructions, it's often more effective to provide arrows, icons, and to overlay the expected movement to convey what should be done.
Unavoidable explanatory text should try to reduce screen real-estate and remain anchored to the screen or the environment, depending on what is more contextually relevant. This is mirrored in industry guidance. UI explanations may benefit more from being fixed to space on the screen, while text associated with objects should be directly connected to the object. Text should not obscure or get "in the way of" interactive content. These issues come up less in HMDs, since ideas like "screen space" are often more abstract and disassociated from user experience, but the overall guidance remains the same.
Critical text instructions may also benefit from a voice-over reading them out, in case the text is unreadable to the user. Longer explanations may also benefit from this. Zooming and scrolling is often more difficult and unintuitive in AR environments, including in HMDs. Large paragraphs or chunks of text should be avoided, especially if confined to a fixed space. In essence, the amount of potential mental strain that text can produce, either by quantity or obstruction, should be minimized in virtually all cases.
Text should be placed where a user will naturally see it. Content may be associated with the user's head (or screen), their body, or the world. Head-associated content moves the text as the user's head moves, either immediately or adaptively changing where the text is placed in the world while keeping it fixed on the user's screen. Head-locked content is associated with better performance, but has a weaker user preference. Head-locked content above eye-level but within field-of-view is preferred for static tasks, but body-locked content outside of the standard field-of-view--where text moves with the rotation and motion of the body--is preferred for tasks with significant movement, especially when the wider environment must be regularly viewed. World-locked (or world-fixed) presentations are effective when text may need to be viewed seamlessly with the wider environment, such as in proximity to a given object. In contexts where full attention needs to be paid to the wider environment (i.e. driving) then this modality is preferred, as it allows for the most attention to be paid to non-AR content.
Text placement may be adaptively altered to select for the best location at any time, given known information about the color and surrounding environment. This may also be used to reduce the amount of information load on the user at any given time, decluttering the screen to present only the most relevant text. In certain contexts, text may also be placed on top of objects to better label relevant parts and features, which may change orientation or style as the object itself changes. This technique may be more effective for model visualization and teaching tools, where understanding the exact parts of the model may be more relevant than delivering instructions to the user.
The background behind text directly impacts the legibility of the text. This is also true in 2D environments. A demonstration is in the below text.
This text is easy to read on this background.
This text is challenging or impossible to read on this background.
Given that backgrounds can change quickly in AR environments, text can quickly become difficult to read. Putting a background behind the text, also known as billboarding, allows for some control over how the text is viewed. However, these backgrounds often come at the cost of immersion and visibility in the environment, as backgrounds can often block visibility behind the text while standing out from the rest of the environment. Backgrounds that present high contrast with the text are preferred to maximize readability, often with a background opacity of 30-50%. Polarity between the text and background may be highly positive or negative, depending on the surrounding environment. Positive polarity (white text on black) is often better for high brightness environments, whereas negative polarity (black text on white) is better for low brightness environments. In any case, adding some background to the text is generally recommended to improve readability. Users tend to prefer white text on dark backgrounds. If a background cannot be used, dark text is often preferred as most backgrounds in practice tend to be well-lit and have whiter backgrounds.
There is no singular best font, especially given that variations in user needs and preferences that may demand different fonts in various use cases. Google provides an overview of best industry practices for AR fonts; medium weights with wide spacing and high contrast between letters provides for the best results. Focus may also impact text readability, particularly in headsets that do not use cameras to display content to a user. Certain specialized high visibility fonts have been created to maintain legibility even at low visual focus. In general, fonts that are easily readable at many distances in the real world (often sans-serif, high contrast, large width) translate well into AR. Given the lower pixel density of some displays, accents and nuances of certain fonts should be taken into account, as they may result in indistinguishable characters.
An outline can be added to some text to provide higher contrast in across environments without adding a background or billboard. In general, a background is preferred. If an outline is used with no background, then black text with a slight (1px) white outline is recommended. White text with a slight black outline may also be acceptable. In both cases, the outline should be minimal to prevent any "washing-out" effects from an outline that may cover most of the character.
In most cases, white text on a black background is preferred and most effective for users among the options explored multiple times throughout the literature. A background only requires a 50% opacity to be functional for most users. Fonts should be chosen to maximize distinguishing features between characters, with font weights being set to medium values. If text must not have a background, black text with a white outline is best. In low-lighting environments, contrast polarity should be flipped, prioritizing black text on white backgrounds and dark outlines with white text.
A literature review providing flowcharts of currently understood best practices dependent on context can be found here.