Validation of a Lip Fullness Scale for Assessment of Lip Augmentation

Z. Paul Lorenc, MD
Article by
Manhattan Plastic Surgeon

Validation of a Lip Fullness Scale for
Assessment of Lip Augmentation

Background: Given the growing use of dermal fillers for cosmetic lip augmentation, a validated instrument with which to measure lip fullness is desirable in the clinic and as an efficacy endpoint in clinical studies. The authors developed and conducted a validation study of a Medicis-developed lip fullness scale. Methods: The Medicis Lip Fullness Scale consists of separate five-point scales for the upper and lower lips, with three photographs exemplifying each grade. Five board-certified dermatologists or plastic surgeons assessed 85 test photographs for each lip on two separate occasions for the first round of validation (photograph versus photograph). Three of the evaluators also graded lip fullness in 39 live subjects, followed 2 weeks later by scoring of the same subjects’ photographs for the second round of validation (live versus photographic).

Results: Within-observer agreement between the two sequential photographic evaluations was almost perfect (weighted   0.81). Between-observer weighted  values ranged from 0.60 to 0.83 for the upper lip and 0.61 to 0.82 for the lower lip. Exact agreement between the live and photographic assessments of the same subjects was 60 percent and 52 percent for upper and lower lips, respectively.

Conclusions: The Medicis Lip Fullness Scale showed high interrater and intrarater reliability in comparisons of test photographs and moderate to substantial reliability in live assessment of patients versus photographs. The Medicis Lip Fullness Scale is suitable for grading lip fullness in clinical trials. (Plast. Reconstr. Surg. 129: 822e, 2012.)

CLINICAL QUESTION/LEVEL OF EVIDENCE: Diagnostic, III. Since the introduction of modern dermal fillers, lip augmentation has become one of the most commonly requested aesthetic procedures.

Aesthetic standards vary across cultures and over
time, but at present, full, well-defined lips are the
ideal in Western cultures.

The “ideal lip” should
be full, with the correct balance between the upper and lower lips, and a well-defined vermilion

In addition to augmentation to achieve
this ideal, fillers are used to correct age-related
losses such as vertical rhytides, effacement of the
lip margins, and thinning of the lips.

Hyaluronic acid fillers are currently the mainstay of lip augmentation, although only one such
product, small-gel-particle hyaluronic acid (Restylane; Medicis Aesthetics, Inc., Scottsdale, Ariz.),
has been approved for this purpose.

acid fillers are approved for the correction of modFrom Lenox Hill Hospital, the Lorenc Aesthetic Plastic Surgery Center, Medicis Aesthetics, and private practice.
Received for publication September 15, 2011; accepted November 21, 2011.
Copyright ©2012 by the American Society of Plastic Surgeons

DOI: 10.1097/PRS.0b013e31824a2df0

Disclosure: Dr. Kane has served as an advisor or
consultant for Allergan, Inc., BioForm Medical,
Inc., Medicis Pharmaceutical Corporation, QMed,
and Stiefel Laboratories, Inc.; has served as a
speaker or a member of a speakers bureau for Allergan, Medicis, QMed, and Sanofi-Aventis; has received grants for clinical research from Coapt Systems, Medicis, and Revance Therapeutics; and
owns stock, stock options, or bonds from Allergan,
Medicis, and Revance. Dr. Lorenc has served as an
advisor or consultant for Johnson & Johnson Pharmaceutical Research & Development, LLC, and
Medicis Pharmaceutical Corporation. Ms. Lin is
an employee of Medicis. Dr. Smith has served as a
consultant for Fibrocell Science, Galderma, Medicis Pharmaceutical Corporation, Miramar Labs,
SkinMedica, and Suneva Medical; has received
grants for clinical research from Allergan, Fibrocell Science, Galderma, Medicis, Miramar Labs,
Revance, SkinMedica, and Suneva Medical; and
does not own stock, stock options, or other equity
in any of the companies.
822e PRSJournal.comerate to severe wrinkles and folds, such as nasolabial folds. Currently marketed fillers are made
from bacterial hyaluronic acid, which is identical
to the hyaluronic acid found in all vertebrate species and which is chemically modified by crosslinking to increase its persistence in the skin. Relative to other fillers, most hyaluronic acid– based
fillers are classified as having a medium duration
of effect (3 to 12 months). Historically, collagenbased fillers were used for lip augmentation, although the requirement for allergic skin testing
with bovine collagen and the short duration of
action (3 months) were disadvantages. Longlasting or permanent fillers are also used less frequently than hyaluronic acid fillers, in part because complications (e.g., nodule formation, filler
migration) and poor cosmetic results in this hyperdynamic area are not easily reversed.

A validated measurement scale would be useful for physicians evaluating the appearance of the
lip. A validated scale is one with a high degree of
interobserver and intraobserver agreement so that
similar results are obtained when the appearance
of the lip is rated by different individuals or by the
same individual at different times. Such an instrument could be used as a visual aid for physicians
and patients to discuss lip augmentation goals and
determine whether lip augmentation is desired.
Patients could use the scale to indicate their desired degree of lip augmentation, and physicians
could use it to communicate the likely outcome of
treatment. A validated lip fullness scale also could
be used to compare the results of lip augmentation in the clinic and to evaluate the efficacy of lip
augmentation in clinical studies.
With the help of recognized experts in the
fields of dermatology and plastic surgery, a grading system was developed by Medicis Pharmaceutical Corporation to evaluate lip fullness and the
effects of augmentation on soft-tissue volume. The
Medicis Lip Fullness Scale is a five-point photonumeric scale that uses a set of comparison photographs and terms to grade overall lip fullness.
Separate scales were developed for the upper and
lower lips. This report describes the development
of the Medicis Lip Fullness Scale and a validation
study of the instrument’s interobserver and intraobserver agreement and its reliability in comparing lip fullness in live subjects with that observed in their photographs.
The Medicis Lip Fullness Scale was developed
in conjunction with board-certified dermatologists and plastic surgeons and consists of separate
scores for upper and lower lip fullness, ranging
from 1 to 5 (Table 1). Some of these physicians
also participated in the validation study as examiners, but only for the comparison of live assessment versus photographic assessment (i.e., the
second round of assessment). However, development of the scale was completed at least 6 months
before the second validation study took place; furthermore, entirely different sets of subject photographs were used for development and validation.
People from the general population were recruited by advertising. Because subject involvement was limited to having photographs taken, no
institutional review board approval or waiver was
needed; however, written, informed consent was
obtained. Subjects posed for photographs in imaging centers (Canfield Scientific, Inc., Fairfield,
N.J.); special care was taken to ensure that all
photographs were taken with standardized photographic parameters and subject positioning.
Participants were not patients seeking lip augmentation but were representative of the local communities and included both sexes, a range of ages,
and different ethnicities to ensure that the full
range of lip fullness was covered. For the Medicis
Lip Fullness Scale photoguide, two representative
photographs were chosen for each grade of the
scale (Fig. 1). The photographs used in the guide
show a frontal view of the lower face extending
from the mid nose to the chin, with the lips slightly
parted to allow a clear view without distortion of
shape or size because of pressure. The lip of interest is clearly visible, and the other lip is digitally
The validation process was conducted in two
phases. The first phase was an assessment of photographs to evaluate whether the Medicis Lip Fullness Scale could be used consistently by different
evaluators or by the same evaluator at different
times. The second phase was a comparison of live
assessment with photographic assessment. This
phase was performed to evaluate whether the
Medicis Lip Fullness Scale could be used for live
assessment and for photographic assessment.
For the photographic validation study, 85 photographs each of the upper and lower lips were

Table 1. Definitions Used for Lip Fullness
Grade Description of Lips
1 Very thin
2 Thin
3 Medium
4 Full
5 Very full

Volume 129, Number 5 • Medicis Lip Fullness Scale

evaluated independently by five physicians. The
test photographs, selected to reflect the complete
range of lip fullness, matched the photographic
parameters and positioning of the images used in
the photoguide. Photographs of upper lips and
lower lips were chosen separately based on the
fullness of the lips; as a result, photographs of
some subjects used only the upper or lower lip.

The evaluators independently rated the fullness of each lip in the photographs using the Medicis Lip Fullness Scale photoguide. Assessments were at least 2 weeks apart so that the evaluator would not remember his or her previous
assessment. The photographic sequence was rerandomized with a computer program in a different
sequence for the second assessment to further enFig. 1. Photographs from the Medicis Lip Fullness Scale photoguide chosen as representative of lips that are very thin (above and
second row), thin (third and fourth rows), and medium (fifth row and below). Note that for clarity of illustration, the upper lip scale
photographs have the bottom lip pixilated and the lower lip scale photographs have the upper lip pixilated.

Plastic and Reconstructive Surgery • May 2012
824esure that the evaluator was not relying on memory
when he or she performed the second assessment.
For comparison of photographic grading with
live grading, assessments were made by each of
three evaluators on two occasions. The first occasion was live assessment of the 39 subjects, and the second occasion was photographic assessment of
the same 39 subjects, using photographs taken at
the end of the first assessment. Initially, the evaluators assessed 39 subjects for upper and lower lip fullness in person using the photoguide. These

subjects were chosen to represent the entire range
of lip fullness. Standardized photographs of these

39 subjects were taken at the time of the live evaluation, and these photographs were graded by the
evaluators at least 14 days later.

Statistical Analysis
Agreement between observers, within the same
observer on two different occasions for the photographic evaluations, and between live and photographic assessments, was analyzed by calculating the
overall proportion of observed agreement (the sum
of the ratings in complete agreement, divided by the
total number of observations). Agreement was also
measured using a  coefficient, which expresses
the degree of agreement between two different
sets of observations, compensating for the proportion of ratings that might coincide by chance.


To assess agreement among raters, pairwise weighted
 coefficients were calculated for each possible
pair of observers, resulting in 10  coefficients for
each of the upper and lower lip scales for the
photographic validation. In addition, an overall 
value based on responses from all five evaluators
was generated for each scale. The weighted  coefficients for intrarater and interrater reliability
were assessed in accordance with Landis and Koch
as presented in Table 2.


Calculation of sample size showed that, with

five evaluators each assessing 80 photographs (the
planned enrollment), the weighted  coefficient
could be calculated within 0.084 point assuming
60 percent agreement and a 95 percent confidence interval. The actual enrollment was 85 patients for evaluation of each lip in the photographic evaluations.

Photographic Evaluation
Demographic characteristics of the photographic subjects are summarized in Table 3.

Within-observer agreement between the two assessments in the photographic evaluation was almost
perfect, according to prespecified criteria, with a
weighted  value of 0.81 for both upper and lower
lips (Table 4).


The between-observer weighted 
values varied between 0.60 and 0.83 for the upper lip and between 0.61 and 0.82 for the lower
lip, indicating substantial to almost perfect
agreement among raters (Table 4). These results showed that the Medicis Lip

Fullness Scale
can be used consistently by different evaluators
and by the same evaluator at different times for
rating of photographs.
Live versus Photographic Evaluation
The overall exact agreement between the live
assessment and the photographic assessments was
59.8 percent for the upper lip and 52.1 percent for
the lower lip (Table 5).


The overall within-observer weighted  value stratified by rater was 0.65
for the upper lip and 0.64 for the lower lip, indicating moderate to substantial agreement within
raters. These results showed that the Medicis Lip
Fullness Scale could be used consistently for live
assessment and for photographic assessment.

The photographic portion of this validation
assessment showed very good overall reproducibility. The intraobserver agreement (ability of
each evaluator to assign the same scores for the
Table 2. Agreement of Weighted  Coefficients
Coefficient Level of Agreement
0–0.19 Poor
0.20–0.39 Fair
0.40–0.59 Moderate
0.60–0.79 Substantial
0.80–1.0 Almost perfect
Table 3. Demographic Characteristics of the
Photographic Subjects
Characteristic Upper Lip (%) Lower Lip (%)
No. of patients 85 85
Mean age  SD, yr 40.2  15.0 39.5  14.5
Age group
18–34 years 40 (47) 43 (51)
35–54 years 28 (33) 25 (29)
55 years 17 (20) 17 (20)
Male 32 (38) 29 (34)
Female 53 (62) 56 (66)
White 71 (84) 68 (80)
Hispanic 7 (8) 9 (11)
African American 4 (5) 4 (5)
Asian 3 (4) 4 (5)

Volume 129, Number 5 • Medicis Lip Fullness Scale
825esame subject at different times) was high for both
the upper lip and the lower lip. Interobserver
agreement (the ability of different evaluators to
rate lip fullness consistently for the same subject)
was moderate to substantial for each lip fullness
scale, upper and lower. The comparison of live to
photographic grading also showed similar results.
We found moderate to substantial intraobserver
agreement of Medicis Lip Fullness Scale scores
between live subjects and photographs of the same
subjects, rated on separate occasions. Lower intraobserver agreement when comparing ratings
between live and photographic examples is to be
expected because these are different modalities.
In live grading, the evaluator has a chance to examine the lips fully, can ask the subject to position
the lips appropriately for assessment, and can view
the subject from several angles. For a photographic assessment, the evaluator scores the subject based on the images taken at that instance. A
single such image shows only two dimensions at
one point in time, without movement, and thus
contains far less information with which an evaluator can assign a score. These differences could
explain clinical study results that showed a
greater number of significant effects with inperson evaluation than with photographic

Using live subjects in formulating
the Medicis Lip Fullness Scale ensured excellent
clinical correlation between this objective measure of fullness and patient satisfaction with the
outcome of lip augmentation.

It is important for clinicians to have validated
measurement tools to document and communicate aesthetic goals and outcomes. Without usable
measurement tools, judging and comparing the
efficacy of products for lip augmentation relies on
case reports, subjective comparisons to baseline,
or other unreliable methods that may lead to false
assumptions about clinical performance.
Valid measurement of the success of lip augmentation has been an elusive goal. Existing methods to measure size and volume of various anatomical areas of the face and other sites have not
been proven reliable or clinically meaningful for
the lips. Some clinical studies have used unvalidated, ad hoc scales

or generalized aesthetic
measures (e.g., Global Aesthetic Improvement
Scale, Cosmetic Improvement Scale, Look and
Feel of the Lips and Mouth, and Self-Perception of
Age) to assess lip augmentation outcomes.

scale created specifically to quantify lip aesthetics
has been reported in the literature

but has not
been used to assess outcomes in any subsequent
clinical studies published in the medical literature. A purely numeric assessment of lip volume
has also been described but was not validated.

more recently published lip fullness scale describes more careful development and validation
that nonetheless were not as robust as those performed for the Medicis Lip Fullness Scale.

The photonumeric five-grade Lip Fullness
Grading Scale

was developed based on photographs of the upper and lower lips. The distribution of fullness scores was different for the upper
and lower lips, which underscores the importance
of assessing these two areas separately. Unfortunately, validation of the Lip Fullness Grading Scale
was compromised by significant methodologic
shortcomings, including a compressed, 2-day process concurrent with development and validation
of several other scales.

Because the photographs were viewed in close proximity in time, it
is likely that evaluators’ scoring was highly biased
by recall of previous assessments. In contrast, recall bias during development and validation of the
Medicis Lip Fullness Scale was minimized by separating sessions in time and randomizing the order in which images were presented at each viewTable 4. Intrarater and Interrater Agreement among
Five Evaluators for Two Ratings of Test Photographs,
at Least 2 Weeks Apart
Statistic Upper Lip Lower Lip
Intrarater agreement
Exact agreement
Mean 69.9 70.6
Range 61.2–80.0 51.8–87.1
Weighted  (95% CI) 0.81 (0.78–0.84) 0.81 (0.78–0.84)
Range of individual
weighted  values 0.70–0.87 0.63–0.90
Interrater agreement*
Range of exact
agreement, % 45.9–73.5 45.3–75.3
Weighted  range 0.60–0.83 0.61–0.82
Round 1 0.47 0.43
Round 2 0.50 0.49
CI, confidence interval.
*All interrater statistics reflect 10 pairwise comparisons among the
five evaluators.
Table 5. Intrarater Agreement among Three
Evaluators for Live versus Photographic
Comparisons of the Medicis Lip Fullness Scale
Statistic Upper Lip Lower Lip
Exact agreement
Mean 59.8 52.1
Range 53.8–66.7 46.2–56.4
Weighted  (95% CI) 0.65 (0.56–0.74) 0.64 (0.56–0.72)
Range of individual
weighted  values 0.62–0.68 0.61–0.68
CI, confidence interval.
Plastic and Reconstructive Surgery • May 2012
826eing. Thirty-five photographs were used to assess
the validity of the Lip Fullness Grading Scale, far
fewer than the 80 considered adequate to power
validation of the Medicis Lip Fullness Scale. Images for the Lip Fullness Grading Scale have the
lips pressed together, which can distort their apparent size, and frontal and lateral views are shown
without guidance on how to rate the two distinct
views if they differ. Finally, the Lip Fullness Grading Scale used computer techniques to electronically modify (i.e., “morph”) the lip thickness of a
single subject to exemplify all five grades.

authors concluded that the Lip Fullness Grading
Scale should not be used for live grading because
the digital changes do not translate clinically to
changes seen in the aging face. This may be a serious
limitation to effective clinical research, because evaluation based solely on photographs, in the case of
nasolabial folds, has been shown to underestimate
effects observed during live evaluation.

In a recently completed pilot study of lip augmentation, treated patients typically achieved at
least a one-grade increase from baseline in Medicis Lip Fullness Scale scores. Ratings for the Medicis Lip Fullness Scale showed high agreement
among the patient, the treating physician, and a
blinded study evaluator. These ratings were also
consistent with results of the subjective Global Aesthetic Improvement Scale.

These observations
suggest that the Medicis Lip Fullness Scale measures changes in lip fullness that are clinically
meaningful and aesthetically visible. The results
could be appreciated by trained professionals and
by untrained lay people. The recent approval by
the U.S. Food and Drug Administration of smallgel-particle hyaluronic acid for lip augmentation

based on Medicis Lip Fullness Scale data demonstrates that the scale is accepted by regulatory authorities as a meaningful measure of efficacy.

The Medicis Lip Fullness Scale is a robust,
validated tool for the assessment of lip fullness and
lip augmentation procedures. It could be used
consistently by different evaluators (i.e., high interrater reliability) or by the same evaluator at
different times (i.e., high intrarater reliability) for
photographic evaluation. It also had moderate to
substantial reliability for live assessment of patients versus photographic assessment. For these
reasons, it can be used as a primary efficacy endpoint in evaluating the outcome of lip augmentation procedures in clinical trials. Use of the
Medicis Lip Fullness Scale allows compilation of
data from different evaluators and comparison of
findings at different time points. Such evaluations
can be performed by grading subjects live or by
grading of photographs at a later time. It may also
be used as a communication tool for physicians to
relate treatment goals to patients.

Michael A. C. Kane, M.D.
115 East 67th Street
New York, N.Y. 10065

This study was funded by Medicis Pharmaceutical
Corporation, Scottsdale, Arizona. The authors acknowledge the following physicians for their role in validating
the lip scale: Lisa Donofrio, M.D. (New Haven, Conn.);
Charles Finn, M.D. (Chapel Hill, N.C.); Richard
Glogau, M.D. (San Francisco, Calif.); Mark Jewell,
M.D. (Eugene, Ore.); Mark Rubin, M.D. (Beverly Hills,
Calif.); and Robert Weiss, M.D. (Baltimore, Md.). The
authors also acknowledge the editorial and writing assistance of Kate Casano, M.H.Sci., M.S.Hyg., a consultant and medical writer for Premier Healthcare Resource, in Morristown, N.J., and Robert A. Gatley, M.D.,
and Michael J. Theisen, Ph.D., of Complete Healthcare
Communications, Inc., Chadds Ford, Pa., whose work
was funded by Medicis Pharmaceuticals Corp.

1. Morris CL, Stinnett SS, Woodward JA. Patient-preferred sites
of restylane injection in periocular and facial soft-tissue augmentation. Ophthal Plast Reconstr Surg. 2008;24:117–121.
2. Niamtu J III. New lip and wrinkle fillers. Oral Maxillofac Surg
Clin North Am. 2005;17:17–28, v.
3. Bisson M, Grobbelaar A. The esthetic properties of lips: A
comparison of models and nonmodels. Angle Orthod. 2004;
4. Klein AW. In search of the perfect lip: 2005. Dermatol Surg.
5. Alam M, Gladstone H, Kramer EM, et al. ASDS guidelines of
care: Injectable fillers. Dermatol Surg. 2008;34(Suppl 1):S115–
6. Medicis Aesthetics, Inc. Restylane (hyaluronic acid): Instructions for use. Scottsdale, Ariz: Medicis Aesthetics, Inc.; 2011.
7. Ali MJ, Ende K, Maas CS. Perioral rejuvenation and lip augmentation. Facial Plast Surg Clin North Am. 2007;15:491–500, vii.
8. Segall L, Ellis DA. Therapeutic options for lip augmentation.
Facial Plast Surg Clin North Am. 2007;15:485–490, vii.
9. Smith KC. Reversible vs. nonreversible fillers in facial aesthetics:
Concerns and considerations. Dermatol Online J. 2008;14:3.
10. Sarnoff DS, Saini R, Gotkin RH. Comparison of filling agents
for lip augmentation. Aesthet Surg J. 2008;28:556–563.
11. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–174.
12. Cohen SR, Holmes RE. Artecoll: A long-lasting injectable
wrinkle filler material: Report of a controlled, randomized,
multicenter clinical trial of 251 subjects. Plast Reconstr Surg.
2004;114:964–976; discussion 977–979.
13. Solish N, Swift A. An open-label, pilot study to assess the
effectiveness and safety of hyaluronic acid gel in the restoVolume 129, Number 5 • Medicis Lip Fullness Scale
827eration of soft tissue fullness of the lips. J Drugs Dermatol.
14. Bosniak S, Cantisano-Zilkha M, Glavas IP. Nonanimal stabilized hyaluronic acid for lip augmentation and facial rhytid
ablation. Arch Facial Plast Surg. 2004;6:379–383.
15. Carruthers A, Carruthers J, Monheit GD, Davis PG, Tardie G.
Multicenter, randomized, parallel-group study of the safety
and effectiveness of onabotulinumtoxinA and hyaluronic
acid dermal fillers (24-mg/ml smooth, cohesive gel) alone
and in combination for lower facial rejuvenation. Dermatol
Surg. 2010;36:2121–2134.
16. Carruthers J, Carruthers A, Monheit GD, Davis PG. Multicenter, randomized, parallel-group study of onabotulinum toxin A and hyaluronic acid dermal fillers (24-mg/ml
smooth, cohesive gel) alone and in combination for lower
facial rejuvenation: Satisfaction and patient-reported outcomes. Dermatol Surg. 2010;36:2135–2145.
17. Downie J, Mao Z, Rachel Lo TW, et al. A double-blind,
clinical evaluation of facial augmentation treatments: A comparison of PRI 1, PRI 2, Zyplast and Perlane. J Plast Reconstr
Aesthet Surg. 2009;62:1636–1643.
18. Lemperle G, Anderson R, Knapp TR. An index for quantitative
assessment of lip augmentation. Aesthet Surg J. 2010;30:301–310.
19. Rossi AB, Nkengne A, Stamatas G, Bertin C. Development
and validation of a photonumeric grading scale for assessing
lip volume and thickness. J Eur Acad Dermatol Venereol. 2011;
20. Carruthers A, Carruthers J, Hardas B, et al. A validated lip
fullness grading scale. Dermatol Surg. 2008;34(Suppl 2):
21. Carruthers A, Carruthers J. “Scale Summit.” Dermatol Surg.
2008;34(Suppl 2):S149.
22. Carruthers A, Carruthers J, Hardas B, et al. A validated grading scale for crow’s feet. Dermatol Surg. 2008;34(Suppl 2):
23. Carruthers A, Carruthers J, Hardas B, et al. A validated grading scale for marionette lines. Dermatol Surg. 2008;34(Suppl 2):
24. Carruthers A, Carruthers J, Hardas B, et al. A validated grading scale for forehead lines. Dermatol Surg. 2008;34(Suppl 2):
25. Carruthers A, Carruthers J, Hardas B, et al. A validated brow
positioning grading scale. Dermatol Surg. 2008;34(Suppl 2):
26. Jones D. Are aesthetic grading scales only a research tool?
Dermatol Surg. 2010;36:1817–1818.