Sample characteristics
A total of 330 and 257 participants have completed the initial and reinvestment survey, respectively. Each sample (GP, DM and RA) had 110 participants (Table 1). After excluding participants, as an illogical response (NSince it cannot match the test response, respectively = 21 and 18) or unused response () () ()N= 19), the final analysis sample was 220 people (66.67%). Table 1 describes the demographic statistics and test characteristics of all participants included in the analysis. The majority of participants in the GP group were 50 years old (52.1%), and the biggest age category was 50-64 years (30.1%). In contrast, most participants (83.0%) of patient samples were over 50 years old, but the largest age category was 65 years of age or older (65.3%), suggesting that the patient group was overall. Most participants in both GPs (45.2%) and patients (59.2%) in relation to education have reported intermediate education. In particular, the proportion of participants with low education levels was slightly higher in GP samples (15.1%) than patient samples (12.9%), and more than patient samples (26.5%) were expressed in GP (37.0%). Overall, the level of education has spread more in the GP sample. The gender was quite equally divided into both samples.
As expected, the average score for the health reported health was low in the patient sample 69 compared to the GP sample 80. The average survey time was also diverse between the group, and the GP sample completed the survey more faster than the patient sample (13.5 minutes). In the total sample, four participants reported other age categories and 24 people reported other education levels. When using the visual analog scale (VAS), the completion rate and self -report health remained similar in all groups.
Dimension ranking and swing weight
At the aggregate level, all samples were less consistent between the test and the re -test for the top level. In the total sample, 220 (42.27%) of the participants chose the same top ranking. Participants 36/73 and 57/147 in the GP and patient samples continued to choose the same top ranking. This resulted in 49.32%and 38.78%consistent, which was considered low. Similarly, the consistency of the individual level was low, and there is a significant correlation between 36.46%of the total sample participants, 41.10%of GP sample participants, and 34.01%of patient sample participants.
In the case of swing weights, individual level ICCs for five -dimensional (MO, EX, LO, CO and PA) are classified as poor agreements. Four were considered moderate (DA, CG, AX and SD) in total samples (Table 2). In the GP sample, the ICC contract strength was classified as a poor three -dimensional (MO, AX and CO); Normal (DA, EX, LO, SD and PA); Good for 1D (CG). In patient samples, all of them were not good, except for one dimension, which received a medium agreement. When choosing the same top ranking, there was a similar level between GP and patient samples, but the level of value was very different. The patient samples were not good at 8 out of nine -dimensional (89%), while the GP sample did not agree only in three out of three (33%). Standard error (SES) is displayed in online resource 1.
Leveling
Participants of 45 (20.45%) and 46 (20.90%) of participants in the entire population samples, respectively, caused non -logical reactions and were excluded from this part of the analysis. A total of 152 participants were included in the analysis, and 23 participants were consistent in the two tests. The population statistics of these participants are shown in Table 3. This indicates that non -logical reactions are consistent by those who are old and medium -level. The ICC value that analyzes the level weight was statistically significant (Table 4). This value was not good at 70.37%, and it showed medium agreement at 29.63%of the level grade.
In the GP sample, 12 people (16.44%) and 13 (17.81%) were excluded from the test due to non -logical responses in the GP sample, and they were excluded from the reinvestment due to non -logic response. A total of 54 participants were included in the final analysis, and six participants consistently created an illogical response. One (UA Level 4) Except for the ICC value, everything was statistically significant. More than half (54%) of a lot of people showed a medium agreement, and the remaining (46%) had a poor agreement.
In patient samples, 33 (22.45%) participants were eliminated by the ratio of participants in both tests and re -examination due to non -logical reactions. A total of 114 participants were included in the final analysis, and 17 participants consistently created a non -logical response. The two ICC values were not statistically significant and did not match the test and the re -test value. Among the significant ones, most (77%) have shown that consent is poor, and the remaining (23%) shows that the reliability of the patient sample is not good compared to the GP sample. SES is found in online resource 2.
fix
In total samples, the percentage contract of pair comparison is 82.73%, which is a high level of consistency. Kappa without weight also showed a good match with a value of 0.64 (95% CI: 0.54–0.75). Overall, 117 participants preferred dead, and 69 participants continued to prefer the worst health. Only 34 participants have changed the response between tests and remains. In the total sample, the average fixed factor of the test and re -test was -0.09 and -0.14, respectively. When comparing the fixed factor, the entire ICC was 0.12 (trust section: -0.015-0.25), indicating that the consent was not good. Considering only those who continued to choose the dead or worst health, the ICC was 0.12 (trust section: -0.057-0.30) and 0.12 (trust section: -0.12-0.34) and indicated that consent was not good.
The percentage of GP and patient samples for pair comparison work was 83.56%and 82.31%, respectively, which was considered good. Similarly, the rain price of the rain was 0.65 (95% CI: 0.48–0.83) and 0.64 (95% CI: 0.52–0.76). In the GP sample, 39 participants consistently preferred the dead state, 21 participants continued to prefer the worst health, and 13 participants changed the response between the test and the test. In the patient’s sample, 73 participants were consistently dead, 48 participants continued to prefer the worst health, and 26 participants changed their response.
The average fixed factor was -0.13 and -0.08 in the test for GP and patient samples, respectively. The average reset fixed factor was -0.14 and -0.14 in GP and patient samples, respectively. The ICC created when comparing the fixed factor in the entire group was -0.00066.blood> 0.05) and 0.16 (blood<0.05), GP and patient samples indicate that there is no consensus and the agreement is not good. Among the GP samples that continuously selected Dead State, the ICC is -0.017.blood> 0.05) and the ICC is 0.57 among those who have continued to choose the worst health.blood<0.01). This indicates that people who choose the worst produce more consistent fixed values than those who prefer dead. Among the patient samples that have been consistently dead, the ICC is 0.19.blood> 0.05) and the ICC was -0.040 among those who continued to choose the worst health status (blood> 0.05). This indicates that no group agrees.
Utility Reduction and Value Set
Table 5 shows the generated ICC values when comparing 36 individual level utility reductions. 35 of the 36 ICC values were significant in the entire sample. Among them, the ICC had a bad agreement in the 23rd decrease and a medium agreement in the 12th decrease.
In the GP sample, the ICC value was significant in 33 reductions. Among the important people, the 1 ICC value is a good agreement (CG2), and 21 shows an intermediate contract, and 11 people have poor agreements. In the patient sample, the ICC value was significant in 32 reductions. Eight of the significant ones showed medium agreements, 24 people showed poor consent, which indicates that the reliability of the patient sample is low compared to the GP sample. SES is listed in online resource 3.
The decrease in usefulness at the aggregate level was also compared between the test and the reinvestment. The average overall utility decrease was similar in the test and reinvestment (0.08). Figure 2 shows a small difference in aggregate levels between test and test at each level of total samples. The average difference was 0.004. Figure 3 shows the distribution of utility reduction in graphics. The QQ plot (Figure 3) shows that the distribution of aggregate utility reduction is similar between test and reinvestment, and many charts are intercepted to 0 and 45 degrees.

Reduced levels of level 3 and 5 in tests and re -testing (total sample)

A plot (total sample) based on the empirical distribution of aggregate -level utility of QQ plot testing and reinvestment. Dimensions: 1, mobility; 2, everyday activities; 3, fatigue; 4, loneliness; 5, cognition; 6, anxiety; 7, sadness/depression; 8, control; 9, pain
Table 6 shows the T-test results of pairs due to the comparison of the bone mogorov-smishnov test and the reduction of aggregate-level usefulness. In total, GP and patient samples, D and T statistics are not statistically significant except for D-statistics for EX5 in the total sample in the total sample and the D-Statistic for PA3, indicating that the test and re-test distribution of such a reduction is not significantly different. The T-Statistic did not statistically significantly significantly significantly in the GP sample and indicates the difference in average. This test is partially led by the distribution, so it may not be clear in the KS test, given that it has a less average effect on the overall importance of the test.
The final health status ranking was compared between tests and reinvestigations using Spearman’s ranking correlation test. Rho was 0.26.blood<0.05), 0.26 (blood<0.05), 0.26 (blood<0.05) indicates a positive forged relationship between the test and the reinvestment health status in the total, GP and patient samples, respectively.
Regression
The cumulative difference between the regression and the reduction of utility shows the relationship between age, gender, and patient samples appear in online resources 4. The age coefficient is positive, indicating that older people have a greater cumulative difference. However, statistically significant in Model 1 for only 50-64 ages, which seems to be related to GP or patient samples (Model 2) because it was no longer statistically significant when the interaction was introduced into the latter regression. The patient coefficient was negative in both returns. Although statistically significant, the coefficient of Model 2 increased when the interaction term was introduced. Men were related to lower differences, which were statistically significant in Model 1 and statistically significant boundaries in Model 2. Overall, these results suggest that the effect of the age has a slight impact on the cumulative difference in the utility reduction value, but the effect on age was not uniform in other samples.