This article describes the control of output formatting with prompts when using ChatGPT's API for advanced natural language processing.In particular, it focuses on how the LLM of a model can be correctly interpreted using prompts, comparing natural language, TypeScript type expressions, and zod Schema expressions.
Nisi enim consequat varius cras aliquam dignissim nam nisi volutpat duis enim sed. Malesuada pulvinar velit vitae libero urna ultricies et dolor vitae varius magna lectus pretium risus eget fermentum eu volutpat varius felis at magna consequat a velit laoreet pharetra fermentum viverra cursus lobortis ac vitae dictumst aliquam eros pretium pharetra vel quam feugiat litum quis etiam sodales turpis.
Porta nibh aliquam amet enim ante bibendum ac praesent iaculis hendrerit nisl amet nisl mauris est placerat suscipit mattis ut et vitae convallis congue semper donec eleifend in tincidunt sed faucibus tempus lectus accumsan blandit duis erat arcu gravida ut id lectus egestas nisl orci id blandit ut etiam pharetra feugiat sit congue dolor nunc ultrices sed eu sed sit egestas a eget lectus potenti commodo quam et varius est eleifend nisl at id nulla sapien quam morbi orci tincidunt dolor.
At risus viverra adipiscing at in tellus integer feugiat nisl pretium fusce id velit ut tortor sagittis orci a scelerisque purus semper eget at lectus urna duis convallis. porta nibh venenatis cras sed felis eget neque laoreet suspendisse interdum.
“Vestibulum eget eleifend duis at auctor blandit potenti id vel morbi arcu faucibus porta aliquet dignissim odio sit amet auctor risus tortor praesent aliquam.”
Lorem cras malesuada aliquet egestas enim nulla ornare in a mauris id cras eget iaculis sollicitudin. Aliquet amet vitae in luctus porttitor eget. parturient porttitor nulla in quis elit commodo posuere nibh. Aliquam sit in ut elementum potenti eleifend augue faucibus donec eu donec neque natoque id integer cursus lectus non luctus non a purus tellus venenatis rutrum vitae cursus orci egestas orci nam a tellus mollis.
Eget lorem dolor sed viverra ipsum nunc aliquet bibendum felis donec et odio pellentesque diam volutpat commodo sed egestas aliquam sem fringilla ut morbi tincidunt augue interdum velit euismod eu tincidunt tortor aliquam nulla facilisi aenean sed adipiscing diam donec adipiscing ut lectus arcu potenti eleifend augue faucibus bibendum at varius vel pharetra nibh venenatis cras sed felis eget.
With the advent of ChatGPT's API, it has become easy to create applications using advanced NLP. However, when trying to create something that can actually be used at the product level, it is necessary to control the non-deterministic behavior of the LLM, which turns out to be quite difficult.
One of the challenges is to make the output of the LLM in a form that can be parsed by subsequent programs. To be more specific, since JSON is often the output format of choice, what is the best prompt if we want LLM to output JSON with a high probability of correct format & specified structure? This is the challenge.
The simple answer is to tell the LLM to output JSON with the correct format and structure with a high probability of correctness. This is sufficient as long as the structure of the data to be output is simple, but if the data becomes complex, specifying the structure in natural language tends to be lengthy or ambiguous. Therefore, we can think of a way to express it using a notation of data structure that is widely used, such as TypeScript types or zod schema definitions.
The issue of "choosing an appropriate notation for structural definitions for the purpose of conveying the structure of the data expected in the output with a prompt" is whether the LLM of the model to which the prompt is given will correctly interpret the notation of the definition, and whether the LLM of the model within the notation of the definition will correctly interpret it. Also, the issue is whether it is easy to control the LLM's attention within the notation of the definition.
Now, in our (SparkleAI) product, we often adopt "TypeScript type definition" as a method that engineers are accustomed to, without any particular verification, and this never causes any problems. However, there are expressions that are difficult to include in a type (e.g., restrictions on the number of characters), and an evaluation is needed as to which is more appropriate in this respect when compared to Schema definitions such as zod.
In this blog, I will investigate and summarize the rate at which ChatGPT succeeds in molding output when the output format is specified in natural language/TypeScript/zod.
Purpose
Communication
Issues of the method
There are ambiguities and it may not return with the specified type
Purpose
Type checking of program source
Issues of the method
Many programs are written in TS, so it is likely to be interpreted correctly.
Purpose
Schema verification at program runtime
Issues of the method
More specific than type expressions, but does it affect the result?
When the number of characters to be output or the number of items to be output is specified, ChatGPT often does not follow the constraint. For this reason, prompt engineering is used to generate the prompt once and then fix the number of characters, etc. later. However, in this case, we also want to check the effect of constraints on the number of characters and pieces, such as ZOD, on the results, so we will use the "Generate Problem" prompt with constraints on the number of characters as the subject.
We will use the following prompts with the output format definitions swapped for natural language/TypeScript type expression/Schema expression of zod, respectively.
The following information will be tabulated and evaluated on the results of 30 iterations of 10 different documents (300 iterations in total), given the prompts to generate the problem with the output format specified in NL, TS and ZD, respectively.
Assuming that the prompt differences would be greater in settings with a higher diversity of generation, we fixed the setting to a realistic upper limit: Temperature=1.2, which is used in the task of generating diverse results.
parse
0.84 (251/300)
scheme
0.57 (143/251)
count
0.8 (114/143)
length
0.8 (1556/1950)
parse
0.95 (284/300)
scheme
1.0 (283/284)
count
0.73 (206/283)
length
0.76 (2861/3788)
parse
0.94 (278/296)
scheme
1.0 (278/278)
count
0.82 (228/278)
length
0.79 (2935/3736)
The large difference between "NL" and "TS and ZOD" in the prase and schema tests indicates that using an artificial language to specify the structure of the output is still more controllable than using natural language. This shows that it is still easier to control the structure of the output using an artificial language than using a natural language.
On top of that, there is no significant difference between TypeScript and Zod in the parse and schema tests. On the other hand, TypeScript's count and length scores are worse than those of Zod and natural language as well, suggesting that the way TypeScript puts constraints on the number of pieces or characters in a type expression may be less likely to attract attention than Schema's specification.
From the above, in terms of the easiest way to produce the most targeted structure, using Schema expressions has a better numerical performance. However, in addition to Zod, there are other Schema expression libraries such as Yup / io-ts / joi, which have similar description methods but different details, and we are not sure if we can control them as we intend to specify. Also, since Schema is less universal than TypeScript, it is necessary to be aware of the possibility that there will be changes in the description in the future.
This article compares and contrasts the use of natural language, TypeScript, and Zod to control the output format of ChatGPT. Its main focus is on how well these methods allow ChatGPT to generate accurate data structures and how well they control ChatGPT's attention.
As a result, we found that using a programming language to specify the structure of the output is more accurate than using natural language. Specifically, the rate at which the output is successfully parsed and the rate at which the schema matches the specified one is higher for both TypeScript and Zod than for natural language, and there is no significant difference between the two.
However, we found that TypeScript was inferior to Zod and natural language when it came to constraining the number of items and the length of strings. This may be due to the fact that TypeScript's type expressions are difficult to include constraints on the number of items and the length of strings, and therefore are less attentive than Schema's specifications.
From this point of view, Schema expressions are appropriate in that they are the easiest to produce the targeted structure. However, Schema expressions have less universality than TypeScript, and the descriptions are subject to change, so care must be taken in this respect.
Based on the above, we believe that TypeScript type expressions are the most useful for specifying JSON output and controlling its structure. However, since TypeScript type expressions have slightly inferior restrictions on the number and length of pieces compared to Schema, it is recommended that these elements be further controlled by plain-text instructions in the prompts.