cjfcsjt commited on
Commit
0a71441
·
verified ·
1 Parent(s): bf830a4

Upload seeclick_task_prompts.py with huggingface_hub

Browse files
Files changed (1) hide show
  1. seeclick_task_prompts.py +267 -0
seeclick_task_prompts.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # locate all elements in a webpage (bbox)
3
+ web_loca_all_bbox_prompt = [
4
+ "In the screenshot of this web page, please give me the coordinates of the element I want to click on according to my instructions (with bbox).",
5
+ "Based on the screenshot of the page, I give a text description and you give its corresponding location (with bbox).",
6
+ "In the image above, I will give a series of descriptions of the elements to be clicked. Please predict where you want to click (with bbox).",
7
+ "I will give textual descriptions of certain elements in the screenshot. Please predict the location of the corresponding element (with bbox).",
8
+ "Please identify the coordinates of the webpage elements I describe based on the provided screenshot (with bbox).",
9
+ "Given a screenshot, I will describe specific elements; your task is to predict their locations (with bbox).",
10
+ "Using the image of this webpage, can you determine the coordinates of the elements I describe (with bbox)?",
11
+ "In this webpage capture, I will describe certain elements. Please locate them for me (with bbox).",
12
+ "I'll provide textual descriptions of elements in this webpage screenshot. Can you find their coordinates (with bbox)?",
13
+ "From the given webpage screenshot, I need you to identify the locations of described elements (with bbox).",
14
+ "Based on this screenshot, I'll describe some elements. Please pinpoint their exact locations (with bbox).",
15
+ "For the elements I describe in this page capture, can you predict their positions (with bbox)?",
16
+ "I will describe elements from a webpage screenshot; your role is to locate them (with bbox).",
17
+ "Using the attached screenshot of a webpage, please find the coordinates of described elements (with bbox).",
18
+ "From the image of this webpage, I will describe elements for you to locate (with bbox).",
19
+ "I'll give descriptions of certain webpage elements; please identify where they are in this screenshot (with bbox).",
20
+ "On this webpage screenshot, I will point out elements; please predict their exact coordinates (with bbox).",
21
+ "In this web page image, please locate the elements as I describe them (with bbox).",
22
+ "Given this screenshot of a webpage, I'll describe some elements; locate them for me (with bbox).",
23
+ "Please use the provided webpage screenshot to locate the elements I describe (with bbox).",
24
+ "In the provided web page image, I'll describe specific elements. Identify their locations, please (with bbox).",
25
+ "With this screenshot of a webpage, can you locate the elements I describe (with bbox)?",
26
+ "I will describe features on this webpage screenshot; please predict their positions (with bbox).",
27
+ "Using the screenshot of this webpage, identify the coordinates of elements I describe (with bbox).",
28
+ "On this webpage capture, I'll point out specific elements for you to locate (with bbox).",
29
+ "Please determine the location of elements I describe in this webpage screenshot (with bbox).",
30
+ "I'll describe certain elements on this webpage image; your task is to find their locations (with bbox).",
31
+ "Using this webpage screenshot, I'll describe some elements. Please locate them (with bbox).",
32
+ "Based on my descriptions, find the locations of elements in this webpage screenshot (with bbox).",
33
+ "In this web page capture, please predict the positions of elements I describe (with bbox).",
34
+ "I'll give textual clues about elements in this webpage screenshot; identify their coordinates (with bbox).",
35
+ "Using the provided screenshot, I'll describe webpage elements for you to locate (with bbox).",
36
+ "From this webpage image, I will describe specific elements. Please predict their exact locations (with bbox)."
37
+ ]
38
+
39
+ # locate all elements in a webpage (point)
40
+ web_loca_all_point_prompt = [
41
+ "In the screenshot of this web page, please give me the coordinates of the element I want to click on according to my instructions (with point).",
42
+ "Based on the screenshot of the page, I give a text description and you give its corresponding location (with point).",
43
+ "In the image above, I will give a series of descriptions of the elements to be clicked. Please predict where you want to click (with point).",
44
+ "I will give textual descriptions of certain elements in the screenshot. Please predict the location of the corresponding element (with point).",
45
+ "Please identify the coordinates of the webpage elements I describe based on the provided screenshot (with point).",
46
+ "Given a screenshot, I will describe specific elements; your task is to predict their locations (with point).",
47
+ "Using the image of this webpage, can you determine the coordinates of the elements I describe (with point)?",
48
+ "In this webpage capture, I will describe certain elements. Please locate them for me (with point).",
49
+ "I'll provide textual descriptions of elements in this webpage screenshot. Can you find their coordinates (with point)?",
50
+ "From the given webpage screenshot, I need you to identify the locations of described elements (with point).",
51
+ "Based on this screenshot, I'll describe some elements. Please pinpoint their exact locations (with point).",
52
+ "For the elements I describe in this page capture, can you predict their positions (with point)?",
53
+ "I will describe elements from a webpage screenshot; your role is to locate them (with point).",
54
+ "Using the attached screenshot of a webpage, please find the coordinates of described elements (with point).",
55
+ "From the image of this webpage, I will describe elements for you to locate (with point).",
56
+ "I'll give descriptions of certain webpage elements; please identify where they are in this screenshot (with point).",
57
+ "On this webpage screenshot, I will point out elements; please predict their exact coordinates (with point).",
58
+ "In this web page image, please locate the elements as I describe them (with point).",
59
+ "Given this screenshot of a webpage, I'll describe some elements; locate them for me (with point).",
60
+ "Please use the provided webpage screenshot to locate the elements I describe (with point).",
61
+ "In the provided web page image, I'll describe specific elements. Identify their locations, please (with point).",
62
+ "With this screenshot of a webpage, can you locate the elements I describe (with point)?",
63
+ "I will describe features on this webpage screenshot; please predict their positions (with point).",
64
+ "Using the screenshot of this webpage, identify the coordinates of elements I describe (with point).",
65
+ "On this webpage capture, I'll point out specific elements for you to locate (with point).",
66
+ "Please determine the location of elements I describe in this webpage screenshot (with point).",
67
+ "I'll describe certain elements on this webpage image; your task is to find their locations (with point).",
68
+ "Using this webpage screenshot, I'll describe some elements. Please locate them (with point).",
69
+ "Based on my descriptions, find the locations of elements in this webpage screenshot (with point).",
70
+ "In this web page capture, please predict the positions of elements I describe (with point).",
71
+ "I'll give textual clues about elements in this webpage screenshot; identify their coordinates (with point).",
72
+ "Using the provided screenshot, I'll describe webpage elements for you to locate (with point).",
73
+ "From this webpage image, I will describe specific elements. Please predict their exact locations (with point)."
74
+ ]
75
+
76
+ # ocr all elements in a webpage (bbox)
77
+ web_ocr_all_bbox_prompt = [
78
+ "Based on the screenshot of the web page, I give you the location to click on and you predict the text content of the corresponding element (with bbox).",
79
+ "In the image above, I give a series of coordinates and ask you to describe the corresponding elements (with bbox).",
80
+ "On this page, I will give you a series of coordinates and ask you to predict the text of the clickable element that corresponds to these coordinates (with bbox).",
81
+ "Given a webpage screenshot, I provide coordinates; predict the text content of the elements at these locations (with bbox).",
82
+ "In this screenshot, I'll give coordinates and ask you to describe the text of the elements there (with bbox).",
83
+ "Using the provided image of the webpage, I'll specify locations; you predict the text content of those elements (with bbox).",
84
+ "With this webpage capture, I provide a series of coordinates; please identify the text content of each element (with bbox).",
85
+ "In this page image, I'll point to specific locations; you need to predict the text of the corresponding elements (with bbox).",
86
+ "From this screenshot, I'll give coordinates; can you describe the text of the elements at these points (with bbox)?",
87
+ "Based on this web page screenshot, I provide coordinates; please predict the textual content at these spots (with bbox).",
88
+ "Using the given image of the webpage, I'll specify certain coordinates; describe the text of the elements there (with bbox).",
89
+ "On this captured webpage, I will give a series of coordinates; your task is to predict the text at these locations (with bbox).",
90
+ "With this webpage image, I provide coordinates; can you tell me the text of the elements at these points (with bbox)?",
91
+ "In the provided webpage screenshot, I'll point out locations; please describe the text of the elements there (with bbox).",
92
+ "From this web page capture, I give specific coordinates; predict the text content of the elements at these locations (with bbox).",
93
+ "Using this screenshot of a webpage, I'll indicate coordinates; can you predict the text of the elements (with bbox)?",
94
+ "On this image of a web page, I provide coordinates; you need to describe the text of the corresponding elements (with bbox).",
95
+ "Given this webpage capture, I'll specify locations; please predict the text content of the elements there (with bbox).",
96
+ "In this screenshot, I give a series of coordinates; your task is to predict the text content of the elements (with bbox).",
97
+ "From the given webpage image, I'll provide coordinates; can you describe the text of the elements at these points (with bbox)?",
98
+ "On this captured webpage, I provide specific coordinates; you need to predict the text of the elements there (with bbox).",
99
+ "Using this web page screenshot, I'll indicate locations; please describe the text content of the elements (with bbox).",
100
+ "With this image of a webpage, I specify coordinates; your task is to predict the text of the corresponding elements (with bbox).",
101
+ "In this webpage capture, I'll give coordinates; can you predict the text content of the elements at these locations (with bbox)?",
102
+ "Based on this screenshot, I provide a series of coordinates; describe the text of the elements there (with bbox).",
103
+ "Using the image of this webpage, I'll specify locations; you need to predict the text of the elements (with bbox).",
104
+ "On this page screenshot, I give coordinates; please predict the text content of the corresponding elements (with bbox).",
105
+ "From this webpage image, I'll indicate specific coordinates; can you describe the text of the elements (with bbox)?",
106
+ "In this web page image, I provide coordinates; your task is to predict the text of the elements at these locations (with bbox).",
107
+ "Given this screenshot of a webpage, I specify locations; please describe the text of the elements there (with bbox).",
108
+ "Using the provided page image, I'll point to locations; you predict the text content of the elements (with bbox).",
109
+ "On this webpage capture, I provide a series of coordinates; can you predict the text of the elements (with bbox)?",
110
+ "With this image of the web page, I give specific coordinates; your task is to describe the text of the elements at these points (with bbox)."
111
+ ]
112
+
113
+ # ocr all elements in a webpage (point)
114
+ web_ocr_all_point_prompt = [
115
+ "Based on the screenshot of the web page, I give you the location to click on and you predict the text content of the corresponding element (with point).",
116
+ "In the image above, I give a series of coordinates and ask you to describe the corresponding elements (with point).",
117
+ "On this page, I will give you a series of coordinates and ask you to predict the text of the clickable element that corresponds to these coordinates (with point).",
118
+ "Given a webpage screenshot, I provide coordinates; predict the text content of the elements at these locations (with point).",
119
+ "In this screenshot, I'll give coordinates and ask you to describe the text of the elements there (with point).",
120
+ "Using the provided image of the webpage, I'll specify locations; you predict the text content of those elements (with point).",
121
+ "With this webpage capture, I provide a series of coordinates; please identify the text content of each element (with point).",
122
+ "In this page image, I'll point to specific locations; you need to predict the text of the corresponding elements (with point).",
123
+ "From this screenshot, I'll give coordinates; can you describe the text of the elements at these points (with point)?",
124
+ "Based on this web page screenshot, I provide coordinates; please predict the textual content at these spots (with point).",
125
+ "Using the given image of the webpage, I'll specify certain coordinates; describe the text of the elements there (with point).",
126
+ "On this captured webpage, I will give a series of coordinates; your task is to predict the text at these locations (with point).",
127
+ "With this webpage image, I provide coordinates; can you tell me the text of the elements at these points (with point)?",
128
+ "In the provided webpage screenshot, I'll point out locations; please describe the text of the elements there (with point).",
129
+ "From this web page capture, I give specific coordinates; predict the text content of the elements at these locations (with point).",
130
+ "Using this screenshot of a webpage, I'll indicate coordinates; can you predict the text of the elements (with point)?",
131
+ "On this image of a web page, I provide coordinates; you need to describe the text of the corresponding elements (with point).",
132
+ "Given this webpage capture, I'll specify locations; please predict the text content of the elements there (with point).",
133
+ "In this screenshot, I give a series of coordinates; your task is to predict the text content of the elements (with point).",
134
+ "From the given webpage image, I'll provide coordinates; can you describe the text of the elements at these points (with point)?",
135
+ "On this captured webpage, I provide specific coordinates; you need to predict the text of the elements there (with point).",
136
+ "Using this web page screenshot, I'll indicate locations; please describe the text content of the elements (with point).",
137
+ "With this image of a webpage, I specify coordinates; your task is to predict the text of the corresponding elements (with point).",
138
+ "In this webpage capture, I'll give coordinates; can you predict the text content of the elements at these locations (with point)?",
139
+ "Based on this screenshot, I provide a series of coordinates; describe the text of the elements there (with point).",
140
+ "Using the image of this webpage, I'll specify locations; you need to predict the text of the elements (with point).",
141
+ "On this page screenshot, I give coordinates; please predict the text content of the corresponding elements (with point).",
142
+ "From this webpage image, I'll indicate specific coordinates; can you describe the text of the elements (with point)?",
143
+ "In this web page image, I provide coordinates; your task is to predict the text of the elements at these locations (with point).",
144
+ "Given this screenshot of a webpage, I specify locations; please describe the text of the elements there (with point).",
145
+ "Using the provided page image, I'll point to locations; you predict the text content of the elements (with point).",
146
+ "On this webpage capture, I provide a series of coordinates; can you predict the text of the elements (with point)?",
147
+ "With this image of the web page, I give specific coordinates; your task is to describe the text of the elements at these points (with point)."
148
+ ]
149
+
150
+ # locate screen element(bbox)
151
+ loca_bbox_prompt = [
152
+ "In this UI screenshot, what is the position of the element corresponding to the command \"{}\" (with bbox)?",
153
+ "In the UI, where should I click if I want to complete instruction \"{}\" (with bbox)?",
154
+ "In this screen, how can I navigate to the section that says \"{}\" (with bbox)?",
155
+ "On this page, what is the location of the button do I press to follow the command \"{}\" (with bbox)?",
156
+ "For the action described as \"{}\", where is the corresponding icon in this UI (with bbox)?",
157
+ "To execute the function \"{}\", which item in the UI should I select (in coordinates) (with bbox)?",
158
+ "In this UI layout, where is the tool that performs the operation \"{}\" (with bbox)?",
159
+ "On this screen, where can I find the feature that allows me to \"{}\" (with bbox)?",
160
+ "In the software interface, which menu item corresponds to the task \"{}\" (in coordinates) (with bbox)?",
161
+ "Within this dashboard, which widget should I interact with to \"{}\" (with bbox)?",
162
+ "In the UI here, I need to {}, what is the coordinates of the element is related to this (with bbox)?",
163
+ "If my goal is to \"{}\", which control in this interface should I use (with bbox)?",
164
+ "On this device screen, to achieve the outcome \"{}\", where do I tap (with bbox)?",
165
+ "Facing this interface, where do I access to \"{}\" (with bbox)?",
166
+ "In this digital interface, to initiate \"{}\", where is my point of interest (with bbox)?",
167
+ "When using this app, for the function \"{}\", where is the command located (with bbox)?",
168
+ "In this UI design, to process the instruction \"{}\", where should I activate (with bbox)?",
169
+ "Within this graphical user interface, to \"{}\", which icon should I be looking for (with bbox)?",
170
+ "On this web page, to perform \"{}\", where is the link or button I will click (with bbox)?",
171
+ "In this interface snapshot, to begin \"{}\", what is the clicking point (with bbox)?",
172
+ "When interacting with this UI, for the operation labeled \"{}\", what is my target (with bbox)?",
173
+ "On this software's interface, to execute the step \"{}\", where do I direct my attention (with bbox)?",
174
+ "In the current UI, I want to {}, where should I click (with bbox)?",
175
+ "In this image, I want to {}, where should I click on (with bbox)?",
176
+ "In the current UI, to {}, where should I click (with bbox)?",
177
+ "In this image, to {}, where should I click on (with bbox)?",
178
+ "On this screen, I need to {}, where do I click (with bbox)?",
179
+ "In the UI right now, to {}, where should I click (with bbox)?",
180
+ "In this layout, I want to {}, where is the upload button (with bbox)?",
181
+ "On this interface, to {}, where should I click (with bbox)?",
182
+ "In this view, I need to {}, which icon do I select (in coordinates) (with bbox)?",
183
+ "On this page, I want to {}, where is the option (with bbox)?",
184
+ "In this webpage, I'm trying to {}, where do I click (with bbox)?",
185
+ "In this software, to {}, where should I navigate (with bbox)?"
186
+ ]
187
+
188
+ # locate screen element(point)
189
+ loca_point_prompt = [
190
+ "In this UI screenshot, what is the position of the element corresponding to the command \"{}\" (with point)?",
191
+ "In the UI, where should I click if I want to complete instruction \"{}\" (with point)?",
192
+ "In this screen, how can I navigate to the section that says \"{}\" (with point)?",
193
+ "On this page, what is the location of the button do I press to follow the command \"{}\" (with point)?",
194
+ "For the action described as \"{}\", where is the corresponding icon in this UI (with point)?",
195
+ "To execute the function \"{}\", which item in the UI should I select (in coordinates) (with point)?",
196
+ "In this UI layout, where is the tool that performs the operation \"{}\" (with point)?",
197
+ "On this screen, where can I find the feature that allows me to \"{}\" (with point)?",
198
+ "In the software interface, which menu item corresponds to the task \"{}\" (in coordinates) (with point)?",
199
+ "Within this dashboard, which widget should I interact with to \"{}\" (with point)?",
200
+ "In the UI here, I need to {}, what is the coordinates of the element is related to this (with point)?",
201
+ "If my goal is to \"{}\", which control in this interface should I use (with point)?",
202
+ "On this device screen, to achieve the outcome \"{}\", where do I tap (with point)?",
203
+ "Facing this interface, where do I access to \"{}\" (with point)?",
204
+ "In this digital interface, to initiate \"{}\", where is my point of interest (with point)?",
205
+ "When using this app, for the function \"{}\", where is the command located (with point)?",
206
+ "In this UI design, to process the instruction \"{}\", where should I activate (with point)?",
207
+ "Within this graphical user interface, to \"{}\", which icon should I be looking for (with point)?",
208
+ "On this web page, to perform \"{}\", where is the link or button I will click (with point)?",
209
+ "In this interface snapshot, to begin \"{}\", what is the clicking point (with point)?",
210
+ "When interacting with this UI, for the operation labeled \"{}\", what is my target (with point)?",
211
+ "On this software's interface, to execute the step \"{}\", where do I direct my attention (with point)?",
212
+ "In the current UI, I want to {}, where should I click (with point)?",
213
+ "In this image, I want to {}, where should I click on (with point)?",
214
+ "In the current UI, to {}, where should I click (with point)?",
215
+ "In this image, to {}, where should I click on (with point)?",
216
+ "On this screen, I need to {}, where do I click (with point)?",
217
+ "In the UI right now, to {}, where should I click (with point)?",
218
+ "In this layout, I want to {}, where is the upload button (with point)?",
219
+ "On this interface, to {}, where should I click (with point)?",
220
+ "In this view, I need to {}, which icon do I select (in coordinates) (with point)?",
221
+ "On this page, I want to {}, where is the option (with point)?",
222
+ "In this webpage, I'm trying to {}, where do I click (with point)?",
223
+ "In this software, to {}, where should I navigate (with point)?"
224
+ ]
225
+
226
+ # screen caption
227
+ screen_caption_prompt = [
228
+ "Can you provide a detailed description of the interface screenshot shown?",
229
+ "Illustrate the details visible in the provided screenshot.",
230
+ "What does the presented screen image depict?",
231
+ "How would you narrate the contents of this screen capture to someone who can't see it?",
232
+ "Please detail the elements shown in the interface screenshot.",
233
+ "Describe the features and information displayed in this screenshot.",
234
+ "Elaborate on what is visible in the screenshot of the interface.",
235
+ "Give a comprehensive description of the screenshot's interface.",
236
+ "What information is conveyed in the screenshot displayed?",
237
+ "Could you depict the content and layout of the screen image provided?",
238
+ "Explain the visual aspects of the screenshot taken from this interface.",
239
+ "How would you verbally depict the interface shown in the screenshot?",
240
+ "What key elements are shown in this interface screenshot?",
241
+ "Provide a verbal representation of the screenshot's content.",
242
+ "Narrate the components and information visible in this interface capture.",
243
+ "What are the main features displayed in the screenshot of this screen?",
244
+ "Outline the specific details shown in the interface image.",
245
+ "How would you describe this screen image to someone who cannot see it?",
246
+ "Enumerate the elements and information present in the provided interface screenshot.",
247
+ "Detail the visual composition of the screen capture you see."
248
+ ]
249
+
250
+ # widget captioning
251
+ widgetcap_prompt = [
252
+ "Please generate a description for the element at {}.",
253
+ "Describe the function of the element at {} on the screen.",
254
+ "What is the function of the element at {} on the UI?",
255
+ "What happens when you tap position {} on the screen?",
256
+ "What happens when you click point {} on the screen?",
257
+ "Can you explain what the user interface element at {} does?",
258
+ "What action is triggered by interacting with the area at {}?",
259
+ "Explain the purpose of the interactive element found at {}.",
260
+ "What feature is accessed by selecting the location at {}?",
261
+ "Identify and describe the component located at {}.",
262
+ "What is the outcome of selecting the element at {}?",
263
+ "Detail the functionality of the UI element positioned at {}.",
264
+ "What is the significance of the element located at {} in the application?",
265
+ "How does the element at {} contribute to the overall user experience?",
266
+ "What kind of input or interaction is expected at the point marked {}?"
267
+ ]