Decoding Tasks in Generative AI: The Building Blocks of Intelligent Systems
Welcome, fellow tech enthusiasts and business leaders! I'm thrilled to kick off a series of blog posts dedicated to a topic that's been creating waves in the world – Generative AI.
As I delve deep into the fascinating world of enterprise generative AI, I'll be exploring not only high-level concepts like governance, security, and audibility, but also offering practical guidance on topics such as aggregating APIs and understanding generative AI architectures.

Whether you're a seasoned veteran in the AI space or a curious newcomer, this series aims to shed light on how large enterprises can harness the power of generative AI to drive innovation, efficiency, and value. I'll be tackling the complex issues, breaking down jargon, and providing actionable insights to help you navigate the AI landscape with confidence. So, buckle up and join me on this exciting journey into the future of business technology. Let's demystify AI together!
Disclaimer: this article provides an overview of architectural concepts that are not specific to Azure, but are at times illustrated using Azure services since I am a Solution Architect at Microsoft.
Tasks
Our first pit-stop is ‘Tasks'. In the grand scheme of AI, tasks are the small, yet mighty, cogs that keep the wheel turning. They are well-defined units of work that a Language Model (LLM) can perform for you, acting as the building blocks to your broader (AI) system.
When I talk about tasks, I'm referring to actions with specified inputs and outputs. Each task execution is independent, meaning that it doesn't rely on past or future executions. It's a standalone operation that works in its own bubble.
A prompt is built into these tasks to guide the LLM. This is a bit like giving a set of instructions to a friend – you need to be clear and concise, so they know exactly what to do. Sometimes this might involve adding few-shot examples (few-shots are show cases of the input and expected outputs), but there are many ways for prompts to be constructed, but that is outside the scope of this post.
Some of the other things to consider are how to check outputs before sending it back to the originating user or application and this is where logging, monitoring and other regular DevOps processes come into play.
But for Generative AI there are also other techniques to consider. Often this comes in the form of a wrapper around the service you're building that catches different content depending on specific filters, for instance using Azure AI Content Safety, but this could also include wrapping the execution with a human-in-the-loop setup that ensures the output is within the bounds, because people check some or all generated outputs.
Finally another approach to consider is to use a tool like Prompt Flow to script different variants of input and outputs and continuously test whether those still give the expected outcome. This is useful both for evaluating a upgraded model (like when Gpt-35-Turbo moved from version 0301 to 0613), but it can also be used to validate and choose the right model for a particular task, testing a task with smaller (and therefore cheaper and faster) models, like Meta's Llama models, and larger models like GPT-35-Turbo and GPT4 or even fine-tuned models, and deciding automatically based on the metrics which models this task should be using to optimise costs, latency and accuracy.
Integration & Examples
The beauty of this all is that it's exposed through a well-documented API. This means that keys, billing, and logging are managed through some kind of API Management system, making the process seamless and user-friendly, while providing the power of LLM's to many applications in different ways.
To bring this concept to life, let's consider a few examples. Imagine you need to export a sales order from a text, or summarise a text with domain-specific highlights. These are tasks that a LLM can do for you, effectively and efficiently.
Example 1: writing a job posting based on a list of requirements and existing job postings
This task is a solution created by a HR department but potentially used by managers throughout a company, preferably embedded within the application that powers the job posting or career website (and from there integrates towards job boards like LinkedIn).
The user input for a task like this is the job title, responsibilities and qualifications, it also includes things like the department and levels.
With this input the task loads relevant comparable postings to serve as examples. It might have language in the prompt like: "always end with this text: …", which serves as a way for HR to enforce mandated language across all job postings.
Finally it does the actual call to a LLM to generate the new texts and returns that to the user. At this stage, the output might be checked for certain terms or texts to make sure that hallucinations of the model do not get in the way. There might also be some post-processing happening, for instance by default translating the text in multiple languages.
The returned result is then given to the requestor to validate and approve after making any changes or tweaks to it, the before mentioned checks and post-processing might also trigger after this, so that the user does not delete the mandatory wording and translation is done on the finished product.

Example 2: parsing incoming orders from email or phone to JSON
This second example is focused on automating tasks that are difficult to program. In this case sales orders for, let's say a retailer, which come in by phone or by email.
In order to get the "technical" description of the sales order from those free-text formats, in most cases, manual labor is the only way to do it, even though you might be able to cobble together something with complex regexes and lots of code, but this kind of work is where LLM's really excel in dealing with the human-made "mess".
In the case of a phone message, the first step is to transcribe that and there are many API's and models that can do this, so I won't go into detail here. The output is treated the same as the email.
The text of the email or transcription is sent to a API endpoint for this task, that task has a prompt which has instructions in it, like "from the text below generate a json-object, summarising the order with the following fields: customer_name, product, SKU, amount, etc…", for some fields that might include allowed values, like brand or product names, and for other fields default values might be specified, all this is part of the prompt.
For instance, the prompt could look like this:
You must extract the following information from the user question below:
1. Customer (key: customer)
2. Location (key: location)
3. Products and orders in a array (key: order_items)
Each order item should have the following information:
1. Brand, can be Contoso, AdventureWorks (key: brand)
2. Product, can be Soda, Water, Sparkling Water, default is 'Water' (key: product)
3. Number of items (key: amount)
4. Package type, i.e. bottle, can, crate, carton (key: type)
Make sure fields 1 to 3 are answered very short, e.g. for location just say the location name
Please answer in JSON machine-readable format, using the keys from above.
User question:
{{ email or phone transcription }}
JSON:
The output when using a LLM for the input: Good morning, I work for AW Bar on 1st Avenue South, Seattle, WA, I would like to order 5 crates of Contoso sparkling, 2 cartons of AW soda, and 4 bottles of AdventureWorks. … could look something like this:
{
"customer": "AW Bar",
"location": "Seattle, WA",
"order_items": [
{
"brand": "Contoso",
"product": "Sparkling",
"amount": 5,
"type": "Crate"
},
{
"brand": "AdventureWorks",
"product": "Soda",
"amount": 2,
"type": "Carton"
},
{
"brand": "AdventureWorks",
"product": "Water",
"amount": 4,
"type": "Bottle"
}
]
}
Next, post-processing is done on that JSON object to validate those values, checking the brand or product names, the SKU's, the amounts ordered, etc. If there are mistakes a second prompt is used asking the LLM to correct the response for that specific field, and if that still fails validation the whole task is put on some kind of queue for someone in the sales order processing department to manually fix what was missing or to reach out to the customer to validate the order, this could then also be used to provide feedback to the task developer so that they can catch new misses over time.
Finally, the order, if correct, is entered into the order processing system, and depending on the situation, either get's processed or the customer is asked to sign off on it before moving forward (and this can also be order dependent, for instance if the total order value is over a certain threshold that the customer is asked to confirm, and that threshold can be made dynamic by comparing to previous orders of this customer).

In conclusion, wrapping prompts + LLM's in a RESTful API to power specific tasks is a very useful pattern to make sure that different teams and applications can reuse smart thinking by other people and teams of the organisation without having to redevelop all the logic, it also allows an enterprise to make sure that certain standards are met, whether that is in the output, like in the first example, or in the structure and field names, like in the second example. By making these API's available and adding things like a API aggregation layer over this, this can power enterprises to use and leverage LLM's in scalable, predictable and valuable ways.
So, that wraps up the first blog post in this series about the different ways generative AI can be used across an enterprise. Some of the other topics that I plan to visit are chatbots, governance, and finally I will propose a model for how to combine all these things and make it work across larger enterprises.
Stay tuned for my next post, where I'll dive deeper into the world of enterprise generative AI. Until then, happy tasking! As the next posts are created and published I will update this one with the links.