AgentPilot Code

A versatile workflow automation platform to create AI workflows

Brought to you by: jbexta

Tree [229f0e] master /

History

HTTPS access

File	Date	Author	Commit
agentpilot	2023-10-25	jb	[229f0e] preparation for binaries
docs	2023-10-24	jbexta	[e49e19] bug fix, cleanup
tests	2023-09-24	JB	[44e3f5] integrating open interpreter
LICENSE	2023-10-20	jbexta	[237f84] update readme
README.md	2023-10-24	jbexta	[e49e19] bug fix, cleanup
configuration.yaml	2023-10-25	jb	[229f0e] preparation for binaries
data.db	2023-10-25	jb	[229f0e] preparation for binaries
pyproject.toml	2023-10-24	jbexta	[046a83] fix markdown
requirements.txt	2023-10-24	jbexta	[e49e19] bug fix, cleanup
setup.py	2023-10-22	jbexta	[1d9f71] rename to agent pilot

Read Me

💬 Agent Pilot

️ AgentPilot desktop demo

Agent Pilot is an open source desktop application to create, manage, and chat with AI agents, and manage their voices, personality, and actions.

Use your own API keys or ~~bring your own model~~

Desktop GUI:

Manage agents - Create, edit and delete agents, and manage their voices, personality and actions.
Manage chats - View, continue and delete previous agent chats.
Run code - With Open Interpreter enabled, an agent can run code to do what you ask it to do.
~~Branching chats - Messages can be deleted, edited and resubmitted, and code can be edited and re-run.~~
~~Group chats - Chat with multiple agents at once, and configure their interactions between each other.~~
Stop Generation - Stop a response mid-generation.
Customise Display - Customise the display with a range of options including colours, fonts, and text size.
Settings - Configure global settings, agent settings, context settings, actions and more.

Hybrid Agents (Coming soon)
~~A blend of hard-coded actions and a code interpreter allows the assistant to be fast and reliable when it can be, and more powerful when it needs to be.~~

AgentPilot gif demo

Features

🔌 Agent Plugins

Easily plug in your own agents. Agent Pilot comes with the following plugins ready to use: MemGPT, OpenInterpreter

🔨 Context Blocks

A customisable list of context blocks are available to all agents, and can be used within their system message with placeholders. This is useful for reusability and consistency across multiple Agents.

📄 Tasks

For agents where actions are enabled, a task is created when one or more actions are detected, and will remain active until it completes, fails or decays.

Actions can be detected natively or with a function call from an LLM that supports it.

Hard-coded actions are searched and sorted based on semantic similarity to the request.
A group of the most similar actions are then fed to the action decision method.
A single action can be detected and executed on its own without using ReAct, if a request is complex enough then ReAct is used.
If ReAct fails to find an action, then the request can be passed on to another Agent.

💻 Code Interpreter

Open-Interpreter is integrated into AgentPilot, and can either be used standalone as a plugin or it can be used only when it needs to be, saving significant costs for a general use agent.

By default, code automatically runs in 5 seconds and can be stopped, edited and re-run.

👸 Behaviour

Agents support definition of character behaviour by using a context block, allowing them to reply and sound like a celebrity or a character using TTS services that support this feature. In the future there will be support for offline TTS models.

Supported TTS services:

Amazon Polly

Elevenlabs

FakeYou (celebrities and characters)

Uberduck (celebrities and characters) (discontinued)

🔓 Integrated Jailbreak

Agents support DevMode Jailbreak for more unique and creative responses.

To enable this add "{jailbreak}" to your agents System Message, then change the following agent setting:

context > prefix-all-assistant-msgs = (🔓 Developer Mode Output)

Assistant messages are sent back to the LLM with the prefix "(🔓 Developer Mode Output)" as instructed by the jailbreak, whether the message contained it or not. This helps to keep the jailbraik aligned ;)

Only the main context is jailbroken. Actions, ReAct and the code interpreter are not affected by the jailbreak.

🕗 Scheduler

~~Tasks can be recurring or scheduled to run at a later time with requests like "The last weekend of every month", or "Every day at 9am".~~
Still in development, coming soon.

The rest of this readme is old and needs updating

Action Overview

# Example Action
class GenerateImage(BaseAction):
    def __init__(self, agent):
        super().__init__(agent)
        # DEFINE THE ACTION DESCRIPTION
        self.desc_prefix = 'requires me to'
        self.desc = "Do something like Generate/Create/Make/Draw/Design something like an Image/Picture/Photo/Drawing/Illustration etc."
        # DEFINE THE ACTION INPUT PARAMETERS
        self.inputs.add('description-of-what-to-create')
        self.inputs.add('should-assistant-augment-improve-or-enhance-the-user-image-prompt', 
                        required=False, 
                        hidden=True, 
                        fvalue=BoolFValue)

    def run_action(self):
        """
        Starts or resumes the action on every user message
        Responses can be yielded instead of returned to allow for continuous execution
        """

        # USE self.add_response() TO SEND A RESPONSE WITHOUT PAUSING THE ACTION
        self.add_response('[SAY] "Ok, give me a moment to generate the image"')

        # GET THE INPUT VALUES
        prompt = self.inputs.get('description-of-what-to-create').value
        augment_prompt = self.inputs.get('should-assistant-augment-improve-or-enhance-the-user-image-prompt').value.lower().strip() == 'true'

        # STABLE DIFFUSION PROMPT GENERATOR
        num_words = len(prompt.split(' '))
        if num_words < 7:
            augment_prompt = True

        if augment_prompt:
            conv_str = self.agent.context.message_history.get_conversation_str(msg_limit=4)
            sd_prompt = llm.get_scalar(f"""
Act as a stable diffusion image prompt augmenter. I will give the base prompt request and you will engineer a prompt for stable diffusion that would yield the best and most desirable image from it. The prompt should be detailed and should build on what I request to generate the best possible image. You must consider and apply what makes a good image prompt.
Here is the requested content to augment: `{prompt}`
This was based on the following conversation: 
{conv_str}

Now after I say "GO", write the stable diffusion prompt without any other text. I will then use it to generate the image.
GO: """)
        else:
            sd_prompt = prompt

        # USE REPLICATE API TO GENERATE THE IMAGE
        cl = replicate.Client(api_token=api.apis['replicate']['priv_key'])
        image_paths = cl.run(
            "stability-ai/sdxl:2b017d9b67edd2ee1401238df49d75da53c523f36e363881e057f5dc3ed3c5b2",
            input={"prompt": sd_prompt}
        )

        if len(image_paths) == 0:
            # YIELD AN ActionError() TO STOP THE ACTION AND RETURN AN ERROR RESPONSE
            yield ActionError('There was an error generating the image')

        # DOWNLOAD THE IMAGE
        req_path = image_paths[0]
        file_extension = req_path.split('.')[-1]
        response = requests.get(req_path)
        response.raise_for_status()
        image_bytes = io.BytesIO(response.content)
        img = Image.open(image_bytes)
        img_path = tempfile.NamedTemporaryFile(suffix=f'.{file_extension}').name
        img.save(img_path)

        # ASK THE USER FOR CONFIRMATION TO OPEN THE IMAGE (FOR THE SAKE OF THIS EXAMPLE)
        # 1. ADD A NEW INPUT
        # 2. YIELD MissingInputs(), THIS IS EQUIVELANT TO `ActionResponse('[MI]')`
        open_image = self.inputs.add('do-you-want-to-open-the-image', BoolFValue)
        yield MissingInputs() 
        # EXECUTION WILL NOT RESUME UNTIL THE INPUT HAS BEEN DETECTED

        # OPEN THE IMAGE
        if open_image.value():
            if platform.system() == 'Darwin':  # MAC
                subprocess.Popen(['open', img_path], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
            elif platform.system() == 'Windows':  # WINDOWS
                subprocess.Popen(['start', img_path], shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)
            else:  # LINUX
                subprocess.Popen(['xdg-open', img_path], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)

        # YIELD AN ActionSuccess() TO STOP THE ACTION AND RETURN A RESPONSE
        # PASS ANY OUTPUT VARIABLES IN PARENTHESES "()"
        yield ActionSuccess(f'[SAY] "The image has been successfuly generated." (path = {img_path})')

Every action must contain the variables:

desc_prefix (A prefix for the description for when the Agent is detecting actions from the users' message Eg. 'requires me to')

desc (A description of what the action does Eg. 'Get the current time')

Any action category (.py file under agentpilot/operations/actions) can also contain these variables, but are optional.
If these aren't given, then by default the category will be formatted like this:
The user's request mentions something related to [category]

Each action must contain a run_action() method.
This is called when a Task decides to run the Action.

This method can be a generator, meaning ActionResponses can be 'yielded' instead of 'returned', allowing the action logic to continue sequentially from where it left off (After each user message).

This method will not run unless all required inputs have been given.
If there are missing inputs the Agent will ask for them until the task decays.

This is useful for confirmation prompts, or to ask the user additional questions based on programatic execution flow.

Action Input Parameters

input_name: A descriptive name for the input

required: A Boolean representing whether the input is required before executing

time_based: A Boolean representing whether the input is time based

hidden: A Boolean representing whether the input is hidden and won't be asked for by the agent

default: A default value for the input

examples: A list of example values, unused but may be used in the future

fvalue: Any FValue (Default: TextFValue)

Action Responses

When an ActionResponse is yielded, it's injected into the main context to guide the agent's next response.

Unless the Action was created from within a ReAct context, then it is only usually used for the React instance.

An ActionResponse can contain dialogue placeholders, by default these are available:

'[RES]' = '[WOFA][ITSOC] very briefly respond to the user '
'[INF]' = '[WOFA][ITSOC] very briefly inform the user '
'[ANS]' = '[WOFA][ITSOC] very briefly respond to the user considering the following information: '
'[Q]' = '[ITSOC] Ask the user the following question: '
'[SAY]', '[ITSOC] Say: '
'[MI]' = '[ITSOC] Ask for the following information: '
'[WOFA]' = 'Without offering any further assistance, '
'[ITSOC]' = 'In the style of {char_name}{verb}, spoken like a genuine dialogue, ' if self.voice_id else ''
'[3S]', 'Three sentences'

ActionResponse's from within a ReAct class ignore all dialogue placeholders. So it's important to word the ActionResponse properly, for example:

ImageGen response = f"[SAY] 'The image has been successfuly generated.' (path = {img_path})"

Notice how the dialogue placeholders are only used for instructions that relate to how the response is relayed to the user, and not the actual response itself.

Also notice the information in parenthesis "( )" is only output values.

The response is seen by the main context including the dialogue placeholders but not the output values.

And is seen by a ReAct context including the output values but not the dialogue placeholders.

Creating an Action Category

Actions can be categorized, allowing many more Actions to be available to the Agent while improving speed.

Categories and Actions are stored in the directory agentpilot/operations/actions

New categories can be made by adding a new file to this directory, the Agent will use the filename as the category name, unless it contains a desc variable.

Creating an Action

Creating a new action is straightforward, simply add a new class that inherits BaseAction to any category file under the actions directory.

An action can be uncategorized by adding it to the _Uncategorized.py file. Categories that begin with an underscore will not be treated as a category, and the actions within this file will always be included in the decision.

Ensure the action makes sense in the context of the category it is being added to, or the Agent will likely have trouble finding it.

Task Overview

A Task is created when one or more Actions are detected, and will remain active until it completes, fails or decays.

Actions can be detected by the following methods:

- Native - Native decision prompt that doesn't rely on function calling.
- Function Call - Function call from an LLM that supports it.

Hard-coded actions are searched and sorted based on semantic similarity to the request. A group of the most similar actions are then fed to one of the detection methods above, depending on the config setting: use-function-call

If the config setting try-single-action = true then a validation prompt is used to determine if the single action is sufficient, and if not, then ReAct is used. (If enabled in the config)

This validator can be disabled with the config setting: use-validator

If the config setting try-single-action = false then the validator is skipped, since the validator is only used to determine if the single action is sufficient.

This default behaviour of not always using ReAct is faster for single actions, but introduces a problem where for complex requests it may forget to initiate a ReAct.
This could be solved by fine-tuning a validator model.

Explicit ReAct is used to seperate different instructions verbatim from the user request, to execute them independently. Implicit ReAct is work in progress.

If ReAct fails to perform an action, then the request can be passed on to the code interpreter.

An action will not run until all required inputs have been given, and the parent task will decay if the inputs are not given within a certain number of messages (Config setting decay_at_idle_count)

This is also true when actions are performed inside a ReAct, then the ReAct will hang on the action until the input is given or decays.

Current actions built in (some are broken or unfinished):

Web_Browser_and_Website

Open_Websites

Search_Site

Audio_Playback

GetNameOfCurrentlyPlayingTrack

NextTrack

PauseMusic

PlayMusic

PreviousTrack

RepeatTrack

SearchPlayMusic

SwitchPlaybackToDesktop

SwitchPlaybackToSmartphone

ToggleShuffle

Image_And_Video_Production

GenerateImage (Replicate API)

UpscaleImage (Replicate API)

Desktop_Management

CloseWindow

MinimizeWindow

Set_Desktop_Background

Desktop_Software_Apps

Open_Desktop_Software

Email_OR_SMS_OR_Messaging

Send_SMS_Or_Text_Message (Twilio API)

Clipboard_Operations

Copy_To_Clipboard

Cut_To_Clipboard

Paste_From_Clipboard

RemindersAndEvents

~~Set_Reminder~~

Lists

Add_Item_To_List

Create_A_New_List

DeleteOrRemove_A_List

DeleteOrRemove_Item_From_List

ViewOrRead_Existing_List

Files_and_Directories

DeleteFile

Open_Directory_Or_File

~~UnzipFile~~

_Uncategorised

Clear_Assistant_Context_Messages

Date

~~Modify_Assistant_Responses~~

~~Modify_Assistant_Voice~~

~~MouseClick~~

Sync_Available_Voices

Time

Type_Text

Example of different ways to execute Tasks:

User: "Generate an image of a cat and a dog and set it as my wallpaper"

Assistant: "Ok, give me a moment to generate the image"

Assistant: "Wallpaper set successfully"

User: "Generate an image of a cat and a dog"

Assistant: "Ok, give me a moment to generate the image"

Assistant: "Here is the image"

User: "Set it as my wallpaper"

Assistant: "Wallpaper set successfully"

User: "Generate an image"

Assistant: "Ok, what do you want me to generate?"

User: "A cat and a dog"

Assistant: "Ok, give me a moment to generate the image"

Assistant: "Here is the image"

User: "Set it as my wallpaper"

Assistant: "Wallpaper set successfully"

Notes

Some features are not yet implemented in the GUI even though the GUI has the options for them, so while the GUI is working for basic functionality, it is not stable.

Parts of this readme may be outdated or incorrect as the project is still in development.

Even though Agent Pilot doesn't support local models yet, the architecture supports it and isn't tied to OpenAI architecture.

Finetuning

~~Each component of the Agent can be fine-tuned independently on top of the zero-shot instructions to improve the accuracy of the Agent.~~

~~Fine-tuning data can be found in utils/finetuning.~~

~~When eval mode is turned on with -e, prompts are saved to the 'valid' directory, and a popup will appear for each task with a "Wrong" button. When a task is marked as wrong, you will be asked to specify which prompts were wrong, and to provide the correct response.~~

~~To fine-tune a GPT 3.5 model with the data, use the following command:

-finetune or just ask the Agent to fine-tune itself.~~

~~You will be told how much it will cost to fine-tune a model, and asked to confirm the action.~~

~~Fine tuned model metadata is stored in the database, and each~~

Contributions

Contributions to AgentPilot are welcome and appreciated. Please feel free to submit a pull request.

Known Issues

Switching chats while a response is generating causes issues. This will be fixed in the group-chat update
App has frozen on me twice, something related to moving the window. Workaround for now is to restart the app
Hard coded actions aren't implemented in the GUI yet

Agent Settings

General
- Name
- Description
- Avatar path
- Plugin ID
Context
- Model
- System message
- Fallback to davinci
- Max messages
- Assistant message prefix
- Automatically title new chat's
Actions
- Enable Actions
- Detection model
- Source Directory
- Replace busy action on new
- Use function calling
- Use validator
- Validator model
Code Interpreter
- Enable Code Interpreter
- Auto run seconds
Voice
- Voice ID

Context Settings

Context specific settings coming with group chat update
- Participants
- - [All from Agent settings]

Global Settings

System
- Database Path
API
Display
- Primary Color
- Secondary Color
- Text Color
- Text Font
- Text Size
- User Bubble Background Color
- User Bubble Text Color
- Assistant Bubble Background Color
- Assistant Bubble Text Color
- Code Bubble Background Color
- Code Bubble Text Color
- Action Bubble Background Color
- Action Bubble Text Color
blocks
Plugins

[Plus all from Agent settings]

AgentPilot Code

A versatile workflow automation platform to create AI workflows

Branches

Tags

Tree [229f0e] master /

History

Read Me