Integrate AI into GNOME
In recent years and months there have been various advancements in the field of AI. I believe it’s important to integrate these advancements into GNOME, hence I want to create this whiteboard ticket. I have various suggestions on how we could implement this integration, sorted how much feasible this idea might be: Beginning from current open source to mid term ideas.
1.) various models have been published that enable very good speech recognition. There’s also a project based on KDE called SpeechNote https://github.com/mkiol/dsnote, and I think it would be useful to integrate this into the GNOME system. If technical AI details should be communicated we need to clarify questions such as which model we should use and how we can communicate the capabilities and differences of the various models (very important: how much GPU memory does it need?). Should we use different models from different sources like DeepSpeach, Vosk? What about specialized models for specific areas, like medicine for example? Many of other models integrated into SpeechNote also offer a very good text-to-speech function, which could be very important for the field of accessibility.
2.) Newer models can describe images and also understand language very well. I believe this could improve our search. For example, images could be described by a model. This could happen automatically in the background and would make tagging by user redundant, as we now have a relatively good search. I never used tagging images, partly I didn't have time. After the background models described the local images, one could search for "beaches" and would get all beach images.
3.)In the medium term, the question naturally arises as to how we can deal with text commands. Language models are very good understand different ways of commands. For example, “Show me the map of so and so” or “What is the way from A to B?” or “Show all pictures taken in Warsaw”. All of this is possible today and could be executed with an appropriate language model in the search or a command bar. How do we design the GNOME search then? Also how to desing the command structure?
4.) There is the idea of integrating a help window in the terminal, which helps one to enter a command when one does not have it at hand like chatGPT or phind offer today.
5.) A new type of interaction with computers has emerged, namely the chat window. This of course also raises certain data protection questions and there are certainly many people in Open Source who value that their model is executed locally. Therefore, the question arises as to how a chat window could be integrated. These models do can be executed offline and have integration into the GNOME system.
6.) In the medium term, the question arises whether strictly defined user interfaces are still up to date or whether the user interfaces are becoming more fluid as an AI model could create it. Here too, there are already initial discussions and considerations as to how fluid a user interface should be. This is more a philosophical question but I think the design team should be aware of.
I hope these suggestions are helpful and I look forward to further discussions and feedback.