CLaRA: Cost-effective LLMs Function Calling Based on Vector Database

Miloslav Szczypka; Lukáš Jochymek; Alena Martinková

doi:10.13164/mendel.2024.1.043

Miloslav Szczypka Rankaci, Ltd.
Lukáš Jochymek Rankacy, Ltd.
Alena Martinková Rankacy, Ltd.

DOI: https://doi.org/10.13164/mendel.2024.1.043

Keywords: Function Calls, Large Language Models, OpenAI, GPT, Vector Database

Abstract

Since their introduction, function calls have become a widely used feature within the OpenAI API ecosystem. They reliably connect GPT’s capabilities with external tools and APIs, and they quickly found their way into the other LLMs. The challenge one can encounter is the number of tokens consumed per request since each definition of each function is always sent as an input. We propose a simple solution to effectively decrease the number of tokens by sending only the function corresponding to the user question. The solution is based on saving the functions
to the vector database, where we use the similarity score to pick only the functions that need to be sent. We have benchmarked that our solution can decrease the average prompt token consumption by 210% and the average prompt (input) price by 244% vs the default function call. Our solution is not limited to specific LLMs. It can be integrated with any LLM that supports function calls, making it a versatile tool for reducing token consumption. This means that even cheaper models with a high volume of functions can benefit from our solution.

References

Function calling and other api updates, available at: https://openai.com/index/function-callingand-

other-api-updates/, 2023.

Pricing, available at: https://openai.com/api/pricing/, 2024.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss,

A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark,

J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. Language models are few-shot learners, 2020.

Cao, H. Recent advances in text embedding: A comprehensive review of top-performing methods on the mteb benchmark, 2024.

Chien, A. A., Lin, L., Nguyen, H., Rao, V., Sharma, T., and Wijayawardana, R. Reducing the carbon impact of generative ai inference

(today and in 2035). In Proceedings of the 2nd Workshop on Sustainable Computer Systems (2023), pp. 1–7.

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. Indexing by latent semantic analysis. Journal of

the American society for information science 41, 6 (1990), 391–407.

Gionis, A., Indyk, P., and Motwani, R. Similarity search in high dimensions via hashing. Proceeding VLDB ’99 Proceedings of the 25th International

Conference on Very Large Data Bases 99 (05 2000).

Malkov, Y. A., and Yashunin, D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.

IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space, 2013.

OpenAI. Gpt-4 technical report, 2023.

Pan, J. J., Wang, J., and Li, G. Vector database management techniques and systems. In SIGMOD Conference Companion (2024),

pp. 597–604.

Pennington, J., Socher, R., and Manning, C. Glove: Global vectors for word representation. vol. 14, pp. 1532–1543.

Procopiuc, O., Agarwal, P. K., Arge, L., and Vitter, J. S. Bkd-tree: A dynamic scalable kd-tree. In Advances in Spatial and Temporal Databases: 8th International Symposium, SSTD 2003, Santorini Island, Greece, July 2003. Proceedings 8 (2003), Springer, pp. 46–65.

Samsi, S., Zhao, D., McDonald, J., Li, B., Michaleas, A., Jones, M., Bergeron, W., Kepner, J., Tiwari, D., and Gadepally, V. From words to watts: Benchmarking the energy costs of large language model inference. In 2023 IEEE High Performance Extreme Computing Conference (HPEC) (2023), IEEE, pp. 1–9.

Stata, R., Bharat, K., and Maghoul, F. The term vector database: fast access to indexing terms for web pages. Computer Networks 33, 1-6 (2000), 247–255.

Venables, W. N., Ripley, B. D., Venables, W., and Ripley, B. Tree-based methods. Modern applied statistics with S-Plus (1999), 303–327.

Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. Improving text embeddings with large language models, 2024.

Xia, P., Zhang, L., and Li, F. Learning similarity with cosine similarity ensemble. Information sciences 307 (2015), 39–52.