ChatGoogleGenerativeAI

You can access Google's gemini and gemini-vision models, as well as other generative models in LangChain through ChatGoogleGenerativeAI class in the @langchain/google-genai integration package.

tip

You can also access Google's gemini family of models via the LangChain VertexAI and VertexAI-web integrations.

Click here to read the docs.

Get an API key here: https://ai.google.dev/tutorials/setup

You'll first need to install the @langchain/google-genai package:

tip

See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/google-genai

yarn add @langchain/google-genai

pnpm add @langchain/google-genai

Usage

tip

We're unifying model params across all packages. We now suggest using model instead of modelName, and apiKey for API keys.

import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { HarmBlockThreshold, HarmCategory } from "@google/generative-ai";

/*
 * Before running this, you should make sure you have created a
 * Google Cloud Project that has `generativelanguage` API enabled.
 *
 * You will also need to generate an API key and set
 * an environment variable GOOGLE_API_KEY
 *
 */

// Text
const model = new ChatGoogleGenerativeAI({
  model: "gemini-pro",
  maxOutputTokens: 2048,
  safetySettings: [
    {
      category: HarmCategory.HARM_CATEGORY_HARASSMENT,
      threshold: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    },
  ],
});

// Batch and stream are also supported
const res = await model.invoke([
  [
    "human",
    "What would be a good company name for a company that makes colorful socks?",
  ],
]);

console.log(res);

/*
  AIMessage {
    content: '1. Rainbow Soles\n' +
      '2. Toe-tally Colorful\n' +
      '3. Bright Sock Creations\n' +
      '4. Hue Knew Socks\n' +
      '5. The Happy Sock Factory\n' +
      '6. Color Pop Hosiery\n' +
      '7. Sock It to Me!\n' +
      '8. Mismatched Masterpieces\n' +
      '9. Threads of Joy\n' +
      '10. Funky Feet Emporium\n' +
      '11. Colorful Threads\n' +
      '12. Sole Mates\n' +
      '13. Colorful Soles\n' +
      '14. Sock Appeal\n' +
      '15. Happy Feet Unlimited\n' +
      '16. The Sock Stop\n' +
      '17. The Sock Drawer\n' +
      '18. Sole-diers\n' +
      '19. Footloose Footwear\n' +
      '20. Step into Color',
    name: 'model',
    additional_kwargs: {}
  }
*/

API Reference:

ChatGoogleGenerativeAI from @langchain/google-genai

Tool calling

import { StructuredTool } from "@langchain/core/tools";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { z } from "zod";

const model = new ChatGoogleGenerativeAI({
  model: "gemini-pro",
});

// Define your tool
class FakeBrowserTool extends StructuredTool {
  schema = z.object({
    url: z.string(),
    query: z.string().optional(),
  });

  name = "fake_browser_tool";

  description =
    "useful for when you need to find something on the web or summarize a webpage.";

  async _call(_: z.infer<this["schema"]>): Promise<string> {
    return "fake_browser_tool";
  }
}

// Bind your tools to the model
const modelWithTools = model.bind({
  tools: [new FakeBrowserTool()],
});
// Or, you can use `.bindTools` which works the same under the hood
// const modelWithTools = model.bindTools([new FakeBrowserTool()]);

const res = await modelWithTools.invoke([
  [
    "human",
    "Search the web and tell me what the weather will be like tonight in new york. use a popular weather website",
  ],
]);

console.log(res.tool_calls);

/*
[
  {
    name: 'fake_browser_tool',
    args: {
      query: 'weather in new york',
      url: 'https://www.google.com/search?q=weather+in+new+york'
    }
  }
]
*/

API Reference:

StructuredTool from @langchain/core/tools
ChatGoogleGenerativeAI from @langchain/google-genai

tip

See the above run's LangSmith trace here

`.withStructuredOutput`

import { StructuredTool } from "@langchain/core/tools";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { z } from "zod";

const model = new ChatGoogleGenerativeAI({
  model: "gemini-pro",
});

// Define your tool
class FakeBrowserTool extends StructuredTool {
  schema = z.object({
    url: z.string(),
    query: z.string().optional(),
  });

  name = "fake_browser_tool";

  description =
    "useful for when you need to find something on the web or summarize a webpage.";

  async _call(_: z.infer<this["schema"]>): Promise<string> {
    return "fake_browser_tool";
  }
}
const tool = new FakeBrowserTool();

// Bind your tools to the model
const modelWithTools = model.withStructuredOutput(tool.schema, {
  name: tool.name, // this is optional
});
// Optionally, you can pass just a Zod schema, or JSONified Zod schema
// const modelWithTools = model.withStructuredOutput(
//   zodSchema,
// );

const res = await modelWithTools.invoke([
  [
    "human",
    "Search the web and tell me what the weather will be like tonight in new york. use a popular weather website",
  ],
]);

console.log(res);
/*
{
  url: 'https://www.accuweather.com/en/us/new-york-ny/10007/night-weather-forecast/349014',
  query: 'weather tonight'
}
*/

API Reference:

StructuredTool from @langchain/core/tools
ChatGoogleGenerativeAI from @langchain/google-genai

tip

See the above run's LangSmith trace here

Multimodal support

To provide an image, pass a human message with a content field set to an array of content objects. Each content object where each dict contains either an image value (type of image_url) or a text (type of text) value. The value of image_url must be a base64 encoded image (e.g., data:image/png;base64,abcd124):

import fs from "fs";
import { ChatGoogleGenerativeAI } from "@langchain/google-genai";
import { HumanMessage } from "@langchain/core/messages";

// Multi-modal
const vision = new ChatGoogleGenerativeAI({
  model: "gemini-pro-vision",
  maxOutputTokens: 2048,
});
const image = fs.readFileSync("./hotdog.jpg").toString("base64");
const input2 = [
  new HumanMessage({
    content: [
      {
        type: "text",
        text: "Describe the following image.",
      },
      {
        type: "image_url",
        image_url: `data:image/png;base64,${image}`,
      },
    ],
  }),
];

const res2 = await vision.invoke(input2);

console.log(res2);

/*
  AIMessage {
    content: ' The image shows a hot dog in a bun. The hot dog is grilled and has a dark brown color. The bun is toasted and has a light brown color. The hot dog is in the center of the bun.',
    name: 'model',
    additional_kwargs: {}
  }
*/

// Multi-modal streaming
const res3 = await vision.stream(input2);

for await (const chunk of res3) {
  console.log(chunk);
}

/*
  AIMessageChunk {
    content: ' The image shows a hot dog in a bun. The hot dog is grilled and has grill marks on it. The bun is toasted and has a light golden',
    name: 'model',
    additional_kwargs: {}
  }
  AIMessageChunk {
    content: ' brown color. The hot dog is in the center of the bun.',
    name: 'model',
    additional_kwargs: {}
  }
*/

API Reference:

ChatGoogleGenerativeAI from @langchain/google-genai
HumanMessage from @langchain/core/messages

Gemini Prompting FAQs

As of the time this doc was written (2023/12/12), Gemini has some restrictions on the types and structure of prompts it accepts. Specifically:

When providing multimodal (image) inputs, you are restricted to at most 1 message of "human" (user) type. You cannot pass multiple messages (though the single human message may have multiple content entries)
System messages are not natively supported, and will be merged with the first human message if present.
For regular chat conversations, messages must follow the human/ai/human/ai alternating pattern. You may not provide 2 AI or human messages in sequence.
Message may be blocked if they violate the safety checks of the LLM. In this case, the model will return an empty response.

ChatGoogleGenerativeAI

Usage

API Reference:

Tool calling

API Reference:

`.withStructuredOutput`

API Reference:

Multimodal support

API Reference:

Gemini Prompting FAQs

Was this page helpful?

You can leave detailed feedback on GitHub.

ChatGoogleGenerativeAI

Usage​

API Reference:

Tool calling​

API Reference:

.withStructuredOutput​

API Reference:

Multimodal support​

API Reference:

Gemini Prompting FAQs​

Was this page helpful?

You can leave detailed feedback on GitHub.

Usage

Tool calling

`.withStructuredOutput`

Multimodal support

Gemini Prompting FAQs