We need a modern replacement for FFaker

Welcome Mokkku

FFaker is a great library. I have used it for many years in every Rails project I have worked on to produce test data for unit tests and fill the development and staging environments with production-like records. However, there are a few things that could be improved.

The problems

FFaker provides a very general set of values and combinations to make it easy to use in most projects. We have different types of values and translations for them. Yet, there is a space where more than this library is needed.

Lack of locked context

Imagine you have the Address record with the city and country columns. You create test data using FFaker::Address.city and FFaker::Address.country and get a similar set of values:

#<Address city: "Berlin", country: "France">

You can probably accept this for unit tests, but it makes no sense when you present such data for the staging or local environment. There are more situations like this, for example, when you get a different currency that is used in the given country, and so on. We need a consistent context in the boundaries of a single record.

The limited set of values

The library supports a limited set of values we can generate for specific fields. That’s understandable, as there are probably unlimited variations of columns we can use in our application. Because of that, we might end up with a random string for less common columns or values that violate the validation rules we have in place.

The limited number of supported languages

Maintainers and various contributors did a great job adding more supported languages, but we still need to include many possible combinations, even for already supported languages.

The solution

Let’s use LLMs like GPT, Gemini, or Claude Sonnet. They are large language models, which means they are very good at generating sample values of any type of column we can create in any language we want.

To verify this idea, I created a library called Mokkku. The concept behind this gem is simple: you provide the model name for which you want to generate test data, and the gem collects the model’s columns and prepares the prompt. The LLM of your choice generates the YAML file with sample values. You get real-world values in the language of your choice and within the same context.

The test drive

Let’s start by adding the gem to the project:

bundle add mokkku

Currently, three LLMs are supported: GPT, Gemini, and Claude Sonnet. You can either provide the API key and benefit from full automation or just get the prompt, paste it to the chat window, and then copy the YAML values.

Given that you want to use GPT via API to generate test data for User and Address models:

mokkku –models=User,Address –api-key=value –llm-model=gpt

The gem will generate two YAML files with test data: spec/mocks/user.yml and spec/mocks/address.yml. You can now get the values similar you call attributes on the model:

Mokkku::Address.country # => "France"
Mokkku::Address.city # => "Paris"

By default, you are locked in the context, so each value is related to other attributes from the same model to simulate real-world scenarios. You can reset the context by calling Mokkku::Address.reset_context!

Usage without the API key

You can still use the gem if you don’t have a paid account from Open AI, Google, or Anthropic. Just call the command without providing the parameters for the LLM model and API key:

mokkku --models=User,Address

The gem will print the prompt for you to use it immediately. Copy the generated YAML code and manually create files in the spec/mocks directory.

Usage with FactoryBot

I often use FFaker with FactoryBot so if you like this combination as well, you can easily similarly use Mokkku:

FactoryBot.define do
  factory :address do
    country { Mokkku::Address.country }
    city { Mokkku::Address.city }
  End

  after(:build) { |_| Mokkku::Address.reset_context! }
end

Remember to reset the context after building so you can use the next set of values for the next record.

Other useful options

By default, for each model, 10 test sets of values are created in the English language, but you can easily change that by using --count and --language params.

The full explanation of possible configuration options is available on the gem repository https://github.com/webamm-org/mokkku