FFaker is a great library. I have used it for many years in every Rails project I have worked on to produce test data for unit tests and fill the development and staging environments with production-like records. However, there are a few things that could be improved.
The problems
FFaker provides a very general set of values and combinations to make it easy to use in most projects. We have different types of values and translations for them. Yet, there is a space where more than this library is needed.
Lack of locked context
Imagine you have the Address record with the city and country columns. You create test data using FFaker::Address.city
and FFaker::Address.country
and get a similar set of values:
#<Address city: "Berlin", country: "France">
You can probably accept this for unit tests, but it makes no sense when you present such data for the staging or local environment. There are more situations like this, for example, when you get a different currency that is used in the given country, and so on. We need a consistent context in the boundaries of a single record.
The limited set of values
The library supports a limited set of values we can generate for specific fields. That’s understandable, as there are probably unlimited variations of columns we can use in our application. Because of that, we might end up with a random string for less common columns or values that violate the validation rules we have in place.
The limited number of supported languages
Maintainers and various contributors did a great job adding more supported languages, but we still need to include many possible combinations, even for already supported languages.
The solution
Let’s use LLMs like GPT, Gemini, or Claude Sonnet. They are large language models, which means they are very good at generating sample values of any type of column we can create in any language we want.
To verify this idea, I created a library called Mokkku. The concept behind this gem is simple: you provide the model name for which you want to generate test data, and the gem collects the model’s columns and prepares the prompt. The LLM of your choice generates the YAML file with sample values. You get real-world values in the language of your choice and within the same context.
The test drive
Let’s start by adding the gem to the project:
bundle add mokkku
Currently, three LLMs are supported: GPT, Gemini, and Claude Sonnet. You can either provide the API key and benefit from full automation or just get the prompt, paste it to the chat window, and then copy the YAML values.
Given that you want to use GPT via API to generate test data for User
and Address
models:
mokkku –models=User,Address –api-key=value –llm-model=gpt
The gem will generate two YAML files with test data: spec/mocks/user.yml
and spec/mocks/address.yml
. You can now get the values similar you call attributes on the model:
Mokkku::Address.country # => "France"
Mokkku::Address.city # => "Paris"
By default, you are locked in the context, so each value is related to other attributes from the same model to simulate real-world scenarios. You can reset the context by calling Mokkku::Address.reset_context!
Usage without the API key
You can still use the gem if you don’t have a paid account from Open AI, Google, or Anthropic. Just call the command without providing the parameters for the LLM model and API key:
mokkku --models=User,Address
The gem will print the prompt for you to use it immediately. Copy the generated YAML code and manually create files in the spec/mocks
directory.
Usage with FactoryBot
I often use FFaker with FactoryBot so if you like this combination as well, you can easily similarly use Mokkku:
FactoryBot.define do
factory :address do
country { Mokkku::Address.country }
city { Mokkku::Address.city }
End
after(:build) { |_| Mokkku::Address.reset_context! }
end
Remember to reset the context after building so you can use the next set of values for the next record.
Other useful options
By default, for each model, 10 test sets of values are created in the English language, but you can easily change that by using --count
and --language
params.
The full explanation of possible configuration options is available on the gem repository https://github.com/webamm-org/mokkku