Faker - mock data for testing

Faker is a great Python package for generating mock test data. It’s really quick and easy to setup and use, and incredibly flexible.

The first release was in 2010 with 444 since then - the latest (37.8.0) just today! There are 577 contributors listed.

Using it couldn’t be easier - all that’s needed is an instance of the Faker class:

from faker import Faker

fake = Faker()

This provides access to heaps of bundled providers (you can also create your own or download community providers).

Here are a couple of examples:

fake.name()

'Curtis Clark'

fake.address()

'Unit 7092 Box 7284\nDPO AP 52465'

You can check which providers are available:

for p in fake.providers:
    print(type(p).__module__)

faker.providers.user_agent
faker.providers.ssn.en_US
faker.providers.sbn
faker.providers.python
faker.providers.profile
faker.providers.phone_number.en_US
faker.providers.person.en_US
faker.providers.passport.en_US
faker.providers.misc.en_US
faker.providers.lorem.en_US
faker.providers.job.en_US
faker.providers.isbn.en_US
faker.providers.internet.en_US
faker.providers.geo.en_US
faker.providers.file
faker.providers.emoji
faker.providers.doi
faker.providers.date_time.en_US
faker.providers.currency.en_US
faker.providers.credit_card.en_US
faker.providers.company.en_US
faker.providers.color.en_US
faker.providers.barcode.en_US
faker.providers.bank.en_GB
faker.providers.automotive.en_US
faker.providers.address.en_US

Faker has locale-specific providers, which is awesome for generating test data relevant to wherever you are. It can even handle multiple locales.

To check which locale is being used (in this case the default en_US):

fake.locales

['en_US']

Not all locales have the same options, so you might see an error like this if you try to use something that isn’t available for that locale:

try:
    fake.area_code()
except Exception as e:
    print(type(e).__name__, e)

AttributeError 'Generator' object has no attribute 'area_code'

The unavailable options will still appear in IntelliSense - just something to be aware of.

I’m in New Zealand, so I’ll use the en_NZ locale.

area_code is available in the en_NZ phone number provider.

You can instantiate the Faker class using the locale you want to use:

fake_nz = Faker('en_NZ')

fake_nz.locales

['en_NZ']

fake_nz.area_code()

'22'

You can also use multiple locales:

fake_multi = Faker(['en_GB', 'es_ES'])

fake_multi.locales

['en_GB', 'es_ES']

This will produce a random mix:

for i in range(0,10):
    print(fake_multi.name())

Severiano Hervás
Demetrio Meléndez Flores
Borja Giralt
Bethany Rowley
Mr Max Thomas
David Isern Morales
Roberta Guardia-Antón
Cándido Aguilar
Luisa Calderon Toro
Dr Pauline Wright

At this point I got a bit excited and wrote some code to create a dictionary with all the providers and options available for the instantiated Faker class - dependant on Faker version and locale(s).

from faker.providers import BaseProvider
from inspect import getfullargspec
from faker.exceptions import UnsupportedFeature

# new faker obj (default locale)
fake = Faker()

# get keys for BaseProvider class
base_keys = set(BaseProvider.__dict__.keys())

# empty dict for storing providers and options
options = {}

# iterate through all available providers
for prov in fake.providers:
    # get provider options that aren't inherited from BaseProvider
    prov_opts = set(prov.__dir__()) ^ base_keys
    # sort prov_opts and remove any private names (e.g. __dir__)
    prov_opts = sorted([x for x in prov_opts if x[0] != '_'])
    # get provider name and add nested dict for options
    prov_name = type(prov).__module__
    options[prov_name] = {}
    # add provider and option names/values to dict
    for po in prov_opts:
        # empty dict for option info
        po_info = {
            'type': None,
            'docs': None,
            'example': None
        }
        # try to get actual object from name
        attr = getattr(prov, po)
        # if obj is a method, add to options
        if hasattr(attr, '__self__') and attr.__self__ is not None:
            arg_count = len(getfullargspec(attr).args or []) - 1
            default_count = len(getfullargspec(attr).defaults or [])
            # only include options that have no or default input args
            if arg_count == default_count:
                # try to get example data (exclude any that are too long)
                exclude = [
                    ('faker.providers.misc.en_US', 'binary'),
                    ('faker.providers.lorem.en_US', 'get_words_list'),
                    ('faker.providers.misc.en_US', 'tar'),
                    ('faker.providers.misc.en_US', 'zip')
                ]
                if (prov_name, po) in exclude:
                    po_info['example'] = 'No example (output too long)'
                else:
                    try:
                        result = attr()
                    except UnsupportedFeature:
                        # fails if a module can't be found
                        po_info['example'] = 'No example (module not found)'
                    except:
                        raise
                    else:
                        # don't include very long examples
                        if 'len' in type(result).__dict__.keys():
                            if len(result) > 1500:
                                po_info['example'] = 'No example (output too long)'
                            else:
                                po_info['example'] = result
                                po_info['type'] = type(result).__name__
                        else:
                            po_info['example'] = result
                            po_info['type'] = type(result).__name__
            else:
                po_info['example'] = 'No example (method has non-default args)'  
            # add info to options dict
            po_info['docs'] = attr.__doc__
            options[prov_name][po] = po_info

The dictionary is pretty huge, so you can dump it to a text file (here’s my example):

import pprint

with open('faker-options.txt', 'w', encoding='utf8') as fh:
    pprint.pprint(options, fh, indent=4, width=85)

And finally - this code converts the dictionary into a markdown file that’s a little easier to read. Check out my example.

from datetime import datetime

with open('faker-options.md', 'w', encoding='utf8') as fh:
    fh.write('# Faker Options\n\n')
    fh.write(f'__Generated__: {datetime.now().isoformat()}\n\n')
    fh.write(f'__Locales__: {str(fake.locales)}\n\n')
    for prov, opts in options.items():
        fh.write(f'## {str(prov)[16:]}\n\n')
        for opt, info in opts.items():
            fh.write(f'### {str(opt)}\n\n')
            fh.write(f'__Type__: `{str(info['type'])}`\n\n')
            fh.write(f'__Docs__:\n\n```\n{str(info['docs'])}\n```\n\n')
            fh.write(f'__Example__:\n\n```\n{pprint.pformat(info['example'], indent=4, width=85)}\n```\n\n')

Banner image by Freepik