Keep in mind a fellow researcher: someone who knows your area of expertise, but not necessarily as well as you do. Are your files understandable as they are, or do they need cleaning enhancement? Keep the right balance: make sure the metadata is sufficient for reuse, but do not get carried away and spend valuable time on minor issues. Enough is enough.


Start creating a codebook for your spreadsheets, databases, etc.

For spreadsheets, databases, and other complex data files (for instance SPSS), you need to create a codebook. The codebook adds information that makes your data understandable and usable to other people than yourself. Creation of codebooks can start well in advance of the archiving, but preferably when you have finalized your data to save yourself spending too much time updating the codebook whenever you change the structure of the data file.

Codebooks should be of the following format:

  • For MS Access or MS Excel files, for each table/worksheet in the file, add a table “codebook_[table name]”
  • For SPSS, use the codebook function.
  • For other programs, create a separate MS Excel file called “[database file name]_codebook.xlsx.”

Codebooks should contain the following information:





Name of a variable

Its description

List of possible values, if applicable (only when it is a class variable)

The scale used, if applicable (only if it is a numeric variable and there is a particular scale)

Intellectual property

It is important to determine whether the data you reuse is subject to intellectual property rights. Generally, research data are free from intellectual property restrictions. However, there are many exceptions, and the actual situation is not always clear. The rightsholder must be asked for permission for the reuse of the dataset.

If you obtained data from another party, first check any licenses, stating what is allowed. If there is no explicit license and you intend to publish data or share them outside the research project, you have to check copyright law and database law.

Legislation is not always straightforward and copyright law and database law can be conflicting. The law is applicable for the country where the data was created, so it can differ among countries, but in the following cases you will probably need to ask for permission:

  • The data possesses its own original character and bears the personal stamp of the author (e.g. creative wording or drawing, subjective choices in processing, specific formatting) AND the reuse does not take the form of a “citation”.
    • Be especially cautious when you reuse photographic material. Photographs are usually subject to copyright because of the artistic choices involved. You should be especially sensitive when photographs come from heritage institutions, even without specific licenses or if you have taken them yourself.
  • The data is a collection of independent materials that have been arranged systematically AND there has been a “substantial investment” in obtaining, verifying, or presenting the materials AND you reuse a “substantial portion” of the said data.

Finally, to stress the obvious: whatever the legal situation, it is a scholarly best practice to inform and cite the source of your data.

