Archivi tag: Drupal

Drupal: how to delete Fields with a lot of data inside

The problem:

in my Drupal Production instance, i need to remove some Fields. They are no more used but they are full of data already. in their DB tables i have millions of records and many GB occupied. What is the best way to clean my instance and save a lot of space?

The Solution:

The easiest thing you can do, is to go to Drupal > Administration Pages > Content Types > Manage Fields and remove the Field. I did it. The page start to load, but never finish.

What happens inside Drupal? Fortunately, nothing really dangerous, but it was clear to me that this wasn’t the best way to perform my task.

So i decided to understand better how Drupal purge massive data.

Essentially, Drupal has two phases of deletion:

  1. as you remove a node, or a field or something else (using Administration Pages or Drupal core functions), it marks the element as “deleted”. In my case, if the name of the Field was “myfield”, in the table field_data_field_myfield, the DB field “deleted” is set to 1.
  2. a cron job (“Purges deleted Field API data” by Field Module) has the responsability to phisically delete record mark as deleted.

It’s better to manage each of these phases, in this way:

  1. The first phase has nothing particular complicated, but if it has to mark a lot of records as deleted, it can run in several seconds or minutes. so, it’s not a good idea to launch it from a web page. it’s better to create a cron job that programmatically remove elements, and tune it depending on your system performance (1000 elements per minute, or 10000 elements per minute, etc.)
  2. the cron job is more complicated and there are different changes to do:
    1. in modules/field/field.module, you can find the hook_cron implementation. as you can see in the code, there is a $limit variable that is set to 10 by default. so, this job runs every five minutes and manage 10 ELEMENTS. If you have to delete millions of elements, you will finish in some years… I’ve changed the value in 1000 or 10000.
    2. the cron job uses field_config and field_config_instance tables to decide what it has to delete. if the Field that you want to delete it’s not present in ALL these tables, it doesn’t consider your Field, even if in the Field table there are elements marked as deleted. So, check these tables.

If you have any questions, I’m happy to answer you!