DOCUMENTATION
  • Welcome
  • Releases
    • 2025
      • Release v6.20.2 (2025-05-07)
      • Release v6.20.1 (2025-05-06)
      • Release v6.20.0 (2025-04-30)
      • Release v6.19.2 (2025-04-11)
      • Release v6.19.1 (2025-03-31)
      • Release v6.19.0 (2025-03-27)
      • Release v6.18.2 (2025-03-11)
      • Release v6.18.1 (2025-03-07)
      • Release v6.18.0 (2025-02-26)
      • Release v6.17.3 (2025-02-14)
      • Release v6.17.2 (2025-02-07)
      • Release v6.17.1 (2025-02-06)
      • Release v6.17.0 (2025-01-30)
    • 2024
      • Release v6.16.0 (2024-12-12)
      • Release v6.15.0 (2024-11-27)
      • Release v6.14.2 (2024-11-05)
      • Release v6.14.1 (2024-11-01)
      • Release v6.14.0 (2024-10-31)
      • Release v6.13.3 (2024-10-16)
      • Release v6.13.2 (2024-10-10)
      • Release v6.13.1 (2024-10-02)
      • Release v6.13.0 (2024-09-25)
      • Release v6.12.2 (2024-09-18)
      • Release v6.12.1 (2024-08-01)
      • Release v6.12.0 (2024-07-25)
      • Release v6.11.5 (2024-07-09)
      • Release v6.11.4 (2024-07-05)
      • Release v6.11.3 (2024-07-03)
      • Release v6.11.2 (2024-06-21)
      • Release v6.11.1 (2024-06-14)
      • Release v6.11.0 (2024-06-05)
      • Release v6.10.2 (2024-05-15)
      • Release v6.10.1 (2024-05-08)
      • Release v6.10.0 (2024-04-30)
      • Release v6.9.3 (2024-03-19)
      • Release v6.9.2 (2024-03-15)
      • Release v6.9.1 (2024-03-06)
      • Release v6.9.0 (2024-02-28)
      • Release v6.8.5 (2024-02-02)
      • Release v6.8.4 (2024-02-01)
      • Release v6.8.3 (2024-01-12)
      • Release v6.8.2 (2024-01-05)
    • 2023
      • Release v6.8.1 (2023-12-22)
      • Release v6.8.0 (2023-12-14)
      • Release v6.7.4 (2023-11-15)
      • Release v6.7.3 (2023-11-14)
      • Release v6.7.2 (2023-11-03)
      • Release v6.7.1 (2023-10-17)
      • Release v6.7.0 (2023-10-13)
      • Release v6.6.4 (2023-09-29)
      • Release v6.6.3 (2023-09-28)
      • Release 6.6.2 (2023-09-14)
      • Release v6.6.1 (2023-08-10)
      • Release v6.6.0 (2023-08-03)
      • Release v6.5.1 (2023-06-23)
      • Release v6.5.0 (2023-06-22)
      • Release v6.4.0 (2023-05-31)
      • Release v6.3.1 (2023-04-28)
      • Release v6.3.0 (2023-04-05)
      • Release v6.2.5 (2023-03-16)
      • Release v6.2.4 (2023-02-01)
      • Release v6.2.3 (2023-01-12)
      • Release v6.2.2 (2023-01-12)
      • Release v6.2.1 (2023-01-05)
    • 2022
      • fylr first Production Ready Release 🎉 (2022-12-22)
  • License
  • Help
    • FAQs
    • Tutorials
      • For Users
      • For Administrators
        • Exporting & Importing Hierarchical Lists
        • Regenerating preview images
        • Search Text in images or office files
      • For System Administrators
        • How to setup and use IIIF
        • External access: Sharing collections with anonymous users
    • Glossary
  • FOR USERS
    • Getting Started
    • Asset / Records Management
      • Creating Records
      • Editing Records
        • Input Fields
        • Group Editor
      • Deleting Records
    • Quick Access
      • Collections (& Presentations)
      • Saved Searches (& Lists)
    • Lists
    • Plugins
      • Plugin Overview
  • FOR ADMINISTRATORS
    • Permissions
      • User
      • Groups
      • Object Types
      • Pools
      • Tags & Workflows
      • Presets
    • Tools
      • CSV Importer
        • General Information
        • Options
        • Examples
          • All Data Types
          • Lists
          • Hierarchies
          • Files
      • JSON Importer
        • Step-by-Step Tutorial
          • Write Import Manifest
          • Create Basetype Payloads
          • Create Object Payloads
          • Collection Payloads
          • Optional: Update links between Objects
          • Start Import
      • Permissions Download & Upload
    • Base Configuration
      • General
      • Access
      • User Management
      • Languages
      • Email
      • Export & Deep Links
      • Workflow Webhooks
      • Publications
      • File Worker
        • Preview Configuration
        • Location Defaults
        • Custom .icc Color Profiles
      • Objectstore
      • Services
      • License Management
      • Development
      • Plugins
    • Plugin Manager
    • Location Manager
    • Messages
    • Events
    • Backup Manager
    • Additional Features
      • IIIF
      • Connector
      • Wordpress
      • Zooniverse
      • Protocols
        • OAI/PMH
  • FOR SYSTEM ADMINISTRATORS
    • Installation
      • Linux
        • multiple fylrs in one Linux
        • proxy and fylr
      • Windows
      • Kubernetes
    • Configuration
      • fylr.example.yml
      • fylr.default.yml
      • performance tuning
      • pre-load frontend config
      • Load Custom Plugins
      • HTTP and HTTPS
      • DNS Domains
    • Backups & Restore
    • Migration Tool
      • Create payloads (fylr backup)
      • Insert payloads (fylr restore)
      • Best Practice
      • Using the fylr inspect page
    • Integration
      • Authentication
      • Hotfolder
    • Symptom & Solution
      • Log messages that can be ignored
      • too many clients are connected
      • too many nested clauses
      • context canceled
      • ContainerConfig error
      • Purge objects
    • PostgreSQL versions
  • Tutorials
    • Project Workflow
    • Hotfolder & File System Connect
      • Preparations Before Usage
      • Setting Up An Upload Collection
      • Importing Files
    • PDF Creator
    • Extracting File Metadata Later On
    • Overlay Resource
    • Authentication
      • LDAP
      • SAML
    • Data Model Sync
    • Purge a fylr instance
    • typo3 plugin
    • Use fylr in Google docs via CI HUB
  • FOR DEVELOPERS
    • API
      • OAuth2
      • Endpoints
        • /api/collection
        • /api/config
        • /api/db_info
        • /api/db
        • /api/eas
        • /api/event
        • /api/export
        • /api/group
        • /api/l10n
        • /api/mask
        • /api/message
        • /api/oaipmh
        • /api/objects
        • /api/objecttype
        • /api/plugin
        • /api/pool
        • /api/publish
        • /api/right
        • /api/schema
        • /api/search
        • /api/settings
        • /api/suggest
        • /api/system
        • /api/tags
        • /api/transitions
        • /api/user
        • /api/webdav
        • /api/xmlmapping
        • /api/task
    • System Data Types
      • pool
      • file
      • user
      • group
      • pool
      • collection
      • message
      • publish
      • event
    • User Data Types
      • text, text_oneline
      • string
      • text_l10n, text_l10n_oneline
      • boolean
      • number
      • integer.2
      • double
      • date, datetime
      • daterange
      • geojson
    • Custom Data
    • Emails
    • Export
    • Exec server
    • File versions
    • WebDAV
    • Plugin
    • Collection Pin Code
    • easydb 5
    • Localization
    • Access private Repositories
Powered by GitBook
On this page
  • API
  • Index
  • Sorting
  • Export
  1. FOR DEVELOPERS
  2. User Data Types

text, text_oneline

PreviousUser Data TypesNextstring

Last updated 5 months ago

text and text_oneline store characters in one language. In the server, these types are treated the same, text_oneline is only a hint for frontends on how to display the data. So it is possible (over the API) to use newlines in a text_online column.

Text are stored as received, so a leading or trailing space is preserved. Frontends are requested to trim texts before sending them to the API.

Text must be encoded in and stores in all . There is not limit on the length of storable text.

API

Text looks like this when send and received over the API:

{
    "text": "this is the text"
}

The above example has the value this is the text for column text.

Index

Normalization is performed as part of the indexer documentation creation where all text is run through a icu_normalizer.

In the indexer, text is stored using a custom analyzer icu_text which works as follows:

{
  "analysis": {
   "icu_text": {
     "type": "custom",
     "tokenizer": "custom_icu_tokenizer",
     "filter": [
       "icu_folding"
     ],
     "char_filter": [
       "icu_normalizer"
     ]
  },
  "tokenizer": {
    "custom_icu_tokenizer": {
      "type": "pattern",
      "pattern": "([[^\\p{L}\\p{Digit}\\uD800-\\uDFFF\\u2F00-\\u2FDF]&&[^&%§\\$€]])"
    }
  }
}

What gets included in tokens:

  • All alphabetic letters from any language.

  • All numeric digits.

  • Characters in the Unicode surrogate pair range and Kangxi Radicals.

  • Symbols: &, %, §, $, and €.

What causes token separation:

  • Punctuation marks (except the specified symbols).

  • Whitespace characters.

  • Other symbols and control characters not specified.

Using the API, searches for text can be performed either on the analysed value (matching the terms), or on the unanalysed value, which is stored alongside with all terms. The unanalysed value stores the data as is. There is no normalisation taking place.

All text for indexed documents is split into chunks of 8000 UTF-8 characters. When matching full texts in analysed form, text cannot easily be matched if they exceed 8000 characters.

Sorting

Export

Text is exported as is, keeping spaces & normalisation.

The XML export looks like this for a column names title and a value Title for type text_oneline. The column-api-id in this example 29.

<title type="text_oneline" column-api-id="29">Title</title>

For type text type is text.

Output in CSV is as is, same for JSON.

Text is normalized using the and tokenized into tokens using the above pattern.

Tokens are then turned into terms using the token filter. The filter removes all punctuation and turns all characters into lower case. So the token Bär is stored as bar.

Sort strings are compiled using the Go library . It uses the first configured database language as assumption in what language the text is in. Numbers are recognised so that Car 100 sorts after Car 11. Text is normalised by the collate library. Internally we use the hex representation of that string to work around anomalies in Elasticsearch. Some special replacement is .

UTF-8
normalization forms
icu_normalizer
icu_folding
collate
always done