Digital Itineraries

Home Sources Compare About ☰

These digitised medieval rulers itineraries were made by scanning out of copyright texts.

The purpose of this project is provide greater access to these out of copyright documents.

Question marks in the location data is taken verbatim from the sources.

This is the Python code that was used to scan the documents

From https://nanonets.com/blog/ocr-with-tesseract/




            # For Windows users
                import pytesseract
                import subprocess
                
                pytesseract.pytesseract.tesseract_cmd = r'pathTo\Tesseract-OCR\tesseract'
                
                # Output in console
                print(pytesseract.image_to_string(r'pathTo\henry1(1132-1135).png'))
                
                
                # For Mac users
    import pytesseract
    import subprocess
    
    # Output to console
    print(pytesseract.image_to_string(r'pathTo\Henry1.png'))

     

    # Write to file
        newFileEscape = newFile.replace("’", "'")
        splitLines = newFileEscape.splitlines()
        
        file = open("image1.txt", "w")
        str_dictionary = repr(newFileEscape)
        file.write(str_dictionary)
        file.close()