diff options
author | Mavridis Philippe <mavridisf@gmail.com> | 2022-02-05 17:44:26 +0000 |
---|---|---|
committer | Slávek Banko <slavek.banko@axis.cz> | 2022-04-17 18:35:01 +0200 |
commit | 95279fbf6dfeb43d80590740a9259d7caa614177 (patch) | |
tree | 8c3ff4de77102f1e55357dd81d650e9f22be69a0 /tdemarkdown | |
parent | a291f3a0a3fab073c009f77a36745c5c7bd48e9b (diff) | |
download | tdelibs-95279fbf6dfeb43d80590740a9259d7caa614177.tar.gz tdelibs-95279fbf6dfeb43d80590740a9259d7caa614177.zip |
Add tdemarkdown part - embeddable lightweight markdown viewing component.
TDEMarkdown is based on the md4c library and using TDEHTML for rendering
its output. For enhanced safety, on HTML widget is turned off everything
we don't need for viewing. It integrates nicely into Konqueror and
supports both Commonmark and GitHub markdown syntaxes.
Signed-off-by: Mavridis Philippe <mavridisf@gmail.com>
Prepare to merge tdemarkdown into tdelibs.
Signed-off-by: Slávek Banko <slavek.banko@axis.cz>
Diffstat (limited to 'tdemarkdown')
54 files changed, 29997 insertions, 0 deletions
diff --git a/tdemarkdown/CMakeL10n.txt b/tdemarkdown/CMakeL10n.txt new file mode 100644 index 000000000..89b0e07f9 --- /dev/null +++ b/tdemarkdown/CMakeL10n.txt @@ -0,0 +1,8 @@ +##### create translation templates ############## + +tde_l10n_create_template( + CATALOG "desktop_files/tdemarkdown-desktops/" + SOURCES *.desktop + EXCLUDES "^md4c/" + DESTINATION "${CMAKE_SOURCE_DIR}/translations" +) diff --git a/tdemarkdown/CMakeLists.txt b/tdemarkdown/CMakeLists.txt new file mode 100644 index 000000000..38df3b6e0 --- /dev/null +++ b/tdemarkdown/CMakeLists.txt @@ -0,0 +1,67 @@ +########################################### +# # +# Copyright (C) 2022 Mavridis Philippe # +# <mavridisf@gmail.com> # +# # +# Improvements and feedback are welcome # +# # +# Released under GNU GPL v2 or greater. # +# # +########################################### + +### Header and library directories ######## +# +include_directories( + ${CMAKE_SOURCE_DIR} + ${CMAKE_BINARY_DIR} + ${CMAKE_SOURCE_DIR}/tdecore + ${CMAKE_BINARY_DIR}/tdecore + ${CMAKE_SOURCE_DIR}/tdehtml + ${CMAKE_SOURCE_DIR}/tdeui + ${CMAKE_SOURCE_DIR}/tdeutils + ${CMAKE_CURRENT_SOURCE_DIR} + ${CMAKE_CURRENT_BINARY_DIR} + ${CMAKE_CURRENT_SOURCE_DIR}/md4c/src +) + +include_directories( SYSTEM + ${TQT_INCLUDE_DIRS} + ${TDE_INCLUDE_DIR} +) + +link_directories( + ${TQT_LIBRARY_DIRS} + ${TDE_LIB_DIR} +) + + +### Build libtdemarkdown (kpart) ########### +# +tde_add_kpart( libtdemarkdown + AUTOMOC + SOURCES + ${CMAKE_CURRENT_SOURCE_DIR}/md4c/src/entity.c + ${CMAKE_CURRENT_SOURCE_DIR}/md4c/src/md4c.c + ${CMAKE_CURRENT_SOURCE_DIR}/md4c/src/md4c-html.c + markdown_part.cpp + LINK + tdeparts-shared tdehtml-shared + DESTINATION ${PLUGIN_INSTALL_DIR} +) + + +### Install part .desktop entry ############# +# +tde_create_translated_desktop( + SOURCE markdown_part.desktop + DESTINATION ${SERVICES_INSTALL_DIR} + PO_DIR tdemarkdown-desktops +) + + +### Install XML-GUI ######################### +# +install( + FILES markdown_part.rc + DESTINATION ${DATA_INSTALL_DIR}/tdemarkdown +) diff --git a/tdemarkdown/LICENSE b/tdemarkdown/LICENSE new file mode 100644 index 000000000..17cb28643 --- /dev/null +++ b/tdemarkdown/LICENSE @@ -0,0 +1,117 @@ +GNU GENERAL PUBLIC LICENSE +Version 2, June 1991 + +Copyright (C) 1989, 1991 Free Software Foundation, Inc. +51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA + +Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. + +Preamble + +The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. + +When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. + +To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. + +For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. + +We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. + +Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. + +Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. + +The precise terms and conditions for copying, distribution and modification follow. + +TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + +0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. + +1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. + +You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. + +2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. + + c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. + +3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. + +If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. + +4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. + +5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. + +6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. + +7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. + +It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. + +This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. + +8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. + +9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. + +Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. + +10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. + +NO WARRANTY + +11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + +12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. + +END OF TERMS AND CONDITIONS + +How to Apply These Terms to Your New Programs + +If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. + +To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. + + one line to give the program's name and an idea of what it does. Copyright (C) yyyy name of author + + This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. + + This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. + + You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. + +signature of Ty Coon, 1 April 1989 Ty Coon, President of Vice diff --git a/tdemarkdown/README.md b/tdemarkdown/README.md new file mode 100644 index 000000000..0b6e811bf --- /dev/null +++ b/tdemarkdown/README.md @@ -0,0 +1,14 @@ +# TDEMarkdown + +TDEMarkdown is a lightweight markdown viewer for Trinity based on the [md4c](https://www.github.com/mity/md4c) library and using TDEHTML for rendering its output. + +It integrates nicely into Konqueror and supports both Commonmark and GitHub markdown syntaxes. + + +## Contributing + +If you wish to contribute to TDEMarkdown: + +- You can **report a bug, request a feature or contribute code** via the [TDE Gitea Workspace (TGW)](https://mirror.git.trinitydesktop.org/gitea) + +- You can **contribute translations** via the [TDE Weblate Translation Workspace (TWTW)](https://mirror.git.trinitydesktop.org/weblate) diff --git a/tdemarkdown/markdown_part.cpp b/tdemarkdown/markdown_part.cpp new file mode 100644 index 000000000..d4b7fb810 --- /dev/null +++ b/tdemarkdown/markdown_part.cpp @@ -0,0 +1,133 @@ +/*************************************************************************** + * Markdown Viewer part * + * Copyright (c) 2022 Mavridis Philippe <mavridisf@gmail.com> * + * * + * This program is free software; you can redistribute it and/or modify * + * it under the terms of the GNU General Public License as published by * + * the Free Software Foundation; either version 2 of the License, or * + * (at your option) any later version. * + ***************************************************************************/ + + +#include <tqbuffer.h> +#include <tqfile.h> + +#include <tdeparts/genericfactory.h> +#include <kstandarddirs.h> + +#include <tdehtmlview.h> + +/* MD4C-HTML */ +#include <md4c-html.h> + +#include "markdown_part.h" + + +typedef KParts::GenericFactory<MarkdownPart> Factory; +K_EXPORT_COMPONENT_FACTORY(libtdemarkdown, Factory) + +MarkdownPart::MarkdownPart(TQWidget* parentWidget, const char* widgetName, + TQObject* parent, const char* name, const TQStringList& args) + : TDEHTMLPart(parentWidget, name = "TDEMarkdown") +{ + setInstance(Factory::instance()); + + /* Features */ + setJScriptEnabled(false); + setJavaEnabled(false); + setMetaRefreshEnabled(false); + setPluginsEnabled(false); + setAutoloadImages(true); + setXMLFile( locate("data", "libtdemarkdown/markdown_part.rc") ); +} + +MarkdownPart::~MarkdownPart() +{ +} + +TDEAboutData* MarkdownPart::createAboutData() +{ + TDEAboutData* aboutData = new TDEAboutData( + "tdemarkdown", I18N_NOOP("TDE Markdown Viewer"), "1.0", + I18N_NOOP("TDEMarkdown is an embeddable viewer for Markdown documents."), + TDEAboutData::License_GPL_V2, "© 2022 Mavridis Philippe" + ); + aboutData->addAuthor("Mavridis Philippe (blu.256)", I18N_NOOP("Developer"), "mavridisf@gmail.com"); + return aboutData; +} + +bool MarkdownPart::openURL(const KURL& u) +{ + if(u.isLocalFile()) + { + TQFile local(u.path()); + + if(!local.open(IO_ReadOnly)) + { + return false; + } + + TQByteArray data = local.readAll(); + + local.close(); + + if(!data.isNull()) + { + begin(u); + TQString parsed(parse((MD_CHAR*) data.data())); + write(parsed); + end(); + } + } + + emit started(0L); + return true; +} + +TQString& MarkdownPart::parse(MD_CHAR* document) +{ + m_buffer = "<!DOCTYPE html>\n"; + m_buffer += "<html>\n"; + m_buffer += " <head>\n"; + m_buffer += " <meta charset='utf-8'>\n"; + m_buffer += " <title>TODO</title>\n"; + m_buffer += " </head>\n"; + m_buffer += " <body>\n"; + + TQByteArray data; + int success = md_html(document, + MD_SIZE(strlen(document)), + &MarkdownPart::processHTML, + &data, + MD_DIALECT_GITHUB | MD_FLAG_PERMISSIVEURLAUTOLINKS | MD_FLAG_PERMISSIVEEMAILAUTOLINKS | MD_FLAG_PERMISSIVEWWWAUTOLINKS + | MD_FLAG_LATEXMATHSPANS | MD_FLAG_PERMISSIVEATXHEADERS | MD_FLAG_UNDERLINE | MD_FLAG_TASKLISTS, + 0); + + if (success == -1) + { + m_buffer += TQString("<b>%1</b>").arg(i18n("Error: malformed document.")); + } + else + { + m_buffer += TQString::fromLocal8Bit(data); + } + + m_buffer += " </body>\n"; + m_buffer += "</html>\n"; + return m_buffer; +} + +void MarkdownPart::processHTML(const MD_CHAR* data, MD_SIZE data_size, void* user_data) +{ + TQByteArray *ud = static_cast<TQByteArray*>(user_data); + TQBuffer buff(*ud); + + if (data_size > 0) + { + buff.open(IO_WriteOnly | IO_Append); + buff.writeBlock(data, (int)data_size); + buff.close(); + } +} + +#include "markdown_part.moc" diff --git a/tdemarkdown/markdown_part.desktop b/tdemarkdown/markdown_part.desktop new file mode 100644 index 000000000..47cff4b03 --- /dev/null +++ b/tdemarkdown/markdown_part.desktop @@ -0,0 +1,10 @@ +[Desktop Entry] +Name=Markdown Viewer + +Comment=Embeddable lightweight markdown viewing component + +Type=Service +Icon=text-x-generic +MimeType=text/markdown +X-TDE-Library=libtdemarkdown +X-TDE-ServiceTypes=KParts/ReadOnlyPart diff --git a/tdemarkdown/markdown_part.h b/tdemarkdown/markdown_part.h new file mode 100644 index 000000000..0131553d3 --- /dev/null +++ b/tdemarkdown/markdown_part.h @@ -0,0 +1,47 @@ +/*************************************************************************** + * Markdown Viewer part * + * Copyright (c) 2022 Mavridis Philippe <mavridisf@gmail.com> * + * * + * This program is free software; you can redistribute it and/or modify * + * it under the terms of the GNU General Public License as published by * + * the Free Software Foundation; either version 2 of the License, or * + * (at your option) any later version. * + ***************************************************************************/ + + +#ifndef __MARKDOWN_PART_H +#define __MARKDOWN_PART_H + +#include <tqwidget.h> + +#include <tdehtml_part.h> + +class TDEHTMLPart; + +class MarkdownPart : public TDEHTMLPart +{ + TQ_OBJECT + + public: + MarkdownPart(TQWidget* parentWidget, const char* widgetName, + TQObject* parent, const char* name, const TQStringList& args); + ~MarkdownPart(); + + /* Create and return About data */ + static TDEAboutData* createAboutData(); + + /* Implemented virtual from TDEHTMLPart */ + bool openURL(const KURL& u); + + /* Parser */ + TQString& parse(MD_CHAR* document); + + private: + TQString m_buffer; + + static void processHTML(const MD_CHAR* data, MD_SIZE data_size, void* userData); + +}; + + +#endif // __MARKDOWN_PART_H diff --git a/tdemarkdown/markdown_part.rc b/tdemarkdown/markdown_part.rc new file mode 100644 index 000000000..437f7814d --- /dev/null +++ b/tdemarkdown/markdown_part.rc @@ -0,0 +1,4 @@ +<!DOCTYPE kpartgui SYSTEM "kpartgui.dtd"> +<kpartgui name="MarkdownPart" version="1"> +<StatusBar/> +</kpartgui>
\ No newline at end of file diff --git a/tdemarkdown/md4c/CHANGELOG.md b/tdemarkdown/md4c/CHANGELOG.md new file mode 100644 index 000000000..63a9c3233 --- /dev/null +++ b/tdemarkdown/md4c/CHANGELOG.md @@ -0,0 +1,442 @@ + +# MD4C Change Log + + +## Next Version (Work in Progress) + +Changes: + + * Changes mandated by CommonMark specification 0.30. + + Actually there are only very minor changes to recognition of HTML blocks: + + - The tag `<textarea>` now triggers HTML block (of type 1 as per the + specification). + + - HTML declaration (HTML block type 4) is not required to begin with an + upper-case ASCII character after the `<!`. Any ASCII character is now + allowed. + + Other than that, the newest specification mainly improves test coverage and + clarifies its wording in some cases, without affecting the implementation. + + Refer to [CommonMark + 0.30 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.30) + for more info. + +Fixes: + + * [#163](https://github.com/mity/md4c/issues/163): + Make HTML renderer to emit `'\n'` after the root tag when in the XHTML mode. + + * [#165](https://github.com/mity/md4c/issues/165): + Make HTML renderer not to percent-encode `'~'` in URLs. Although it does + work, it's not needed, and it can actually be confusing with URLs such as + `http://www.example.com/~johndoe/`. + + * [#167](https://github.com/mity/md4c/issues/167), + [#168](https://github.com/mity/md4c/issues/168): + Fix multiple instances of various buffer overflow bugs, found mostly using + a fuzz testing. Contributed by [dtldarek](https://github.com/dtldarek) and + [Thierry Coppey](https://github.com/TCKnet). + + * [#169](https://github.com/mity/md4c/issues/169): + Table underline now does not require 3 characters per table column anymore. + One dash (optionally with a leading or tailing `:` appended or prepended) + is now sufficient. This improves compatibility with the GFM. + + * [#172](https://github.com/mity/md4c/issues/172): + Fix quadratic time behavior caused by unnecessary lookup for link reference + definition even if the potential label contains nested brackets. + + * [#173](https://github.com/mity/md4c/issues/173), + [#174](https://github.com/mity/md4c/issues/174): + Multiple bugs identified with [OSS-Fuzz](https://github.com/google/oss-fuzz) + were fixed. + + +## Version 0.4.8 + +Fixes: + + * [#149](https://github.com/mity/md4c/issues/149): + A HTML block started in a container block (and not explicitly finished in + the block) could eat 1 line of actual contents. + + * [#150](https://github.com/mity/md4c/issues/150): + Fix md2html utility to output proper DOCTYPE and HTML tags when `--full-html` + command line options is used, accordingly to the expected output format + (HTML or XHTML). + + * [#152](https://github.com/mity/md4c/issues/152): + Suppress recognition of a permissive autolink if it would otherwise form a + complete body of an outer inline link. + + * [#153](https://github.com/mity/md4c/issues/153), + [#154](https://github.com/mity/md4c/issues/154): + Set `MD_BLOCK_UL_DETAIL::mark` and `MD_BLOCK_OL_DETAIL::mark_delimiter` + correctly, even when the blocks are nested at the same line in a complicated + ways. + + * [#155](https://github.com/mity/md4c/issues/155): + Avoid reading 1 character beyond the input size in some complex cases. + + +## Version 0.4.7 + +Changes: + + * Add `MD_TABLE_DETAIL` structure into the API. The structure describes column + count and row count of the table, and pointer to it is passed into the + application-provided block callback with the `MD_BLOCK_TABLE` block type. + +Fixes: + + * [#131](https://github.com/mity/md4c/issues/131): + Fix handling of a reference image nested in a reference link. + + * [#135](https://github.com/mity/md4c/issues/135): + Handle unmatched parenthesis pairs inside a permissive URL and WWW auto-links + in a way more compatible with the GFM. + + * [#138](https://github.com/mity/md4c/issues/138): + The tag `<tbody></tbody>` is now suppressed whenever the table has zero body + rows. + + * [#139](https://github.com/mity/md4c/issues/139): + Recognize a list item mark even when EOF follows it. + + * [#142](https://github.com/mity/md4c/issues/142): + Fix reference link definition label matching in a case when the label ends + with a Unicode character with non-trivial case folding mapping. + + +## Version 0.4.6 + +Fixes: + + * [#130](https://github.com/mity/md4c/issues/130): + Fix `ISANYOF` macro, which could provide unexpected results when encountering + zero byte in the input text; in some cases leading to broken internal state + of the parser. + + The bug could result in denial of service and possibly also to other security + implications. Applications are advised to update to 0.4.6. + + +## Version 0.4.5 + +Fixes: + + * [#118](https://github.com/mity/md4c/issues/118): + Fix HTML renderer's `MD_HTML_FLAG_VERBATIM_ENTITIES` flag, exposed in the + `md2html` utility via `--fverbatim-entities`. + + * [#124](https://github.com/mity/md4c/issues/124): + Fix handling of indentation of 16 or more spaces in the fenced code blocks. + + +## Version 0.4.4 + +Changes: + + * Make Unicode-specific code compliant to Unicode 13.0. + +New features: + + * The HTML renderer, developed originally as the heart of the `md2html` + utility, is now built as a standalone library, in order to simplify its + reuse in applications. + + * With `MD_HTML_FLAG_SKIP_UTF8_BOM`, the HTML renderer now skips UTF-8 byte + order mark (BOM) if the input begins with it, before passing to the Markdown + parser. + + `md2html` utility automatically enables the flag (unless it is custom-built + with `-DMD4C_USE_ASCII`). + + * With `MD_HTML_FLAG_XHTML`, The HTML renderer generates XHTML instead of + HTML. + + This effectively means `<br />` instead of `<br>`, `<hr />` instead of + `<hr>`, and `<img ... />` instead of `<img ...>`. + + `md2html` utility now understands the command line option `-x` or `--xhtml` + enabling the XHTML mode. + +Fixes: + + * [#113](https://github.com/mity/md4c/issues/113): + Add missing folding info data for the following Unicode characters: + `U+0184`, `U+018a`, `U+01b2`, `U+01b5`, `U+01f4`, `U+0372`, `U+038f`, + `U+1c84`, `U+1fb9`, `U+1fbb`, `U+1fd9`, `U+1fdb`, `U+1fe9`, `U+1feb`, + `U+1ff9`, `U+1ffb`, `U+2c7f`, `U+2ced`, `U+a77b`, `U+a792`, `U+a7c9`. + + Due the bug, the link definition label matching did not work in the case + insensitive way for these characters. + + +## Version 0.4.3 + +New features: + + * With `MD_FLAG_UNDERLINE`, spans enclosed in underscore (`_foo_`) are seen + as underline (`MD_SPAN_UNDERLINE`) rather than an ordinary emphasis or + strong emphasis. + +Changes: + + * The implementation of wiki-links extension (with `MD_FLAG_WIKILINKS`) has + been simplified. + + - A noticeable increase of MD4C's memory footprint introduced by the + extension implementation in 0.4.0 has been removed. + - The priority handling towards other inline elements have been unified. + (This affects an obscure case where syntax of an image was in place of + wiki-link destination made the wiki-link invalid. Now *all* inline spans + in the wiki-link destination, including the images, is suppressed.) + - The length limitation of 100 characters now always applies to wiki-link + destination. + + * Recognition of strike-through spans (with the flag `MD_FLAG_STRIKETHROUGH`) + has become much stricter and, arguably, reasonable. + + - Only single tildes (`~`) and double tildes (`~~`) are recognized as + strike-through marks. Longer ones are not anymore. + - The length of the opener and closer marks have to be the same. + - The tildes cannot open a strike-through span if a whitespace follows. + - The tildes cannot close a strike-through span if a whitespace precedes. + + This change follows the changes of behavior in cmark-gfm some time ago, so + it is also beneficial from compatibility point of view. + + * When building MD4C by hand instead of using its CMake-based build, the UTF-8 + support was by default disabled, unless explicitly asked for by defining + a preprocessor macro `MD4C_USE_UTF8`. + + This has been changed and the UTF-8 mode now becomes the default, no matter + how `md4c.c` is compiled. If you need to disable it and use the ASCII-only + mode, you have explicitly define macro `MD4C_USE_ASCII` when compiling it. + + (The CMake-based build as provided in our repository explicitly asked for + the UTF-8 support with `-DMD4C_USE_UTF8`. I.e. if you are using MD4C library + built with our vanilla `CMakeLists.txt` files, this change should not affect + you.) + +Fixes: + + * Fixed some string length handling in the special `MD4C_USE_UTF16` build. + + (This does not affect you unless you are on Windows and explicitly define + the macro when building MD4C.) + + * [#100](https://github.com/mity/md4c/issues/100): + Fixed an off-by-one error in the maximal length limit of some segments + of e-mail addresses used in autolinks. + + * [#107](https://github.com/mity/md4c/issues/107): + Fix mis-detection of asterisk-encoded emphasis in some corner cases when + length of the opener and closer differs, as in `***foo *bar baz***`. + + +## Version 0.4.2 + +Fixes: + + * [#98](https://github.com/mity/md4c/issues/98): + Fix mis-detection of asterisk-encoded emphasis in some corner cases when + length of the opener and closer differs, as in `**a *b c** d*`. + + +## Version 0.4.1 + +Unfortunately, 0.4.0 has been released with badly updated ChangeLog. Fixing +this is the only change on 0.4.1. + + +## Version 0.4.0 + +New features: + + * With `MD_FLAG_LATEXMATHSPANS`, LaTeX math spans (`$...$`) and LaTeX display + math spans (`$$...$$`) are now recognized. (Note though that the HTML + renderer outputs them verbatim in a custom `<x-equation>` tag.) + + Contributed by [Tilman Roeder](https://github.com/dyedgreen). + + * With `MD_FLAG_WIKILINKS`, Wiki-style links (`[[...]]`) are now recognized. + (Note though that the HTML renderer renders them as a custom `<x-wikilink>` + tag.) + + Contributed by [Nils Blomqvist](https://github.com/niblo). + +Changes: + + * Parsing of tables (with `MD_FLAG_TABLES`) is now closer to the way how + cmark-gfm parses tables as we do not require every row of the table to + contain a pipe `|` anymore. + + As a consequence, paragraphs now cannot interrupt tables. A paragraph which + follows the table has to be delimited with a blank line. + +Fixes: + + * [#94](https://github.com/mity/md4c/issues/94): + `md_build_ref_def_hashtable()`: Do not allocate more memory than strictly + needed. + + * [#95](https://github.com/mity/md4c/issues/95): + `md_is_container_mark()`: Ordered list mark requires at least one digit. + + * [#96](https://github.com/mity/md4c/issues/96): + Some fixes for link label comparison. + + +## Version 0.3.4 + +Changes: + + * Make Unicode-specific code compliant to Unicode 12.1. + + * Structure `MD_BLOCK_CODE_DETAIL` got new member `fenced_char`. Application + can use it to detect character used to form the block fences (`` ` `` or + `~`). In the case of indented code block, it is set to zero. + +Fixes: + + * [#77](https://github.com/mity/md4c/issues/77): + Fix maximal count of digits for numerical character references, as requested + by CommonMark specification 0.29. + + * [#78](https://github.com/mity/md4c/issues/78): + Fix link reference definition label matching for Unicode characters where + the folding mapping leads to multiple codepoints, as e.g. in `ẞ` -> `SS`. + + * [#83](https://github.com/mity/md4c/issues/83): + Fix recognition of an empty blockquote which interrupts a paragraph. + + +## Version 0.3.3 + +Changes: + + * Make permissive URL autolink and permissive WWW autolink extensions stricter. + + This brings the behavior closer to GFM and mitigates risk of false positives. + In particular, the domain has to contain at least one dot and parenthesis + can be part of the link destination only if `(` and `)` are balanced. + +Fixes: + + * [#73](https://github.com/mity/md4c/issues/73): + Some raw HTML inputs could lead to quadratic parsing times. + + * [#74](https://github.com/mity/md4c/issues/74): + Fix input leading to a crash. Found by fuzzing. + + * [#76](https://github.com/mity/md4c/issues/76): + Fix handling of parenthesis in some corner cases of permissive URL autolink + and permissive WWW autolink extensions. + + +## Version 0.3.2 + +Changes: + + * Changes mandated by CommonMark specification 0.29. + + Most importantly, the white-space trimming rules for code spans have changed. + At most one space/newline is trimmed from beginning/end of the code span + (if the code span contains some non-space contents, and if it begins and + ends with space at the same time). In all other cases the spaces in the code + span are now left intact. + + Other changes in behavior are in corner cases only. Refer to [CommonMark + 0.29 notes](https://github.com/commonmark/commonmark-spec/releases/tag/0.29) + for more info. + +Fixes: + + * [#68](https://github.com/mity/md4c/issues/68): + Some specific HTML blocks were not recognized when EOF follows without any + end-of-line character. + + * [#69](https://github.com/mity/md4c/issues/69): + Strike-through span not working correctly when its opener mark is directly + followed by other opener mark; or when other closer mark directly precedes + its closer mark. + + +## Version 0.3.1 + +Fixes: + + * [#58](https://github.com/mity/md4c/issues/58), + [#59](https://github.com/mity/md4c/issues/59), + [#60](https://github.com/mity/md4c/issues/60), + [#63](https://github.com/mity/md4c/issues/63), + [#66](https://github.com/mity/md4c/issues/66): + Some inputs could lead to quadratic parsing times. Thanks to Anders Kaseorg + for finding all those issues. + + * [#61](https://github.com/mity/md4c/issues/59): + Flag `MD_FLAG_NOHTMLSPANS` erroneously affected also recognition of + CommonMark autolinks. + + +## Version 0.3.0 + +New features: + + * Add extension for GitHub-style task lists: + + ``` + * [x] foo + * [x] bar + * [ ] baz + ``` + + (It has to be explicitly enabled with `MD_FLAG_TASKLISTS`.) + + * Added support for building as a shared library. On non-Windows platforms, + this is now default behavior; on Windows static library is still the default. + The CMake option `BUILD_SHARED_LIBS` can be used to request one or the other + explicitly. + + Contributed by Lisandro Damián Nicanor Pérez Meyer. + + * Renamed structure `MD_RENDERER` to `MD_PARSER` and refactorize its contents + a little bit. Note this is source-level incompatible and initialization code + in apps may need to be updated. + + The aim of the change is to be more friendly for long-term ABI compatibility + we shall maintain, starting with this release. + + * Added `CHANGELOG.md` (this file). + + * Make sure `md_process_table_row()` reports the same count of table cells for + all table rows, no matter how broken the input is. The cell count is derived + from table underline line. Bogus cells in other rows are silently ignored. + Missing cells in other rows are reported as empty ones. + +Fixes: + + * CID 1475544: + Calling `md_free_attribute()` on uninitialized data. + + * [#47](https://github.com/mity/md4c/issues/47): + Using bad offsets in `md_is_entity_str()`, in some cases leading to buffer + overflow. + + * [#51](https://github.com/mity/md4c/issues/51): + Segfault in `md_process_table_cell()`. + + * [#53](https://github.com/mity/md4c/issues/53): + With `MD_FLAG_PERMISSIVEURLAUTOLINKS` or `MD_FLAG_PERMISSIVEWWWAUTOLINKS` + we could generate bad output for ordinary Markdown links, if a non-space + character immediately follows like e.g. in `[link](http://github.com)X`. + + +## Version 0.2.7 + +This was the last version before the changelog has been added. diff --git a/tdemarkdown/md4c/CMakeLists.txt b/tdemarkdown/md4c/CMakeLists.txt new file mode 100644 index 000000000..cab797eb9 --- /dev/null +++ b/tdemarkdown/md4c/CMakeLists.txt @@ -0,0 +1,59 @@ + +cmake_minimum_required(VERSION 3.4) +project(MD4C C) + +set(MD_VERSION_MAJOR 0) +set(MD_VERSION_MINOR 4) +set(MD_VERSION_RELEASE 8) +set(MD_VERSION "${MD_VERSION_MAJOR}.${MD_VERSION_MINOR}.${MD_VERSION_RELEASE}") + +set(PROJECT_VERSION "${MD_VERSION}") +set(PROJECT_URL "https://github.com/mity/md4c") + +if(WIN32) + # On Windows, given there is no standard lib install dir etc., we rather + # by default build static lib. + option(BUILD_SHARED_LIBS "help string describing option" OFF) +else() + # On Linux, MD4C is slowly being adding into some distros which prefer + # shared lib. + option(BUILD_SHARED_LIBS "help string describing option" ON) +endif() + +add_definitions( + -DMD_VERSION_MAJOR=${MD_VERSION_MAJOR} + -DMD_VERSION_MINOR=${MD_VERSION_MINOR} + -DMD_VERSION_RELEASE=${MD_VERSION_RELEASE} +) + +set(CMAKE_CONFIGURATION_TYPES Debug Release RelWithDebInfo MinSizeRel) +if("${CMAKE_BUILD_TYPE}" STREQUAL "") + set(CMAKE_BUILD_TYPE $ENV{CMAKE_BUILD_TYPE}) + + if("${CMAKE_BUILD_TYPE}" STREQUAL "") + set(CMAKE_BUILD_TYPE "Release") + endif() +endif() + + +if(${CMAKE_C_COMPILER_ID} MATCHES GNU|Clang) + set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall") +elseif(MSVC) + # Disable warnings about the so-called unsecured functions: + add_definitions(/D_CRT_SECURE_NO_WARNINGS /W3) + + # Specify proper C runtime library: + string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}") + string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE}") + string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_RELWITHDEBINFO "{$CMAKE_C_FLAGS_RELWITHDEBINFO}") + string(REGEX REPLACE "/M[DT]d?" "" CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_MINSIZEREL}") + set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} /MTd") + set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /MT") + set(CMAKE_C_FLAGS_RELWITHDEBINFO "${CMAKE_C_FLAGS_RELEASE} /MT") + set(CMAKE_C_FLAGS_MINSIZEREL "${CMAKE_C_FLAGS_RELEASE} /MT") +endif() + +include(GNUInstallDirs) + +add_subdirectory(src) +add_subdirectory(md2html) diff --git a/tdemarkdown/md4c/LICENSE.md b/tdemarkdown/md4c/LICENSE.md new file mode 100644 index 000000000..2088ba453 --- /dev/null +++ b/tdemarkdown/md4c/LICENSE.md @@ -0,0 +1,22 @@ + +# The MIT License (MIT) + +Copyright © 2016-2020 Martin Mitáš + +Permission is hereby granted, free of charge, to any person obtaining a +copy of this software and associated documentation files (the “Software”), +to deal in the Software without restriction, including without limitation +the rights to use, copy, modify, merge, publish, distribute, sublicense, +and/or sell copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included +in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS +OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL +THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS +IN THE SOFTWARE. diff --git a/tdemarkdown/md4c/README.md b/tdemarkdown/md4c/README.md new file mode 100644 index 000000000..9abe987e9 --- /dev/null +++ b/tdemarkdown/md4c/README.md @@ -0,0 +1,297 @@ +[![Linux Build Status (travis-ci.com)](https://img.shields.io/travis/mity/md4c/master.svg?logo=linux&label=linux%20build)](https://travis-ci.com/mity/md4c) +[![Windows Build Status (appveyor.com)](https://img.shields.io/appveyor/ci/mity/md4c/master.svg?logo=windows&label=windows%20build)](https://ci.appveyor.com/project/mity/md4c/branch/master) +[![Code Coverage Status (codecov.io)](https://img.shields.io/codecov/c/github/mity/md4c/master.svg?logo=codecov&label=code%20coverage)](https://codecov.io/github/mity/md4c) +[![Coverity Scan Status](https://img.shields.io/coverity/scan/mity-md4c.svg?label=coverity%20scan)](https://scan.coverity.com/projects/mity-md4c) + + +# MD4C Readme + +* Home: http://github.com/mity/md4c +* Wiki: http://github.com/mity/md4c/wiki +* Issue tracker: http://github.com/mity/md4c/issues + +MD4C stands for "Markdown for C" and that's exactly what this project is about. + + +## What is Markdown + +In short, Markdown is the markup language this `README.md` file is written in. + +The following resources can explain more if you are unfamiliar with it: +* [Wikipedia article](http://en.wikipedia.org/wiki/Markdown) +* [CommonMark site](http://commonmark.org) + + +## What is MD4C + +MD4C is Markdown parser implementation in C, with the following features: + +* **Compliance:** Generally, MD4C aims to be compliant to the latest version of + [CommonMark specification](http://spec.commonmark.org/). Currently, we are + fully compliant to CommonMark 0.30. + +* **Extensions:** MD4C supports some commonly requested and accepted extensions. + See below. + +* **Performance:** MD4C is [very fast](https://talk.commonmark.org/t/2520). + +* **Compactness:** MD4C parser is implemented in one source file and one header + file. There are no dependencies other than standard C library. + +* **Embedding:** MD4C parser is easy to reuse in other projects, its API is + very straightforward: There is actually just one function, `md_parse()`. + +* **Push model:** MD4C parses the complete document and calls few callback + functions provided by the application to inform it about a start/end of + every block, a start/end of every span, and with any textual contents. + +* **Portability:** MD4C builds and works on Windows and POSIX-compliant OSes. + (It should be simple to make it run also on most other platforms, at least as + long as the platform provides C standard library, including a heap memory + management.) + +* **Encoding:** MD4C by default expects UTF-8 encoding of the input document. + But it can be compiled to recognize ASCII-only control characters (i.e. to + disable all Unicode-specific code), or (on Windows) to expect UTF-16 (i.e. + what is on Windows commonly called just "Unicode"). See more details below. + +* **Permissive license:** MD4C is available under the [MIT license](LICENSE.md). + + +## Using MD4C + +### Parsing Markdown + +If you need just to parse a Markdown document, you need to include `md4c.h` +and link against MD4C library (`-lmd4c`); or alternatively add `md4c.[hc]` +directly to your code base as the parser is only implemented in the single C +source file. + +The main provided function is `md_parse()`. It takes a text in the Markdown +syntax and a pointer to a structure which provides pointers to several callback +functions. + +As `md_parse()` processes the input, it calls the callbacks (when entering or +leaving any Markdown block or span; and when outputting any textual content of +the document), allowing application to convert it into another format or render +it onto the screen. + + +### Converting to HTML + +If you need to convert Markdown to HTML, include `md4c-html.h` and link against +MD4C-HTML library (`-lmd4c-html`); or alternatively add the sources `md4c.[hc]`, +`md4c-html.[hc]` and `entity.[hc]` into your code base. + +To convert a Markdown input, call `md_html()` function. It takes the Markdown +input and calls the provided callback function. The callback is fed with +chunks of the HTML output. Typical callback implementation just appends the +chunks into a buffer or writes them to a file. + + +## Markdown Extensions + +The default behavior is to recognize only Markdown syntax defined by the +[CommonMark specification](http://spec.commonmark.org/). + +However, with appropriate flags, the behavior can be tuned to enable some +extensions: + +* With the flag `MD_FLAG_COLLAPSEWHITESPACE`, a non-trivial whitespace is + collapsed into a single space. + +* With the flag `MD_FLAG_TABLES`, GitHub-style tables are supported. + +* With the flag `MD_FLAG_TASKLISTS`, GitHub-style task lists are supported. + +* With the flag `MD_FLAG_STRIKETHROUGH`, strike-through spans are enabled + (text enclosed in tilde marks, e.g. `~foo bar~`). + +* With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS` permissive URL autolinks + (not enclosed in `<` and `>`) are supported. + +* With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, permissive e-mail + autolinks (not enclosed in `<` and `>`) are supported. + +* With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS` permissive WWW autolinks + without any scheme specified (e.g. `www.example.com`) are supported. MD4C + then assumes `http:` scheme. + +* With the flag `MD_FLAG_LATEXMATHSPANS` LaTeX math spans (`$...$`) and + LaTeX display math spans (`$$...$$`) are supported. (Note though that the + HTML renderer outputs them verbatim in a custom tag `<x-equation>`.) + +* With the flag `MD_FLAG_WIKILINKS`, wiki-style links (`[[link label]]` and + `[[target article|link label]]`) are supported. (Note that the HTML renderer + outputs them in a custom tag `<x-wikilink>`.) + +* With the flag `MD_FLAG_UNDERLINE`, underscore (`_`) denotes an underline + instead of an ordinary emphasis or strong emphasis. + +Few features of CommonMark (those some people see as mis-features) may be +disabled with the following flags: + +* With the flag `MD_FLAG_NOHTMLSPANS` or `MD_FLAG_NOHTMLBLOCKS`, raw inline + HTML or raw HTML blocks respectively are disabled. + +* With the flag `MD_FLAG_NOINDENTEDCODEBLOCKS`, indented code blocks are + disabled. + + +## Input/Output Encoding + +The CommonMark specification declares that any sequence of Unicode code points +is a valid CommonMark document. + +But, under a closer inspection, Unicode plays any role in few very specific +situations when parsing Markdown documents: + +1. For detection of word boundaries when processing emphasis and strong + emphasis, some classification of Unicode characters (whether it is + a whitespace or a punctuation) is needed. + +2. For (case-insensitive) matching of a link reference label with the + corresponding link reference definition, Unicode case folding is used. + +3. For translating HTML entities (e.g. `&`) and numeric character + references (e.g. `#` or `ಫ`) into their Unicode equivalents. + + However note MD4C leaves this translation on the renderer/application; as + the renderer is supposed to really know output encoding and whether it + really needs to perform this kind of translation. (For example, when the + renderer outputs HTML, it may leave the entities untranslated and defer the + work to a web browser.) + +MD4C relies on this property of the CommonMark and the implementation is, to +a large degree, encoding-agnostic. Most of MD4C code only assumes that the +encoding of your choice is compatible with ASCII. I.e. that the codepoints +below 128 have the same numeric values as ASCII. + +Any input MD4C does not understand is simply seen as part of the document text +and sent to the renderer's callback functions unchanged. + +The two situations (word boundary detection and link reference matching) where +MD4C has to understand Unicode are handled as specified by the following +preprocessor macros (as specified at the time MD4C is being built): + +* If preprocessor macro `MD4C_USE_UTF8` is defined, MD4C assumes UTF-8 for the + word boundary detection and for the case-insensitive matching of link labels. + + When none of these macros is explicitly used, this is the default behavior. + +* On Windows, if preprocessor macro `MD4C_USE_UTF16` is defined, MD4C uses + `WCHAR` instead of `char` and assumes UTF-16 encoding in those situations. + (UTF-16 is what Windows developers usually call just "Unicode" and what + Win32API generally works with.) + + Note that because this macro affects also the types in `md4c.h`, you have + to define the macro both when building MD4C as well as when including + `md4c.h`. + + Also note this is only supported in the parser (`md4c.[hc]`). The HTML + renderer does not support this and you will have to write your own custom + renderer to use this feature. + +* If preprocessor macro `MD4C_USE_ASCII` is defined, MD4C assumes nothing but + an ASCII input. + + That effectively means that non-ASCII whitespace or punctuation characters + won't be recognized as such and that link reference matching will work in + a case-insensitive way only for ASCII letters (`[a-zA-Z]`). + + +## Documentation + +The API of the parser is quite well documented in the comments in the `md4c.h`. +Similarly, the markdown-to-html API is described in its header `md4c-html.h`. + +There is also [project wiki](http://github.com/mity/md4c/wiki) which provides +some more comprehensive documentation. However note it is incomplete and some +details may be somewhat outdated. + + +## FAQ + +**Q: How does MD4C compare to a parser XY?** + +**A:** Some other implementations combine Markdown parser and HTML generator +into a single entangled code hidden behind an interface which just allows the +conversion from Markdown to HTML. They are often unusable if you want to +process the input in any other way. + +Even when the parsing is available as a standalone feature, most parsers (if +not all of them; at least within the scope of C/C++ language) are full DOM-like +parsers: They construct abstract syntax tree (AST) representation of the whole +Markdown document. That takes time and it leads to bigger memory footprint. + +It's completely fine as long as you really need it. If you don't need the full +AST, there is a very high chance that using MD4C will be substantially faster +and less hungry in terms of memory consumption. + +Last but not least, some Markdown parsers are implemented in a naive way. When +fed with a [smartly crafted input pattern](test/pathological_tests.py), they +may exhibit quadratic (or even worse) parsing times. What MD4C can still parse +in a fraction of second may turn into long minutes or possibly hours with them. +Hence, when such a naive parser is used to process an input from an untrusted +source, the possibility of denial-of-service attacks becomes a real danger. + +A lot of our effort went into providing linear parsing times no matter what +kind of crazy input MD4C parser is fed with. (If you encounter an input pattern +which leads to a sub-linear parsing times, please do not hesitate and report it +as a bug.) + +**Q: Does MD4C perform any input validation?** + +**A:** No. And we are proud of it. :-) + +CommonMark specification states that any sequence of Unicode characters is +a valid Markdown document. (In practice, this more or less always means UTF-8 +encoding.) + +In other words, according to the specification, it does not matter whether some +Markdown syntax construction is in some way broken or not. If it is broken, it +will simply not be recognized and the parser should see it just as a verbatim +text. + +MD4C takes this a step further: It sees any sequence of bytes as a valid input, +following completely the GIGO philosophy (garbage in, garbage out). I.e. any +ill-formed UTF-8 byte sequence will propagate to the respective callback as +a part of the text. + +If you need to validate that the input is, say, a well-formed UTF-8 document, +you have to do it on your own. The easiest way how to do this is to simply +validate the whole document before passing it to the MD4C parser. + + +## License + +MD4C is covered with MIT license, see the file `LICENSE.md`. + + +## Links to Related Projects + +Ports and bindings to other languages: + +* [commonmark-d](https://github.com/AuburnSounds/commonmark-d): + Port of MD4C to D language. + +* [markdown-wasm](https://github.com/rsms/markdown-wasm): + Port of MD4C to WebAssembly. + +* [PyMD4C](https://github.com/dominickpastore/pymd4c): + Python bindings for MD4C + +Software using MD4C: + +* [QOwnNotes](https://www.qownnotes.org/): + A plain-text file notepad and todo-list manager with markdown support and + ownCloud / Nextcloud integration. + +* [Qt](https://www.qt.io/): + Cross-platform C++ GUI framework. + +* [Textosaurus](https://github.com/martinrotter/textosaurus): + Cross-platform text editor based on Qt and Scintilla. + +* [8th](https://8th-dev.com/): + Cross-platform concatenative programming language. diff --git a/tdemarkdown/md4c/md2html/CMakeLists.txt b/tdemarkdown/md4c/md2html/CMakeLists.txt new file mode 100644 index 000000000..14de6712e --- /dev/null +++ b/tdemarkdown/md4c/md2html/CMakeLists.txt @@ -0,0 +1,22 @@ + +set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUG") + + +# Build rules for md2html command line utility + +include_directories("${PROJECT_SOURCE_DIR}/src") +add_executable(md2html cmdline.c cmdline.h md2html.c) +target_link_libraries(md2html md4c-html) + + +# Install rules + +install( + TARGETS md2html + ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} + LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} + RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} + PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} +) +install(FILES "md2html.1" DESTINATION "${CMAKE_INSTALL_MANDIR}/man1") + diff --git a/tdemarkdown/md4c/md2html/cmdline.c b/tdemarkdown/md4c/md2html/cmdline.c new file mode 100644 index 000000000..c3fddfaa4 --- /dev/null +++ b/tdemarkdown/md4c/md2html/cmdline.c @@ -0,0 +1,205 @@ +/* + * C Reusables + * <http://github.com/mity/c-reusables> + * + * Copyright (c) 2017-2020 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "cmdline.h" + +#include <stdio.h> +#include <string.h> + + +#ifdef _WIN32 + #define snprintf _snprintf +#endif + + +#define CMDLINE_AUXBUF_SIZE 32 + + + +static int +cmdline_handle_short_opt_group(const CMDLINE_OPTION* options, const char* arggroup, + int (*callback)(int /*optval*/, const char* /*arg*/, void* /*userdata*/), + void* userdata) +{ + const CMDLINE_OPTION* opt; + int i; + int ret = 0; + + for(i = 0; arggroup[i] != '\0'; i++) { + for(opt = options; opt->id != 0; opt++) { + if(arggroup[i] == opt->shortname) + break; + } + + if(opt->id != 0 && !(opt->flags & CMDLINE_OPTFLAG_REQUIREDARG)) { + ret = callback(opt->id, NULL, userdata); + } else { + /* Unknown option. */ + char badoptname[3]; + badoptname[0] = '-'; + badoptname[1] = arggroup[i]; + badoptname[2] = '\0'; + ret = callback((opt->id != 0 ? CMDLINE_OPTID_MISSINGARG : CMDLINE_OPTID_UNKNOWN), + badoptname, userdata); + } + + if(ret != 0) + break; + } + + return ret; +} + +int +cmdline_read(const CMDLINE_OPTION* options, int argc, char** argv, + int (*callback)(int /*optval*/, const char* /*arg*/, void* /*userdata*/), + void* userdata) +{ + const CMDLINE_OPTION* opt; + char auxbuf[CMDLINE_AUXBUF_SIZE+1]; + int fast_optarg_decision = 1; + int after_doubledash = 0; + int i = 1; + int ret = 0; + + auxbuf[CMDLINE_AUXBUF_SIZE] = '\0'; + + /* Check whether there is any CMDLINE_OPTFLAG_COMPILERLIKE option with + * a name not starting with '-'. That would imply we can to check for + * non-option arguments only after refusing all such options. */ + for(opt = options; opt->id != 0; opt++) { + if((opt->flags & CMDLINE_OPTFLAG_COMPILERLIKE) && opt->longname[0] != '-') + fast_optarg_decision = 0; + } + + while(i < argc) { + if(after_doubledash || strcmp(argv[i], "-") == 0) { + /* Non-option argument. + * Standalone "-" usually means "read from stdin" or "write to + * stdout" so treat it always as a non-option. */ + ret = callback(CMDLINE_OPTID_NONE, argv[i], userdata); + } else if(strcmp(argv[i], "--") == 0) { + /* End of options. All the remaining tokens are non-options + * even if they start with a dash. */ + after_doubledash = 1; + } else if(fast_optarg_decision && argv[i][0] != '-') { + /* Non-option argument. */ + ret = callback(CMDLINE_OPTID_NONE, argv[i], userdata); + } else { + for(opt = options; opt->id != 0; opt++) { + if(opt->flags & CMDLINE_OPTFLAG_COMPILERLIKE) { + size_t len = strlen(opt->longname); + if(strncmp(argv[i], opt->longname, len) == 0) { + /* Compiler-like option. */ + if(argv[i][len] != '\0') + ret = callback(opt->id, argv[i] + len, userdata); + else if(i+1 < argc) + ret = callback(opt->id, argv[++i], userdata); + else + ret = callback(CMDLINE_OPTID_MISSINGARG, opt->longname, userdata); + break; + } + } else if(opt->longname != NULL && strncmp(argv[i], "--", 2) == 0) { + size_t len = strlen(opt->longname); + if(strncmp(argv[i]+2, opt->longname, len) == 0) { + /* Regular long option. */ + if(argv[i][2+len] == '\0') { + /* with no argument provided. */ + if(!(opt->flags & CMDLINE_OPTFLAG_REQUIREDARG)) + ret = callback(opt->id, NULL, userdata); + else + ret = callback(CMDLINE_OPTID_MISSINGARG, argv[i], userdata); + break; + } else if(argv[i][2+len] == '=') { + /* with an argument provided. */ + if(opt->flags & (CMDLINE_OPTFLAG_OPTIONALARG | CMDLINE_OPTFLAG_REQUIREDARG)) { + ret = callback(opt->id, argv[i]+2+len+1, userdata); + } else { + snprintf(auxbuf, CMDLINE_AUXBUF_SIZE, "--%s", opt->longname); + ret = callback(CMDLINE_OPTID_BOGUSARG, auxbuf, userdata); + } + break; + } else { + continue; + } + } + } else if(opt->shortname != '\0' && argv[i][0] == '-') { + if(argv[i][1] == opt->shortname) { + /* Regular short option. */ + if(opt->flags & CMDLINE_OPTFLAG_REQUIREDARG) { + if(argv[i][2] != '\0') + ret = callback(opt->id, argv[i]+2, userdata); + else if(i+1 < argc) + ret = callback(opt->id, argv[++i], userdata); + else + ret = callback(CMDLINE_OPTID_MISSINGARG, argv[i], userdata); + break; + } else { + ret = callback(opt->id, NULL, userdata); + + /* There might be more (argument-less) short options + * grouped together. */ + if(ret == 0 && argv[i][2] != '\0') + ret = cmdline_handle_short_opt_group(options, argv[i]+2, callback, userdata); + break; + } + } + } + } + + if(opt->id == 0) { /* still not handled? */ + if(argv[i][0] != '-') { + /* Non-option argument. */ + ret = callback(CMDLINE_OPTID_NONE, argv[i], userdata); + } else { + /* Unknown option. */ + char* badoptname = argv[i]; + + if(strncmp(badoptname, "--", 2) == 0) { + /* Strip any argument from the long option. */ + char* assignment = strchr(badoptname, '='); + if(assignment != NULL) { + size_t len = assignment - badoptname; + if(len > CMDLINE_AUXBUF_SIZE) + len = CMDLINE_AUXBUF_SIZE; + strncpy(auxbuf, badoptname, len); + auxbuf[len] = '\0'; + badoptname = auxbuf; + } + } + + ret = callback(CMDLINE_OPTID_UNKNOWN, badoptname, userdata); + } + } + } + + if(ret != 0) + return ret; + i++; + } + + return ret; +} + diff --git a/tdemarkdown/md4c/md2html/cmdline.h b/tdemarkdown/md4c/md2html/cmdline.h new file mode 100644 index 000000000..5bbde47eb --- /dev/null +++ b/tdemarkdown/md4c/md2html/cmdline.h @@ -0,0 +1,153 @@ +/* + * C Reusables + * <http://github.com/mity/c-reusables> + * + * Copyright (c) 2017 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef CRE_CMDLINE_H +#define CRE_CMDLINE_H + +#ifdef __cplusplus +extern "C" { +#endif + + +/* The option may have an argument. (Affects only long option.) */ +#define CMDLINE_OPTFLAG_OPTIONALARG 0x0001 + +/* The option must have an argument. + * Such short option cannot be grouped within single '-abc'. */ +#define CMDLINE_OPTFLAG_REQUIREDARG 0x0002 + +/* Enable special compiler-like mode for the long option. + * + * Note ::shortname is not supported with this flag. CMDLINE_OPTION::shortname + * is silently ignored if the flag is used. + * + * With this flag, CMDLINE_OPTION::longname is treated differently as follows: + * + * 1. The option matches if the CMDLINE_OPTION::longname is the exact prefix + * of the argv[i] from commandline. + * + * 2. Double dash ("--") is not automatically prepended to + * CMDLINE_OPTION::longname. (If you desire any leading dash, include it + * explicitly in CMDLINE_OPTION initialization.) + * + * 3. An argument (optionally after a whitespace) is required (the flag + * CMDLINE_OPTFLAG_COMPILERLIKE implicitly implies also the flag + * CMDLINE_OPTFLAG_REQUIREDARG). + * + * But there is no delimiter expected (no "=" between the option and its + * argument). Whitespace is optional between the option and its argument. + * + * Intended use is for options similar to what many compilers accept. + * For example: + * -DDEBUG=0 (-D is the option, DEBUG=0 is the argument). + * -Isrc/include (-I is the option, src/include is the argument). + * -isystem /usr/include (-isystem is the option, /usr/include is the argument). + * -lmath (-l is the option, math is the argument). + */ +#define CMDLINE_OPTFLAG_COMPILERLIKE 0x0004 + + +/* Special (reserved) option IDs. Do not use these for any CMDLINE_OPTION::id. + * See documentation of cmdline_read() to get info about their meaning. + */ +#define CMDLINE_OPTID_NONE 0 +#define CMDLINE_OPTID_UNKNOWN (-0x7fffffff + 0) +#define CMDLINE_OPTID_MISSINGARG (-0x7fffffff + 1) +#define CMDLINE_OPTID_BOGUSARG (-0x7fffffff + 2) + + +typedef struct CMDLINE_OPTION { + char shortname; /* Short (single char) option or 0. */ + const char* longname; /* Long name (after "--") or NULL. */ + int id; /* Non-zero ID to identify the option in the callback; or zero to denote end of options list. */ + unsigned flags; /* Bitmask of CMDLINE_OPTFLAG_xxxx flags. */ +} CMDLINE_OPTION; + + +/* Parses all options and their arguments as specified by argc, argv accordingly + * with the given options (except argv[0] which is ignored). + * + * The caller must specify the list of supported options in the 1st parameter + * of the function. The array must end with a record whose CMDLINE_OPTION::id + * is zero to zero. + * + * The provided callback function is called for each option on the command + * line so that: + * + * -- the "id" refers to the id of the option as specified in options[]. + * + * -- the "arg" specifies an argument of the option or NULL if none is + * provided. + * + * -- the "userdata" just allows to pass in some caller's context into + * the callback. + * + * Special cases (recognized via special "id" value) are reported to the + * callback as follows: + * + * -- If id is CMDLINE_OPTID_NONE, the callback informs about a non-option + * also known as a positional argument. + * + * All argv[] tokens which are not interpreted as an options or an argument + * of any option fall into this category. + * + * Usually, programs interpret these as paths to file to process. + * + * -- If id is CMDLINE_OPTID_UNKNOWN, the corresponding argv[] looks like an + * option but it is not found in the options[] passed to cmdline_read(). + * + * The callback's parameter arg specifies the guilty command line token. + * Usually, program writes down an error message and exits. + * + * -- If id is CMDLINE_OPTID_MISSINGARG, the given option is valid but its + * flag in options[] requires an argument; yet there is none on the + * command line. + * + * The callback's parameter arg specifies the guilty option name. + * Usually, program writes down an error message and exits. + * + * -- If id is CMDLINE_OPTID_BOGUSARG, the given option is valid but its + * flag in options[] does not expect an argument; yet the command line + * does provide one. + * + * The callback's parameter arg specifies the guilty option name. + * Usually, program writes down an error message and exits. + * + * On success, zero is returned. + * + * If the callback returns a non-zero, cmdline_read() aborts immediately and + * cmdline_read() propagates the same return value to the caller. + */ + +int cmdline_read(const CMDLINE_OPTION* options, int argc, char** argv, + int (*callback)(int /*id*/, const char* /*arg*/, void* /*userdata*/), + void* userdata); + + +#ifdef __cplusplus +} /* extern "C" { */ +#endif + +#endif /* CRE_CMDLINE_H */ diff --git a/tdemarkdown/md4c/md2html/md2html.1 b/tdemarkdown/md4c/md2html/md2html.1 new file mode 100644 index 000000000..cffaee8b8 --- /dev/null +++ b/tdemarkdown/md4c/md2html/md2html.1 @@ -0,0 +1,113 @@ +.TH MD2HTML 1 "June 2019" "" "General Commands Manual" +.nh +.ad l +. +.SH NAME +. +md2html \- convert Markdown to HTML +. +.SH SYNOPSIS +. +.B md2html +.RI [ OPTION ]...\& +.RI [ FILE ] +. +.SH OPTIONS +. +.SS General options: +. +.TP +.BR -o ", " --output= \fIOUTFILE\fR +Write output to \fIOUTFILE\fR instead of \fBstdout\fR(3) +. +.TP +.BR -f ", " --full-html +Generate full HTML document, including header +. +.TP +.BR -s ", " --stat +Measure time of input parsing +. +.TP +.BR -h ", " --help +Display help and exit +. +.TP +.BR -v ", " --version +Display version and exit +. +.SS Markdown dialect options: +. +.TP +.B --commonmark +CommonMark (the default) +. +.TP +.B --github +Github Flavored Markdown +. +.PP +Note: dialect options are equivalent to some combination of flags below. +. +.SS Markdown extension options: +. +.TP +.B --fcollapse-whitespace +Collapse non-trivial whitespace +. +.TP +.B --fverbatim-entities +Do not translate entities +. +.TP +.B --fpermissive-atx-headers +Allow ATX headers without delimiting space +. +.TP +.B --fpermissive-url-autolinks +Allow URL autolinks without "<" and ">" delimiters +. +.TP +.B --fpermissive-www-autolinks +Allow WWW autolinks without any scheme (e.g. "www.example.com") +. +.TP +.B --fpermissive-email-autolinks +Allow e-mail autolinks without "<", ">" and "mailto:" +. +.TP +.B --fpermissive-autolinks +Enable all 3 of the above permissive autolinks options +. +.TP +.B --fno-indented-code +Disable indented code blocks +. +.TP +.B --fno-html-blocks +Disable raw HTML blocks +. +.TP +.B --fno-html-spans +Disable raw HTML spans +. +.TP +.B --fno-html +Same as \fB--fno-html-blocks --fno-html-spans\fR +. +.TP +.B --ftables +Enable tables +. +.TP +.B --fstrikethrough +Enable strikethrough spans +. +.TP +.B --ftasklists +Enable task lists +. +.SH SEE ALSO +. +https://github.com/mity/md4c +. diff --git a/tdemarkdown/md4c/md2html/md2html.c b/tdemarkdown/md4c/md2html/md2html.c new file mode 100644 index 000000000..06b2b74b1 --- /dev/null +++ b/tdemarkdown/md4c/md2html/md2html.c @@ -0,0 +1,383 @@ +/* + * MD4C: Markdown parser for C + * (http://github.com/mity/md4c) + * + * Copyright (c) 2016-2020 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <time.h> + +#include "md4c-html.h" +#include "cmdline.h" + + + +/* Global options. */ +static unsigned parser_flags = 0; +#ifndef MD4C_USE_ASCII + static unsigned renderer_flags = MD_HTML_FLAG_DEBUG | MD_HTML_FLAG_SKIP_UTF8_BOM; +#else + static unsigned renderer_flags = MD_HTML_FLAG_DEBUG; +#endif +static int want_fullhtml = 0; +static int want_xhtml = 0; +static int want_stat = 0; + + +/********************************* + *** Simple grow-able buffer *** + *********************************/ + +/* We render to a memory buffer instead of directly outputting the rendered + * documents, as this allows using this utility for evaluating performance + * of MD4C (--stat option). This allows us to measure just time of the parser, + * without the I/O. + */ + +struct membuffer { + char* data; + size_t asize; + size_t size; +}; + +static void +membuf_init(struct membuffer* buf, MD_SIZE new_asize) +{ + buf->size = 0; + buf->asize = new_asize; + buf->data = malloc(buf->asize); + if(buf->data == NULL) { + fprintf(stderr, "membuf_init: malloc() failed.\n"); + exit(1); + } +} + +static void +membuf_fini(struct membuffer* buf) +{ + if(buf->data) + free(buf->data); +} + +static void +membuf_grow(struct membuffer* buf, size_t new_asize) +{ + buf->data = realloc(buf->data, new_asize); + if(buf->data == NULL) { + fprintf(stderr, "membuf_grow: realloc() failed.\n"); + exit(1); + } + buf->asize = new_asize; +} + +static void +membuf_append(struct membuffer* buf, const char* data, MD_SIZE size) +{ + if(buf->asize < buf->size + size) + membuf_grow(buf, buf->size + buf->size / 2 + size); + memcpy(buf->data + buf->size, data, size); + buf->size += size; +} + + +/********************** + *** Main program *** + **********************/ + +static void +process_output(const MD_CHAR* text, MD_SIZE size, void* userdata) +{ + membuf_append((struct membuffer*) userdata, text, size); +} + +static int +process_file(FILE* in, FILE* out) +{ + size_t n; + struct membuffer buf_in = {0}; + struct membuffer buf_out = {0}; + int ret = -1; + clock_t t0, t1; + + membuf_init(&buf_in, 32 * 1024); + + /* Read the input file into a buffer. */ + while(1) { + if(buf_in.size >= buf_in.asize) + membuf_grow(&buf_in, buf_in.asize + buf_in.asize / 2); + + n = fread(buf_in.data + buf_in.size, 1, buf_in.asize - buf_in.size, in); + if(n == 0) + break; + buf_in.size += n; + } + + /* Input size is good estimation of output size. Add some more reserve to + * deal with the HTML header/footer and tags. */ + membuf_init(&buf_out, (MD_SIZE)(buf_in.size + buf_in.size/8 + 64)); + + /* Parse the document. This shall call our callbacks provided via the + * md_renderer_t structure. */ + t0 = clock(); + + ret = md_html(buf_in.data, (MD_SIZE)buf_in.size, process_output, (void*) &buf_out, + parser_flags, renderer_flags); + + t1 = clock(); + if(ret != 0) { + fprintf(stderr, "Parsing failed.\n"); + goto out; + } + + /* Write down the document in the HTML format. */ + if(want_fullhtml) { + if(want_xhtml) { + fprintf(out, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"); + fprintf(out, "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" " + "\"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">\n"); + fprintf(out, "<html xmlns=\"http://www.w3.org/1999/xhtml\">\n"); + } else { + fprintf(out, "<!DOCTYPE html>\n"); + fprintf(out, "<html>\n"); + } + fprintf(out, "<head>\n"); + fprintf(out, "<title></title>\n"); + fprintf(out, "<meta name=\"generator\" content=\"md2html\"%s>\n", want_xhtml ? " /" : ""); + fprintf(out, "</head>\n"); + fprintf(out, "<body>\n"); + } + + fwrite(buf_out.data, 1, buf_out.size, out); + + if(want_fullhtml) { + fprintf(out, "</body>\n"); + fprintf(out, "</html>\n"); + } + + if(want_stat) { + if(t0 != (clock_t)-1 && t1 != (clock_t)-1) { + double elapsed = (double)(t1 - t0) / CLOCKS_PER_SEC; + if (elapsed < 1) + fprintf(stderr, "Time spent on parsing: %7.2f ms.\n", elapsed*1e3); + else + fprintf(stderr, "Time spent on parsing: %6.3f s.\n", elapsed); + } + } + + /* Success if we have reached here. */ + ret = 0; + +out: + membuf_fini(&buf_in); + membuf_fini(&buf_out); + + return ret; +} + + +static const CMDLINE_OPTION cmdline_options[] = { + { 'o', "output", 'o', CMDLINE_OPTFLAG_REQUIREDARG }, + { 'f', "full-html", 'f', 0 }, + { 'x', "xhtml", 'x', 0 }, + { 's', "stat", 's', 0 }, + { 'h', "help", 'h', 0 }, + { 'v', "version", 'v', 0 }, + + { 0, "commonmark", 'c', 0 }, + { 0, "github", 'g', 0 }, + + { 0, "fcollapse-whitespace", 'W', 0 }, + { 0, "flatex-math", 'L', 0 }, + { 0, "fpermissive-atx-headers", 'A', 0 }, + { 0, "fpermissive-autolinks", 'V', 0 }, + { 0, "fpermissive-email-autolinks", '@', 0 }, + { 0, "fpermissive-url-autolinks", 'U', 0 }, + { 0, "fpermissive-www-autolinks", '.', 0 }, + { 0, "fstrikethrough", 'S', 0 }, + { 0, "ftables", 'T', 0 }, + { 0, "ftasklists", 'X', 0 }, + { 0, "funderline", '_', 0 }, + { 0, "fverbatim-entities", 'E', 0 }, + { 0, "fwiki-links", 'K', 0 }, + + { 0, "fno-html-blocks", 'F', 0 }, + { 0, "fno-html-spans", 'G', 0 }, + { 0, "fno-html", 'H', 0 }, + { 0, "fno-indented-code", 'I', 0 }, + + { 0, NULL, 0, 0 } +}; + +static void +usage(void) +{ + printf( + "Usage: md2html [OPTION]... [FILE]\n" + "Convert input FILE (or standard input) in Markdown format to HTML.\n" + "\n" + "General options:\n" + " -o --output=FILE Output file (default is standard output)\n" + " -f, --full-html Generate full HTML document, including header\n" + " -x, --xhtml Generate XHTML instead of HTML\n" + " -s, --stat Measure time of input parsing\n" + " -h, --help Display this help and exit\n" + " -v, --version Display version and exit\n" + "\n" + "Markdown dialect options:\n" + "(note these are equivalent to some combinations of the flags below)\n" + " --commonmark CommonMark (this is default)\n" + " --github Github Flavored Markdown\n" + "\n" + "Markdown extension options:\n" + " --fcollapse-whitespace\n" + " Collapse non-trivial whitespace\n" + " --flatex-math Enable LaTeX style mathematics spans\n" + " --fpermissive-atx-headers\n" + " Allow ATX headers without delimiting space\n" + " --fpermissive-url-autolinks\n" + " Allow URL autolinks without '<', '>'\n" + " --fpermissive-www-autolinks\n" + " Allow WWW autolinks without any scheme (e.g. 'www.example.com')\n" + " --fpermissive-email-autolinks \n" + " Allow e-mail autolinks without '<', '>' and 'mailto:'\n" + " --fpermissive-autolinks\n" + " Same as --fpermissive-url-autolinks --fpermissive-www-autolinks\n" + " --fpermissive-email-autolinks\n" + " --fstrikethrough Enable strike-through spans\n" + " --ftables Enable tables\n" + " --ftasklists Enable task lists\n" + " --funderline Enable underline spans\n" + " --fwiki-links Enable wiki links\n" + "\n" + "Markdown suppression options:\n" + " --fno-html-blocks\n" + " Disable raw HTML blocks\n" + " --fno-html-spans\n" + " Disable raw HTML spans\n" + " --fno-html Same as --fno-html-blocks --fno-html-spans\n" + " --fno-indented-code\n" + " Disable indented code blocks\n" + "\n" + "HTML generator options:\n" + " --fverbatim-entities\n" + " Do not translate entities\n" + "\n" + ); +} + +static void +version(void) +{ + printf("%d.%d.%d\n", MD_VERSION_MAJOR, MD_VERSION_MINOR, MD_VERSION_RELEASE); +} + +static const char* input_path = NULL; +static const char* output_path = NULL; + +static int +cmdline_callback(int opt, char const* value, void* data) +{ + switch(opt) { + case 0: + if(input_path) { + fprintf(stderr, "Too many arguments. Only one input file can be specified.\n"); + fprintf(stderr, "Use --help for more info.\n"); + exit(1); + } + input_path = value; + break; + + case 'o': output_path = value; break; + case 'f': want_fullhtml = 1; break; + case 'x': want_xhtml = 1; renderer_flags |= MD_HTML_FLAG_XHTML; break; + case 's': want_stat = 1; break; + case 'h': usage(); exit(0); break; + case 'v': version(); exit(0); break; + + case 'c': parser_flags |= MD_DIALECT_COMMONMARK; break; + case 'g': parser_flags |= MD_DIALECT_GITHUB; break; + + case 'E': renderer_flags |= MD_HTML_FLAG_VERBATIM_ENTITIES; break; + case 'A': parser_flags |= MD_FLAG_PERMISSIVEATXHEADERS; break; + case 'I': parser_flags |= MD_FLAG_NOINDENTEDCODEBLOCKS; break; + case 'F': parser_flags |= MD_FLAG_NOHTMLBLOCKS; break; + case 'G': parser_flags |= MD_FLAG_NOHTMLSPANS; break; + case 'H': parser_flags |= MD_FLAG_NOHTML; break; + case 'W': parser_flags |= MD_FLAG_COLLAPSEWHITESPACE; break; + case 'U': parser_flags |= MD_FLAG_PERMISSIVEURLAUTOLINKS; break; + case '.': parser_flags |= MD_FLAG_PERMISSIVEWWWAUTOLINKS; break; + case '@': parser_flags |= MD_FLAG_PERMISSIVEEMAILAUTOLINKS; break; + case 'V': parser_flags |= MD_FLAG_PERMISSIVEAUTOLINKS; break; + case 'T': parser_flags |= MD_FLAG_TABLES; break; + case 'S': parser_flags |= MD_FLAG_STRIKETHROUGH; break; + case 'L': parser_flags |= MD_FLAG_LATEXMATHSPANS; break; + case 'K': parser_flags |= MD_FLAG_WIKILINKS; break; + case 'X': parser_flags |= MD_FLAG_TASKLISTS; break; + case '_': parser_flags |= MD_FLAG_UNDERLINE; break; + + default: + fprintf(stderr, "Illegal option: %s\n", value); + fprintf(stderr, "Use --help for more info.\n"); + exit(1); + break; + } + + return 0; +} + +int +main(int argc, char** argv) +{ + FILE* in = stdin; + FILE* out = stdout; + int ret = 0; + + if(cmdline_read(cmdline_options, argc, argv, cmdline_callback, NULL) != 0) { + usage(); + exit(1); + } + + if(input_path != NULL && strcmp(input_path, "-") != 0) { + in = fopen(input_path, "rb"); + if(in == NULL) { + fprintf(stderr, "Cannot open %s.\n", input_path); + exit(1); + } + } + if(output_path != NULL && strcmp(output_path, "-") != 0) { + out = fopen(output_path, "wt"); + if(out == NULL) { + fprintf(stderr, "Cannot open %s.\n", output_path); + exit(1); + } + } + + ret = process_file(in, out); + if(in != stdin) + fclose(in); + if(out != stdout) + fclose(out); + + return ret; +} diff --git a/tdemarkdown/md4c/scripts/build_folding_map.py b/tdemarkdown/md4c/scripts/build_folding_map.py new file mode 100644 index 000000000..b401775f5 --- /dev/null +++ b/tdemarkdown/md4c/scripts/build_folding_map.py @@ -0,0 +1,120 @@ +#!/usr/bin/env python3 + +import os +import sys +import textwrap + + +self_path = os.path.dirname(os.path.realpath(__file__)); +f = open(self_path + "/unicode/CaseFolding.txt", "r") + +status_list = [ "C", "F" ] + +folding_list = [ dict(), dict(), dict() ] + +# Filter the foldings for "full" folding. +for line in f: + comment_off = line.find("#") + if comment_off >= 0: + line = line[:comment_off] + line = line.strip() + if not line: + continue + + raw_codepoint, status, raw_mapping, ignored_tail = line.split(";", 3) + if not status.strip() in status_list: + continue + codepoint = int(raw_codepoint.strip(), 16) + mapping = [int(it, 16) for it in raw_mapping.strip().split(" ")] + mapping_len = len(mapping) + + if mapping_len in range(1, 4): + folding_list[mapping_len-1][codepoint] = mapping + else: + assert(False) +f.close() + + +# If we assume that (index0 ... index-1) makes a range (as defined below), +# check that the newly provided index is compatible with the range too; i.e. +# verify that the range can be extended without breaking its properties. +# +# Currently, we can handle ranges which: +# +# (1) either form consecutive sequence of codepoints and which map that range +# to other consecutive range of codepoints (of the same length); +# +# (2) or a consecutive sequence of codepoints with step 2 where each codepoint +# CP is mapped to the codepoint CP+1 +# (e.g. 0x1234 -> 0x1235; 0x1236 -> 0x1237; 0x1238 -> 0x1239; ...). +# +# Note: When the codepoints in the range are mapped to multiple codepoints, +# only the 1st mapped codepoint is considered. All the other ones have to be +# shared by all the mappings covered by the range. +def is_range_compatible(folding, codepoint_list, index0, index): + N = index - index0 + codepoint0 = codepoint_list[index0] + codepoint1 = codepoint_list[index0+1] + codepointN = codepoint_list[index] + mapping0 = folding[codepoint0] + mapping1 = folding[codepoint1] + mappingN = folding[codepointN] + + # Check the range type (1): + if codepoint1 - codepoint0 == 1 and codepointN - codepoint0 == N \ + and mapping1[0] - mapping0[0] == 1 and mapping1[1:] == mapping0[1:] \ + and mappingN[0] - mapping0[0] == N and mappingN[1:] == mapping0[1:]: + return True + + # Check the range type (2): + if codepoint1 - codepoint0 == 2 and codepointN - codepoint0 == 2 * N \ + and mapping0[0] - codepoint0 == 1 \ + and mapping1[0] - codepoint1 == 1 and mapping1[1:] == mapping0[1:] \ + and mappingN[0] - codepointN == 1 and mappingN[1:] == mapping0[1:]: + return True + + return False + + +def mapping_str(list, mapping): + return ",".join("0x{:04x}".format(x) for x in mapping) + +for mapping_len in range(1, 4): + folding = folding_list[mapping_len-1] + codepoint_list = list(folding) + + index0 = 0 + count = len(folding) + + records = list() + data_records = list() + + while index0 < count: + index1 = index0 + 1 + while index1 < count and is_range_compatible(folding, codepoint_list, index0, index1): + index1 += 1 + + if index1 - index0 > 2: + # Range of codepoints + records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1])) + data_records.append(mapping_str(data_records, folding[codepoint_list[index0]])) + data_records.append(mapping_str(data_records, folding[codepoint_list[index1-1]])) + index0 = index1 + else: + # Single codepoint + records.append("S(0x{:04x})".format(codepoint_list[index0])) + data_records.append(mapping_str(data_records, folding[codepoint_list[index0]])) + index0 += 1 + + sys.stdout.write("static const unsigned FOLD_MAP_{}[] = {{\n".format(mapping_len)) + sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110, + initial_indent = " ", subsequent_indent=" "))) + sys.stdout.write("\n};\n") + + sys.stdout.write("static const unsigned FOLD_MAP_{}_DATA[] = {{\n".format(mapping_len)) + sys.stdout.write("\n".join(textwrap.wrap(", ".join(data_records), 110, + initial_indent = " ", subsequent_indent=" "))) + sys.stdout.write("\n};\n") + + + diff --git a/tdemarkdown/md4c/scripts/build_punct_map.py b/tdemarkdown/md4c/scripts/build_punct_map.py new file mode 100644 index 000000000..13102f26f --- /dev/null +++ b/tdemarkdown/md4c/scripts/build_punct_map.py @@ -0,0 +1,66 @@ +#!/usr/bin/env python3 + +import os +import sys +import textwrap + + +self_path = os.path.dirname(os.path.realpath(__file__)); +f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r") + +codepoint_list = [] +category_list = [ "Pc", "Pd", "Pe", "Pf", "Pi", "Po", "Ps" ] + +# Filter codepoints falling in the right category: +for line in f: + comment_off = line.find("#") + if comment_off >= 0: + line = line[:comment_off] + line = line.strip() + if not line: + continue + + char_range, category = line.split(";") + char_range = char_range.strip() + category = category.strip() + + if not category in category_list: + continue + + delim_off = char_range.find("..") + if delim_off >= 0: + codepoint0 = int(char_range[:delim_off], 16) + codepoint1 = int(char_range[delim_off+2:], 16) + for codepoint in range(codepoint0, codepoint1 + 1): + codepoint_list.append(codepoint) + else: + codepoint = int(char_range, 16) + codepoint_list.append(codepoint) +f.close() + + +codepoint_list.sort() + + +index0 = 0 +count = len(codepoint_list) + +records = list() +while index0 < count: + index1 = index0 + 1 + while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1: + index1 += 1 + + if index1 - index0 > 1: + # Range of codepoints + records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1])) + else: + # Single codepoint + records.append("S(0x{:04x})".format(codepoint_list[index0])) + + index0 = index1 + +sys.stdout.write("static const unsigned PUNCT_MAP[] = {\n") +sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110, + initial_indent = " ", subsequent_indent=" "))) +sys.stdout.write("\n};\n\n") diff --git a/tdemarkdown/md4c/scripts/build_whitespace_map.py b/tdemarkdown/md4c/scripts/build_whitespace_map.py new file mode 100644 index 000000000..932b5716e --- /dev/null +++ b/tdemarkdown/md4c/scripts/build_whitespace_map.py @@ -0,0 +1,66 @@ +#!/usr/bin/env python3 + +import os +import sys +import textwrap + + +self_path = os.path.dirname(os.path.realpath(__file__)); +f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r") + +codepoint_list = [] +category_list = [ "Zs" ] + +# Filter codepoints falling in the right category: +for line in f: + comment_off = line.find("#") + if comment_off >= 0: + line = line[:comment_off] + line = line.strip() + if not line: + continue + + char_range, category = line.split(";") + char_range = char_range.strip() + category = category.strip() + + if not category in category_list: + continue + + delim_off = char_range.find("..") + if delim_off >= 0: + codepoint0 = int(char_range[:delim_off], 16) + codepoint1 = int(char_range[delim_off+2:], 16) + for codepoint in range(codepoint0, codepoint1 + 1): + codepoint_list.append(codepoint) + else: + codepoint = int(char_range, 16) + codepoint_list.append(codepoint) +f.close() + + +codepoint_list.sort() + + +index0 = 0 +count = len(codepoint_list) + +records = list() +while index0 < count: + index1 = index0 + 1 + while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1: + index1 += 1 + + if index1 - index0 > 1: + # Range of codepoints + records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1])) + else: + # Single codepoint + records.append("S(0x{:04x})".format(codepoint_list[index0])) + + index0 = index1 + +sys.stdout.write("static const unsigned WHITESPACE_MAP[] = {\n") +sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110, + initial_indent = " ", subsequent_indent=" "))) +sys.stdout.write("\n};\n\n") diff --git a/tdemarkdown/md4c/scripts/coverity.sh b/tdemarkdown/md4c/scripts/coverity.sh new file mode 100755 index 000000000..bcf7f140b --- /dev/null +++ b/tdemarkdown/md4c/scripts/coverity.sh @@ -0,0 +1,70 @@ +#!/bin/sh +# +# This scripts attempts to build the project via cov-build utility, and prepare +# a package for uploading to the coverity scan service. +# +# (See http://scan.coverity.com for more info.) + +set -e + +# Check presence of coverity static analyzer. +if ! which cov-build; then + echo "Utility cov-build not found in PATH." + exit 1 +fi + +# Choose a build system (ninja or GNU make). +if which ninja; then + BUILD_TOOL=ninja + GENERATOR=Ninja +elif which make; then + BUILD_TOOL=make + GENERATOR="MSYS Makefiles" +else + echo "No suitable build system found." + exit 1 +fi + +# Choose a zip tool. +if which 7za; then + MKZIP="7za a -r -mx9" +elif which 7z; then + MKZIP="7z a -r -mx9" +elif which zip; then + MKZIP="zip -r" +else + echo "No suitable zip utility found" + exit 1 +fi + +# Change dir to project root. +cd `dirname "$0"`/.. + +CWD=`pwd` +ROOT_DIR="$CWD" +BUILD_DIR="$CWD/coverity" +OUTPUT="$CWD/cov-int.zip" + +# Sanity checks. +if [ ! -x "$ROOT_DIR/scripts/coverity.sh" ]; then + echo "There is some path mismatch." + exit 1 +fi +if [ -e "$BUILD_DIR" ]; then + echo "Path $BUILD_DIR already exists. Delete it and retry." + exit 1 +fi +if [ -e "$OUTPUT" ]; then + echo "Path $OUTPUT already exists. Delete it and retry." + exit 1 +fi + +# Build the project with the Coverity analyzes enabled. +mkdir -p "$BUILD_DIR" +cd "$BUILD_DIR" +cmake -G "$GENERATOR" "$ROOT_DIR" +cov-build --dir cov-int "$BUILD_TOOL" +$MKZIP "$OUTPUT" "cov-int" +cd "$ROOT_DIR" +rm -rf "$BUILD_DIR" + diff --git a/tdemarkdown/md4c/scripts/run-tests.sh b/tdemarkdown/md4c/scripts/run-tests.sh new file mode 100755 index 000000000..c00b36a92 --- /dev/null +++ b/tdemarkdown/md4c/scripts/run-tests.sh @@ -0,0 +1,75 @@ +#!/bin/sh +# +# Run this script from build directory. + +#set -e + +SELF_DIR=`dirname $0` +PROJECT_DIR="$SELF_DIR/.." +TEST_DIR="$PROJECT_DIR/test" + + +PROGRAM="md2html/md2html" +if [ ! -x "$PROGRAM" ]; then + echo "Cannot find the $PROGRAM." >&2 + echo "You have to run this script from the build directory." >&2 + exit 1 +fi + +if which py >>/dev/null 2>&1; then + PYTHON=py +elif which python3 >>/dev/null 2>&1; then + PYTHON=python3 +elif which python >>/dev/null 2>&1; then + if [ `python --version | awk '{print $2}' | cut -d. -f1` -ge 3 ]; then + PYTHON=python + fi +fi + +echo +echo "CommonMark specification:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/spec.txt" -p "$PROGRAM" + +echo +echo "Code coverage & regressions:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/coverage.txt" -p "$PROGRAM" + +echo +echo "Permissive e-mail autolinks extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-email-autolinks.txt" -p "$PROGRAM --fpermissive-email-autolinks" + +echo +echo "Permissive URL autolinks extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-url-autolinks.txt" -p "$PROGRAM --fpermissive-url-autolinks" + +echo +echo "WWW autolinks extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/permissive-www-autolinks.txt" -p "$PROGRAM --fpermissive-www-autolinks" + +echo +echo "Tables extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tables.txt" -p "$PROGRAM --ftables" + +echo +echo "Strikethrough extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/strikethrough.txt" -p "$PROGRAM --fstrikethrough" + +echo +echo "Task lists extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/tasklists.txt" -p "$PROGRAM --ftasklists" + +echo +echo "LaTeX extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/latex-math.txt" -p "$PROGRAM --flatex-math" + +echo +echo "Wiki links extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/wiki-links.txt" -p "$PROGRAM --fwiki-links --ftables" + +echo +echo "Underline extension:" +$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/underline.txt" -p "$PROGRAM --funderline" + +echo +echo "Pathological input:" +$PYTHON "$TEST_DIR/pathological_tests.py" -p "$PROGRAM" diff --git a/tdemarkdown/md4c/scripts/unicode/CaseFolding.txt b/tdemarkdown/md4c/scripts/unicode/CaseFolding.txt new file mode 100644 index 000000000..033788b25 --- /dev/null +++ b/tdemarkdown/md4c/scripts/unicode/CaseFolding.txt @@ -0,0 +1,1584 @@ +# CaseFolding-13.0.0.txt +# Date: 2019-09-08, 23:30:59 GMT +# © 2019 Unicode®, Inc. +# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. +# For terms of use, see http://www.unicode.org/terms_of_use.html +# +# Unicode Character Database +# For documentation, see http://www.unicode.org/reports/tr44/ +# +# Case Folding Properties +# +# This file is a supplement to the UnicodeData file. +# It provides a case folding mapping generated from the Unicode Character Database. +# If all characters are mapped according to the full mapping below, then +# case differences (according to UnicodeData.txt and SpecialCasing.txt) +# are eliminated. +# +# The data supports both implementations that require simple case foldings +# (where string lengths don't change), and implementations that allow full case folding +# (where string lengths may grow). Note that where they can be supported, the +# full case foldings are superior: for example, they allow "MASSE" and "Maße" to match. +# +# All code points not listed in this file map to themselves. +# +# NOTE: case folding does not preserve normalization formats! +# +# For information on case folding, including how to have case folding +# preserve normalization formats, see Section 3.13 Default Case Algorithms in +# The Unicode Standard. +# +# ================================================================================ +# Format +# ================================================================================ +# The entries in this file are in the following machine-readable format: +# +# <code>; <status>; <mapping>; # <name> +# +# The status field is: +# C: common case folding, common mappings shared by both simple and full mappings. +# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces. +# S: simple case folding, mappings to single characters where different from F. +# T: special case for uppercase I and dotted uppercase I +# - For non-Turkic languages, this mapping is normally not used. +# - For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters. +# Note that the Turkic mappings do not maintain canonical equivalence without additional processing. +# See the discussions of case mapping in the Unicode Standard for more information. +# +# Usage: +# A. To do a simple case folding, use the mappings with status C + S. +# B. To do a full case folding, use the mappings with status C + F. +# +# The mappings with status T can be used or omitted depending on the desired case-folding +# behavior. (The default option is to exclude them.) +# +# ================================================================= + +# Property: Case_Folding + +# All code points not explicitly listed for Case_Folding +# have the value C for the status field, and the code point itself for the mapping field. + +# ================================================================= +0041; C; 0061; # LATIN CAPITAL LETTER A +0042; C; 0062; # LATIN CAPITAL LETTER B +0043; C; 0063; # LATIN CAPITAL LETTER C +0044; C; 0064; # LATIN CAPITAL LETTER D +0045; C; 0065; # LATIN CAPITAL LETTER E +0046; C; 0066; # LATIN CAPITAL LETTER F +0047; C; 0067; # LATIN CAPITAL LETTER G +0048; C; 0068; # LATIN CAPITAL LETTER H +0049; C; 0069; # LATIN CAPITAL LETTER I +0049; T; 0131; # LATIN CAPITAL LETTER I +004A; C; 006A; # LATIN CAPITAL LETTER J +004B; C; 006B; # LATIN CAPITAL LETTER K +004C; C; 006C; # LATIN CAPITAL LETTER L +004D; C; 006D; # LATIN CAPITAL LETTER M +004E; C; 006E; # LATIN CAPITAL LETTER N +004F; C; 006F; # LATIN CAPITAL LETTER O +0050; C; 0070; # LATIN CAPITAL LETTER P +0051; C; 0071; # LATIN CAPITAL LETTER Q +0052; C; 0072; # LATIN CAPITAL LETTER R +0053; C; 0073; # LATIN CAPITAL LETTER S +0054; C; 0074; # LATIN CAPITAL LETTER T +0055; C; 0075; # LATIN CAPITAL LETTER U +0056; C; 0076; # LATIN CAPITAL LETTER V +0057; C; 0077; # LATIN CAPITAL LETTER W +0058; C; 0078; # LATIN CAPITAL LETTER X +0059; C; 0079; # LATIN CAPITAL LETTER Y +005A; C; 007A; # LATIN CAPITAL LETTER Z +00B5; C; 03BC; # MICRO SIGN +00C0; C; 00E0; # LATIN CAPITAL LETTER A WITH GRAVE +00C1; C; 00E1; # LATIN CAPITAL LETTER A WITH ACUTE +00C2; C; 00E2; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX +00C3; C; 00E3; # LATIN CAPITAL LETTER A WITH TILDE +00C4; C; 00E4; # LATIN CAPITAL LETTER A WITH DIAERESIS +00C5; C; 00E5; # LATIN CAPITAL LETTER A WITH RING ABOVE +00C6; C; 00E6; # LATIN CAPITAL LETTER AE +00C7; C; 00E7; # LATIN CAPITAL LETTER C WITH CEDILLA +00C8; C; 00E8; # LATIN CAPITAL LETTER E WITH GRAVE +00C9; C; 00E9; # LATIN CAPITAL LETTER E WITH ACUTE +00CA; C; 00EA; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX +00CB; C; 00EB; # LATIN CAPITAL LETTER E WITH DIAERESIS +00CC; C; 00EC; # LATIN CAPITAL LETTER I WITH GRAVE +00CD; C; 00ED; # LATIN CAPITAL LETTER I WITH ACUTE +00CE; C; 00EE; # LATIN CAPITAL LETTER I WITH CIRCUMFLEX +00CF; C; 00EF; # LATIN CAPITAL LETTER I WITH DIAERESIS +00D0; C; 00F0; # LATIN CAPITAL LETTER ETH +00D1; C; 00F1; # LATIN CAPITAL LETTER N WITH TILDE +00D2; C; 00F2; # LATIN CAPITAL LETTER O WITH GRAVE +00D3; C; 00F3; # LATIN CAPITAL LETTER O WITH ACUTE +00D4; C; 00F4; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX +00D5; C; 00F5; # LATIN CAPITAL LETTER O WITH TILDE +00D6; C; 00F6; # LATIN CAPITAL LETTER O WITH DIAERESIS +00D8; C; 00F8; # LATIN CAPITAL LETTER O WITH STROKE +00D9; C; 00F9; # LATIN CAPITAL LETTER U WITH GRAVE +00DA; C; 00FA; # LATIN CAPITAL LETTER U WITH ACUTE +00DB; C; 00FB; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX +00DC; C; 00FC; # LATIN CAPITAL LETTER U WITH DIAERESIS +00DD; C; 00FD; # LATIN CAPITAL LETTER Y WITH ACUTE +00DE; C; 00FE; # LATIN CAPITAL LETTER THORN +00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S +0100; C; 0101; # LATIN CAPITAL LETTER A WITH MACRON +0102; C; 0103; # LATIN CAPITAL LETTER A WITH BREVE +0104; C; 0105; # LATIN CAPITAL LETTER A WITH OGONEK +0106; C; 0107; # LATIN CAPITAL LETTER C WITH ACUTE +0108; C; 0109; # LATIN CAPITAL LETTER C WITH CIRCUMFLEX +010A; C; 010B; # LATIN CAPITAL LETTER C WITH DOT ABOVE +010C; C; 010D; # LATIN CAPITAL LETTER C WITH CARON +010E; C; 010F; # LATIN CAPITAL LETTER D WITH CARON +0110; C; 0111; # LATIN CAPITAL LETTER D WITH STROKE +0112; C; 0113; # LATIN CAPITAL LETTER E WITH MACRON +0114; C; 0115; # LATIN CAPITAL LETTER E WITH BREVE +0116; C; 0117; # LATIN CAPITAL LETTER E WITH DOT ABOVE +0118; C; 0119; # LATIN CAPITAL LETTER E WITH OGONEK +011A; C; 011B; # LATIN CAPITAL LETTER E WITH CARON +011C; C; 011D; # LATIN CAPITAL LETTER G WITH CIRCUMFLEX +011E; C; 011F; # LATIN CAPITAL LETTER G WITH BREVE +0120; C; 0121; # LATIN CAPITAL LETTER G WITH DOT ABOVE +0122; C; 0123; # LATIN CAPITAL LETTER G WITH CEDILLA +0124; C; 0125; # LATIN CAPITAL LETTER H WITH CIRCUMFLEX +0126; C; 0127; # LATIN CAPITAL LETTER H WITH STROKE +0128; C; 0129; # LATIN CAPITAL LETTER I WITH TILDE +012A; C; 012B; # LATIN CAPITAL LETTER I WITH MACRON +012C; C; 012D; # LATIN CAPITAL LETTER I WITH BREVE +012E; C; 012F; # LATIN CAPITAL LETTER I WITH OGONEK +0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE +0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE +0132; C; 0133; # LATIN CAPITAL LIGATURE IJ +0134; C; 0135; # LATIN CAPITAL LETTER J WITH CIRCUMFLEX +0136; C; 0137; # LATIN CAPITAL LETTER K WITH CEDILLA +0139; C; 013A; # LATIN CAPITAL LETTER L WITH ACUTE +013B; C; 013C; # LATIN CAPITAL LETTER L WITH CEDILLA +013D; C; 013E; # LATIN CAPITAL LETTER L WITH CARON +013F; C; 0140; # LATIN CAPITAL LETTER L WITH MIDDLE DOT +0141; C; 0142; # LATIN CAPITAL LETTER L WITH STROKE +0143; C; 0144; # LATIN CAPITAL LETTER N WITH ACUTE +0145; C; 0146; # LATIN CAPITAL LETTER N WITH CEDILLA +0147; C; 0148; # LATIN CAPITAL LETTER N WITH CARON +0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE +014A; C; 014B; # LATIN CAPITAL LETTER ENG +014C; C; 014D; # LATIN CAPITAL LETTER O WITH MACRON +014E; C; 014F; # LATIN CAPITAL LETTER O WITH BREVE +0150; C; 0151; # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE +0152; C; 0153; # LATIN CAPITAL LIGATURE OE +0154; C; 0155; # LATIN CAPITAL LETTER R WITH ACUTE +0156; C; 0157; # LATIN CAPITAL LETTER R WITH CEDILLA +0158; C; 0159; # LATIN CAPITAL LETTER R WITH CARON +015A; C; 015B; # LATIN CAPITAL LETTER S WITH ACUTE +015C; C; 015D; # LATIN CAPITAL LETTER S WITH CIRCUMFLEX +015E; C; 015F; # LATIN CAPITAL LETTER S WITH CEDILLA +0160; C; 0161; # LATIN CAPITAL LETTER S WITH CARON +0162; C; 0163; # LATIN CAPITAL LETTER T WITH CEDILLA +0164; C; 0165; # LATIN CAPITAL LETTER T WITH CARON +0166; C; 0167; # LATIN CAPITAL LETTER T WITH STROKE +0168; C; 0169; # LATIN CAPITAL LETTER U WITH TILDE +016A; C; 016B; # LATIN CAPITAL LETTER U WITH MACRON +016C; C; 016D; # LATIN CAPITAL LETTER U WITH BREVE +016E; C; 016F; # LATIN CAPITAL LETTER U WITH RING ABOVE +0170; C; 0171; # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE +0172; C; 0173; # LATIN CAPITAL LETTER U WITH OGONEK +0174; C; 0175; # LATIN CAPITAL LETTER W WITH CIRCUMFLEX +0176; C; 0177; # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX +0178; C; 00FF; # LATIN CAPITAL LETTER Y WITH DIAERESIS +0179; C; 017A; # LATIN CAPITAL LETTER Z WITH ACUTE +017B; C; 017C; # LATIN CAPITAL LETTER Z WITH DOT ABOVE +017D; C; 017E; # LATIN CAPITAL LETTER Z WITH CARON +017F; C; 0073; # LATIN SMALL LETTER LONG S +0181; C; 0253; # LATIN CAPITAL LETTER B WITH HOOK +0182; C; 0183; # LATIN CAPITAL LETTER B WITH TOPBAR +0184; C; 0185; # LATIN CAPITAL LETTER TONE SIX +0186; C; 0254; # LATIN CAPITAL LETTER OPEN O +0187; C; 0188; # LATIN CAPITAL LETTER C WITH HOOK +0189; C; 0256; # LATIN CAPITAL LETTER AFRICAN D +018A; C; 0257; # LATIN CAPITAL LETTER D WITH HOOK +018B; C; 018C; # LATIN CAPITAL LETTER D WITH TOPBAR +018E; C; 01DD; # LATIN CAPITAL LETTER REVERSED E +018F; C; 0259; # LATIN CAPITAL LETTER SCHWA +0190; C; 025B; # LATIN CAPITAL LETTER OPEN E +0191; C; 0192; # LATIN CAPITAL LETTER F WITH HOOK +0193; C; 0260; # LATIN CAPITAL LETTER G WITH HOOK +0194; C; 0263; # LATIN CAPITAL LETTER GAMMA +0196; C; 0269; # LATIN CAPITAL LETTER IOTA +0197; C; 0268; # LATIN CAPITAL LETTER I WITH STROKE +0198; C; 0199; # LATIN CAPITAL LETTER K WITH HOOK +019C; C; 026F; # LATIN CAPITAL LETTER TURNED M +019D; C; 0272; # LATIN CAPITAL LETTER N WITH LEFT HOOK +019F; C; 0275; # LATIN CAPITAL LETTER O WITH MIDDLE TILDE +01A0; C; 01A1; # LATIN CAPITAL LETTER O WITH HORN +01A2; C; 01A3; # LATIN CAPITAL LETTER OI +01A4; C; 01A5; # LATIN CAPITAL LETTER P WITH HOOK +01A6; C; 0280; # LATIN LETTER YR +01A7; C; 01A8; # LATIN CAPITAL LETTER TONE TWO +01A9; C; 0283; # LATIN CAPITAL LETTER ESH +01AC; C; 01AD; # LATIN CAPITAL LETTER T WITH HOOK +01AE; C; 0288; # LATIN CAPITAL LETTER T WITH RETROFLEX HOOK +01AF; C; 01B0; # LATIN CAPITAL LETTER U WITH HORN +01B1; C; 028A; # LATIN CAPITAL LETTER UPSILON +01B2; C; 028B; # LATIN CAPITAL LETTER V WITH HOOK +01B3; C; 01B4; # LATIN CAPITAL LETTER Y WITH HOOK +01B5; C; 01B6; # LATIN CAPITAL LETTER Z WITH STROKE +01B7; C; 0292; # LATIN CAPITAL LETTER EZH +01B8; C; 01B9; # LATIN CAPITAL LETTER EZH REVERSED +01BC; C; 01BD; # LATIN CAPITAL LETTER TONE FIVE +01C4; C; 01C6; # LATIN CAPITAL LETTER DZ WITH CARON +01C5; C; 01C6; # LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON +01C7; C; 01C9; # LATIN CAPITAL LETTER LJ +01C8; C; 01C9; # LATIN CAPITAL LETTER L WITH SMALL LETTER J +01CA; C; 01CC; # LATIN CAPITAL LETTER NJ +01CB; C; 01CC; # LATIN CAPITAL LETTER N WITH SMALL LETTER J +01CD; C; 01CE; # LATIN CAPITAL LETTER A WITH CARON +01CF; C; 01D0; # LATIN CAPITAL LETTER I WITH CARON +01D1; C; 01D2; # LATIN CAPITAL LETTER O WITH CARON +01D3; C; 01D4; # LATIN CAPITAL LETTER U WITH CARON +01D5; C; 01D6; # LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON +01D7; C; 01D8; # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE +01D9; C; 01DA; # LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON +01DB; C; 01DC; # LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE +01DE; C; 01DF; # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON +01E0; C; 01E1; # LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON +01E2; C; 01E3; # LATIN CAPITAL LETTER AE WITH MACRON +01E4; C; 01E5; # LATIN CAPITAL LETTER G WITH STROKE +01E6; C; 01E7; # LATIN CAPITAL LETTER G WITH CARON +01E8; C; 01E9; # LATIN CAPITAL LETTER K WITH CARON +01EA; C; 01EB; # LATIN CAPITAL LETTER O WITH OGONEK +01EC; C; 01ED; # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON +01EE; C; 01EF; # LATIN CAPITAL LETTER EZH WITH CARON +01F0; F; 006A 030C; # LATIN SMALL LETTER J WITH CARON +01F1; C; 01F3; # LATIN CAPITAL LETTER DZ +01F2; C; 01F3; # LATIN CAPITAL LETTER D WITH SMALL LETTER Z +01F4; C; 01F5; # LATIN CAPITAL LETTER G WITH ACUTE +01F6; C; 0195; # LATIN CAPITAL LETTER HWAIR +01F7; C; 01BF; # LATIN CAPITAL LETTER WYNN +01F8; C; 01F9; # LATIN CAPITAL LETTER N WITH GRAVE +01FA; C; 01FB; # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE +01FC; C; 01FD; # LATIN CAPITAL LETTER AE WITH ACUTE +01FE; C; 01FF; # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE +0200; C; 0201; # LATIN CAPITAL LETTER A WITH DOUBLE GRAVE +0202; C; 0203; # LATIN CAPITAL LETTER A WITH INVERTED BREVE +0204; C; 0205; # LATIN CAPITAL LETTER E WITH DOUBLE GRAVE +0206; C; 0207; # LATIN CAPITAL LETTER E WITH INVERTED BREVE +0208; C; 0209; # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE +020A; C; 020B; # LATIN CAPITAL LETTER I WITH INVERTED BREVE +020C; C; 020D; # LATIN CAPITAL LETTER O WITH DOUBLE GRAVE +020E; C; 020F; # LATIN CAPITAL LETTER O WITH INVERTED BREVE +0210; C; 0211; # LATIN CAPITAL LETTER R WITH DOUBLE GRAVE +0212; C; 0213; # LATIN CAPITAL LETTER R WITH INVERTED BREVE +0214; C; 0215; # LATIN CAPITAL LETTER U WITH DOUBLE GRAVE +0216; C; 0217; # LATIN CAPITAL LETTER U WITH INVERTED BREVE +0218; C; 0219; # LATIN CAPITAL LETTER S WITH COMMA BELOW +021A; C; 021B; # LATIN CAPITAL LETTER T WITH COMMA BELOW +021C; C; 021D; # LATIN CAPITAL LETTER YOGH +021E; C; 021F; # LATIN CAPITAL LETTER H WITH CARON +0220; C; 019E; # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG +0222; C; 0223; # LATIN CAPITAL LETTER OU +0224; C; 0225; # LATIN CAPITAL LETTER Z WITH HOOK +0226; C; 0227; # LATIN CAPITAL LETTER A WITH DOT ABOVE +0228; C; 0229; # LATIN CAPITAL LETTER E WITH CEDILLA +022A; C; 022B; # LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON +022C; C; 022D; # LATIN CAPITAL LETTER O WITH TILDE AND MACRON +022E; C; 022F; # LATIN CAPITAL LETTER O WITH DOT ABOVE +0230; C; 0231; # LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON +0232; C; 0233; # LATIN CAPITAL LETTER Y WITH MACRON +023A; C; 2C65; # LATIN CAPITAL LETTER A WITH STROKE +023B; C; 023C; # LATIN CAPITAL LETTER C WITH STROKE +023D; C; 019A; # LATIN CAPITAL LETTER L WITH BAR +023E; C; 2C66; # LATIN CAPITAL LETTER T WITH DIAGONAL STROKE +0241; C; 0242; # LATIN CAPITAL LETTER GLOTTAL STOP +0243; C; 0180; # LATIN CAPITAL LETTER B WITH STROKE +0244; C; 0289; # LATIN CAPITAL LETTER U BAR +0245; C; 028C; # LATIN CAPITAL LETTER TURNED V +0246; C; 0247; # LATIN CAPITAL LETTER E WITH STROKE +0248; C; 0249; # LATIN CAPITAL LETTER J WITH STROKE +024A; C; 024B; # LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL +024C; C; 024D; # LATIN CAPITAL LETTER R WITH STROKE +024E; C; 024F; # LATIN CAPITAL LETTER Y WITH STROKE +0345; C; 03B9; # COMBINING GREEK YPOGEGRAMMENI +0370; C; 0371; # GREEK CAPITAL LETTER HETA +0372; C; 0373; # GREEK CAPITAL LETTER ARCHAIC SAMPI +0376; C; 0377; # GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA +037F; C; 03F3; # GREEK CAPITAL LETTER YOT +0386; C; 03AC; # GREEK CAPITAL LETTER ALPHA WITH TONOS +0388; C; 03AD; # GREEK CAPITAL LETTER EPSILON WITH TONOS +0389; C; 03AE; # GREEK CAPITAL LETTER ETA WITH TONOS +038A; C; 03AF; # GREEK CAPITAL LETTER IOTA WITH TONOS +038C; C; 03CC; # GREEK CAPITAL LETTER OMICRON WITH TONOS +038E; C; 03CD; # GREEK CAPITAL LETTER UPSILON WITH TONOS +038F; C; 03CE; # GREEK CAPITAL LETTER OMEGA WITH TONOS +0390; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS +0391; C; 03B1; # GREEK CAPITAL LETTER ALPHA +0392; C; 03B2; # GREEK CAPITAL LETTER BETA +0393; C; 03B3; # GREEK CAPITAL LETTER GAMMA +0394; C; 03B4; # GREEK CAPITAL LETTER DELTA +0395; C; 03B5; # GREEK CAPITAL LETTER EPSILON +0396; C; 03B6; # GREEK CAPITAL LETTER ZETA +0397; C; 03B7; # GREEK CAPITAL LETTER ETA +0398; C; 03B8; # GREEK CAPITAL LETTER THETA +0399; C; 03B9; # GREEK CAPITAL LETTER IOTA +039A; C; 03BA; # GREEK CAPITAL LETTER KAPPA +039B; C; 03BB; # GREEK CAPITAL LETTER LAMDA +039C; C; 03BC; # GREEK CAPITAL LETTER MU +039D; C; 03BD; # GREEK CAPITAL LETTER NU +039E; C; 03BE; # GREEK CAPITAL LETTER XI +039F; C; 03BF; # GREEK CAPITAL LETTER OMICRON +03A0; C; 03C0; # GREEK CAPITAL LETTER PI +03A1; C; 03C1; # GREEK CAPITAL LETTER RHO +03A3; C; 03C3; # GREEK CAPITAL LETTER SIGMA +03A4; C; 03C4; # GREEK CAPITAL LETTER TAU +03A5; C; 03C5; # GREEK CAPITAL LETTER UPSILON +03A6; C; 03C6; # GREEK CAPITAL LETTER PHI +03A7; C; 03C7; # GREEK CAPITAL LETTER CHI +03A8; C; 03C8; # GREEK CAPITAL LETTER PSI +03A9; C; 03C9; # GREEK CAPITAL LETTER OMEGA +03AA; C; 03CA; # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA +03AB; C; 03CB; # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA +03B0; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS +03C2; C; 03C3; # GREEK SMALL LETTER FINAL SIGMA +03CF; C; 03D7; # GREEK CAPITAL KAI SYMBOL +03D0; C; 03B2; # GREEK BETA SYMBOL +03D1; C; 03B8; # GREEK THETA SYMBOL +03D5; C; 03C6; # GREEK PHI SYMBOL +03D6; C; 03C0; # GREEK PI SYMBOL +03D8; C; 03D9; # GREEK LETTER ARCHAIC KOPPA +03DA; C; 03DB; # GREEK LETTER STIGMA +03DC; C; 03DD; # GREEK LETTER DIGAMMA +03DE; C; 03DF; # GREEK LETTER KOPPA +03E0; C; 03E1; # GREEK LETTER SAMPI +03E2; C; 03E3; # COPTIC CAPITAL LETTER SHEI +03E4; C; 03E5; # COPTIC CAPITAL LETTER FEI +03E6; C; 03E7; # COPTIC CAPITAL LETTER KHEI +03E8; C; 03E9; # COPTIC CAPITAL LETTER HORI +03EA; C; 03EB; # COPTIC CAPITAL LETTER GANGIA +03EC; C; 03ED; # COPTIC CAPITAL LETTER SHIMA +03EE; C; 03EF; # COPTIC CAPITAL LETTER DEI +03F0; C; 03BA; # GREEK KAPPA SYMBOL +03F1; C; 03C1; # GREEK RHO SYMBOL +03F4; C; 03B8; # GREEK CAPITAL THETA SYMBOL +03F5; C; 03B5; # GREEK LUNATE EPSILON SYMBOL +03F7; C; 03F8; # GREEK CAPITAL LETTER SHO +03F9; C; 03F2; # GREEK CAPITAL LUNATE SIGMA SYMBOL +03FA; C; 03FB; # GREEK CAPITAL LETTER SAN +03FD; C; 037B; # GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL +03FE; C; 037C; # GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL +03FF; C; 037D; # GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL +0400; C; 0450; # CYRILLIC CAPITAL LETTER IE WITH GRAVE +0401; C; 0451; # CYRILLIC CAPITAL LETTER IO +0402; C; 0452; # CYRILLIC CAPITAL LETTER DJE +0403; C; 0453; # CYRILLIC CAPITAL LETTER GJE +0404; C; 0454; # CYRILLIC CAPITAL LETTER UKRAINIAN IE +0405; C; 0455; # CYRILLIC CAPITAL LETTER DZE +0406; C; 0456; # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I +0407; C; 0457; # CYRILLIC CAPITAL LETTER YI +0408; C; 0458; # CYRILLIC CAPITAL LETTER JE +0409; C; 0459; # CYRILLIC CAPITAL LETTER LJE +040A; C; 045A; # CYRILLIC CAPITAL LETTER NJE +040B; C; 045B; # CYRILLIC CAPITAL LETTER TSHE +040C; C; 045C; # CYRILLIC CAPITAL LETTER KJE +040D; C; 045D; # CYRILLIC CAPITAL LETTER I WITH GRAVE +040E; C; 045E; # CYRILLIC CAPITAL LETTER SHORT U +040F; C; 045F; # CYRILLIC CAPITAL LETTER DZHE +0410; C; 0430; # CYRILLIC CAPITAL LETTER A +0411; C; 0431; # CYRILLIC CAPITAL LETTER BE +0412; C; 0432; # CYRILLIC CAPITAL LETTER VE +0413; C; 0433; # CYRILLIC CAPITAL LETTER GHE +0414; C; 0434; # CYRILLIC CAPITAL LETTER DE +0415; C; 0435; # CYRILLIC CAPITAL LETTER IE +0416; C; 0436; # CYRILLIC CAPITAL LETTER ZHE +0417; C; 0437; # CYRILLIC CAPITAL LETTER ZE +0418; C; 0438; # CYRILLIC CAPITAL LETTER I +0419; C; 0439; # CYRILLIC CAPITAL LETTER SHORT I +041A; C; 043A; # CYRILLIC CAPITAL LETTER KA +041B; C; 043B; # CYRILLIC CAPITAL LETTER EL +041C; C; 043C; # CYRILLIC CAPITAL LETTER EM +041D; C; 043D; # CYRILLIC CAPITAL LETTER EN +041E; C; 043E; # CYRILLIC CAPITAL LETTER O +041F; C; 043F; # CYRILLIC CAPITAL LETTER PE +0420; C; 0440; # CYRILLIC CAPITAL LETTER ER +0421; C; 0441; # CYRILLIC CAPITAL LETTER ES +0422; C; 0442; # CYRILLIC CAPITAL LETTER TE +0423; C; 0443; # CYRILLIC CAPITAL LETTER U +0424; C; 0444; # CYRILLIC CAPITAL LETTER EF +0425; C; 0445; # CYRILLIC CAPITAL LETTER HA +0426; C; 0446; # CYRILLIC CAPITAL LETTER TSE +0427; C; 0447; # CYRILLIC CAPITAL LETTER CHE +0428; C; 0448; # CYRILLIC CAPITAL LETTER SHA +0429; C; 0449; # CYRILLIC CAPITAL LETTER SHCHA +042A; C; 044A; # CYRILLIC CAPITAL LETTER HARD SIGN +042B; C; 044B; # CYRILLIC CAPITAL LETTER YERU +042C; C; 044C; # CYRILLIC CAPITAL LETTER SOFT SIGN +042D; C; 044D; # CYRILLIC CAPITAL LETTER E +042E; C; 044E; # CYRILLIC CAPITAL LETTER YU +042F; C; 044F; # CYRILLIC CAPITAL LETTER YA +0460; C; 0461; # CYRILLIC CAPITAL LETTER OMEGA +0462; C; 0463; # CYRILLIC CAPITAL LETTER YAT +0464; C; 0465; # CYRILLIC CAPITAL LETTER IOTIFIED E +0466; C; 0467; # CYRILLIC CAPITAL LETTER LITTLE YUS +0468; C; 0469; # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS +046A; C; 046B; # CYRILLIC CAPITAL LETTER BIG YUS +046C; C; 046D; # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS +046E; C; 046F; # CYRILLIC CAPITAL LETTER KSI +0470; C; 0471; # CYRILLIC CAPITAL LETTER PSI +0472; C; 0473; # CYRILLIC CAPITAL LETTER FITA +0474; C; 0475; # CYRILLIC CAPITAL LETTER IZHITSA +0476; C; 0477; # CYRILLIC CAPITAL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT +0478; C; 0479; # CYRILLIC CAPITAL LETTER UK +047A; C; 047B; # CYRILLIC CAPITAL LETTER ROUND OMEGA +047C; C; 047D; # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO +047E; C; 047F; # CYRILLIC CAPITAL LETTER OT +0480; C; 0481; # CYRILLIC CAPITAL LETTER KOPPA +048A; C; 048B; # CYRILLIC CAPITAL LETTER SHORT I WITH TAIL +048C; C; 048D; # CYRILLIC CAPITAL LETTER SEMISOFT SIGN +048E; C; 048F; # CYRILLIC CAPITAL LETTER ER WITH TICK +0490; C; 0491; # CYRILLIC CAPITAL LETTER GHE WITH UPTURN +0492; C; 0493; # CYRILLIC CAPITAL LETTER GHE WITH STROKE +0494; C; 0495; # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK +0496; C; 0497; # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER +0498; C; 0499; # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER +049A; C; 049B; # CYRILLIC CAPITAL LETTER KA WITH DESCENDER +049C; C; 049D; # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE +049E; C; 049F; # CYRILLIC CAPITAL LETTER KA WITH STROKE +04A0; C; 04A1; # CYRILLIC CAPITAL LETTER BASHKIR KA +04A2; C; 04A3; # CYRILLIC CAPITAL LETTER EN WITH DESCENDER +04A4; C; 04A5; # CYRILLIC CAPITAL LIGATURE EN GHE +04A6; C; 04A7; # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK +04A8; C; 04A9; # CYRILLIC CAPITAL LETTER ABKHASIAN HA +04AA; C; 04AB; # CYRILLIC CAPITAL LETTER ES WITH DESCENDER +04AC; C; 04AD; # CYRILLIC CAPITAL LETTER TE WITH DESCENDER +04AE; C; 04AF; # CYRILLIC CAPITAL LETTER STRAIGHT U +04B0; C; 04B1; # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE +04B2; C; 04B3; # CYRILLIC CAPITAL LETTER HA WITH DESCENDER +04B4; C; 04B5; # CYRILLIC CAPITAL LIGATURE TE TSE +04B6; C; 04B7; # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER +04B8; C; 04B9; # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE +04BA; C; 04BB; # CYRILLIC CAPITAL LETTER SHHA +04BC; C; 04BD; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE +04BE; C; 04BF; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER +04C0; C; 04CF; # CYRILLIC LETTER PALOCHKA +04C1; C; 04C2; # CYRILLIC CAPITAL LETTER ZHE WITH BREVE +04C3; C; 04C4; # CYRILLIC CAPITAL LETTER KA WITH HOOK +04C5; C; 04C6; # CYRILLIC CAPITAL LETTER EL WITH TAIL +04C7; C; 04C8; # CYRILLIC CAPITAL LETTER EN WITH HOOK +04C9; C; 04CA; # CYRILLIC CAPITAL LETTER EN WITH TAIL +04CB; C; 04CC; # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE +04CD; C; 04CE; # CYRILLIC CAPITAL LETTER EM WITH TAIL +04D0; C; 04D1; # CYRILLIC CAPITAL LETTER A WITH BREVE +04D2; C; 04D3; # CYRILLIC CAPITAL LETTER A WITH DIAERESIS +04D4; C; 04D5; # CYRILLIC CAPITAL LIGATURE A IE +04D6; C; 04D7; # CYRILLIC CAPITAL LETTER IE WITH BREVE +04D8; C; 04D9; # CYRILLIC CAPITAL LETTER SCHWA +04DA; C; 04DB; # CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS +04DC; C; 04DD; # CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS +04DE; C; 04DF; # CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS +04E0; C; 04E1; # CYRILLIC CAPITAL LETTER ABKHASIAN DZE +04E2; C; 04E3; # CYRILLIC CAPITAL LETTER I WITH MACRON +04E4; C; 04E5; # CYRILLIC CAPITAL LETTER I WITH DIAERESIS +04E6; C; 04E7; # CYRILLIC CAPITAL LETTER O WITH DIAERESIS +04E8; C; 04E9; # CYRILLIC CAPITAL LETTER BARRED O +04EA; C; 04EB; # CYRILLIC CAPITAL LETTER BARRED O WITH DIAERESIS +04EC; C; 04ED; # CYRILLIC CAPITAL LETTER E WITH DIAERESIS +04EE; C; 04EF; # CYRILLIC CAPITAL LETTER U WITH MACRON +04F0; C; 04F1; # CYRILLIC CAPITAL LETTER U WITH DIAERESIS +04F2; C; 04F3; # CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE +04F4; C; 04F5; # CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS +04F6; C; 04F7; # CYRILLIC CAPITAL LETTER GHE WITH DESCENDER +04F8; C; 04F9; # CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS +04FA; C; 04FB; # CYRILLIC CAPITAL LETTER GHE WITH STROKE AND HOOK +04FC; C; 04FD; # CYRILLIC CAPITAL LETTER HA WITH HOOK +04FE; C; 04FF; # CYRILLIC CAPITAL LETTER HA WITH STROKE +0500; C; 0501; # CYRILLIC CAPITAL LETTER KOMI DE +0502; C; 0503; # CYRILLIC CAPITAL LETTER KOMI DJE +0504; C; 0505; # CYRILLIC CAPITAL LETTER KOMI ZJE +0506; C; 0507; # CYRILLIC CAPITAL LETTER KOMI DZJE +0508; C; 0509; # CYRILLIC CAPITAL LETTER KOMI LJE +050A; C; 050B; # CYRILLIC CAPITAL LETTER KOMI NJE +050C; C; 050D; # CYRILLIC CAPITAL LETTER KOMI SJE +050E; C; 050F; # CYRILLIC CAPITAL LETTER KOMI TJE +0510; C; 0511; # CYRILLIC CAPITAL LETTER REVERSED ZE +0512; C; 0513; # CYRILLIC CAPITAL LETTER EL WITH HOOK +0514; C; 0515; # CYRILLIC CAPITAL LETTER LHA +0516; C; 0517; # CYRILLIC CAPITAL LETTER RHA +0518; C; 0519; # CYRILLIC CAPITAL LETTER YAE +051A; C; 051B; # CYRILLIC CAPITAL LETTER QA +051C; C; 051D; # CYRILLIC CAPITAL LETTER WE +051E; C; 051F; # CYRILLIC CAPITAL LETTER ALEUT KA +0520; C; 0521; # CYRILLIC CAPITAL LETTER EL WITH MIDDLE HOOK +0522; C; 0523; # CYRILLIC CAPITAL LETTER EN WITH MIDDLE HOOK +0524; C; 0525; # CYRILLIC CAPITAL LETTER PE WITH DESCENDER +0526; C; 0527; # CYRILLIC CAPITAL LETTER SHHA WITH DESCENDER +0528; C; 0529; # CYRILLIC CAPITAL LETTER EN WITH LEFT HOOK +052A; C; 052B; # CYRILLIC CAPITAL LETTER DZZHE +052C; C; 052D; # CYRILLIC CAPITAL LETTER DCHE +052E; C; 052F; # CYRILLIC CAPITAL LETTER EL WITH DESCENDER +0531; C; 0561; # ARMENIAN CAPITAL LETTER AYB +0532; C; 0562; # ARMENIAN CAPITAL LETTER BEN +0533; C; 0563; # ARMENIAN CAPITAL LETTER GIM +0534; C; 0564; # ARMENIAN CAPITAL LETTER DA +0535; C; 0565; # ARMENIAN CAPITAL LETTER ECH +0536; C; 0566; # ARMENIAN CAPITAL LETTER ZA +0537; C; 0567; # ARMENIAN CAPITAL LETTER EH +0538; C; 0568; # ARMENIAN CAPITAL LETTER ET +0539; C; 0569; # ARMENIAN CAPITAL LETTER TO +053A; C; 056A; # ARMENIAN CAPITAL LETTER ZHE +053B; C; 056B; # ARMENIAN CAPITAL LETTER INI +053C; C; 056C; # ARMENIAN CAPITAL LETTER LIWN +053D; C; 056D; # ARMENIAN CAPITAL LETTER XEH +053E; C; 056E; # ARMENIAN CAPITAL LETTER CA +053F; C; 056F; # ARMENIAN CAPITAL LETTER KEN +0540; C; 0570; # ARMENIAN CAPITAL LETTER HO +0541; C; 0571; # ARMENIAN CAPITAL LETTER JA +0542; C; 0572; # ARMENIAN CAPITAL LETTER GHAD +0543; C; 0573; # ARMENIAN CAPITAL LETTER CHEH +0544; C; 0574; # ARMENIAN CAPITAL LETTER MEN +0545; C; 0575; # ARMENIAN CAPITAL LETTER YI +0546; C; 0576; # ARMENIAN CAPITAL LETTER NOW +0547; C; 0577; # ARMENIAN CAPITAL LETTER SHA +0548; C; 0578; # ARMENIAN CAPITAL LETTER VO +0549; C; 0579; # ARMENIAN CAPITAL LETTER CHA +054A; C; 057A; # ARMENIAN CAPITAL LETTER PEH +054B; C; 057B; # ARMENIAN CAPITAL LETTER JHEH +054C; C; 057C; # ARMENIAN CAPITAL LETTER RA +054D; C; 057D; # ARMENIAN CAPITAL LETTER SEH +054E; C; 057E; # ARMENIAN CAPITAL LETTER VEW +054F; C; 057F; # ARMENIAN CAPITAL LETTER TIWN +0550; C; 0580; # ARMENIAN CAPITAL LETTER REH +0551; C; 0581; # ARMENIAN CAPITAL LETTER CO +0552; C; 0582; # ARMENIAN CAPITAL LETTER YIWN +0553; C; 0583; # ARMENIAN CAPITAL LETTER PIWR +0554; C; 0584; # ARMENIAN CAPITAL LETTER KEH +0555; C; 0585; # ARMENIAN CAPITAL LETTER OH +0556; C; 0586; # ARMENIAN CAPITAL LETTER FEH +0587; F; 0565 0582; # ARMENIAN SMALL LIGATURE ECH YIWN +10A0; C; 2D00; # GEORGIAN CAPITAL LETTER AN +10A1; C; 2D01; # GEORGIAN CAPITAL LETTER BAN +10A2; C; 2D02; # GEORGIAN CAPITAL LETTER GAN +10A3; C; 2D03; # GEORGIAN CAPITAL LETTER DON +10A4; C; 2D04; # GEORGIAN CAPITAL LETTER EN +10A5; C; 2D05; # GEORGIAN CAPITAL LETTER VIN +10A6; C; 2D06; # GEORGIAN CAPITAL LETTER ZEN +10A7; C; 2D07; # GEORGIAN CAPITAL LETTER TAN +10A8; C; 2D08; # GEORGIAN CAPITAL LETTER IN +10A9; C; 2D09; # GEORGIAN CAPITAL LETTER KAN +10AA; C; 2D0A; # GEORGIAN CAPITAL LETTER LAS +10AB; C; 2D0B; # GEORGIAN CAPITAL LETTER MAN +10AC; C; 2D0C; # GEORGIAN CAPITAL LETTER NAR +10AD; C; 2D0D; # GEORGIAN CAPITAL LETTER ON +10AE; C; 2D0E; # GEORGIAN CAPITAL LETTER PAR +10AF; C; 2D0F; # GEORGIAN CAPITAL LETTER ZHAR +10B0; C; 2D10; # GEORGIAN CAPITAL LETTER RAE +10B1; C; 2D11; # GEORGIAN CAPITAL LETTER SAN +10B2; C; 2D12; # GEORGIAN CAPITAL LETTER TAR +10B3; C; 2D13; # GEORGIAN CAPITAL LETTER UN +10B4; C; 2D14; # GEORGIAN CAPITAL LETTER PHAR +10B5; C; 2D15; # GEORGIAN CAPITAL LETTER KHAR +10B6; C; 2D16; # GEORGIAN CAPITAL LETTER GHAN +10B7; C; 2D17; # GEORGIAN CAPITAL LETTER QAR +10B8; C; 2D18; # GEORGIAN CAPITAL LETTER SHIN +10B9; C; 2D19; # GEORGIAN CAPITAL LETTER CHIN +10BA; C; 2D1A; # GEORGIAN CAPITAL LETTER CAN +10BB; C; 2D1B; # GEORGIAN CAPITAL LETTER JIL +10BC; C; 2D1C; # GEORGIAN CAPITAL LETTER CIL +10BD; C; 2D1D; # GEORGIAN CAPITAL LETTER CHAR +10BE; C; 2D1E; # GEORGIAN CAPITAL LETTER XAN +10BF; C; 2D1F; # GEORGIAN CAPITAL LETTER JHAN +10C0; C; 2D20; # GEORGIAN CAPITAL LETTER HAE +10C1; C; 2D21; # GEORGIAN CAPITAL LETTER HE +10C2; C; 2D22; # GEORGIAN CAPITAL LETTER HIE +10C3; C; 2D23; # GEORGIAN CAPITAL LETTER WE +10C4; C; 2D24; # GEORGIAN CAPITAL LETTER HAR +10C5; C; 2D25; # GEORGIAN CAPITAL LETTER HOE +10C7; C; 2D27; # GEORGIAN CAPITAL LETTER YN +10CD; C; 2D2D; # GEORGIAN CAPITAL LETTER AEN +13F8; C; 13F0; # CHEROKEE SMALL LETTER YE +13F9; C; 13F1; # CHEROKEE SMALL LETTER YI +13FA; C; 13F2; # CHEROKEE SMALL LETTER YO +13FB; C; 13F3; # CHEROKEE SMALL LETTER YU +13FC; C; 13F4; # CHEROKEE SMALL LETTER YV +13FD; C; 13F5; # CHEROKEE SMALL LETTER MV +1C80; C; 0432; # CYRILLIC SMALL LETTER ROUNDED VE +1C81; C; 0434; # CYRILLIC SMALL LETTER LONG-LEGGED DE +1C82; C; 043E; # CYRILLIC SMALL LETTER NARROW O +1C83; C; 0441; # CYRILLIC SMALL LETTER WIDE ES +1C84; C; 0442; # CYRILLIC SMALL LETTER TALL TE +1C85; C; 0442; # CYRILLIC SMALL LETTER THREE-LEGGED TE +1C86; C; 044A; # CYRILLIC SMALL LETTER TALL HARD SIGN +1C87; C; 0463; # CYRILLIC SMALL LETTER TALL YAT +1C88; C; A64B; # CYRILLIC SMALL LETTER UNBLENDED UK +1C90; C; 10D0; # GEORGIAN MTAVRULI CAPITAL LETTER AN +1C91; C; 10D1; # GEORGIAN MTAVRULI CAPITAL LETTER BAN +1C92; C; 10D2; # GEORGIAN MTAVRULI CAPITAL LETTER GAN +1C93; C; 10D3; # GEORGIAN MTAVRULI CAPITAL LETTER DON +1C94; C; 10D4; # GEORGIAN MTAVRULI CAPITAL LETTER EN +1C95; C; 10D5; # GEORGIAN MTAVRULI CAPITAL LETTER VIN +1C96; C; 10D6; # GEORGIAN MTAVRULI CAPITAL LETTER ZEN +1C97; C; 10D7; # GEORGIAN MTAVRULI CAPITAL LETTER TAN +1C98; C; 10D8; # GEORGIAN MTAVRULI CAPITAL LETTER IN +1C99; C; 10D9; # GEORGIAN MTAVRULI CAPITAL LETTER KAN +1C9A; C; 10DA; # GEORGIAN MTAVRULI CAPITAL LETTER LAS +1C9B; C; 10DB; # GEORGIAN MTAVRULI CAPITAL LETTER MAN +1C9C; C; 10DC; # GEORGIAN MTAVRULI CAPITAL LETTER NAR +1C9D; C; 10DD; # GEORGIAN MTAVRULI CAPITAL LETTER ON +1C9E; C; 10DE; # GEORGIAN MTAVRULI CAPITAL LETTER PAR +1C9F; C; 10DF; # GEORGIAN MTAVRULI CAPITAL LETTER ZHAR +1CA0; C; 10E0; # GEORGIAN MTAVRULI CAPITAL LETTER RAE +1CA1; C; 10E1; # GEORGIAN MTAVRULI CAPITAL LETTER SAN +1CA2; C; 10E2; # GEORGIAN MTAVRULI CAPITAL LETTER TAR +1CA3; C; 10E3; # GEORGIAN MTAVRULI CAPITAL LETTER UN +1CA4; C; 10E4; # GEORGIAN MTAVRULI CAPITAL LETTER PHAR +1CA5; C; 10E5; # GEORGIAN MTAVRULI CAPITAL LETTER KHAR +1CA6; C; 10E6; # GEORGIAN MTAVRULI CAPITAL LETTER GHAN +1CA7; C; 10E7; # GEORGIAN MTAVRULI CAPITAL LETTER QAR +1CA8; C; 10E8; # GEORGIAN MTAVRULI CAPITAL LETTER SHIN +1CA9; C; 10E9; # GEORGIAN MTAVRULI CAPITAL LETTER CHIN +1CAA; C; 10EA; # GEORGIAN MTAVRULI CAPITAL LETTER CAN +1CAB; C; 10EB; # GEORGIAN MTAVRULI CAPITAL LETTER JIL +1CAC; C; 10EC; # GEORGIAN MTAVRULI CAPITAL LETTER CIL +1CAD; C; 10ED; # GEORGIAN MTAVRULI CAPITAL LETTER CHAR +1CAE; C; 10EE; # GEORGIAN MTAVRULI CAPITAL LETTER XAN +1CAF; C; 10EF; # GEORGIAN MTAVRULI CAPITAL LETTER JHAN +1CB0; C; 10F0; # GEORGIAN MTAVRULI CAPITAL LETTER HAE +1CB1; C; 10F1; # GEORGIAN MTAVRULI CAPITAL LETTER HE +1CB2; C; 10F2; # GEORGIAN MTAVRULI CAPITAL LETTER HIE +1CB3; C; 10F3; # GEORGIAN MTAVRULI CAPITAL LETTER WE +1CB4; C; 10F4; # GEORGIAN MTAVRULI CAPITAL LETTER HAR +1CB5; C; 10F5; # GEORGIAN MTAVRULI CAPITAL LETTER HOE +1CB6; C; 10F6; # GEORGIAN MTAVRULI CAPITAL LETTER FI +1CB7; C; 10F7; # GEORGIAN MTAVRULI CAPITAL LETTER YN +1CB8; C; 10F8; # GEORGIAN MTAVRULI CAPITAL LETTER ELIFI +1CB9; C; 10F9; # GEORGIAN MTAVRULI CAPITAL LETTER TURNED GAN +1CBA; C; 10FA; # GEORGIAN MTAVRULI CAPITAL LETTER AIN +1CBD; C; 10FD; # GEORGIAN MTAVRULI CAPITAL LETTER AEN +1CBE; C; 10FE; # GEORGIAN MTAVRULI CAPITAL LETTER HARD SIGN +1CBF; C; 10FF; # GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN +1E00; C; 1E01; # LATIN CAPITAL LETTER A WITH RING BELOW +1E02; C; 1E03; # LATIN CAPITAL LETTER B WITH DOT ABOVE +1E04; C; 1E05; # LATIN CAPITAL LETTER B WITH DOT BELOW +1E06; C; 1E07; # LATIN CAPITAL LETTER B WITH LINE BELOW +1E08; C; 1E09; # LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE +1E0A; C; 1E0B; # LATIN CAPITAL LETTER D WITH DOT ABOVE +1E0C; C; 1E0D; # LATIN CAPITAL LETTER D WITH DOT BELOW +1E0E; C; 1E0F; # LATIN CAPITAL LETTER D WITH LINE BELOW +1E10; C; 1E11; # LATIN CAPITAL LETTER D WITH CEDILLA +1E12; C; 1E13; # LATIN CAPITAL LETTER D WITH CIRCUMFLEX BELOW +1E14; C; 1E15; # LATIN CAPITAL LETTER E WITH MACRON AND GRAVE +1E16; C; 1E17; # LATIN CAPITAL LETTER E WITH MACRON AND ACUTE +1E18; C; 1E19; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW +1E1A; C; 1E1B; # LATIN CAPITAL LETTER E WITH TILDE BELOW +1E1C; C; 1E1D; # LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE +1E1E; C; 1E1F; # LATIN CAPITAL LETTER F WITH DOT ABOVE +1E20; C; 1E21; # LATIN CAPITAL LETTER G WITH MACRON +1E22; C; 1E23; # LATIN CAPITAL LETTER H WITH DOT ABOVE +1E24; C; 1E25; # LATIN CAPITAL LETTER H WITH DOT BELOW +1E26; C; 1E27; # LATIN CAPITAL LETTER H WITH DIAERESIS +1E28; C; 1E29; # LATIN CAPITAL LETTER H WITH CEDILLA +1E2A; C; 1E2B; # LATIN CAPITAL LETTER H WITH BREVE BELOW +1E2C; C; 1E2D; # LATIN CAPITAL LETTER I WITH TILDE BELOW +1E2E; C; 1E2F; # LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE +1E30; C; 1E31; # LATIN CAPITAL LETTER K WITH ACUTE +1E32; C; 1E33; # LATIN CAPITAL LETTER K WITH DOT BELOW +1E34; C; 1E35; # LATIN CAPITAL LETTER K WITH LINE BELOW +1E36; C; 1E37; # LATIN CAPITAL LETTER L WITH DOT BELOW +1E38; C; 1E39; # LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON +1E3A; C; 1E3B; # LATIN CAPITAL LETTER L WITH LINE BELOW +1E3C; C; 1E3D; # LATIN CAPITAL LETTER L WITH CIRCUMFLEX BELOW +1E3E; C; 1E3F; # LATIN CAPITAL LETTER M WITH ACUTE +1E40; C; 1E41; # LATIN CAPITAL LETTER M WITH DOT ABOVE +1E42; C; 1E43; # LATIN CAPITAL LETTER M WITH DOT BELOW +1E44; C; 1E45; # LATIN CAPITAL LETTER N WITH DOT ABOVE +1E46; C; 1E47; # LATIN CAPITAL LETTER N WITH DOT BELOW +1E48; C; 1E49; # LATIN CAPITAL LETTER N WITH LINE BELOW +1E4A; C; 1E4B; # LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW +1E4C; C; 1E4D; # LATIN CAPITAL LETTER O WITH TILDE AND ACUTE +1E4E; C; 1E4F; # LATIN CAPITAL LETTER O WITH TILDE AND DIAERESIS +1E50; C; 1E51; # LATIN CAPITAL LETTER O WITH MACRON AND GRAVE +1E52; C; 1E53; # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE +1E54; C; 1E55; # LATIN CAPITAL LETTER P WITH ACUTE +1E56; C; 1E57; # LATIN CAPITAL LETTER P WITH DOT ABOVE +1E58; C; 1E59; # LATIN CAPITAL LETTER R WITH DOT ABOVE +1E5A; C; 1E5B; # LATIN CAPITAL LETTER R WITH DOT BELOW +1E5C; C; 1E5D; # LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON +1E5E; C; 1E5F; # LATIN CAPITAL LETTER R WITH LINE BELOW +1E60; C; 1E61; # LATIN CAPITAL LETTER S WITH DOT ABOVE +1E62; C; 1E63; # LATIN CAPITAL LETTER S WITH DOT BELOW +1E64; C; 1E65; # LATIN CAPITAL LETTER S WITH ACUTE AND DOT ABOVE +1E66; C; 1E67; # LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE +1E68; C; 1E69; # LATIN CAPITAL LETTER S WITH DOT BELOW AND DOT ABOVE +1E6A; C; 1E6B; # LATIN CAPITAL LETTER T WITH DOT ABOVE +1E6C; C; 1E6D; # LATIN CAPITAL LETTER T WITH DOT BELOW +1E6E; C; 1E6F; # LATIN CAPITAL LETTER T WITH LINE BELOW +1E70; C; 1E71; # LATIN CAPITAL LETTER T WITH CIRCUMFLEX BELOW +1E72; C; 1E73; # LATIN CAPITAL LETTER U WITH DIAERESIS BELOW +1E74; C; 1E75; # LATIN CAPITAL LETTER U WITH TILDE BELOW +1E76; C; 1E77; # LATIN CAPITAL LETTER U WITH CIRCUMFLEX BELOW +1E78; C; 1E79; # LATIN CAPITAL LETTER U WITH TILDE AND ACUTE +1E7A; C; 1E7B; # LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS +1E7C; C; 1E7D; # LATIN CAPITAL LETTER V WITH TILDE +1E7E; C; 1E7F; # LATIN CAPITAL LETTER V WITH DOT BELOW +1E80; C; 1E81; # LATIN CAPITAL LETTER W WITH GRAVE +1E82; C; 1E83; # LATIN CAPITAL LETTER W WITH ACUTE +1E84; C; 1E85; # LATIN CAPITAL LETTER W WITH DIAERESIS +1E86; C; 1E87; # LATIN CAPITAL LETTER W WITH DOT ABOVE +1E88; C; 1E89; # LATIN CAPITAL LETTER W WITH DOT BELOW +1E8A; C; 1E8B; # LATIN CAPITAL LETTER X WITH DOT ABOVE +1E8C; C; 1E8D; # LATIN CAPITAL LETTER X WITH DIAERESIS +1E8E; C; 1E8F; # LATIN CAPITAL LETTER Y WITH DOT ABOVE +1E90; C; 1E91; # LATIN CAPITAL LETTER Z WITH CIRCUMFLEX +1E92; C; 1E93; # LATIN CAPITAL LETTER Z WITH DOT BELOW +1E94; C; 1E95; # LATIN CAPITAL LETTER Z WITH LINE BELOW +1E96; F; 0068 0331; # LATIN SMALL LETTER H WITH LINE BELOW +1E97; F; 0074 0308; # LATIN SMALL LETTER T WITH DIAERESIS +1E98; F; 0077 030A; # LATIN SMALL LETTER W WITH RING ABOVE +1E99; F; 0079 030A; # LATIN SMALL LETTER Y WITH RING ABOVE +1E9A; F; 0061 02BE; # LATIN SMALL LETTER A WITH RIGHT HALF RING +1E9B; C; 1E61; # LATIN SMALL LETTER LONG S WITH DOT ABOVE +1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S +1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S +1EA0; C; 1EA1; # LATIN CAPITAL LETTER A WITH DOT BELOW +1EA2; C; 1EA3; # LATIN CAPITAL LETTER A WITH HOOK ABOVE +1EA4; C; 1EA5; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND ACUTE +1EA6; C; 1EA7; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE +1EA8; C; 1EA9; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE +1EAA; C; 1EAB; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND TILDE +1EAC; C; 1EAD; # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW +1EAE; C; 1EAF; # LATIN CAPITAL LETTER A WITH BREVE AND ACUTE +1EB0; C; 1EB1; # LATIN CAPITAL LETTER A WITH BREVE AND GRAVE +1EB2; C; 1EB3; # LATIN CAPITAL LETTER A WITH BREVE AND HOOK ABOVE +1EB4; C; 1EB5; # LATIN CAPITAL LETTER A WITH BREVE AND TILDE +1EB6; C; 1EB7; # LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW +1EB8; C; 1EB9; # LATIN CAPITAL LETTER E WITH DOT BELOW +1EBA; C; 1EBB; # LATIN CAPITAL LETTER E WITH HOOK ABOVE +1EBC; C; 1EBD; # LATIN CAPITAL LETTER E WITH TILDE +1EBE; C; 1EBF; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE +1EC0; C; 1EC1; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE +1EC2; C; 1EC3; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE +1EC4; C; 1EC5; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND TILDE +1EC6; C; 1EC7; # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND DOT BELOW +1EC8; C; 1EC9; # LATIN CAPITAL LETTER I WITH HOOK ABOVE +1ECA; C; 1ECB; # LATIN CAPITAL LETTER I WITH DOT BELOW +1ECC; C; 1ECD; # LATIN CAPITAL LETTER O WITH DOT BELOW +1ECE; C; 1ECF; # LATIN CAPITAL LETTER O WITH HOOK ABOVE +1ED0; C; 1ED1; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND ACUTE +1ED2; C; 1ED3; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND GRAVE +1ED4; C; 1ED5; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE +1ED6; C; 1ED7; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND TILDE +1ED8; C; 1ED9; # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND DOT BELOW +1EDA; C; 1EDB; # LATIN CAPITAL LETTER O WITH HORN AND ACUTE +1EDC; C; 1EDD; # LATIN CAPITAL LETTER O WITH HORN AND GRAVE +1EDE; C; 1EDF; # LATIN CAPITAL LETTER O WITH HORN AND HOOK ABOVE +1EE0; C; 1EE1; # LATIN CAPITAL LETTER O WITH HORN AND TILDE +1EE2; C; 1EE3; # LATIN CAPITAL LETTER O WITH HORN AND DOT BELOW +1EE4; C; 1EE5; # LATIN CAPITAL LETTER U WITH DOT BELOW +1EE6; C; 1EE7; # LATIN CAPITAL LETTER U WITH HOOK ABOVE +1EE8; C; 1EE9; # LATIN CAPITAL LETTER U WITH HORN AND ACUTE +1EEA; C; 1EEB; # LATIN CAPITAL LETTER U WITH HORN AND GRAVE +1EEC; C; 1EED; # LATIN CAPITAL LETTER U WITH HORN AND HOOK ABOVE +1EEE; C; 1EEF; # LATIN CAPITAL LETTER U WITH HORN AND TILDE +1EF0; C; 1EF1; # LATIN CAPITAL LETTER U WITH HORN AND DOT BELOW +1EF2; C; 1EF3; # LATIN CAPITAL LETTER Y WITH GRAVE +1EF4; C; 1EF5; # LATIN CAPITAL LETTER Y WITH DOT BELOW +1EF6; C; 1EF7; # LATIN CAPITAL LETTER Y WITH HOOK ABOVE +1EF8; C; 1EF9; # LATIN CAPITAL LETTER Y WITH TILDE +1EFA; C; 1EFB; # LATIN CAPITAL LETTER MIDDLE-WELSH LL +1EFC; C; 1EFD; # LATIN CAPITAL LETTER MIDDLE-WELSH V +1EFE; C; 1EFF; # LATIN CAPITAL LETTER Y WITH LOOP +1F08; C; 1F00; # GREEK CAPITAL LETTER ALPHA WITH PSILI +1F09; C; 1F01; # GREEK CAPITAL LETTER ALPHA WITH DASIA +1F0A; C; 1F02; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA +1F0B; C; 1F03; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA +1F0C; C; 1F04; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA +1F0D; C; 1F05; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA +1F0E; C; 1F06; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI +1F0F; C; 1F07; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI +1F18; C; 1F10; # GREEK CAPITAL LETTER EPSILON WITH PSILI +1F19; C; 1F11; # GREEK CAPITAL LETTER EPSILON WITH DASIA +1F1A; C; 1F12; # GREEK CAPITAL LETTER EPSILON WITH PSILI AND VARIA +1F1B; C; 1F13; # GREEK CAPITAL LETTER EPSILON WITH DASIA AND VARIA +1F1C; C; 1F14; # GREEK CAPITAL LETTER EPSILON WITH PSILI AND OXIA +1F1D; C; 1F15; # GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA +1F28; C; 1F20; # GREEK CAPITAL LETTER ETA WITH PSILI +1F29; C; 1F21; # GREEK CAPITAL LETTER ETA WITH DASIA +1F2A; C; 1F22; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA +1F2B; C; 1F23; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA +1F2C; C; 1F24; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA +1F2D; C; 1F25; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA +1F2E; C; 1F26; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI +1F2F; C; 1F27; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI +1F38; C; 1F30; # GREEK CAPITAL LETTER IOTA WITH PSILI +1F39; C; 1F31; # GREEK CAPITAL LETTER IOTA WITH DASIA +1F3A; C; 1F32; # GREEK CAPITAL LETTER IOTA WITH PSILI AND VARIA +1F3B; C; 1F33; # GREEK CAPITAL LETTER IOTA WITH DASIA AND VARIA +1F3C; C; 1F34; # GREEK CAPITAL LETTER IOTA WITH PSILI AND OXIA +1F3D; C; 1F35; # GREEK CAPITAL LETTER IOTA WITH DASIA AND OXIA +1F3E; C; 1F36; # GREEK CAPITAL LETTER IOTA WITH PSILI AND PERISPOMENI +1F3F; C; 1F37; # GREEK CAPITAL LETTER IOTA WITH DASIA AND PERISPOMENI +1F48; C; 1F40; # GREEK CAPITAL LETTER OMICRON WITH PSILI +1F49; C; 1F41; # GREEK CAPITAL LETTER OMICRON WITH DASIA +1F4A; C; 1F42; # GREEK CAPITAL LETTER OMICRON WITH PSILI AND VARIA +1F4B; C; 1F43; # GREEK CAPITAL LETTER OMICRON WITH DASIA AND VARIA +1F4C; C; 1F44; # GREEK CAPITAL LETTER OMICRON WITH PSILI AND OXIA +1F4D; C; 1F45; # GREEK CAPITAL LETTER OMICRON WITH DASIA AND OXIA +1F50; F; 03C5 0313; # GREEK SMALL LETTER UPSILON WITH PSILI +1F52; F; 03C5 0313 0300; # GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA +1F54; F; 03C5 0313 0301; # GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA +1F56; F; 03C5 0313 0342; # GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI +1F59; C; 1F51; # GREEK CAPITAL LETTER UPSILON WITH DASIA +1F5B; C; 1F53; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA +1F5D; C; 1F55; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA +1F5F; C; 1F57; # GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI +1F68; C; 1F60; # GREEK CAPITAL LETTER OMEGA WITH PSILI +1F69; C; 1F61; # GREEK CAPITAL LETTER OMEGA WITH DASIA +1F6A; C; 1F62; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA +1F6B; C; 1F63; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA +1F6C; C; 1F64; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA +1F6D; C; 1F65; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA +1F6E; C; 1F66; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI +1F6F; C; 1F67; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI +1F80; F; 1F00 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI +1F81; F; 1F01 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAMMENI +1F82; F; 1F02 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND YPOGEGRAMMENI +1F83; F; 1F03 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND VARIA AND YPOGEGRAMMENI +1F84; F; 1F04 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA AND YPOGEGRAMMENI +1F85; F; 1F05 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA AND YPOGEGRAMMENI +1F86; F; 1F06 03B9; # GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI +1F87; F; 1F07 03B9; # GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI +1F88; F; 1F00 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI +1F88; S; 1F80; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI +1F89; F; 1F01 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI +1F89; S; 1F81; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI +1F8A; F; 1F02 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI +1F8A; S; 1F82; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI +1F8B; F; 1F03 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI +1F8B; S; 1F83; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI +1F8C; F; 1F04 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI +1F8C; S; 1F84; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI +1F8D; F; 1F05 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI +1F8D; S; 1F85; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI +1F8E; F; 1F06 03B9; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI +1F8E; S; 1F86; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI +1F8F; F; 1F07 03B9; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1F8F; S; 1F87; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1F90; F; 1F20 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI +1F91; F; 1F21 03B9; # GREEK SMALL LETTER ETA WITH DASIA AND YPOGEGRAMMENI +1F92; F; 1F22 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND VARIA AND YPOGEGRAMMENI +1F93; F; 1F23 03B9; # GREEK SMALL LETTER ETA WITH DASIA AND VARIA AND YPOGEGRAMMENI +1F94; F; 1F24 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND OXIA AND YPOGEGRAMMENI +1F95; F; 1F25 03B9; # GREEK SMALL LETTER ETA WITH DASIA AND OXIA AND YPOGEGRAMMENI +1F96; F; 1F26 03B9; # GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI +1F97; F; 1F27 03B9; # GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI +1F98; F; 1F20 03B9; # GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI +1F98; S; 1F90; # GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI +1F99; F; 1F21 03B9; # GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI +1F99; S; 1F91; # GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI +1F9A; F; 1F22 03B9; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI +1F9A; S; 1F92; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI +1F9B; F; 1F23 03B9; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI +1F9B; S; 1F93; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI +1F9C; F; 1F24 03B9; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI +1F9C; S; 1F94; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI +1F9D; F; 1F25 03B9; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI +1F9D; S; 1F95; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI +1F9E; F; 1F26 03B9; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI +1F9E; S; 1F96; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI +1F9F; F; 1F27 03B9; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1F9F; S; 1F97; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1FA0; F; 1F60 03B9; # GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI +1FA1; F; 1F61 03B9; # GREEK SMALL LETTER OMEGA WITH DASIA AND YPOGEGRAMMENI +1FA2; F; 1F62 03B9; # GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI +1FA3; F; 1F63 03B9; # GREEK SMALL LETTER OMEGA WITH DASIA AND VARIA AND YPOGEGRAMMENI +1FA4; F; 1F64 03B9; # GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA AND YPOGEGRAMMENI +1FA5; F; 1F65 03B9; # GREEK SMALL LETTER OMEGA WITH DASIA AND OXIA AND YPOGEGRAMMENI +1FA6; F; 1F66 03B9; # GREEK SMALL LETTER OMEGA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI +1FA7; F; 1F67 03B9; # GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI +1FA8; F; 1F60 03B9; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI +1FA8; S; 1FA0; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI +1FA9; F; 1F61 03B9; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI +1FA9; S; 1FA1; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI +1FAA; F; 1F62 03B9; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI +1FAA; S; 1FA2; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI +1FAB; F; 1F63 03B9; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI +1FAB; S; 1FA3; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI +1FAC; F; 1F64 03B9; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI +1FAC; S; 1FA4; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI +1FAD; F; 1F65 03B9; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI +1FAD; S; 1FA5; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI +1FAE; F; 1F66 03B9; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI +1FAE; S; 1FA6; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI +1FAF; F; 1F67 03B9; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1FAF; S; 1FA7; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1FB2; F; 1F70 03B9; # GREEK SMALL LETTER ALPHA WITH VARIA AND YPOGEGRAMMENI +1FB3; F; 03B1 03B9; # GREEK SMALL LETTER ALPHA WITH YPOGEGRAMMENI +1FB4; F; 03AC 03B9; # GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI +1FB6; F; 03B1 0342; # GREEK SMALL LETTER ALPHA WITH PERISPOMENI +1FB7; F; 03B1 0342 03B9; # GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI +1FB8; C; 1FB0; # GREEK CAPITAL LETTER ALPHA WITH VRACHY +1FB9; C; 1FB1; # GREEK CAPITAL LETTER ALPHA WITH MACRON +1FBA; C; 1F70; # GREEK CAPITAL LETTER ALPHA WITH VARIA +1FBB; C; 1F71; # GREEK CAPITAL LETTER ALPHA WITH OXIA +1FBC; F; 03B1 03B9; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI +1FBC; S; 1FB3; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI +1FBE; C; 03B9; # GREEK PROSGEGRAMMENI +1FC2; F; 1F74 03B9; # GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI +1FC3; F; 03B7 03B9; # GREEK SMALL LETTER ETA WITH YPOGEGRAMMENI +1FC4; F; 03AE 03B9; # GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI +1FC6; F; 03B7 0342; # GREEK SMALL LETTER ETA WITH PERISPOMENI +1FC7; F; 03B7 0342 03B9; # GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI +1FC8; C; 1F72; # GREEK CAPITAL LETTER EPSILON WITH VARIA +1FC9; C; 1F73; # GREEK CAPITAL LETTER EPSILON WITH OXIA +1FCA; C; 1F74; # GREEK CAPITAL LETTER ETA WITH VARIA +1FCB; C; 1F75; # GREEK CAPITAL LETTER ETA WITH OXIA +1FCC; F; 03B7 03B9; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI +1FCC; S; 1FC3; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI +1FD2; F; 03B9 0308 0300; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND VARIA +1FD3; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA +1FD6; F; 03B9 0342; # GREEK SMALL LETTER IOTA WITH PERISPOMENI +1FD7; F; 03B9 0308 0342; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI +1FD8; C; 1FD0; # GREEK CAPITAL LETTER IOTA WITH VRACHY +1FD9; C; 1FD1; # GREEK CAPITAL LETTER IOTA WITH MACRON +1FDA; C; 1F76; # GREEK CAPITAL LETTER IOTA WITH VARIA +1FDB; C; 1F77; # GREEK CAPITAL LETTER IOTA WITH OXIA +1FE2; F; 03C5 0308 0300; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND VARIA +1FE3; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA +1FE4; F; 03C1 0313; # GREEK SMALL LETTER RHO WITH PSILI +1FE6; F; 03C5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI +1FE7; F; 03C5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI +1FE8; C; 1FE0; # GREEK CAPITAL LETTER UPSILON WITH VRACHY +1FE9; C; 1FE1; # GREEK CAPITAL LETTER UPSILON WITH MACRON +1FEA; C; 1F7A; # GREEK CAPITAL LETTER UPSILON WITH VARIA +1FEB; C; 1F7B; # GREEK CAPITAL LETTER UPSILON WITH OXIA +1FEC; C; 1FE5; # GREEK CAPITAL LETTER RHO WITH DASIA +1FF2; F; 1F7C 03B9; # GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI +1FF3; F; 03C9 03B9; # GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI +1FF4; F; 03CE 03B9; # GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI +1FF6; F; 03C9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI +1FF7; F; 03C9 0342 03B9; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI +1FF8; C; 1F78; # GREEK CAPITAL LETTER OMICRON WITH VARIA +1FF9; C; 1F79; # GREEK CAPITAL LETTER OMICRON WITH OXIA +1FFA; C; 1F7C; # GREEK CAPITAL LETTER OMEGA WITH VARIA +1FFB; C; 1F7D; # GREEK CAPITAL LETTER OMEGA WITH OXIA +1FFC; F; 03C9 03B9; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI +1FFC; S; 1FF3; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI +2126; C; 03C9; # OHM SIGN +212A; C; 006B; # KELVIN SIGN +212B; C; 00E5; # ANGSTROM SIGN +2132; C; 214E; # TURNED CAPITAL F +2160; C; 2170; # ROMAN NUMERAL ONE +2161; C; 2171; # ROMAN NUMERAL TWO +2162; C; 2172; # ROMAN NUMERAL THREE +2163; C; 2173; # ROMAN NUMERAL FOUR +2164; C; 2174; # ROMAN NUMERAL FIVE +2165; C; 2175; # ROMAN NUMERAL SIX +2166; C; 2176; # ROMAN NUMERAL SEVEN +2167; C; 2177; # ROMAN NUMERAL EIGHT +2168; C; 2178; # ROMAN NUMERAL NINE +2169; C; 2179; # ROMAN NUMERAL TEN +216A; C; 217A; # ROMAN NUMERAL ELEVEN +216B; C; 217B; # ROMAN NUMERAL TWELVE +216C; C; 217C; # ROMAN NUMERAL FIFTY +216D; C; 217D; # ROMAN NUMERAL ONE HUNDRED +216E; C; 217E; # ROMAN NUMERAL FIVE HUNDRED +216F; C; 217F; # ROMAN NUMERAL ONE THOUSAND +2183; C; 2184; # ROMAN NUMERAL REVERSED ONE HUNDRED +24B6; C; 24D0; # CIRCLED LATIN CAPITAL LETTER A +24B7; C; 24D1; # CIRCLED LATIN CAPITAL LETTER B +24B8; C; 24D2; # CIRCLED LATIN CAPITAL LETTER C +24B9; C; 24D3; # CIRCLED LATIN CAPITAL LETTER D +24BA; C; 24D4; # CIRCLED LATIN CAPITAL LETTER E +24BB; C; 24D5; # CIRCLED LATIN CAPITAL LETTER F +24BC; C; 24D6; # CIRCLED LATIN CAPITAL LETTER G +24BD; C; 24D7; # CIRCLED LATIN CAPITAL LETTER H +24BE; C; 24D8; # CIRCLED LATIN CAPITAL LETTER I +24BF; C; 24D9; # CIRCLED LATIN CAPITAL LETTER J +24C0; C; 24DA; # CIRCLED LATIN CAPITAL LETTER K +24C1; C; 24DB; # CIRCLED LATIN CAPITAL LETTER L +24C2; C; 24DC; # CIRCLED LATIN CAPITAL LETTER M +24C3; C; 24DD; # CIRCLED LATIN CAPITAL LETTER N +24C4; C; 24DE; # CIRCLED LATIN CAPITAL LETTER O +24C5; C; 24DF; # CIRCLED LATIN CAPITAL LETTER P +24C6; C; 24E0; # CIRCLED LATIN CAPITAL LETTER Q +24C7; C; 24E1; # CIRCLED LATIN CAPITAL LETTER R +24C8; C; 24E2; # CIRCLED LATIN CAPITAL LETTER S +24C9; C; 24E3; # CIRCLED LATIN CAPITAL LETTER T +24CA; C; 24E4; # CIRCLED LATIN CAPITAL LETTER U +24CB; C; 24E5; # CIRCLED LATIN CAPITAL LETTER V +24CC; C; 24E6; # CIRCLED LATIN CAPITAL LETTER W +24CD; C; 24E7; # CIRCLED LATIN CAPITAL LETTER X +24CE; C; 24E8; # CIRCLED LATIN CAPITAL LETTER Y +24CF; C; 24E9; # CIRCLED LATIN CAPITAL LETTER Z +2C00; C; 2C30; # GLAGOLITIC CAPITAL LETTER AZU +2C01; C; 2C31; # GLAGOLITIC CAPITAL LETTER BUKY +2C02; C; 2C32; # GLAGOLITIC CAPITAL LETTER VEDE +2C03; C; 2C33; # GLAGOLITIC CAPITAL LETTER GLAGOLI +2C04; C; 2C34; # GLAGOLITIC CAPITAL LETTER DOBRO +2C05; C; 2C35; # GLAGOLITIC CAPITAL LETTER YESTU +2C06; C; 2C36; # GLAGOLITIC CAPITAL LETTER ZHIVETE +2C07; C; 2C37; # GLAGOLITIC CAPITAL LETTER DZELO +2C08; C; 2C38; # GLAGOLITIC CAPITAL LETTER ZEMLJA +2C09; C; 2C39; # GLAGOLITIC CAPITAL LETTER IZHE +2C0A; C; 2C3A; # GLAGOLITIC CAPITAL LETTER INITIAL IZHE +2C0B; C; 2C3B; # GLAGOLITIC CAPITAL LETTER I +2C0C; C; 2C3C; # GLAGOLITIC CAPITAL LETTER DJERVI +2C0D; C; 2C3D; # GLAGOLITIC CAPITAL LETTER KAKO +2C0E; C; 2C3E; # GLAGOLITIC CAPITAL LETTER LJUDIJE +2C0F; C; 2C3F; # GLAGOLITIC CAPITAL LETTER MYSLITE +2C10; C; 2C40; # GLAGOLITIC CAPITAL LETTER NASHI +2C11; C; 2C41; # GLAGOLITIC CAPITAL LETTER ONU +2C12; C; 2C42; # GLAGOLITIC CAPITAL LETTER POKOJI +2C13; C; 2C43; # GLAGOLITIC CAPITAL LETTER RITSI +2C14; C; 2C44; # GLAGOLITIC CAPITAL LETTER SLOVO +2C15; C; 2C45; # GLAGOLITIC CAPITAL LETTER TVRIDO +2C16; C; 2C46; # GLAGOLITIC CAPITAL LETTER UKU +2C17; C; 2C47; # GLAGOLITIC CAPITAL LETTER FRITU +2C18; C; 2C48; # GLAGOLITIC CAPITAL LETTER HERU +2C19; C; 2C49; # GLAGOLITIC CAPITAL LETTER OTU +2C1A; C; 2C4A; # GLAGOLITIC CAPITAL LETTER PE +2C1B; C; 2C4B; # GLAGOLITIC CAPITAL LETTER SHTA +2C1C; C; 2C4C; # GLAGOLITIC CAPITAL LETTER TSI +2C1D; C; 2C4D; # GLAGOLITIC CAPITAL LETTER CHRIVI +2C1E; C; 2C4E; # GLAGOLITIC CAPITAL LETTER SHA +2C1F; C; 2C4F; # GLAGOLITIC CAPITAL LETTER YERU +2C20; C; 2C50; # GLAGOLITIC CAPITAL LETTER YERI +2C21; C; 2C51; # GLAGOLITIC CAPITAL LETTER YATI +2C22; C; 2C52; # GLAGOLITIC CAPITAL LETTER SPIDERY HA +2C23; C; 2C53; # GLAGOLITIC CAPITAL LETTER YU +2C24; C; 2C54; # GLAGOLITIC CAPITAL LETTER SMALL YUS +2C25; C; 2C55; # GLAGOLITIC CAPITAL LETTER SMALL YUS WITH TAIL +2C26; C; 2C56; # GLAGOLITIC CAPITAL LETTER YO +2C27; C; 2C57; # GLAGOLITIC CAPITAL LETTER IOTATED SMALL YUS +2C28; C; 2C58; # GLAGOLITIC CAPITAL LETTER BIG YUS +2C29; C; 2C59; # GLAGOLITIC CAPITAL LETTER IOTATED BIG YUS +2C2A; C; 2C5A; # GLAGOLITIC CAPITAL LETTER FITA +2C2B; C; 2C5B; # GLAGOLITIC CAPITAL LETTER IZHITSA +2C2C; C; 2C5C; # GLAGOLITIC CAPITAL LETTER SHTAPIC +2C2D; C; 2C5D; # GLAGOLITIC CAPITAL LETTER TROKUTASTI A +2C2E; C; 2C5E; # GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE +2C60; C; 2C61; # LATIN CAPITAL LETTER L WITH DOUBLE BAR +2C62; C; 026B; # LATIN CAPITAL LETTER L WITH MIDDLE TILDE +2C63; C; 1D7D; # LATIN CAPITAL LETTER P WITH STROKE +2C64; C; 027D; # LATIN CAPITAL LETTER R WITH TAIL +2C67; C; 2C68; # LATIN CAPITAL LETTER H WITH DESCENDER +2C69; C; 2C6A; # LATIN CAPITAL LETTER K WITH DESCENDER +2C6B; C; 2C6C; # LATIN CAPITAL LETTER Z WITH DESCENDER +2C6D; C; 0251; # LATIN CAPITAL LETTER ALPHA +2C6E; C; 0271; # LATIN CAPITAL LETTER M WITH HOOK +2C6F; C; 0250; # LATIN CAPITAL LETTER TURNED A +2C70; C; 0252; # LATIN CAPITAL LETTER TURNED ALPHA +2C72; C; 2C73; # LATIN CAPITAL LETTER W WITH HOOK +2C75; C; 2C76; # LATIN CAPITAL LETTER HALF H +2C7E; C; 023F; # LATIN CAPITAL LETTER S WITH SWASH TAIL +2C7F; C; 0240; # LATIN CAPITAL LETTER Z WITH SWASH TAIL +2C80; C; 2C81; # COPTIC CAPITAL LETTER ALFA +2C82; C; 2C83; # COPTIC CAPITAL LETTER VIDA +2C84; C; 2C85; # COPTIC CAPITAL LETTER GAMMA +2C86; C; 2C87; # COPTIC CAPITAL LETTER DALDA +2C88; C; 2C89; # COPTIC CAPITAL LETTER EIE +2C8A; C; 2C8B; # COPTIC CAPITAL LETTER SOU +2C8C; C; 2C8D; # COPTIC CAPITAL LETTER ZATA +2C8E; C; 2C8F; # COPTIC CAPITAL LETTER HATE +2C90; C; 2C91; # COPTIC CAPITAL LETTER THETHE +2C92; C; 2C93; # COPTIC CAPITAL LETTER IAUDA +2C94; C; 2C95; # COPTIC CAPITAL LETTER KAPA +2C96; C; 2C97; # COPTIC CAPITAL LETTER LAULA +2C98; C; 2C99; # COPTIC CAPITAL LETTER MI +2C9A; C; 2C9B; # COPTIC CAPITAL LETTER NI +2C9C; C; 2C9D; # COPTIC CAPITAL LETTER KSI +2C9E; C; 2C9F; # COPTIC CAPITAL LETTER O +2CA0; C; 2CA1; # COPTIC CAPITAL LETTER PI +2CA2; C; 2CA3; # COPTIC CAPITAL LETTER RO +2CA4; C; 2CA5; # COPTIC CAPITAL LETTER SIMA +2CA6; C; 2CA7; # COPTIC CAPITAL LETTER TAU +2CA8; C; 2CA9; # COPTIC CAPITAL LETTER UA +2CAA; C; 2CAB; # COPTIC CAPITAL LETTER FI +2CAC; C; 2CAD; # COPTIC CAPITAL LETTER KHI +2CAE; C; 2CAF; # COPTIC CAPITAL LETTER PSI +2CB0; C; 2CB1; # COPTIC CAPITAL LETTER OOU +2CB2; C; 2CB3; # COPTIC CAPITAL LETTER DIALECT-P ALEF +2CB4; C; 2CB5; # COPTIC CAPITAL LETTER OLD COPTIC AIN +2CB6; C; 2CB7; # COPTIC CAPITAL LETTER CRYPTOGRAMMIC EIE +2CB8; C; 2CB9; # COPTIC CAPITAL LETTER DIALECT-P KAPA +2CBA; C; 2CBB; # COPTIC CAPITAL LETTER DIALECT-P NI +2CBC; C; 2CBD; # COPTIC CAPITAL LETTER CRYPTOGRAMMIC NI +2CBE; C; 2CBF; # COPTIC CAPITAL LETTER OLD COPTIC OOU +2CC0; C; 2CC1; # COPTIC CAPITAL LETTER SAMPI +2CC2; C; 2CC3; # COPTIC CAPITAL LETTER CROSSED SHEI +2CC4; C; 2CC5; # COPTIC CAPITAL LETTER OLD COPTIC SHEI +2CC6; C; 2CC7; # COPTIC CAPITAL LETTER OLD COPTIC ESH +2CC8; C; 2CC9; # COPTIC CAPITAL LETTER AKHMIMIC KHEI +2CCA; C; 2CCB; # COPTIC CAPITAL LETTER DIALECT-P HORI +2CCC; C; 2CCD; # COPTIC CAPITAL LETTER OLD COPTIC HORI +2CCE; C; 2CCF; # COPTIC CAPITAL LETTER OLD COPTIC HA +2CD0; C; 2CD1; # COPTIC CAPITAL LETTER L-SHAPED HA +2CD2; C; 2CD3; # COPTIC CAPITAL LETTER OLD COPTIC HEI +2CD4; C; 2CD5; # COPTIC CAPITAL LETTER OLD COPTIC HAT +2CD6; C; 2CD7; # COPTIC CAPITAL LETTER OLD COPTIC GANGIA +2CD8; C; 2CD9; # COPTIC CAPITAL LETTER OLD COPTIC DJA +2CDA; C; 2CDB; # COPTIC CAPITAL LETTER OLD COPTIC SHIMA +2CDC; C; 2CDD; # COPTIC CAPITAL LETTER OLD NUBIAN SHIMA +2CDE; C; 2CDF; # COPTIC CAPITAL LETTER OLD NUBIAN NGI +2CE0; C; 2CE1; # COPTIC CAPITAL LETTER OLD NUBIAN NYI +2CE2; C; 2CE3; # COPTIC CAPITAL LETTER OLD NUBIAN WAU +2CEB; C; 2CEC; # COPTIC CAPITAL LETTER CRYPTOGRAMMIC SHEI +2CED; C; 2CEE; # COPTIC CAPITAL LETTER CRYPTOGRAMMIC GANGIA +2CF2; C; 2CF3; # COPTIC CAPITAL LETTER BOHAIRIC KHEI +A640; C; A641; # CYRILLIC CAPITAL LETTER ZEMLYA +A642; C; A643; # CYRILLIC CAPITAL LETTER DZELO +A644; C; A645; # CYRILLIC CAPITAL LETTER REVERSED DZE +A646; C; A647; # CYRILLIC CAPITAL LETTER IOTA +A648; C; A649; # CYRILLIC CAPITAL LETTER DJERV +A64A; C; A64B; # CYRILLIC CAPITAL LETTER MONOGRAPH UK +A64C; C; A64D; # CYRILLIC CAPITAL LETTER BROAD OMEGA +A64E; C; A64F; # CYRILLIC CAPITAL LETTER NEUTRAL YER +A650; C; A651; # CYRILLIC CAPITAL LETTER YERU WITH BACK YER +A652; C; A653; # CYRILLIC CAPITAL LETTER IOTIFIED YAT +A654; C; A655; # CYRILLIC CAPITAL LETTER REVERSED YU +A656; C; A657; # CYRILLIC CAPITAL LETTER IOTIFIED A +A658; C; A659; # CYRILLIC CAPITAL LETTER CLOSED LITTLE YUS +A65A; C; A65B; # CYRILLIC CAPITAL LETTER BLENDED YUS +A65C; C; A65D; # CYRILLIC CAPITAL LETTER IOTIFIED CLOSED LITTLE YUS +A65E; C; A65F; # CYRILLIC CAPITAL LETTER YN +A660; C; A661; # CYRILLIC CAPITAL LETTER REVERSED TSE +A662; C; A663; # CYRILLIC CAPITAL LETTER SOFT DE +A664; C; A665; # CYRILLIC CAPITAL LETTER SOFT EL +A666; C; A667; # CYRILLIC CAPITAL LETTER SOFT EM +A668; C; A669; # CYRILLIC CAPITAL LETTER MONOCULAR O +A66A; C; A66B; # CYRILLIC CAPITAL LETTER BINOCULAR O +A66C; C; A66D; # CYRILLIC CAPITAL LETTER DOUBLE MONOCULAR O +A680; C; A681; # CYRILLIC CAPITAL LETTER DWE +A682; C; A683; # CYRILLIC CAPITAL LETTER DZWE +A684; C; A685; # CYRILLIC CAPITAL LETTER ZHWE +A686; C; A687; # CYRILLIC CAPITAL LETTER CCHE +A688; C; A689; # CYRILLIC CAPITAL LETTER DZZE +A68A; C; A68B; # CYRILLIC CAPITAL LETTER TE WITH MIDDLE HOOK +A68C; C; A68D; # CYRILLIC CAPITAL LETTER TWE +A68E; C; A68F; # CYRILLIC CAPITAL LETTER TSWE +A690; C; A691; # CYRILLIC CAPITAL LETTER TSSE +A692; C; A693; # CYRILLIC CAPITAL LETTER TCHE +A694; C; A695; # CYRILLIC CAPITAL LETTER HWE +A696; C; A697; # CYRILLIC CAPITAL LETTER SHWE +A698; C; A699; # CYRILLIC CAPITAL LETTER DOUBLE O +A69A; C; A69B; # CYRILLIC CAPITAL LETTER CROSSED O +A722; C; A723; # LATIN CAPITAL LETTER EGYPTOLOGICAL ALEF +A724; C; A725; # LATIN CAPITAL LETTER EGYPTOLOGICAL AIN +A726; C; A727; # LATIN CAPITAL LETTER HENG +A728; C; A729; # LATIN CAPITAL LETTER TZ +A72A; C; A72B; # LATIN CAPITAL LETTER TRESILLO +A72C; C; A72D; # LATIN CAPITAL LETTER CUATRILLO +A72E; C; A72F; # LATIN CAPITAL LETTER CUATRILLO WITH COMMA +A732; C; A733; # LATIN CAPITAL LETTER AA +A734; C; A735; # LATIN CAPITAL LETTER AO +A736; C; A737; # LATIN CAPITAL LETTER AU +A738; C; A739; # LATIN CAPITAL LETTER AV +A73A; C; A73B; # LATIN CAPITAL LETTER AV WITH HORIZONTAL BAR +A73C; C; A73D; # LATIN CAPITAL LETTER AY +A73E; C; A73F; # LATIN CAPITAL LETTER REVERSED C WITH DOT +A740; C; A741; # LATIN CAPITAL LETTER K WITH STROKE +A742; C; A743; # LATIN CAPITAL LETTER K WITH DIAGONAL STROKE +A744; C; A745; # LATIN CAPITAL LETTER K WITH STROKE AND DIAGONAL STROKE +A746; C; A747; # LATIN CAPITAL LETTER BROKEN L +A748; C; A749; # LATIN CAPITAL LETTER L WITH HIGH STROKE +A74A; C; A74B; # LATIN CAPITAL LETTER O WITH LONG STROKE OVERLAY +A74C; C; A74D; # LATIN CAPITAL LETTER O WITH LOOP +A74E; C; A74F; # LATIN CAPITAL LETTER OO +A750; C; A751; # LATIN CAPITAL LETTER P WITH STROKE THROUGH DESCENDER +A752; C; A753; # LATIN CAPITAL LETTER P WITH FLOURISH +A754; C; A755; # LATIN CAPITAL LETTER P WITH SQUIRREL TAIL +A756; C; A757; # LATIN CAPITAL LETTER Q WITH STROKE THROUGH DESCENDER +A758; C; A759; # LATIN CAPITAL LETTER Q WITH DIAGONAL STROKE +A75A; C; A75B; # LATIN CAPITAL LETTER R ROTUNDA +A75C; C; A75D; # LATIN CAPITAL LETTER RUM ROTUNDA +A75E; C; A75F; # LATIN CAPITAL LETTER V WITH DIAGONAL STROKE +A760; C; A761; # LATIN CAPITAL LETTER VY +A762; C; A763; # LATIN CAPITAL LETTER VISIGOTHIC Z +A764; C; A765; # LATIN CAPITAL LETTER THORN WITH STROKE +A766; C; A767; # LATIN CAPITAL LETTER THORN WITH STROKE THROUGH DESCENDER +A768; C; A769; # LATIN CAPITAL LETTER VEND +A76A; C; A76B; # LATIN CAPITAL LETTER ET +A76C; C; A76D; # LATIN CAPITAL LETTER IS +A76E; C; A76F; # LATIN CAPITAL LETTER CON +A779; C; A77A; # LATIN CAPITAL LETTER INSULAR D +A77B; C; A77C; # LATIN CAPITAL LETTER INSULAR F +A77D; C; 1D79; # LATIN CAPITAL LETTER INSULAR G +A77E; C; A77F; # LATIN CAPITAL LETTER TURNED INSULAR G +A780; C; A781; # LATIN CAPITAL LETTER TURNED L +A782; C; A783; # LATIN CAPITAL LETTER INSULAR R +A784; C; A785; # LATIN CAPITAL LETTER INSULAR S +A786; C; A787; # LATIN CAPITAL LETTER INSULAR T +A78B; C; A78C; # LATIN CAPITAL LETTER SALTILLO +A78D; C; 0265; # LATIN CAPITAL LETTER TURNED H +A790; C; A791; # LATIN CAPITAL LETTER N WITH DESCENDER +A792; C; A793; # LATIN CAPITAL LETTER C WITH BAR +A796; C; A797; # LATIN CAPITAL LETTER B WITH FLOURISH +A798; C; A799; # LATIN CAPITAL LETTER F WITH STROKE +A79A; C; A79B; # LATIN CAPITAL LETTER VOLAPUK AE +A79C; C; A79D; # LATIN CAPITAL LETTER VOLAPUK OE +A79E; C; A79F; # LATIN CAPITAL LETTER VOLAPUK UE +A7A0; C; A7A1; # LATIN CAPITAL LETTER G WITH OBLIQUE STROKE +A7A2; C; A7A3; # LATIN CAPITAL LETTER K WITH OBLIQUE STROKE +A7A4; C; A7A5; # LATIN CAPITAL LETTER N WITH OBLIQUE STROKE +A7A6; C; A7A7; # LATIN CAPITAL LETTER R WITH OBLIQUE STROKE +A7A8; C; A7A9; # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE +A7AA; C; 0266; # LATIN CAPITAL LETTER H WITH HOOK +A7AB; C; 025C; # LATIN CAPITAL LETTER REVERSED OPEN E +A7AC; C; 0261; # LATIN CAPITAL LETTER SCRIPT G +A7AD; C; 026C; # LATIN CAPITAL LETTER L WITH BELT +A7AE; C; 026A; # LATIN CAPITAL LETTER SMALL CAPITAL I +A7B0; C; 029E; # LATIN CAPITAL LETTER TURNED K +A7B1; C; 0287; # LATIN CAPITAL LETTER TURNED T +A7B2; C; 029D; # LATIN CAPITAL LETTER J WITH CROSSED-TAIL +A7B3; C; AB53; # LATIN CAPITAL LETTER CHI +A7B4; C; A7B5; # LATIN CAPITAL LETTER BETA +A7B6; C; A7B7; # LATIN CAPITAL LETTER OMEGA +A7B8; C; A7B9; # LATIN CAPITAL LETTER U WITH STROKE +A7BA; C; A7BB; # LATIN CAPITAL LETTER GLOTTAL A +A7BC; C; A7BD; # LATIN CAPITAL LETTER GLOTTAL I +A7BE; C; A7BF; # LATIN CAPITAL LETTER GLOTTAL U +A7C2; C; A7C3; # LATIN CAPITAL LETTER ANGLICANA W +A7C4; C; A794; # LATIN CAPITAL LETTER C WITH PALATAL HOOK +A7C5; C; 0282; # LATIN CAPITAL LETTER S WITH HOOK +A7C6; C; 1D8E; # LATIN CAPITAL LETTER Z WITH PALATAL HOOK +A7C7; C; A7C8; # LATIN CAPITAL LETTER D WITH SHORT STROKE OVERLAY +A7C9; C; A7CA; # LATIN CAPITAL LETTER S WITH SHORT STROKE OVERLAY +A7F5; C; A7F6; # LATIN CAPITAL LETTER REVERSED HALF H +AB70; C; 13A0; # CHEROKEE SMALL LETTER A +AB71; C; 13A1; # CHEROKEE SMALL LETTER E +AB72; C; 13A2; # CHEROKEE SMALL LETTER I +AB73; C; 13A3; # CHEROKEE SMALL LETTER O +AB74; C; 13A4; # CHEROKEE SMALL LETTER U +AB75; C; 13A5; # CHEROKEE SMALL LETTER V +AB76; C; 13A6; # CHEROKEE SMALL LETTER GA +AB77; C; 13A7; # CHEROKEE SMALL LETTER KA +AB78; C; 13A8; # CHEROKEE SMALL LETTER GE +AB79; C; 13A9; # CHEROKEE SMALL LETTER GI +AB7A; C; 13AA; # CHEROKEE SMALL LETTER GO +AB7B; C; 13AB; # CHEROKEE SMALL LETTER GU +AB7C; C; 13AC; # CHEROKEE SMALL LETTER GV +AB7D; C; 13AD; # CHEROKEE SMALL LETTER HA +AB7E; C; 13AE; # CHEROKEE SMALL LETTER HE +AB7F; C; 13AF; # CHEROKEE SMALL LETTER HI +AB80; C; 13B0; # CHEROKEE SMALL LETTER HO +AB81; C; 13B1; # CHEROKEE SMALL LETTER HU +AB82; C; 13B2; # CHEROKEE SMALL LETTER HV +AB83; C; 13B3; # CHEROKEE SMALL LETTER LA +AB84; C; 13B4; # CHEROKEE SMALL LETTER LE +AB85; C; 13B5; # CHEROKEE SMALL LETTER LI +AB86; C; 13B6; # CHEROKEE SMALL LETTER LO +AB87; C; 13B7; # CHEROKEE SMALL LETTER LU +AB88; C; 13B8; # CHEROKEE SMALL LETTER LV +AB89; C; 13B9; # CHEROKEE SMALL LETTER MA +AB8A; C; 13BA; # CHEROKEE SMALL LETTER ME +AB8B; C; 13BB; # CHEROKEE SMALL LETTER MI +AB8C; C; 13BC; # CHEROKEE SMALL LETTER MO +AB8D; C; 13BD; # CHEROKEE SMALL LETTER MU +AB8E; C; 13BE; # CHEROKEE SMALL LETTER NA +AB8F; C; 13BF; # CHEROKEE SMALL LETTER HNA +AB90; C; 13C0; # CHEROKEE SMALL LETTER NAH +AB91; C; 13C1; # CHEROKEE SMALL LETTER NE +AB92; C; 13C2; # CHEROKEE SMALL LETTER NI +AB93; C; 13C3; # CHEROKEE SMALL LETTER NO +AB94; C; 13C4; # CHEROKEE SMALL LETTER NU +AB95; C; 13C5; # CHEROKEE SMALL LETTER NV +AB96; C; 13C6; # CHEROKEE SMALL LETTER QUA +AB97; C; 13C7; # CHEROKEE SMALL LETTER QUE +AB98; C; 13C8; # CHEROKEE SMALL LETTER QUI +AB99; C; 13C9; # CHEROKEE SMALL LETTER QUO +AB9A; C; 13CA; # CHEROKEE SMALL LETTER QUU +AB9B; C; 13CB; # CHEROKEE SMALL LETTER QUV +AB9C; C; 13CC; # CHEROKEE SMALL LETTER SA +AB9D; C; 13CD; # CHEROKEE SMALL LETTER S +AB9E; C; 13CE; # CHEROKEE SMALL LETTER SE +AB9F; C; 13CF; # CHEROKEE SMALL LETTER SI +ABA0; C; 13D0; # CHEROKEE SMALL LETTER SO +ABA1; C; 13D1; # CHEROKEE SMALL LETTER SU +ABA2; C; 13D2; # CHEROKEE SMALL LETTER SV +ABA3; C; 13D3; # CHEROKEE SMALL LETTER DA +ABA4; C; 13D4; # CHEROKEE SMALL LETTER TA +ABA5; C; 13D5; # CHEROKEE SMALL LETTER DE +ABA6; C; 13D6; # CHEROKEE SMALL LETTER TE +ABA7; C; 13D7; # CHEROKEE SMALL LETTER DI +ABA8; C; 13D8; # CHEROKEE SMALL LETTER TI +ABA9; C; 13D9; # CHEROKEE SMALL LETTER DO +ABAA; C; 13DA; # CHEROKEE SMALL LETTER DU +ABAB; C; 13DB; # CHEROKEE SMALL LETTER DV +ABAC; C; 13DC; # CHEROKEE SMALL LETTER DLA +ABAD; C; 13DD; # CHEROKEE SMALL LETTER TLA +ABAE; C; 13DE; # CHEROKEE SMALL LETTER TLE +ABAF; C; 13DF; # CHEROKEE SMALL LETTER TLI +ABB0; C; 13E0; # CHEROKEE SMALL LETTER TLO +ABB1; C; 13E1; # CHEROKEE SMALL LETTER TLU +ABB2; C; 13E2; # CHEROKEE SMALL LETTER TLV +ABB3; C; 13E3; # CHEROKEE SMALL LETTER TSA +ABB4; C; 13E4; # CHEROKEE SMALL LETTER TSE +ABB5; C; 13E5; # CHEROKEE SMALL LETTER TSI +ABB6; C; 13E6; # CHEROKEE SMALL LETTER TSO +ABB7; C; 13E7; # CHEROKEE SMALL LETTER TSU +ABB8; C; 13E8; # CHEROKEE SMALL LETTER TSV +ABB9; C; 13E9; # CHEROKEE SMALL LETTER WA +ABBA; C; 13EA; # CHEROKEE SMALL LETTER WE +ABBB; C; 13EB; # CHEROKEE SMALL LETTER WI +ABBC; C; 13EC; # CHEROKEE SMALL LETTER WO +ABBD; C; 13ED; # CHEROKEE SMALL LETTER WU +ABBE; C; 13EE; # CHEROKEE SMALL LETTER WV +ABBF; C; 13EF; # CHEROKEE SMALL LETTER YA +FB00; F; 0066 0066; # LATIN SMALL LIGATURE FF +FB01; F; 0066 0069; # LATIN SMALL LIGATURE FI +FB02; F; 0066 006C; # LATIN SMALL LIGATURE FL +FB03; F; 0066 0066 0069; # LATIN SMALL LIGATURE FFI +FB04; F; 0066 0066 006C; # LATIN SMALL LIGATURE FFL +FB05; F; 0073 0074; # LATIN SMALL LIGATURE LONG S T +FB06; F; 0073 0074; # LATIN SMALL LIGATURE ST +FB13; F; 0574 0576; # ARMENIAN SMALL LIGATURE MEN NOW +FB14; F; 0574 0565; # ARMENIAN SMALL LIGATURE MEN ECH +FB15; F; 0574 056B; # ARMENIAN SMALL LIGATURE MEN INI +FB16; F; 057E 0576; # ARMENIAN SMALL LIGATURE VEW NOW +FB17; F; 0574 056D; # ARMENIAN SMALL LIGATURE MEN XEH +FF21; C; FF41; # FULLWIDTH LATIN CAPITAL LETTER A +FF22; C; FF42; # FULLWIDTH LATIN CAPITAL LETTER B +FF23; C; FF43; # FULLWIDTH LATIN CAPITAL LETTER C +FF24; C; FF44; # FULLWIDTH LATIN CAPITAL LETTER D +FF25; C; FF45; # FULLWIDTH LATIN CAPITAL LETTER E +FF26; C; FF46; # FULLWIDTH LATIN CAPITAL LETTER F +FF27; C; FF47; # FULLWIDTH LATIN CAPITAL LETTER G +FF28; C; FF48; # FULLWIDTH LATIN CAPITAL LETTER H +FF29; C; FF49; # FULLWIDTH LATIN CAPITAL LETTER I +FF2A; C; FF4A; # FULLWIDTH LATIN CAPITAL LETTER J +FF2B; C; FF4B; # FULLWIDTH LATIN CAPITAL LETTER K +FF2C; C; FF4C; # FULLWIDTH LATIN CAPITAL LETTER L +FF2D; C; FF4D; # FULLWIDTH LATIN CAPITAL LETTER M +FF2E; C; FF4E; # FULLWIDTH LATIN CAPITAL LETTER N +FF2F; C; FF4F; # FULLWIDTH LATIN CAPITAL LETTER O +FF30; C; FF50; # FULLWIDTH LATIN CAPITAL LETTER P +FF31; C; FF51; # FULLWIDTH LATIN CAPITAL LETTER Q +FF32; C; FF52; # FULLWIDTH LATIN CAPITAL LETTER R +FF33; C; FF53; # FULLWIDTH LATIN CAPITAL LETTER S +FF34; C; FF54; # FULLWIDTH LATIN CAPITAL LETTER T +FF35; C; FF55; # FULLWIDTH LATIN CAPITAL LETTER U +FF36; C; FF56; # FULLWIDTH LATIN CAPITAL LETTER V +FF37; C; FF57; # FULLWIDTH LATIN CAPITAL LETTER W +FF38; C; FF58; # FULLWIDTH LATIN CAPITAL LETTER X +FF39; C; FF59; # FULLWIDTH LATIN CAPITAL LETTER Y +FF3A; C; FF5A; # FULLWIDTH LATIN CAPITAL LETTER Z +10400; C; 10428; # DESERET CAPITAL LETTER LONG I +10401; C; 10429; # DESERET CAPITAL LETTER LONG E +10402; C; 1042A; # DESERET CAPITAL LETTER LONG A +10403; C; 1042B; # DESERET CAPITAL LETTER LONG AH +10404; C; 1042C; # DESERET CAPITAL LETTER LONG O +10405; C; 1042D; # DESERET CAPITAL LETTER LONG OO +10406; C; 1042E; # DESERET CAPITAL LETTER SHORT I +10407; C; 1042F; # DESERET CAPITAL LETTER SHORT E +10408; C; 10430; # DESERET CAPITAL LETTER SHORT A +10409; C; 10431; # DESERET CAPITAL LETTER SHORT AH +1040A; C; 10432; # DESERET CAPITAL LETTER SHORT O +1040B; C; 10433; # DESERET CAPITAL LETTER SHORT OO +1040C; C; 10434; # DESERET CAPITAL LETTER AY +1040D; C; 10435; # DESERET CAPITAL LETTER OW +1040E; C; 10436; # DESERET CAPITAL LETTER WU +1040F; C; 10437; # DESERET CAPITAL LETTER YEE +10410; C; 10438; # DESERET CAPITAL LETTER H +10411; C; 10439; # DESERET CAPITAL LETTER PEE +10412; C; 1043A; # DESERET CAPITAL LETTER BEE +10413; C; 1043B; # DESERET CAPITAL LETTER TEE +10414; C; 1043C; # DESERET CAPITAL LETTER DEE +10415; C; 1043D; # DESERET CAPITAL LETTER CHEE +10416; C; 1043E; # DESERET CAPITAL LETTER JEE +10417; C; 1043F; # DESERET CAPITAL LETTER KAY +10418; C; 10440; # DESERET CAPITAL LETTER GAY +10419; C; 10441; # DESERET CAPITAL LETTER EF +1041A; C; 10442; # DESERET CAPITAL LETTER VEE +1041B; C; 10443; # DESERET CAPITAL LETTER ETH +1041C; C; 10444; # DESERET CAPITAL LETTER THEE +1041D; C; 10445; # DESERET CAPITAL LETTER ES +1041E; C; 10446; # DESERET CAPITAL LETTER ZEE +1041F; C; 10447; # DESERET CAPITAL LETTER ESH +10420; C; 10448; # DESERET CAPITAL LETTER ZHEE +10421; C; 10449; # DESERET CAPITAL LETTER ER +10422; C; 1044A; # DESERET CAPITAL LETTER EL +10423; C; 1044B; # DESERET CAPITAL LETTER EM +10424; C; 1044C; # DESERET CAPITAL LETTER EN +10425; C; 1044D; # DESERET CAPITAL LETTER ENG +10426; C; 1044E; # DESERET CAPITAL LETTER OI +10427; C; 1044F; # DESERET CAPITAL LETTER EW +104B0; C; 104D8; # OSAGE CAPITAL LETTER A +104B1; C; 104D9; # OSAGE CAPITAL LETTER AI +104B2; C; 104DA; # OSAGE CAPITAL LETTER AIN +104B3; C; 104DB; # OSAGE CAPITAL LETTER AH +104B4; C; 104DC; # OSAGE CAPITAL LETTER BRA +104B5; C; 104DD; # OSAGE CAPITAL LETTER CHA +104B6; C; 104DE; # OSAGE CAPITAL LETTER EHCHA +104B7; C; 104DF; # OSAGE CAPITAL LETTER E +104B8; C; 104E0; # OSAGE CAPITAL LETTER EIN +104B9; C; 104E1; # OSAGE CAPITAL LETTER HA +104BA; C; 104E2; # OSAGE CAPITAL LETTER HYA +104BB; C; 104E3; # OSAGE CAPITAL LETTER I +104BC; C; 104E4; # OSAGE CAPITAL LETTER KA +104BD; C; 104E5; # OSAGE CAPITAL LETTER EHKA +104BE; C; 104E6; # OSAGE CAPITAL LETTER KYA +104BF; C; 104E7; # OSAGE CAPITAL LETTER LA +104C0; C; 104E8; # OSAGE CAPITAL LETTER MA +104C1; C; 104E9; # OSAGE CAPITAL LETTER NA +104C2; C; 104EA; # OSAGE CAPITAL LETTER O +104C3; C; 104EB; # OSAGE CAPITAL LETTER OIN +104C4; C; 104EC; # OSAGE CAPITAL LETTER PA +104C5; C; 104ED; # OSAGE CAPITAL LETTER EHPA +104C6; C; 104EE; # OSAGE CAPITAL LETTER SA +104C7; C; 104EF; # OSAGE CAPITAL LETTER SHA +104C8; C; 104F0; # OSAGE CAPITAL LETTER TA +104C9; C; 104F1; # OSAGE CAPITAL LETTER EHTA +104CA; C; 104F2; # OSAGE CAPITAL LETTER TSA +104CB; C; 104F3; # OSAGE CAPITAL LETTER EHTSA +104CC; C; 104F4; # OSAGE CAPITAL LETTER TSHA +104CD; C; 104F5; # OSAGE CAPITAL LETTER DHA +104CE; C; 104F6; # OSAGE CAPITAL LETTER U +104CF; C; 104F7; # OSAGE CAPITAL LETTER WA +104D0; C; 104F8; # OSAGE CAPITAL LETTER KHA +104D1; C; 104F9; # OSAGE CAPITAL LETTER GHA +104D2; C; 104FA; # OSAGE CAPITAL LETTER ZA +104D3; C; 104FB; # OSAGE CAPITAL LETTER ZHA +10C80; C; 10CC0; # OLD HUNGARIAN CAPITAL LETTER A +10C81; C; 10CC1; # OLD HUNGARIAN CAPITAL LETTER AA +10C82; C; 10CC2; # OLD HUNGARIAN CAPITAL LETTER EB +10C83; C; 10CC3; # OLD HUNGARIAN CAPITAL LETTER AMB +10C84; C; 10CC4; # OLD HUNGARIAN CAPITAL LETTER EC +10C85; C; 10CC5; # OLD HUNGARIAN CAPITAL LETTER ENC +10C86; C; 10CC6; # OLD HUNGARIAN CAPITAL LETTER ECS +10C87; C; 10CC7; # OLD HUNGARIAN CAPITAL LETTER ED +10C88; C; 10CC8; # OLD HUNGARIAN CAPITAL LETTER AND +10C89; C; 10CC9; # OLD HUNGARIAN CAPITAL LETTER E +10C8A; C; 10CCA; # OLD HUNGARIAN CAPITAL LETTER CLOSE E +10C8B; C; 10CCB; # OLD HUNGARIAN CAPITAL LETTER EE +10C8C; C; 10CCC; # OLD HUNGARIAN CAPITAL LETTER EF +10C8D; C; 10CCD; # OLD HUNGARIAN CAPITAL LETTER EG +10C8E; C; 10CCE; # OLD HUNGARIAN CAPITAL LETTER EGY +10C8F; C; 10CCF; # OLD HUNGARIAN CAPITAL LETTER EH +10C90; C; 10CD0; # OLD HUNGARIAN CAPITAL LETTER I +10C91; C; 10CD1; # OLD HUNGARIAN CAPITAL LETTER II +10C92; C; 10CD2; # OLD HUNGARIAN CAPITAL LETTER EJ +10C93; C; 10CD3; # OLD HUNGARIAN CAPITAL LETTER EK +10C94; C; 10CD4; # OLD HUNGARIAN CAPITAL LETTER AK +10C95; C; 10CD5; # OLD HUNGARIAN CAPITAL LETTER UNK +10C96; C; 10CD6; # OLD HUNGARIAN CAPITAL LETTER EL +10C97; C; 10CD7; # OLD HUNGARIAN CAPITAL LETTER ELY +10C98; C; 10CD8; # OLD HUNGARIAN CAPITAL LETTER EM +10C99; C; 10CD9; # OLD HUNGARIAN CAPITAL LETTER EN +10C9A; C; 10CDA; # OLD HUNGARIAN CAPITAL LETTER ENY +10C9B; C; 10CDB; # OLD HUNGARIAN CAPITAL LETTER O +10C9C; C; 10CDC; # OLD HUNGARIAN CAPITAL LETTER OO +10C9D; C; 10CDD; # OLD HUNGARIAN CAPITAL LETTER NIKOLSBURG OE +10C9E; C; 10CDE; # OLD HUNGARIAN CAPITAL LETTER RUDIMENTA OE +10C9F; C; 10CDF; # OLD HUNGARIAN CAPITAL LETTER OEE +10CA0; C; 10CE0; # OLD HUNGARIAN CAPITAL LETTER EP +10CA1; C; 10CE1; # OLD HUNGARIAN CAPITAL LETTER EMP +10CA2; C; 10CE2; # OLD HUNGARIAN CAPITAL LETTER ER +10CA3; C; 10CE3; # OLD HUNGARIAN CAPITAL LETTER SHORT ER +10CA4; C; 10CE4; # OLD HUNGARIAN CAPITAL LETTER ES +10CA5; C; 10CE5; # OLD HUNGARIAN CAPITAL LETTER ESZ +10CA6; C; 10CE6; # OLD HUNGARIAN CAPITAL LETTER ET +10CA7; C; 10CE7; # OLD HUNGARIAN CAPITAL LETTER ENT +10CA8; C; 10CE8; # OLD HUNGARIAN CAPITAL LETTER ETY +10CA9; C; 10CE9; # OLD HUNGARIAN CAPITAL LETTER ECH +10CAA; C; 10CEA; # OLD HUNGARIAN CAPITAL LETTER U +10CAB; C; 10CEB; # OLD HUNGARIAN CAPITAL LETTER UU +10CAC; C; 10CEC; # OLD HUNGARIAN CAPITAL LETTER NIKOLSBURG UE +10CAD; C; 10CED; # OLD HUNGARIAN CAPITAL LETTER RUDIMENTA UE +10CAE; C; 10CEE; # OLD HUNGARIAN CAPITAL LETTER EV +10CAF; C; 10CEF; # OLD HUNGARIAN CAPITAL LETTER EZ +10CB0; C; 10CF0; # OLD HUNGARIAN CAPITAL LETTER EZS +10CB1; C; 10CF1; # OLD HUNGARIAN CAPITAL LETTER ENT-SHAPED SIGN +10CB2; C; 10CF2; # OLD HUNGARIAN CAPITAL LETTER US +118A0; C; 118C0; # WARANG CITI CAPITAL LETTER NGAA +118A1; C; 118C1; # WARANG CITI CAPITAL LETTER A +118A2; C; 118C2; # WARANG CITI CAPITAL LETTER WI +118A3; C; 118C3; # WARANG CITI CAPITAL LETTER YU +118A4; C; 118C4; # WARANG CITI CAPITAL LETTER YA +118A5; C; 118C5; # WARANG CITI CAPITAL LETTER YO +118A6; C; 118C6; # WARANG CITI CAPITAL LETTER II +118A7; C; 118C7; # WARANG CITI CAPITAL LETTER UU +118A8; C; 118C8; # WARANG CITI CAPITAL LETTER E +118A9; C; 118C9; # WARANG CITI CAPITAL LETTER O +118AA; C; 118CA; # WARANG CITI CAPITAL LETTER ANG +118AB; C; 118CB; # WARANG CITI CAPITAL LETTER GA +118AC; C; 118CC; # WARANG CITI CAPITAL LETTER KO +118AD; C; 118CD; # WARANG CITI CAPITAL LETTER ENY +118AE; C; 118CE; # WARANG CITI CAPITAL LETTER YUJ +118AF; C; 118CF; # WARANG CITI CAPITAL LETTER UC +118B0; C; 118D0; # WARANG CITI CAPITAL LETTER ENN +118B1; C; 118D1; # WARANG CITI CAPITAL LETTER ODD +118B2; C; 118D2; # WARANG CITI CAPITAL LETTER TTE +118B3; C; 118D3; # WARANG CITI CAPITAL LETTER NUNG +118B4; C; 118D4; # WARANG CITI CAPITAL LETTER DA +118B5; C; 118D5; # WARANG CITI CAPITAL LETTER AT +118B6; C; 118D6; # WARANG CITI CAPITAL LETTER AM +118B7; C; 118D7; # WARANG CITI CAPITAL LETTER BU +118B8; C; 118D8; # WARANG CITI CAPITAL LETTER PU +118B9; C; 118D9; # WARANG CITI CAPITAL LETTER HIYO +118BA; C; 118DA; # WARANG CITI CAPITAL LETTER HOLO +118BB; C; 118DB; # WARANG CITI CAPITAL LETTER HORR +118BC; C; 118DC; # WARANG CITI CAPITAL LETTER HAR +118BD; C; 118DD; # WARANG CITI CAPITAL LETTER SSUU +118BE; C; 118DE; # WARANG CITI CAPITAL LETTER SII +118BF; C; 118DF; # WARANG CITI CAPITAL LETTER VIYO +16E40; C; 16E60; # MEDEFAIDRIN CAPITAL LETTER M +16E41; C; 16E61; # MEDEFAIDRIN CAPITAL LETTER S +16E42; C; 16E62; # MEDEFAIDRIN CAPITAL LETTER V +16E43; C; 16E63; # MEDEFAIDRIN CAPITAL LETTER W +16E44; C; 16E64; # MEDEFAIDRIN CAPITAL LETTER ATIU +16E45; C; 16E65; # MEDEFAIDRIN CAPITAL LETTER Z +16E46; C; 16E66; # MEDEFAIDRIN CAPITAL LETTER KP +16E47; C; 16E67; # MEDEFAIDRIN CAPITAL LETTER P +16E48; C; 16E68; # MEDEFAIDRIN CAPITAL LETTER T +16E49; C; 16E69; # MEDEFAIDRIN CAPITAL LETTER G +16E4A; C; 16E6A; # MEDEFAIDRIN CAPITAL LETTER F +16E4B; C; 16E6B; # MEDEFAIDRIN CAPITAL LETTER I +16E4C; C; 16E6C; # MEDEFAIDRIN CAPITAL LETTER K +16E4D; C; 16E6D; # MEDEFAIDRIN CAPITAL LETTER A +16E4E; C; 16E6E; # MEDEFAIDRIN CAPITAL LETTER J +16E4F; C; 16E6F; # MEDEFAIDRIN CAPITAL LETTER E +16E50; C; 16E70; # MEDEFAIDRIN CAPITAL LETTER B +16E51; C; 16E71; # MEDEFAIDRIN CAPITAL LETTER C +16E52; C; 16E72; # MEDEFAIDRIN CAPITAL LETTER U +16E53; C; 16E73; # MEDEFAIDRIN CAPITAL LETTER YU +16E54; C; 16E74; # MEDEFAIDRIN CAPITAL LETTER L +16E55; C; 16E75; # MEDEFAIDRIN CAPITAL LETTER Q +16E56; C; 16E76; # MEDEFAIDRIN CAPITAL LETTER HP +16E57; C; 16E77; # MEDEFAIDRIN CAPITAL LETTER NY +16E58; C; 16E78; # MEDEFAIDRIN CAPITAL LETTER X +16E59; C; 16E79; # MEDEFAIDRIN CAPITAL LETTER D +16E5A; C; 16E7A; # MEDEFAIDRIN CAPITAL LETTER OE +16E5B; C; 16E7B; # MEDEFAIDRIN CAPITAL LETTER N +16E5C; C; 16E7C; # MEDEFAIDRIN CAPITAL LETTER R +16E5D; C; 16E7D; # MEDEFAIDRIN CAPITAL LETTER O +16E5E; C; 16E7E; # MEDEFAIDRIN CAPITAL LETTER AI +16E5F; C; 16E7F; # MEDEFAIDRIN CAPITAL LETTER Y +1E900; C; 1E922; # ADLAM CAPITAL LETTER ALIF +1E901; C; 1E923; # ADLAM CAPITAL LETTER DAALI +1E902; C; 1E924; # ADLAM CAPITAL LETTER LAAM +1E903; C; 1E925; # ADLAM CAPITAL LETTER MIIM +1E904; C; 1E926; # ADLAM CAPITAL LETTER BA +1E905; C; 1E927; # ADLAM CAPITAL LETTER SINNYIIYHE +1E906; C; 1E928; # ADLAM CAPITAL LETTER PE +1E907; C; 1E929; # ADLAM CAPITAL LETTER BHE +1E908; C; 1E92A; # ADLAM CAPITAL LETTER RA +1E909; C; 1E92B; # ADLAM CAPITAL LETTER E +1E90A; C; 1E92C; # ADLAM CAPITAL LETTER FA +1E90B; C; 1E92D; # ADLAM CAPITAL LETTER I +1E90C; C; 1E92E; # ADLAM CAPITAL LETTER O +1E90D; C; 1E92F; # ADLAM CAPITAL LETTER DHA +1E90E; C; 1E930; # ADLAM CAPITAL LETTER YHE +1E90F; C; 1E931; # ADLAM CAPITAL LETTER WAW +1E910; C; 1E932; # ADLAM CAPITAL LETTER NUN +1E911; C; 1E933; # ADLAM CAPITAL LETTER KAF +1E912; C; 1E934; # ADLAM CAPITAL LETTER YA +1E913; C; 1E935; # ADLAM CAPITAL LETTER U +1E914; C; 1E936; # ADLAM CAPITAL LETTER JIIM +1E915; C; 1E937; # ADLAM CAPITAL LETTER CHI +1E916; C; 1E938; # ADLAM CAPITAL LETTER HA +1E917; C; 1E939; # ADLAM CAPITAL LETTER QAAF +1E918; C; 1E93A; # ADLAM CAPITAL LETTER GA +1E919; C; 1E93B; # ADLAM CAPITAL LETTER NYA +1E91A; C; 1E93C; # ADLAM CAPITAL LETTER TU +1E91B; C; 1E93D; # ADLAM CAPITAL LETTER NHA +1E91C; C; 1E93E; # ADLAM CAPITAL LETTER VA +1E91D; C; 1E93F; # ADLAM CAPITAL LETTER KHA +1E91E; C; 1E940; # ADLAM CAPITAL LETTER GBE +1E91F; C; 1E941; # ADLAM CAPITAL LETTER ZAL +1E920; C; 1E942; # ADLAM CAPITAL LETTER KPO +1E921; C; 1E943; # ADLAM CAPITAL LETTER SHA +# +# EOF diff --git a/tdemarkdown/md4c/scripts/unicode/DerivedGeneralCategory.txt b/tdemarkdown/md4c/scripts/unicode/DerivedGeneralCategory.txt new file mode 100644 index 000000000..3e82c7fc5 --- /dev/null +++ b/tdemarkdown/md4c/scripts/unicode/DerivedGeneralCategory.txt @@ -0,0 +1,4100 @@ +# DerivedGeneralCategory-13.0.0.txt +# Date: 2019-10-21, 14:30:32 GMT +# © 2019 Unicode®, Inc. +# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. +# For terms of use, see http://www.unicode.org/terms_of_use.html +# +# Unicode Character Database +# For documentation, see http://www.unicode.org/reports/tr44/ + +# ================================================ + +# Property: General_Category + +# ================================================ + +# General_Category=Unassigned + +0378..0379 ; Cn # [2] <reserved-0378>..<reserved-0379> +0380..0383 ; Cn # [4] <reserved-0380>..<reserved-0383> +038B ; Cn # <reserved-038B> +038D ; Cn # <reserved-038D> +03A2 ; Cn # <reserved-03A2> +0530 ; Cn # <reserved-0530> +0557..0558 ; Cn # [2] <reserved-0557>..<reserved-0558> +058B..058C ; Cn # [2] <reserved-058B>..<reserved-058C> +0590 ; Cn # <reserved-0590> +05C8..05CF ; Cn # [8] <reserved-05C8>..<reserved-05CF> +05EB..05EE ; Cn # [4] <reserved-05EB>..<reserved-05EE> +05F5..05FF ; Cn # [11] <reserved-05F5>..<reserved-05FF> +061D ; Cn # <reserved-061D> +070E ; Cn # <reserved-070E> +074B..074C ; Cn # [2] <reserved-074B>..<reserved-074C> +07B2..07BF ; Cn # [14] <reserved-07B2>..<reserved-07BF> +07FB..07FC ; Cn # [2] <reserved-07FB>..<reserved-07FC> +082E..082F ; Cn # [2] <reserved-082E>..<reserved-082F> +083F ; Cn # <reserved-083F> +085C..085D ; Cn # [2] <reserved-085C>..<reserved-085D> +085F ; Cn # <reserved-085F> +086B..089F ; Cn # [53] <reserved-086B>..<reserved-089F> +08B5 ; Cn # <reserved-08B5> +08C8..08D2 ; Cn # [11] <reserved-08C8>..<reserved-08D2> +0984 ; Cn # <reserved-0984> +098D..098E ; Cn # [2] <reserved-098D>..<reserved-098E> +0991..0992 ; Cn # [2] <reserved-0991>..<reserved-0992> +09A9 ; Cn # <reserved-09A9> +09B1 ; Cn # <reserved-09B1> +09B3..09B5 ; Cn # [3] <reserved-09B3>..<reserved-09B5> +09BA..09BB ; Cn # [2] <reserved-09BA>..<reserved-09BB> +09C5..09C6 ; Cn # [2] <reserved-09C5>..<reserved-09C6> +09C9..09CA ; Cn # [2] <reserved-09C9>..<reserved-09CA> +09CF..09D6 ; Cn # [8] <reserved-09CF>..<reserved-09D6> +09D8..09DB ; Cn # [4] <reserved-09D8>..<reserved-09DB> +09DE ; Cn # <reserved-09DE> +09E4..09E5 ; Cn # [2] <reserved-09E4>..<reserved-09E5> +09FF..0A00 ; Cn # [2] <reserved-09FF>..<reserved-0A00> +0A04 ; Cn # <reserved-0A04> +0A0B..0A0E ; Cn # [4] <reserved-0A0B>..<reserved-0A0E> +0A11..0A12 ; Cn # [2] <reserved-0A11>..<reserved-0A12> +0A29 ; Cn # <reserved-0A29> +0A31 ; Cn # <reserved-0A31> +0A34 ; Cn # <reserved-0A34> +0A37 ; Cn # <reserved-0A37> +0A3A..0A3B ; Cn # [2] <reserved-0A3A>..<reserved-0A3B> +0A3D ; Cn # <reserved-0A3D> +0A43..0A46 ; Cn # [4] <reserved-0A43>..<reserved-0A46> +0A49..0A4A ; Cn # [2] <reserved-0A49>..<reserved-0A4A> +0A4E..0A50 ; Cn # [3] <reserved-0A4E>..<reserved-0A50> +0A52..0A58 ; Cn # [7] <reserved-0A52>..<reserved-0A58> +0A5D ; Cn # <reserved-0A5D> +0A5F..0A65 ; Cn # [7] <reserved-0A5F>..<reserved-0A65> +0A77..0A80 ; Cn # [10] <reserved-0A77>..<reserved-0A80> +0A84 ; Cn # <reserved-0A84> +0A8E ; Cn # <reserved-0A8E> +0A92 ; Cn # <reserved-0A92> +0AA9 ; Cn # <reserved-0AA9> +0AB1 ; Cn # <reserved-0AB1> +0AB4 ; Cn # <reserved-0AB4> +0ABA..0ABB ; Cn # [2] <reserved-0ABA>..<reserved-0ABB> +0AC6 ; Cn # <reserved-0AC6> +0ACA ; Cn # <reserved-0ACA> +0ACE..0ACF ; Cn # [2] <reserved-0ACE>..<reserved-0ACF> +0AD1..0ADF ; Cn # [15] <reserved-0AD1>..<reserved-0ADF> +0AE4..0AE5 ; Cn # [2] <reserved-0AE4>..<reserved-0AE5> +0AF2..0AF8 ; Cn # [7] <reserved-0AF2>..<reserved-0AF8> +0B00 ; Cn # <reserved-0B00> +0B04 ; Cn # <reserved-0B04> +0B0D..0B0E ; Cn # [2] <reserved-0B0D>..<reserved-0B0E> +0B11..0B12 ; Cn # [2] <reserved-0B11>..<reserved-0B12> +0B29 ; Cn # <reserved-0B29> +0B31 ; Cn # <reserved-0B31> +0B34 ; Cn # <reserved-0B34> +0B3A..0B3B ; Cn # [2] <reserved-0B3A>..<reserved-0B3B> +0B45..0B46 ; Cn # [2] <reserved-0B45>..<reserved-0B46> +0B49..0B4A ; Cn # [2] <reserved-0B49>..<reserved-0B4A> +0B4E..0B54 ; Cn # [7] <reserved-0B4E>..<reserved-0B54> +0B58..0B5B ; Cn # [4] <reserved-0B58>..<reserved-0B5B> +0B5E ; Cn # <reserved-0B5E> +0B64..0B65 ; Cn # [2] <reserved-0B64>..<reserved-0B65> +0B78..0B81 ; Cn # [10] <reserved-0B78>..<reserved-0B81> +0B84 ; Cn # <reserved-0B84> +0B8B..0B8D ; Cn # [3] <reserved-0B8B>..<reserved-0B8D> +0B91 ; Cn # <reserved-0B91> +0B96..0B98 ; Cn # [3] <reserved-0B96>..<reserved-0B98> +0B9B ; Cn # <reserved-0B9B> +0B9D ; Cn # <reserved-0B9D> +0BA0..0BA2 ; Cn # [3] <reserved-0BA0>..<reserved-0BA2> +0BA5..0BA7 ; Cn # [3] <reserved-0BA5>..<reserved-0BA7> +0BAB..0BAD ; Cn # [3] <reserved-0BAB>..<reserved-0BAD> +0BBA..0BBD ; Cn # [4] <reserved-0BBA>..<reserved-0BBD> +0BC3..0BC5 ; Cn # [3] <reserved-0BC3>..<reserved-0BC5> +0BC9 ; Cn # <reserved-0BC9> +0BCE..0BCF ; Cn # [2] <reserved-0BCE>..<reserved-0BCF> +0BD1..0BD6 ; Cn # [6] <reserved-0BD1>..<reserved-0BD6> +0BD8..0BE5 ; Cn # [14] <reserved-0BD8>..<reserved-0BE5> +0BFB..0BFF ; Cn # [5] <reserved-0BFB>..<reserved-0BFF> +0C0D ; Cn # <reserved-0C0D> +0C11 ; Cn # <reserved-0C11> +0C29 ; Cn # <reserved-0C29> +0C3A..0C3C ; Cn # [3] <reserved-0C3A>..<reserved-0C3C> +0C45 ; Cn # <reserved-0C45> +0C49 ; Cn # <reserved-0C49> +0C4E..0C54 ; Cn # [7] <reserved-0C4E>..<reserved-0C54> +0C57 ; Cn # <reserved-0C57> +0C5B..0C5F ; Cn # [5] <reserved-0C5B>..<reserved-0C5F> +0C64..0C65 ; Cn # [2] <reserved-0C64>..<reserved-0C65> +0C70..0C76 ; Cn # [7] <reserved-0C70>..<reserved-0C76> +0C8D ; Cn # <reserved-0C8D> +0C91 ; Cn # <reserved-0C91> +0CA9 ; Cn # <reserved-0CA9> +0CB4 ; Cn # <reserved-0CB4> +0CBA..0CBB ; Cn # [2] <reserved-0CBA>..<reserved-0CBB> +0CC5 ; Cn # <reserved-0CC5> +0CC9 ; Cn # <reserved-0CC9> +0CCE..0CD4 ; Cn # [7] <reserved-0CCE>..<reserved-0CD4> +0CD7..0CDD ; Cn # [7] <reserved-0CD7>..<reserved-0CDD> +0CDF ; Cn # <reserved-0CDF> +0CE4..0CE5 ; Cn # [2] <reserved-0CE4>..<reserved-0CE5> +0CF0 ; Cn # <reserved-0CF0> +0CF3..0CFF ; Cn # [13] <reserved-0CF3>..<reserved-0CFF> +0D0D ; Cn # <reserved-0D0D> +0D11 ; Cn # <reserved-0D11> +0D45 ; Cn # <reserved-0D45> +0D49 ; Cn # <reserved-0D49> +0D50..0D53 ; Cn # [4] <reserved-0D50>..<reserved-0D53> +0D64..0D65 ; Cn # [2] <reserved-0D64>..<reserved-0D65> +0D80 ; Cn # <reserved-0D80> +0D84 ; Cn # <reserved-0D84> +0D97..0D99 ; Cn # [3] <reserved-0D97>..<reserved-0D99> +0DB2 ; Cn # <reserved-0DB2> +0DBC ; Cn # <reserved-0DBC> +0DBE..0DBF ; Cn # [2] <reserved-0DBE>..<reserved-0DBF> +0DC7..0DC9 ; Cn # [3] <reserved-0DC7>..<reserved-0DC9> +0DCB..0DCE ; Cn # [4] <reserved-0DCB>..<reserved-0DCE> +0DD5 ; Cn # <reserved-0DD5> +0DD7 ; Cn # <reserved-0DD7> +0DE0..0DE5 ; Cn # [6] <reserved-0DE0>..<reserved-0DE5> +0DF0..0DF1 ; Cn # [2] <reserved-0DF0>..<reserved-0DF1> +0DF5..0E00 ; Cn # [12] <reserved-0DF5>..<reserved-0E00> +0E3B..0E3E ; Cn # [4] <reserved-0E3B>..<reserved-0E3E> +0E5C..0E80 ; Cn # [37] <reserved-0E5C>..<reserved-0E80> +0E83 ; Cn # <reserved-0E83> +0E85 ; Cn # <reserved-0E85> +0E8B ; Cn # <reserved-0E8B> +0EA4 ; Cn # <reserved-0EA4> +0EA6 ; Cn # <reserved-0EA6> +0EBE..0EBF ; Cn # [2] <reserved-0EBE>..<reserved-0EBF> +0EC5 ; Cn # <reserved-0EC5> +0EC7 ; Cn # <reserved-0EC7> +0ECE..0ECF ; Cn # [2] <reserved-0ECE>..<reserved-0ECF> +0EDA..0EDB ; Cn # [2] <reserved-0EDA>..<reserved-0EDB> +0EE0..0EFF ; Cn # [32] <reserved-0EE0>..<reserved-0EFF> +0F48 ; Cn # <reserved-0F48> +0F6D..0F70 ; Cn # [4] <reserved-0F6D>..<reserved-0F70> +0F98 ; Cn # <reserved-0F98> +0FBD ; Cn # <reserved-0FBD> +0FCD ; Cn # <reserved-0FCD> +0FDB..0FFF ; Cn # [37] <reserved-0FDB>..<reserved-0FFF> +10C6 ; Cn # <reserved-10C6> +10C8..10CC ; Cn # [5] <reserved-10C8>..<reserved-10CC> +10CE..10CF ; Cn # [2] <reserved-10CE>..<reserved-10CF> +1249 ; Cn # <reserved-1249> +124E..124F ; Cn # [2] <reserved-124E>..<reserved-124F> +1257 ; Cn # <reserved-1257> +1259 ; Cn # <reserved-1259> +125E..125F ; Cn # [2] <reserved-125E>..<reserved-125F> +1289 ; Cn # <reserved-1289> +128E..128F ; Cn # [2] <reserved-128E>..<reserved-128F> +12B1 ; Cn # <reserved-12B1> +12B6..12B7 ; Cn # [2] <reserved-12B6>..<reserved-12B7> +12BF ; Cn # <reserved-12BF> +12C1 ; Cn # <reserved-12C1> +12C6..12C7 ; Cn # [2] <reserved-12C6>..<reserved-12C7> +12D7 ; Cn # <reserved-12D7> +1311 ; Cn # <reserved-1311> +1316..1317 ; Cn # [2] <reserved-1316>..<reserved-1317> +135B..135C ; Cn # [2] <reserved-135B>..<reserved-135C> +137D..137F ; Cn # [3] <reserved-137D>..<reserved-137F> +139A..139F ; Cn # [6] <reserved-139A>..<reserved-139F> +13F6..13F7 ; Cn # [2] <reserved-13F6>..<reserved-13F7> +13FE..13FF ; Cn # [2] <reserved-13FE>..<reserved-13FF> +169D..169F ; Cn # [3] <reserved-169D>..<reserved-169F> +16F9..16FF ; Cn # [7] <reserved-16F9>..<reserved-16FF> +170D ; Cn # <reserved-170D> +1715..171F ; Cn # [11] <reserved-1715>..<reserved-171F> +1737..173F ; Cn # [9] <reserved-1737>..<reserved-173F> +1754..175F ; Cn # [12] <reserved-1754>..<reserved-175F> +176D ; Cn # <reserved-176D> +1771 ; Cn # <reserved-1771> +1774..177F ; Cn # [12] <reserved-1774>..<reserved-177F> +17DE..17DF ; Cn # [2] <reserved-17DE>..<reserved-17DF> +17EA..17EF ; Cn # [6] <reserved-17EA>..<reserved-17EF> +17FA..17FF ; Cn # [6] <reserved-17FA>..<reserved-17FF> +180F ; Cn # <reserved-180F> +181A..181F ; Cn # [6] <reserved-181A>..<reserved-181F> +1879..187F ; Cn # [7] <reserved-1879>..<reserved-187F> +18AB..18AF ; Cn # [5] <reserved-18AB>..<reserved-18AF> +18F6..18FF ; Cn # [10] <reserved-18F6>..<reserved-18FF> +191F ; Cn # <reserved-191F> +192C..192F ; Cn # [4] <reserved-192C>..<reserved-192F> +193C..193F ; Cn # [4] <reserved-193C>..<reserved-193F> +1941..1943 ; Cn # [3] <reserved-1941>..<reserved-1943> +196E..196F ; Cn # [2] <reserved-196E>..<reserved-196F> +1975..197F ; Cn # [11] <reserved-1975>..<reserved-197F> +19AC..19AF ; Cn # [4] <reserved-19AC>..<reserved-19AF> +19CA..19CF ; Cn # [6] <reserved-19CA>..<reserved-19CF> +19DB..19DD ; Cn # [3] <reserved-19DB>..<reserved-19DD> +1A1C..1A1D ; Cn # [2] <reserved-1A1C>..<reserved-1A1D> +1A5F ; Cn # <reserved-1A5F> +1A7D..1A7E ; Cn # [2] <reserved-1A7D>..<reserved-1A7E> +1A8A..1A8F ; Cn # [6] <reserved-1A8A>..<reserved-1A8F> +1A9A..1A9F ; Cn # [6] <reserved-1A9A>..<reserved-1A9F> +1AAE..1AAF ; Cn # [2] <reserved-1AAE>..<reserved-1AAF> +1AC1..1AFF ; Cn # [63] <reserved-1AC1>..<reserved-1AFF> +1B4C..1B4F ; Cn # [4] <reserved-1B4C>..<reserved-1B4F> +1B7D..1B7F ; Cn # [3] <reserved-1B7D>..<reserved-1B7F> +1BF4..1BFB ; Cn # [8] <reserved-1BF4>..<reserved-1BFB> +1C38..1C3A ; Cn # [3] <reserved-1C38>..<reserved-1C3A> +1C4A..1C4C ; Cn # [3] <reserved-1C4A>..<reserved-1C4C> +1C89..1C8F ; Cn # [7] <reserved-1C89>..<reserved-1C8F> +1CBB..1CBC ; Cn # [2] <reserved-1CBB>..<reserved-1CBC> +1CC8..1CCF ; Cn # [8] <reserved-1CC8>..<reserved-1CCF> +1CFB..1CFF ; Cn # [5] <reserved-1CFB>..<reserved-1CFF> +1DFA ; Cn # <reserved-1DFA> +1F16..1F17 ; Cn # [2] <reserved-1F16>..<reserved-1F17> +1F1E..1F1F ; Cn # [2] <reserved-1F1E>..<reserved-1F1F> +1F46..1F47 ; Cn # [2] <reserved-1F46>..<reserved-1F47> +1F4E..1F4F ; Cn # [2] <reserved-1F4E>..<reserved-1F4F> +1F58 ; Cn # <reserved-1F58> +1F5A ; Cn # <reserved-1F5A> +1F5C ; Cn # <reserved-1F5C> +1F5E ; Cn # <reserved-1F5E> +1F7E..1F7F ; Cn # [2] <reserved-1F7E>..<reserved-1F7F> +1FB5 ; Cn # <reserved-1FB5> +1FC5 ; Cn # <reserved-1FC5> +1FD4..1FD5 ; Cn # [2] <reserved-1FD4>..<reserved-1FD5> +1FDC ; Cn # <reserved-1FDC> +1FF0..1FF1 ; Cn # [2] <reserved-1FF0>..<reserved-1FF1> +1FF5 ; Cn # <reserved-1FF5> +1FFF ; Cn # <reserved-1FFF> +2065 ; Cn # <reserved-2065> +2072..2073 ; Cn # [2] <reserved-2072>..<reserved-2073> +208F ; Cn # <reserved-208F> +209D..209F ; Cn # [3] <reserved-209D>..<reserved-209F> +20C0..20CF ; Cn # [16] <reserved-20C0>..<reserved-20CF> +20F1..20FF ; Cn # [15] <reserved-20F1>..<reserved-20FF> +218C..218F ; Cn # [4] <reserved-218C>..<reserved-218F> +2427..243F ; Cn # [25] <reserved-2427>..<reserved-243F> +244B..245F ; Cn # [21] <reserved-244B>..<reserved-245F> +2B74..2B75 ; Cn # [2] <reserved-2B74>..<reserved-2B75> +2B96 ; Cn # <reserved-2B96> +2C2F ; Cn # <reserved-2C2F> +2C5F ; Cn # <reserved-2C5F> +2CF4..2CF8 ; Cn # [5] <reserved-2CF4>..<reserved-2CF8> +2D26 ; Cn # <reserved-2D26> +2D28..2D2C ; Cn # [5] <reserved-2D28>..<reserved-2D2C> +2D2E..2D2F ; Cn # [2] <reserved-2D2E>..<reserved-2D2F> +2D68..2D6E ; Cn # [7] <reserved-2D68>..<reserved-2D6E> +2D71..2D7E ; Cn # [14] <reserved-2D71>..<reserved-2D7E> +2D97..2D9F ; Cn # [9] <reserved-2D97>..<reserved-2D9F> +2DA7 ; Cn # <reserved-2DA7> +2DAF ; Cn # <reserved-2DAF> +2DB7 ; Cn # <reserved-2DB7> +2DBF ; Cn # <reserved-2DBF> +2DC7 ; Cn # <reserved-2DC7> +2DCF ; Cn # <reserved-2DCF> +2DD7 ; Cn # <reserved-2DD7> +2DDF ; Cn # <reserved-2DDF> +2E53..2E7F ; Cn # [45] <reserved-2E53>..<reserved-2E7F> +2E9A ; Cn # <reserved-2E9A> +2EF4..2EFF ; Cn # [12] <reserved-2EF4>..<reserved-2EFF> +2FD6..2FEF ; Cn # [26] <reserved-2FD6>..<reserved-2FEF> +2FFC..2FFF ; Cn # [4] <reserved-2FFC>..<reserved-2FFF> +3040 ; Cn # <reserved-3040> +3097..3098 ; Cn # [2] <reserved-3097>..<reserved-3098> +3100..3104 ; Cn # [5] <reserved-3100>..<reserved-3104> +3130 ; Cn # <reserved-3130> +318F ; Cn # <reserved-318F> +31E4..31EF ; Cn # [12] <reserved-31E4>..<reserved-31EF> +321F ; Cn # <reserved-321F> +9FFD..9FFF ; Cn # [3] <reserved-9FFD>..<reserved-9FFF> +A48D..A48F ; Cn # [3] <reserved-A48D>..<reserved-A48F> +A4C7..A4CF ; Cn # [9] <reserved-A4C7>..<reserved-A4CF> +A62C..A63F ; Cn # [20] <reserved-A62C>..<reserved-A63F> +A6F8..A6FF ; Cn # [8] <reserved-A6F8>..<reserved-A6FF> +A7C0..A7C1 ; Cn # [2] <reserved-A7C0>..<reserved-A7C1> +A7CB..A7F4 ; Cn # [42] <reserved-A7CB>..<reserved-A7F4> +A82D..A82F ; Cn # [3] <reserved-A82D>..<reserved-A82F> +A83A..A83F ; Cn # [6] <reserved-A83A>..<reserved-A83F> +A878..A87F ; Cn # [8] <reserved-A878>..<reserved-A87F> +A8C6..A8CD ; Cn # [8] <reserved-A8C6>..<reserved-A8CD> +A8DA..A8DF ; Cn # [6] <reserved-A8DA>..<reserved-A8DF> +A954..A95E ; Cn # [11] <reserved-A954>..<reserved-A95E> +A97D..A97F ; Cn # [3] <reserved-A97D>..<reserved-A97F> +A9CE ; Cn # <reserved-A9CE> +A9DA..A9DD ; Cn # [4] <reserved-A9DA>..<reserved-A9DD> +A9FF ; Cn # <reserved-A9FF> +AA37..AA3F ; Cn # [9] <reserved-AA37>..<reserved-AA3F> +AA4E..AA4F ; Cn # [2] <reserved-AA4E>..<reserved-AA4F> +AA5A..AA5B ; Cn # [2] <reserved-AA5A>..<reserved-AA5B> +AAC3..AADA ; Cn # [24] <reserved-AAC3>..<reserved-AADA> +AAF7..AB00 ; Cn # [10] <reserved-AAF7>..<reserved-AB00> +AB07..AB08 ; Cn # [2] <reserved-AB07>..<reserved-AB08> +AB0F..AB10 ; Cn # [2] <reserved-AB0F>..<reserved-AB10> +AB17..AB1F ; Cn # [9] <reserved-AB17>..<reserved-AB1F> +AB27 ; Cn # <reserved-AB27> +AB2F ; Cn # <reserved-AB2F> +AB6C..AB6F ; Cn # [4] <reserved-AB6C>..<reserved-AB6F> +ABEE..ABEF ; Cn # [2] <reserved-ABEE>..<reserved-ABEF> +ABFA..ABFF ; Cn # [6] <reserved-ABFA>..<reserved-ABFF> +D7A4..D7AF ; Cn # [12] <reserved-D7A4>..<reserved-D7AF> +D7C7..D7CA ; Cn # [4] <reserved-D7C7>..<reserved-D7CA> +D7FC..D7FF ; Cn # [4] <reserved-D7FC>..<reserved-D7FF> +FA6E..FA6F ; Cn # [2] <reserved-FA6E>..<reserved-FA6F> +FADA..FAFF ; Cn # [38] <reserved-FADA>..<reserved-FAFF> +FB07..FB12 ; Cn # [12] <reserved-FB07>..<reserved-FB12> +FB18..FB1C ; Cn # [5] <reserved-FB18>..<reserved-FB1C> +FB37 ; Cn # <reserved-FB37> +FB3D ; Cn # <reserved-FB3D> +FB3F ; Cn # <reserved-FB3F> +FB42 ; Cn # <reserved-FB42> +FB45 ; Cn # <reserved-FB45> +FBC2..FBD2 ; Cn # [17] <reserved-FBC2>..<reserved-FBD2> +FD40..FD4F ; Cn # [16] <reserved-FD40>..<reserved-FD4F> +FD90..FD91 ; Cn # [2] <reserved-FD90>..<reserved-FD91> +FDC8..FDEF ; Cn # [40] <reserved-FDC8>..<noncharacter-FDEF> +FDFE..FDFF ; Cn # [2] <reserved-FDFE>..<reserved-FDFF> +FE1A..FE1F ; Cn # [6] <reserved-FE1A>..<reserved-FE1F> +FE53 ; Cn # <reserved-FE53> +FE67 ; Cn # <reserved-FE67> +FE6C..FE6F ; Cn # [4] <reserved-FE6C>..<reserved-FE6F> +FE75 ; Cn # <reserved-FE75> +FEFD..FEFE ; Cn # [2] <reserved-FEFD>..<reserved-FEFE> +FF00 ; Cn # <reserved-FF00> +FFBF..FFC1 ; Cn # [3] <reserved-FFBF>..<reserved-FFC1> +FFC8..FFC9 ; Cn # [2] <reserved-FFC8>..<reserved-FFC9> +FFD0..FFD1 ; Cn # [2] <reserved-FFD0>..<reserved-FFD1> +FFD8..FFD9 ; Cn # [2] <reserved-FFD8>..<reserved-FFD9> +FFDD..FFDF ; Cn # [3] <reserved-FFDD>..<reserved-FFDF> +FFE7 ; Cn # <reserved-FFE7> +FFEF..FFF8 ; Cn # [10] <reserved-FFEF>..<reserved-FFF8> +FFFE..FFFF ; Cn # [2] <noncharacter-FFFE>..<noncharacter-FFFF> +1000C ; Cn # <reserved-1000C> +10027 ; Cn # <reserved-10027> +1003B ; Cn # <reserved-1003B> +1003E ; Cn # <reserved-1003E> +1004E..1004F ; Cn # [2] <reserved-1004E>..<reserved-1004F> +1005E..1007F ; Cn # [34] <reserved-1005E>..<reserved-1007F> +100FB..100FF ; Cn # [5] <reserved-100FB>..<reserved-100FF> +10103..10106 ; Cn # [4] <reserved-10103>..<reserved-10106> +10134..10136 ; Cn # [3] <reserved-10134>..<reserved-10136> +1018F ; Cn # <reserved-1018F> +1019D..1019F ; Cn # [3] <reserved-1019D>..<reserved-1019F> +101A1..101CF ; Cn # [47] <reserved-101A1>..<reserved-101CF> +101FE..1027F ; Cn # [130] <reserved-101FE>..<reserved-1027F> +1029D..1029F ; Cn # [3] <reserved-1029D>..<reserved-1029F> +102D1..102DF ; Cn # [15] <reserved-102D1>..<reserved-102DF> +102FC..102FF ; Cn # [4] <reserved-102FC>..<reserved-102FF> +10324..1032C ; Cn # [9] <reserved-10324>..<reserved-1032C> +1034B..1034F ; Cn # [5] <reserved-1034B>..<reserved-1034F> +1037B..1037F ; Cn # [5] <reserved-1037B>..<reserved-1037F> +1039E ; Cn # <reserved-1039E> +103C4..103C7 ; Cn # [4] <reserved-103C4>..<reserved-103C7> +103D6..103FF ; Cn # [42] <reserved-103D6>..<reserved-103FF> +1049E..1049F ; Cn # [2] <reserved-1049E>..<reserved-1049F> +104AA..104AF ; Cn # [6] <reserved-104AA>..<reserved-104AF> +104D4..104D7 ; Cn # [4] <reserved-104D4>..<reserved-104D7> +104FC..104FF ; Cn # [4] <reserved-104FC>..<reserved-104FF> +10528..1052F ; Cn # [8] <reserved-10528>..<reserved-1052F> +10564..1056E ; Cn # [11] <reserved-10564>..<reserved-1056E> +10570..105FF ; Cn # [144] <reserved-10570>..<reserved-105FF> +10737..1073F ; Cn # [9] <reserved-10737>..<reserved-1073F> +10756..1075F ; Cn # [10] <reserved-10756>..<reserved-1075F> +10768..107FF ; Cn # [152] <reserved-10768>..<reserved-107FF> +10806..10807 ; Cn # [2] <reserved-10806>..<reserved-10807> +10809 ; Cn # <reserved-10809> +10836 ; Cn # <reserved-10836> +10839..1083B ; Cn # [3] <reserved-10839>..<reserved-1083B> +1083D..1083E ; Cn # [2] <reserved-1083D>..<reserved-1083E> +10856 ; Cn # <reserved-10856> +1089F..108A6 ; Cn # [8] <reserved-1089F>..<reserved-108A6> +108B0..108DF ; Cn # [48] <reserved-108B0>..<reserved-108DF> +108F3 ; Cn # <reserved-108F3> +108F6..108FA ; Cn # [5] <reserved-108F6>..<reserved-108FA> +1091C..1091E ; Cn # [3] <reserved-1091C>..<reserved-1091E> +1093A..1093E ; Cn # [5] <reserved-1093A>..<reserved-1093E> +10940..1097F ; Cn # [64] <reserved-10940>..<reserved-1097F> +109B8..109BB ; Cn # [4] <reserved-109B8>..<reserved-109BB> +109D0..109D1 ; Cn # [2] <reserved-109D0>..<reserved-109D1> +10A04 ; Cn # <reserved-10A04> +10A07..10A0B ; Cn # [5] <reserved-10A07>..<reserved-10A0B> +10A14 ; Cn # <reserved-10A14> +10A18 ; Cn # <reserved-10A18> +10A36..10A37 ; Cn # [2] <reserved-10A36>..<reserved-10A37> +10A3B..10A3E ; Cn # [4] <reserved-10A3B>..<reserved-10A3E> +10A49..10A4F ; Cn # [7] <reserved-10A49>..<reserved-10A4F> +10A59..10A5F ; Cn # [7] <reserved-10A59>..<reserved-10A5F> +10AA0..10ABF ; Cn # [32] <reserved-10AA0>..<reserved-10ABF> +10AE7..10AEA ; Cn # [4] <reserved-10AE7>..<reserved-10AEA> +10AF7..10AFF ; Cn # [9] <reserved-10AF7>..<reserved-10AFF> +10B36..10B38 ; Cn # [3] <reserved-10B36>..<reserved-10B38> +10B56..10B57 ; Cn # [2] <reserved-10B56>..<reserved-10B57> +10B73..10B77 ; Cn # [5] <reserved-10B73>..<reserved-10B77> +10B92..10B98 ; Cn # [7] <reserved-10B92>..<reserved-10B98> +10B9D..10BA8 ; Cn # [12] <reserved-10B9D>..<reserved-10BA8> +10BB0..10BFF ; Cn # [80] <reserved-10BB0>..<reserved-10BFF> +10C49..10C7F ; Cn # [55] <reserved-10C49>..<reserved-10C7F> +10CB3..10CBF ; Cn # [13] <reserved-10CB3>..<reserved-10CBF> +10CF3..10CF9 ; Cn # [7] <reserved-10CF3>..<reserved-10CF9> +10D28..10D2F ; Cn # [8] <reserved-10D28>..<reserved-10D2F> +10D3A..10E5F ; Cn # [294] <reserved-10D3A>..<reserved-10E5F> +10E7F ; Cn # <reserved-10E7F> +10EAA ; Cn # <reserved-10EAA> +10EAE..10EAF ; Cn # [2] <reserved-10EAE>..<reserved-10EAF> +10EB2..10EFF ; Cn # [78] <reserved-10EB2>..<reserved-10EFF> +10F28..10F2F ; Cn # [8] <reserved-10F28>..<reserved-10F2F> +10F5A..10FAF ; Cn # [86] <reserved-10F5A>..<reserved-10FAF> +10FCC..10FDF ; Cn # [20] <reserved-10FCC>..<reserved-10FDF> +10FF7..10FFF ; Cn # [9] <reserved-10FF7>..<reserved-10FFF> +1104E..11051 ; Cn # [4] <reserved-1104E>..<reserved-11051> +11070..1107E ; Cn # [15] <reserved-11070>..<reserved-1107E> +110C2..110CC ; Cn # [11] <reserved-110C2>..<reserved-110CC> +110CE..110CF ; Cn # [2] <reserved-110CE>..<reserved-110CF> +110E9..110EF ; Cn # [7] <reserved-110E9>..<reserved-110EF> +110FA..110FF ; Cn # [6] <reserved-110FA>..<reserved-110FF> +11135 ; Cn # <reserved-11135> +11148..1114F ; Cn # [8] <reserved-11148>..<reserved-1114F> +11177..1117F ; Cn # [9] <reserved-11177>..<reserved-1117F> +111E0 ; Cn # <reserved-111E0> +111F5..111FF ; Cn # [11] <reserved-111F5>..<reserved-111FF> +11212 ; Cn # <reserved-11212> +1123F..1127F ; Cn # [65] <reserved-1123F>..<reserved-1127F> +11287 ; Cn # <reserved-11287> +11289 ; Cn # <reserved-11289> +1128E ; Cn # <reserved-1128E> +1129E ; Cn # <reserved-1129E> +112AA..112AF ; Cn # [6] <reserved-112AA>..<reserved-112AF> +112EB..112EF ; Cn # [5] <reserved-112EB>..<reserved-112EF> +112FA..112FF ; Cn # [6] <reserved-112FA>..<reserved-112FF> +11304 ; Cn # <reserved-11304> +1130D..1130E ; Cn # [2] <reserved-1130D>..<reserved-1130E> +11311..11312 ; Cn # [2] <reserved-11311>..<reserved-11312> +11329 ; Cn # <reserved-11329> +11331 ; Cn # <reserved-11331> +11334 ; Cn # <reserved-11334> +1133A ; Cn # <reserved-1133A> +11345..11346 ; Cn # [2] <reserved-11345>..<reserved-11346> +11349..1134A ; Cn # [2] <reserved-11349>..<reserved-1134A> +1134E..1134F ; Cn # [2] <reserved-1134E>..<reserved-1134F> +11351..11356 ; Cn # [6] <reserved-11351>..<reserved-11356> +11358..1135C ; Cn # [5] <reserved-11358>..<reserved-1135C> +11364..11365 ; Cn # [2] <reserved-11364>..<reserved-11365> +1136D..1136F ; Cn # [3] <reserved-1136D>..<reserved-1136F> +11375..113FF ; Cn # [139] <reserved-11375>..<reserved-113FF> +1145C ; Cn # <reserved-1145C> +11462..1147F ; Cn # [30] <reserved-11462>..<reserved-1147F> +114C8..114CF ; Cn # [8] <reserved-114C8>..<reserved-114CF> +114DA..1157F ; Cn # [166] <reserved-114DA>..<reserved-1157F> +115B6..115B7 ; Cn # [2] <reserved-115B6>..<reserved-115B7> +115DE..115FF ; Cn # [34] <reserved-115DE>..<reserved-115FF> +11645..1164F ; Cn # [11] <reserved-11645>..<reserved-1164F> +1165A..1165F ; Cn # [6] <reserved-1165A>..<reserved-1165F> +1166D..1167F ; Cn # [19] <reserved-1166D>..<reserved-1167F> +116B9..116BF ; Cn # [7] <reserved-116B9>..<reserved-116BF> +116CA..116FF ; Cn # [54] <reserved-116CA>..<reserved-116FF> +1171B..1171C ; Cn # [2] <reserved-1171B>..<reserved-1171C> +1172C..1172F ; Cn # [4] <reserved-1172C>..<reserved-1172F> +11740..117FF ; Cn # [192] <reserved-11740>..<reserved-117FF> +1183C..1189F ; Cn # [100] <reserved-1183C>..<reserved-1189F> +118F3..118FE ; Cn # [12] <reserved-118F3>..<reserved-118FE> +11907..11908 ; Cn # [2] <reserved-11907>..<reserved-11908> +1190A..1190B ; Cn # [2] <reserved-1190A>..<reserved-1190B> +11914 ; Cn # <reserved-11914> +11917 ; Cn # <reserved-11917> +11936 ; Cn # <reserved-11936> +11939..1193A ; Cn # [2] <reserved-11939>..<reserved-1193A> +11947..1194F ; Cn # [9] <reserved-11947>..<reserved-1194F> +1195A..1199F ; Cn # [70] <reserved-1195A>..<reserved-1199F> +119A8..119A9 ; Cn # [2] <reserved-119A8>..<reserved-119A9> +119D8..119D9 ; Cn # [2] <reserved-119D8>..<reserved-119D9> +119E5..119FF ; Cn # [27] <reserved-119E5>..<reserved-119FF> +11A48..11A4F ; Cn # [8] <reserved-11A48>..<reserved-11A4F> +11AA3..11ABF ; Cn # [29] <reserved-11AA3>..<reserved-11ABF> +11AF9..11BFF ; Cn # [263] <reserved-11AF9>..<reserved-11BFF> +11C09 ; Cn # <reserved-11C09> +11C37 ; Cn # <reserved-11C37> +11C46..11C4F ; Cn # [10] <reserved-11C46>..<reserved-11C4F> +11C6D..11C6F ; Cn # [3] <reserved-11C6D>..<reserved-11C6F> +11C90..11C91 ; Cn # [2] <reserved-11C90>..<reserved-11C91> +11CA8 ; Cn # <reserved-11CA8> +11CB7..11CFF ; Cn # [73] <reserved-11CB7>..<reserved-11CFF> +11D07 ; Cn # <reserved-11D07> +11D0A ; Cn # <reserved-11D0A> +11D37..11D39 ; Cn # [3] <reserved-11D37>..<reserved-11D39> +11D3B ; Cn # <reserved-11D3B> +11D3E ; Cn # <reserved-11D3E> +11D48..11D4F ; Cn # [8] <reserved-11D48>..<reserved-11D4F> +11D5A..11D5F ; Cn # [6] <reserved-11D5A>..<reserved-11D5F> +11D66 ; Cn # <reserved-11D66> +11D69 ; Cn # <reserved-11D69> +11D8F ; Cn # <reserved-11D8F> +11D92 ; Cn # <reserved-11D92> +11D99..11D9F ; Cn # [7] <reserved-11D99>..<reserved-11D9F> +11DAA..11EDF ; Cn # [310] <reserved-11DAA>..<reserved-11EDF> +11EF9..11FAF ; Cn # [183] <reserved-11EF9>..<reserved-11FAF> +11FB1..11FBF ; Cn # [15] <reserved-11FB1>..<reserved-11FBF> +11FF2..11FFE ; Cn # [13] <reserved-11FF2>..<reserved-11FFE> +1239A..123FF ; Cn # [102] <reserved-1239A>..<reserved-123FF> +1246F ; Cn # <reserved-1246F> +12475..1247F ; Cn # [11] <reserved-12475>..<reserved-1247F> +12544..12FFF ; Cn # [2748] <reserved-12544>..<reserved-12FFF> +1342F ; Cn # <reserved-1342F> +13439..143FF ; Cn # [4039] <reserved-13439>..<reserved-143FF> +14647..167FF ; Cn # [8633] <reserved-14647>..<reserved-167FF> +16A39..16A3F ; Cn # [7] <reserved-16A39>..<reserved-16A3F> +16A5F ; Cn # <reserved-16A5F> +16A6A..16A6D ; Cn # [4] <reserved-16A6A>..<reserved-16A6D> +16A70..16ACF ; Cn # [96] <reserved-16A70>..<reserved-16ACF> +16AEE..16AEF ; Cn # [2] <reserved-16AEE>..<reserved-16AEF> +16AF6..16AFF ; Cn # [10] <reserved-16AF6>..<reserved-16AFF> +16B46..16B4F ; Cn # [10] <reserved-16B46>..<reserved-16B4F> +16B5A ; Cn # <reserved-16B5A> +16B62 ; Cn # <reserved-16B62> +16B78..16B7C ; Cn # [5] <reserved-16B78>..<reserved-16B7C> +16B90..16E3F ; Cn # [688] <reserved-16B90>..<reserved-16E3F> +16E9B..16EFF ; Cn # [101] <reserved-16E9B>..<reserved-16EFF> +16F4B..16F4E ; Cn # [4] <reserved-16F4B>..<reserved-16F4E> +16F88..16F8E ; Cn # [7] <reserved-16F88>..<reserved-16F8E> +16FA0..16FDF ; Cn # [64] <reserved-16FA0>..<reserved-16FDF> +16FE5..16FEF ; Cn # [11] <reserved-16FE5>..<reserved-16FEF> +16FF2..16FFF ; Cn # [14] <reserved-16FF2>..<reserved-16FFF> +187F8..187FF ; Cn # [8] <reserved-187F8>..<reserved-187FF> +18CD6..18CFF ; Cn # [42] <reserved-18CD6>..<reserved-18CFF> +18D09..1AFFF ; Cn # [8951] <reserved-18D09>..<reserved-1AFFF> +1B11F..1B14F ; Cn # [49] <reserved-1B11F>..<reserved-1B14F> +1B153..1B163 ; Cn # [17] <reserved-1B153>..<reserved-1B163> +1B168..1B16F ; Cn # [8] <reserved-1B168>..<reserved-1B16F> +1B2FC..1BBFF ; Cn # [2308] <reserved-1B2FC>..<reserved-1BBFF> +1BC6B..1BC6F ; Cn # [5] <reserved-1BC6B>..<reserved-1BC6F> +1BC7D..1BC7F ; Cn # [3] <reserved-1BC7D>..<reserved-1BC7F> +1BC89..1BC8F ; Cn # [7] <reserved-1BC89>..<reserved-1BC8F> +1BC9A..1BC9B ; Cn # [2] <reserved-1BC9A>..<reserved-1BC9B> +1BCA4..1CFFF ; Cn # [4956] <reserved-1BCA4>..<reserved-1CFFF> +1D0F6..1D0FF ; Cn # [10] <reserved-1D0F6>..<reserved-1D0FF> +1D127..1D128 ; Cn # [2] <reserved-1D127>..<reserved-1D128> +1D1E9..1D1FF ; Cn # [23] <reserved-1D1E9>..<reserved-1D1FF> +1D246..1D2DF ; Cn # [154] <reserved-1D246>..<reserved-1D2DF> +1D2F4..1D2FF ; Cn # [12] <reserved-1D2F4>..<reserved-1D2FF> +1D357..1D35F ; Cn # [9] <reserved-1D357>..<reserved-1D35F> +1D379..1D3FF ; Cn # [135] <reserved-1D379>..<reserved-1D3FF> +1D455 ; Cn # <reserved-1D455> +1D49D ; Cn # <reserved-1D49D> +1D4A0..1D4A1 ; Cn # [2] <reserved-1D4A0>..<reserved-1D4A1> +1D4A3..1D4A4 ; Cn # [2] <reserved-1D4A3>..<reserved-1D4A4> +1D4A7..1D4A8 ; Cn # [2] <reserved-1D4A7>..<reserved-1D4A8> +1D4AD ; Cn # <reserved-1D4AD> +1D4BA ; Cn # <reserved-1D4BA> +1D4BC ; Cn # <reserved-1D4BC> +1D4C4 ; Cn # <reserved-1D4C4> +1D506 ; Cn # <reserved-1D506> +1D50B..1D50C ; Cn # [2] <reserved-1D50B>..<reserved-1D50C> +1D515 ; Cn # <reserved-1D515> +1D51D ; Cn # <reserved-1D51D> +1D53A ; Cn # <reserved-1D53A> +1D53F ; Cn # <reserved-1D53F> +1D545 ; Cn # <reserved-1D545> +1D547..1D549 ; Cn # [3] <reserved-1D547>..<reserved-1D549> +1D551 ; Cn # <reserved-1D551> +1D6A6..1D6A7 ; Cn # [2] <reserved-1D6A6>..<reserved-1D6A7> +1D7CC..1D7CD ; Cn # [2] <reserved-1D7CC>..<reserved-1D7CD> +1DA8C..1DA9A ; Cn # [15] <reserved-1DA8C>..<reserved-1DA9A> +1DAA0 ; Cn # <reserved-1DAA0> +1DAB0..1DFFF ; Cn # [1360] <reserved-1DAB0>..<reserved-1DFFF> +1E007 ; Cn # <reserved-1E007> +1E019..1E01A ; Cn # [2] <reserved-1E019>..<reserved-1E01A> +1E022 ; Cn # <reserved-1E022> +1E025 ; Cn # <reserved-1E025> +1E02B..1E0FF ; Cn # [213] <reserved-1E02B>..<reserved-1E0FF> +1E12D..1E12F ; Cn # [3] <reserved-1E12D>..<reserved-1E12F> +1E13E..1E13F ; Cn # [2] <reserved-1E13E>..<reserved-1E13F> +1E14A..1E14D ; Cn # [4] <reserved-1E14A>..<reserved-1E14D> +1E150..1E2BF ; Cn # [368] <reserved-1E150>..<reserved-1E2BF> +1E2FA..1E2FE ; Cn # [5] <reserved-1E2FA>..<reserved-1E2FE> +1E300..1E7FF ; Cn # [1280] <reserved-1E300>..<reserved-1E7FF> +1E8C5..1E8C6 ; Cn # [2] <reserved-1E8C5>..<reserved-1E8C6> +1E8D7..1E8FF ; Cn # [41] <reserved-1E8D7>..<reserved-1E8FF> +1E94C..1E94F ; Cn # [4] <reserved-1E94C>..<reserved-1E94F> +1E95A..1E95D ; Cn # [4] <reserved-1E95A>..<reserved-1E95D> +1E960..1EC70 ; Cn # [785] <reserved-1E960>..<reserved-1EC70> +1ECB5..1ED00 ; Cn # [76] <reserved-1ECB5>..<reserved-1ED00> +1ED3E..1EDFF ; Cn # [194] <reserved-1ED3E>..<reserved-1EDFF> +1EE04 ; Cn # <reserved-1EE04> +1EE20 ; Cn # <reserved-1EE20> +1EE23 ; Cn # <reserved-1EE23> +1EE25..1EE26 ; Cn # [2] <reserved-1EE25>..<reserved-1EE26> +1EE28 ; Cn # <reserved-1EE28> +1EE33 ; Cn # <reserved-1EE33> +1EE38 ; Cn # <reserved-1EE38> +1EE3A ; Cn # <reserved-1EE3A> +1EE3C..1EE41 ; Cn # [6] <reserved-1EE3C>..<reserved-1EE41> +1EE43..1EE46 ; Cn # [4] <reserved-1EE43>..<reserved-1EE46> +1EE48 ; Cn # <reserved-1EE48> +1EE4A ; Cn # <reserved-1EE4A> +1EE4C ; Cn # <reserved-1EE4C> +1EE50 ; Cn # <reserved-1EE50> +1EE53 ; Cn # <reserved-1EE53> +1EE55..1EE56 ; Cn # [2] <reserved-1EE55>..<reserved-1EE56> +1EE58 ; Cn # <reserved-1EE58> +1EE5A ; Cn # <reserved-1EE5A> +1EE5C ; Cn # <reserved-1EE5C> +1EE5E ; Cn # <reserved-1EE5E> +1EE60 ; Cn # <reserved-1EE60> +1EE63 ; Cn # <reserved-1EE63> +1EE65..1EE66 ; Cn # [2] <reserved-1EE65>..<reserved-1EE66> +1EE6B ; Cn # <reserved-1EE6B> +1EE73 ; Cn # <reserved-1EE73> +1EE78 ; Cn # <reserved-1EE78> +1EE7D ; Cn # <reserved-1EE7D> +1EE7F ; Cn # <reserved-1EE7F> +1EE8A ; Cn # <reserved-1EE8A> +1EE9C..1EEA0 ; Cn # [5] <reserved-1EE9C>..<reserved-1EEA0> +1EEA4 ; Cn # <reserved-1EEA4> +1EEAA ; Cn # <reserved-1EEAA> +1EEBC..1EEEF ; Cn # [52] <reserved-1EEBC>..<reserved-1EEEF> +1EEF2..1EFFF ; Cn # [270] <reserved-1EEF2>..<reserved-1EFFF> +1F02C..1F02F ; Cn # [4] <reserved-1F02C>..<reserved-1F02F> +1F094..1F09F ; Cn # [12] <reserved-1F094>..<reserved-1F09F> +1F0AF..1F0B0 ; Cn # [2] <reserved-1F0AF>..<reserved-1F0B0> +1F0C0 ; Cn # <reserved-1F0C0> +1F0D0 ; Cn # <reserved-1F0D0> +1F0F6..1F0FF ; Cn # [10] <reserved-1F0F6>..<reserved-1F0FF> +1F1AE..1F1E5 ; Cn # [56] <reserved-1F1AE>..<reserved-1F1E5> +1F203..1F20F ; Cn # [13] <reserved-1F203>..<reserved-1F20F> +1F23C..1F23F ; Cn # [4] <reserved-1F23C>..<reserved-1F23F> +1F249..1F24F ; Cn # [7] <reserved-1F249>..<reserved-1F24F> +1F252..1F25F ; Cn # [14] <reserved-1F252>..<reserved-1F25F> +1F266..1F2FF ; Cn # [154] <reserved-1F266>..<reserved-1F2FF> +1F6D8..1F6DF ; Cn # [8] <reserved-1F6D8>..<reserved-1F6DF> +1F6ED..1F6EF ; Cn # [3] <reserved-1F6ED>..<reserved-1F6EF> +1F6FD..1F6FF ; Cn # [3] <reserved-1F6FD>..<reserved-1F6FF> +1F774..1F77F ; Cn # [12] <reserved-1F774>..<reserved-1F77F> +1F7D9..1F7DF ; Cn # [7] <reserved-1F7D9>..<reserved-1F7DF> +1F7EC..1F7FF ; Cn # [20] <reserved-1F7EC>..<reserved-1F7FF> +1F80C..1F80F ; Cn # [4] <reserved-1F80C>..<reserved-1F80F> +1F848..1F84F ; Cn # [8] <reserved-1F848>..<reserved-1F84F> +1F85A..1F85F ; Cn # [6] <reserved-1F85A>..<reserved-1F85F> +1F888..1F88F ; Cn # [8] <reserved-1F888>..<reserved-1F88F> +1F8AE..1F8AF ; Cn # [2] <reserved-1F8AE>..<reserved-1F8AF> +1F8B2..1F8FF ; Cn # [78] <reserved-1F8B2>..<reserved-1F8FF> +1F979 ; Cn # <reserved-1F979> +1F9CC ; Cn # <reserved-1F9CC> +1FA54..1FA5F ; Cn # [12] <reserved-1FA54>..<reserved-1FA5F> +1FA6E..1FA6F ; Cn # [2] <reserved-1FA6E>..<reserved-1FA6F> +1FA75..1FA77 ; Cn # [3] <reserved-1FA75>..<reserved-1FA77> +1FA7B..1FA7F ; Cn # [5] <reserved-1FA7B>..<reserved-1FA7F> +1FA87..1FA8F ; Cn # [9] <reserved-1FA87>..<reserved-1FA8F> +1FAA9..1FAAF ; Cn # [7] <reserved-1FAA9>..<reserved-1FAAF> +1FAB7..1FABF ; Cn # [9] <reserved-1FAB7>..<reserved-1FABF> +1FAC3..1FACF ; Cn # [13] <reserved-1FAC3>..<reserved-1FACF> +1FAD7..1FAFF ; Cn # [41] <reserved-1FAD7>..<reserved-1FAFF> +1FB93 ; Cn # <reserved-1FB93> +1FBCB..1FBEF ; Cn # [37] <reserved-1FBCB>..<reserved-1FBEF> +1FBFA..1FFFF ; Cn # [1030] <reserved-1FBFA>..<noncharacter-1FFFF> +2A6DE..2A6FF ; Cn # [34] <reserved-2A6DE>..<reserved-2A6FF> +2B735..2B73F ; Cn # [11] <reserved-2B735>..<reserved-2B73F> +2B81E..2B81F ; Cn # [2] <reserved-2B81E>..<reserved-2B81F> +2CEA2..2CEAF ; Cn # [14] <reserved-2CEA2>..<reserved-2CEAF> +2EBE1..2F7FF ; Cn # [3103] <reserved-2EBE1>..<reserved-2F7FF> +2FA1E..2FFFF ; Cn # [1506] <reserved-2FA1E>..<noncharacter-2FFFF> +3134B..E0000 ; Cn # [715958] <reserved-3134B>..<reserved-E0000> +E0002..E001F ; Cn # [30] <reserved-E0002>..<reserved-E001F> +E0080..E00FF ; Cn # [128] <reserved-E0080>..<reserved-E00FF> +E01F0..EFFFF ; Cn # [65040] <reserved-E01F0>..<noncharacter-EFFFF> +FFFFE..FFFFF ; Cn # [2] <noncharacter-FFFFE>..<noncharacter-FFFFF> +10FFFE..10FFFF; Cn # [2] <noncharacter-10FFFE>..<noncharacter-10FFFF> + +# Total code points: 830672 + +# ================================================ + +# General_Category=Uppercase_Letter + +0041..005A ; Lu # [26] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER Z +00C0..00D6 ; Lu # [23] LATIN CAPITAL LETTER A WITH GRAVE..LATIN CAPITAL LETTER O WITH DIAERESIS +00D8..00DE ; Lu # [7] LATIN CAPITAL LETTER O WITH STROKE..LATIN CAPITAL LETTER THORN +0100 ; Lu # LATIN CAPITAL LETTER A WITH MACRON +0102 ; Lu # LATIN CAPITAL LETTER A WITH BREVE +0104 ; Lu # LATIN CAPITAL LETTER A WITH OGONEK +0106 ; Lu # LATIN CAPITAL LETTER C WITH ACUTE +0108 ; Lu # LATIN CAPITAL LETTER C WITH CIRCUMFLEX +010A ; Lu # LATIN CAPITAL LETTER C WITH DOT ABOVE +010C ; Lu # LATIN CAPITAL LETTER C WITH CARON +010E ; Lu # LATIN CAPITAL LETTER D WITH CARON +0110 ; Lu # LATIN CAPITAL LETTER D WITH STROKE +0112 ; Lu # LATIN CAPITAL LETTER E WITH MACRON +0114 ; Lu # LATIN CAPITAL LETTER E WITH BREVE +0116 ; Lu # LATIN CAPITAL LETTER E WITH DOT ABOVE +0118 ; Lu # LATIN CAPITAL LETTER E WITH OGONEK +011A ; Lu # LATIN CAPITAL LETTER E WITH CARON +011C ; Lu # LATIN CAPITAL LETTER G WITH CIRCUMFLEX +011E ; Lu # LATIN CAPITAL LETTER G WITH BREVE +0120 ; Lu # LATIN CAPITAL LETTER G WITH DOT ABOVE +0122 ; Lu # LATIN CAPITAL LETTER G WITH CEDILLA +0124 ; Lu # LATIN CAPITAL LETTER H WITH CIRCUMFLEX +0126 ; Lu # LATIN CAPITAL LETTER H WITH STROKE +0128 ; Lu # LATIN CAPITAL LETTER I WITH TILDE +012A ; Lu # LATIN CAPITAL LETTER I WITH MACRON +012C ; Lu # LATIN CAPITAL LETTER I WITH BREVE +012E ; Lu # LATIN CAPITAL LETTER I WITH OGONEK +0130 ; Lu # LATIN CAPITAL LETTER I WITH DOT ABOVE +0132 ; Lu # LATIN CAPITAL LIGATURE IJ +0134 ; Lu # LATIN CAPITAL LETTER J WITH CIRCUMFLEX +0136 ; Lu # LATIN CAPITAL LETTER K WITH CEDILLA +0139 ; Lu # LATIN CAPITAL LETTER L WITH ACUTE +013B ; Lu # LATIN CAPITAL LETTER L WITH CEDILLA +013D ; Lu # LATIN CAPITAL LETTER L WITH CARON +013F ; Lu # LATIN CAPITAL LETTER L WITH MIDDLE DOT +0141 ; Lu # LATIN CAPITAL LETTER L WITH STROKE +0143 ; Lu # LATIN CAPITAL LETTER N WITH ACUTE +0145 ; Lu # LATIN CAPITAL LETTER N WITH CEDILLA +0147 ; Lu # LATIN CAPITAL LETTER N WITH CARON +014A ; Lu # LATIN CAPITAL LETTER ENG +014C ; Lu # LATIN CAPITAL LETTER O WITH MACRON +014E ; Lu # LATIN CAPITAL LETTER O WITH BREVE +0150 ; Lu # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE +0152 ; Lu # LATIN CAPITAL LIGATURE OE +0154 ; Lu # LATIN CAPITAL LETTER R WITH ACUTE +0156 ; Lu # LATIN CAPITAL LETTER R WITH CEDILLA +0158 ; Lu # LATIN CAPITAL LETTER R WITH CARON +015A ; Lu # LATIN CAPITAL LETTER S WITH ACUTE +015C ; Lu # LATIN CAPITAL LETTER S WITH CIRCUMFLEX +015E ; Lu # LATIN CAPITAL LETTER S WITH CEDILLA +0160 ; Lu # LATIN CAPITAL LETTER S WITH CARON +0162 ; Lu # LATIN CAPITAL LETTER T WITH CEDILLA +0164 ; Lu # LATIN CAPITAL LETTER T WITH CARON +0166 ; Lu # LATIN CAPITAL LETTER T WITH STROKE +0168 ; Lu # LATIN CAPITAL LETTER U WITH TILDE +016A ; Lu # LATIN CAPITAL LETTER U WITH MACRON +016C ; Lu # LATIN CAPITAL LETTER U WITH BREVE +016E ; Lu # LATIN CAPITAL LETTER U WITH RING ABOVE +0170 ; Lu # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE +0172 ; Lu # LATIN CAPITAL LETTER U WITH OGONEK +0174 ; Lu # LATIN CAPITAL LETTER W WITH CIRCUMFLEX +0176 ; Lu # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX +0178..0179 ; Lu # [2] LATIN CAPITAL LETTER Y WITH DIAERESIS..LATIN CAPITAL LETTER Z WITH ACUTE +017B ; Lu # LATIN CAPITAL LETTER Z WITH DOT ABOVE +017D ; Lu # LATIN CAPITAL LETTER Z WITH CARON +0181..0182 ; Lu # [2] LATIN CAPITAL LETTER B WITH HOOK..LATIN CAPITAL LETTER B WITH TOPBAR +0184 ; Lu # LATIN CAPITAL LETTER TONE SIX +0186..0187 ; Lu # [2] LATIN CAPITAL LETTER OPEN O..LATIN CAPITAL LETTER C WITH HOOK +0189..018B ; Lu # [3] LATIN CAPITAL LETTER AFRICAN D..LATIN CAPITAL LETTER D WITH TOPBAR +018E..0191 ; Lu # [4] LATIN CAPITAL LETTER REVERSED E..LATIN CAPITAL LETTER F WITH HOOK +0193..0194 ; Lu # [2] LATIN CAPITAL LETTER G WITH HOOK..LATIN CAPITAL LETTER GAMMA +0196..0198 ; Lu # [3] LATIN CAPITAL LETTER IOTA..LATIN CAPITAL LETTER K WITH HOOK +019C..019D ; Lu # [2] LATIN CAPITAL LETTER TURNED M..LATIN CAPITAL LETTER N WITH LEFT HOOK +019F..01A0 ; Lu # [2] LATIN CAPITAL LETTER O WITH MIDDLE TILDE..LATIN CAPITAL LETTER O WITH HORN +01A2 ; Lu # LATIN CAPITAL LETTER OI +01A4 ; Lu # LATIN CAPITAL LETTER P WITH HOOK +01A6..01A7 ; Lu # [2] LATIN LETTER YR..LATIN CAPITAL LETTER TONE TWO +01A9 ; Lu # LATIN CAPITAL LETTER ESH +01AC ; Lu # LATIN CAPITAL LETTER T WITH HOOK +01AE..01AF ; Lu # [2] LATIN CAPITAL LETTER T WITH RETROFLEX HOOK..LATIN CAPITAL LETTER U WITH HORN +01B1..01B3 ; Lu # [3] LATIN CAPITAL LETTER UPSILON..LATIN CAPITAL LETTER Y WITH HOOK +01B5 ; Lu # LATIN CAPITAL LETTER Z WITH STROKE +01B7..01B8 ; Lu # [2] LATIN CAPITAL LETTER EZH..LATIN CAPITAL LETTER EZH REVERSED +01BC ; Lu # LATIN CAPITAL LETTER TONE FIVE +01C4 ; Lu # LATIN CAPITAL LETTER DZ WITH CARON +01C7 ; Lu # LATIN CAPITAL LETTER LJ +01CA ; Lu # LATIN CAPITAL LETTER NJ +01CD ; Lu # LATIN CAPITAL LETTER A WITH CARON +01CF ; Lu # LATIN CAPITAL LETTER I WITH CARON +01D1 ; Lu # LATIN CAPITAL LETTER O WITH CARON +01D3 ; Lu # LATIN CAPITAL LETTER U WITH CARON +01D5 ; Lu # LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON +01D7 ; Lu # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE +01D9 ; Lu # LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON +01DB ; Lu # LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE +01DE ; Lu # LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON +01E0 ; Lu # LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON +01E2 ; Lu # LATIN CAPITAL LETTER AE WITH MACRON +01E4 ; Lu # LATIN CAPITAL LETTER G WITH STROKE +01E6 ; Lu # LATIN CAPITAL LETTER G WITH CARON +01E8 ; Lu # LATIN CAPITAL LETTER K WITH CARON +01EA ; Lu # LATIN CAPITAL LETTER O WITH OGONEK +01EC ; Lu # LATIN CAPITAL LETTER O WITH OGONEK AND MACRON +01EE ; Lu # LATIN CAPITAL LETTER EZH WITH CARON +01F1 ; Lu # LATIN CAPITAL LETTER DZ +01F4 ; Lu # LATIN CAPITAL LETTER G WITH ACUTE +01F6..01F8 ; Lu # [3] LATIN CAPITAL LETTER HWAIR..LATIN CAPITAL LETTER N WITH GRAVE +01FA ; Lu # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE +01FC ; Lu # LATIN CAPITAL LETTER AE WITH ACUTE +01FE ; Lu # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE +0200 ; Lu # LATIN CAPITAL LETTER A WITH DOUBLE GRAVE +0202 ; Lu # LATIN CAPITAL LETTER A WITH INVERTED BREVE +0204 ; Lu # LATIN CAPITAL LETTER E WITH DOUBLE GRAVE +0206 ; Lu # LATIN CAPITAL LETTER E WITH INVERTED BREVE +0208 ; Lu # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE +020A ; Lu # LATIN CAPITAL LETTER I WITH INVERTED BREVE +020C ; Lu # LATIN CAPITAL LETTER O WITH DOUBLE GRAVE +020E ; Lu # LATIN CAPITAL LETTER O WITH INVERTED BREVE +0210 ; Lu # LATIN CAPITAL LETTER R WITH DOUBLE GRAVE +0212 ; Lu # LATIN CAPITAL LETTER R WITH INVERTED BREVE +0214 ; Lu # LATIN CAPITAL LETTER U WITH DOUBLE GRAVE +0216 ; Lu # LATIN CAPITAL LETTER U WITH INVERTED BREVE +0218 ; Lu # LATIN CAPITAL LETTER S WITH COMMA BELOW +021A ; Lu # LATIN CAPITAL LETTER T WITH COMMA BELOW +021C ; Lu # LATIN CAPITAL LETTER YOGH +021E ; Lu # LATIN CAPITAL LETTER H WITH CARON +0220 ; Lu # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG +0222 ; Lu # LATIN CAPITAL LETTER OU +0224 ; Lu # LATIN CAPITAL LETTER Z WITH HOOK +0226 ; Lu # LATIN CAPITAL LETTER A WITH DOT ABOVE +0228 ; Lu # LATIN CAPITAL LETTER E WITH CEDILLA +022A ; Lu # LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON +022C ; Lu # LATIN CAPITAL LETTER O WITH TILDE AND MACRON +022E ; Lu # LATIN CAPITAL LETTER O WITH DOT ABOVE +0230 ; Lu # LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON +0232 ; Lu # LATIN CAPITAL LETTER Y WITH MACRON +023A..023B ; Lu # [2] LATIN CAPITAL LETTER A WITH STROKE..LATIN CAPITAL LETTER C WITH STROKE +023D..023E ; Lu # [2] LATIN CAPITAL LETTER L WITH BAR..LATIN CAPITAL LETTER T WITH DIAGONAL STROKE +0241 ; Lu # LATIN CAPITAL LETTER GLOTTAL STOP +0243..0246 ; Lu # [4] LATIN CAPITAL LETTER B WITH STROKE..LATIN CAPITAL LETTER E WITH STROKE +0248 ; Lu # LATIN CAPITAL LETTER J WITH STROKE +024A ; Lu # LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL +024C ; Lu # LATIN CAPITAL LETTER R WITH STROKE +024E ; Lu # LATIN CAPITAL LETTER Y WITH STROKE +0370 ; Lu # GREEK CAPITAL LETTER HETA +0372 ; Lu # GREEK CAPITAL LETTER ARCHAIC SAMPI +0376 ; Lu # GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA +037F ; Lu # GREEK CAPITAL LETTER YOT +0386 ; Lu # GREEK CAPITAL LETTER ALPHA WITH TONOS +0388..038A ; Lu # [3] GREEK CAPITAL LETTER EPSILON WITH TONOS..GREEK CAPITAL LETTER IOTA WITH TONOS +038C ; Lu # GREEK CAPITAL LETTER OMICRON WITH TONOS +038E..038F ; Lu # [2] GREEK CAPITAL LETTER UPSILON WITH TONOS..GREEK CAPITAL LETTER OMEGA WITH TONOS +0391..03A1 ; Lu # [17] GREEK CAPITAL LETTER ALPHA..GREEK CAPITAL LETTER RHO +03A3..03AB ; Lu # [9] GREEK CAPITAL LETTER SIGMA..GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA +03CF ; Lu # GREEK CAPITAL KAI SYMBOL +03D2..03D4 ; Lu # [3] GREEK UPSILON WITH HOOK SYMBOL..GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL +03D8 ; Lu # GREEK LETTER ARCHAIC KOPPA +03DA ; Lu # GREEK LETTER STIGMA +03DC ; Lu # GREEK LETTER DIGAMMA +03DE ; Lu # GREEK LETTER KOPPA +03E0 ; Lu # GREEK LETTER SAMPI +03E2 ; Lu # COPTIC CAPITAL LETTER SHEI +03E4 ; Lu # COPTIC CAPITAL LETTER FEI +03E6 ; Lu # COPTIC CAPITAL LETTER KHEI +03E8 ; Lu # COPTIC CAPITAL LETTER HORI +03EA ; Lu # COPTIC CAPITAL LETTER GANGIA +03EC ; Lu # COPTIC CAPITAL LETTER SHIMA +03EE ; Lu # COPTIC CAPITAL LETTER DEI +03F4 ; Lu # GREEK CAPITAL THETA SYMBOL +03F7 ; Lu # GREEK CAPITAL LETTER SHO +03F9..03FA ; Lu # [2] GREEK CAPITAL LUNATE SIGMA SYMBOL..GREEK CAPITAL LETTER SAN +03FD..042F ; Lu # [51] GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL..CYRILLIC CAPITAL LETTER YA +0460 ; Lu # CYRILLIC CAPITAL LETTER OMEGA +0462 ; Lu # CYRILLIC CAPITAL LETTER YAT +0464 ; Lu # CYRILLIC CAPITAL LETTER IOTIFIED E +0466 ; Lu # CYRILLIC CAPITAL LETTER LITTLE YUS +0468 ; Lu # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS +046A ; Lu # CYRILLIC CAPITAL LETTER BIG YUS +046C ; Lu # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS +046E ; Lu # CYRILLIC CAPITAL LETTER KSI +0470 ; Lu # CYRILLIC CAPITAL LETTER PSI +0472 ; Lu # CYRILLIC CAPITAL LETTER FITA +0474 ; Lu # CYRILLIC CAPITAL LETTER IZHITSA +0476 ; Lu # CYRILLIC CAPITAL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT +0478 ; Lu # CYRILLIC CAPITAL LETTER UK +047A ; Lu # CYRILLIC CAPITAL LETTER ROUND OMEGA +047C ; Lu # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO +047E ; Lu # CYRILLIC CAPITAL LETTER OT +0480 ; Lu # CYRILLIC CAPITAL LETTER KOPPA +048A ; Lu # CYRILLIC CAPITAL LETTER SHORT I WITH TAIL +048C ; Lu # CYRILLIC CAPITAL LETTER SEMISOFT SIGN +048E ; Lu # CYRILLIC CAPITAL LETTER ER WITH TICK +0490 ; Lu # CYRILLIC CAPITAL LETTER GHE WITH UPTURN +0492 ; Lu # CYRILLIC CAPITAL LETTER GHE WITH STROKE +0494 ; Lu # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK +0496 ; Lu # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER +0498 ; Lu # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER +049A ; Lu # CYRILLIC CAPITAL LETTER KA WITH DESCENDER +049C ; Lu # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE +049E ; Lu # CYRILLIC CAPITAL LETTER KA WITH STROKE +04A0 ; Lu # CYRILLIC CAPITAL LETTER BASHKIR KA +04A2 ; Lu # CYRILLIC CAPITAL LETTER EN WITH DESCENDER +04A4 ; Lu # CYRILLIC CAPITAL LIGATURE EN GHE +04A6 ; Lu # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK +04A8 ; Lu # CYRILLIC CAPITAL LETTER ABKHASIAN HA +04AA ; Lu # CYRILLIC CAPITAL LETTER ES WITH DESCENDER +04AC ; Lu # CYRILLIC CAPITAL LETTER TE WITH DESCENDER +04AE ; Lu # CYRILLIC CAPITAL LETTER STRAIGHT U +04B0 ; Lu # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE +04B2 ; Lu # CYRILLIC CAPITAL LETTER HA WITH DESCENDER +04B4 ; Lu # CYRILLIC CAPITAL LIGATURE TE TSE +04B6 ; Lu # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER +04B8 ; Lu # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE +04BA ; Lu # CYRILLIC CAPITAL LETTER SHHA +04BC ; Lu # CYRILLIC CAPITAL LETTER ABKHASIAN CHE +04BE ; Lu # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER +04C0..04C1 ; Lu # [2] CYRILLIC LETTER PALOCHKA..CYRILLIC CAPITAL LETTER ZHE WITH BREVE +04C3 ; Lu # CYRILLIC CAPITAL LETTER KA WITH HOOK +04C5 ; Lu # CYRILLIC CAPITAL LETTER EL WITH TAIL +04C7 ; Lu # CYRILLIC CAPITAL LETTER EN WITH HOOK +04C9 ; Lu # CYRILLIC CAPITAL LETTER EN WITH TAIL +04CB ; Lu # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE +04CD ; Lu # CYRILLIC CAPITAL LETTER EM WITH TAIL +04D0 ; Lu # CYRILLIC CAPITAL LETTER A WITH BREVE +04D2 ; Lu # CYRILLIC CAPITAL LETTER A WITH DIAERESIS +04D4 ; Lu # CYRILLIC CAPITAL LIGATURE A IE +04D6 ; Lu # CYRILLIC CAPITAL LETTER IE WITH BREVE +04D8 ; Lu # CYRILLIC CAPITAL LETTER SCHWA +04DA ; Lu # CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS +04DC ; Lu # CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS +04DE ; Lu # CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS +04E0 ; Lu # CYRILLIC CAPITAL LETTER ABKHASIAN DZE +04E2 ; Lu # CYRILLIC CAPITAL LETTER I WITH MACRON +04E4 ; Lu # CYRILLIC CAPITAL LETTER I WITH DIAERESIS +04E6 ; Lu # CYRILLIC CAPITAL LETTER O WITH DIAERESIS +04E8 ; Lu # CYRILLIC CAPITAL LETTER BARRED O +04EA ; Lu # CYRILLIC CAPITAL LETTER BARRED O WITH DIAERESIS +04EC ; Lu # CYRILLIC CAPITAL LETTER E WITH DIAERESIS +04EE ; Lu # CYRILLIC CAPITAL LETTER U WITH MACRON +04F0 ; Lu # CYRILLIC CAPITAL LETTER U WITH DIAERESIS +04F2 ; Lu # CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE +04F4 ; Lu # CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS +04F6 ; Lu # CYRILLIC CAPITAL LETTER GHE WITH DESCENDER +04F8 ; Lu # CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS +04FA ; Lu # CYRILLIC CAPITAL LETTER GHE WITH STROKE AND HOOK +04FC ; Lu # CYRILLIC CAPITAL LETTER HA WITH HOOK +04FE ; Lu # CYRILLIC CAPITAL LETTER HA WITH STROKE +0500 ; Lu # CYRILLIC CAPITAL LETTER KOMI DE +0502 ; Lu # CYRILLIC CAPITAL LETTER KOMI DJE +0504 ; Lu # CYRILLIC CAPITAL LETTER KOMI ZJE +0506 ; Lu # CYRILLIC CAPITAL LETTER KOMI DZJE +0508 ; Lu # CYRILLIC CAPITAL LETTER KOMI LJE +050A ; Lu # CYRILLIC CAPITAL LETTER KOMI NJE +050C ; Lu # CYRILLIC CAPITAL LETTER KOMI SJE +050E ; Lu # CYRILLIC CAPITAL LETTER KOMI TJE +0510 ; Lu # CYRILLIC CAPITAL LETTER REVERSED ZE +0512 ; Lu # CYRILLIC CAPITAL LETTER EL WITH HOOK +0514 ; Lu # CYRILLIC CAPITAL LETTER LHA +0516 ; Lu # CYRILLIC CAPITAL LETTER RHA +0518 ; Lu # CYRILLIC CAPITAL LETTER YAE +051A ; Lu # CYRILLIC CAPITAL LETTER QA +051C ; Lu # CYRILLIC CAPITAL LETTER WE +051E ; Lu # CYRILLIC CAPITAL LETTER ALEUT KA +0520 ; Lu # CYRILLIC CAPITAL LETTER EL WITH MIDDLE HOOK +0522 ; Lu # CYRILLIC CAPITAL LETTER EN WITH MIDDLE HOOK +0524 ; Lu # CYRILLIC CAPITAL LETTER PE WITH DESCENDER +0526 ; Lu # CYRILLIC CAPITAL LETTER SHHA WITH DESCENDER +0528 ; Lu # CYRILLIC CAPITAL LETTER EN WITH LEFT HOOK +052A ; Lu # CYRILLIC CAPITAL LETTER DZZHE +052C ; Lu # CYRILLIC CAPITAL LETTER DCHE +052E ; Lu # CYRILLIC CAPITAL LETTER EL WITH DESCENDER +0531..0556 ; Lu # [38] ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITAL LETTER FEH +10A0..10C5 ; Lu # [38] GEORGIAN CAPITAL LETTER AN..GEORGIAN CAPITAL LETTER HOE +10C7 ; Lu # GEORGIAN CAPITAL LETTER YN +10CD ; Lu # GEORGIAN CAPITAL LETTER AEN +13A0..13F5 ; Lu # [86] CHEROKEE LETTER A..CHEROKEE LETTER MV +1C90..1CBA ; Lu # [43] GEORGIAN MTAVRULI CAPITAL LETTER AN..GEORGIAN MTAVRULI CAPITAL LETTER AIN +1CBD..1CBF ; Lu # [3] GEORGIAN MTAVRULI CAPITAL LETTER AEN..GEORGIAN MTAVRULI CAPITAL LETTER LABIAL SIGN +1E00 ; Lu # LATIN CAPITAL LETTER A WITH RING BELOW +1E02 ; Lu # LATIN CAPITAL LETTER B WITH DOT ABOVE +1E04 ; Lu # LATIN CAPITAL LETTER B WITH DOT BELOW +1E06 ; Lu # LATIN CAPITAL LETTER B WITH LINE BELOW +1E08 ; Lu # LATIN CAPITAL LETTER C WITH CEDILLA AND ACUTE +1E0A ; Lu # LATIN CAPITAL LETTER D WITH DOT ABOVE +1E0C ; Lu # LATIN CAPITAL LETTER D WITH DOT BELOW +1E0E ; Lu # LATIN CAPITAL LETTER D WITH LINE BELOW +1E10 ; Lu # LATIN CAPITAL LETTER D WITH CEDILLA +1E12 ; Lu # LATIN CAPITAL LETTER D WITH CIRCUMFLEX BELOW +1E14 ; Lu # LATIN CAPITAL LETTER E WITH MACRON AND GRAVE +1E16 ; Lu # LATIN CAPITAL LETTER E WITH MACRON AND ACUTE +1E18 ; Lu # LATIN CAPITAL LETTER E WITH CIRCUMFLEX BELOW +1E1A ; Lu # LATIN CAPITAL LETTER E WITH TILDE BELOW +1E1C ; Lu # LATIN CAPITAL LETTER E WITH CEDILLA AND BREVE +1E1E ; Lu # LATIN CAPITAL LETTER F WITH DOT ABOVE +1E20 ; Lu # LATIN CAPITAL LETTER G WITH MACRON +1E22 ; Lu # LATIN CAPITAL LETTER H WITH DOT ABOVE +1E24 ; Lu # LATIN CAPITAL LETTER H WITH DOT BELOW +1E26 ; Lu # LATIN CAPITAL LETTER H WITH DIAERESIS +1E28 ; Lu # LATIN CAPITAL LETTER H WITH CEDILLA +1E2A ; Lu # LATIN CAPITAL LETTER H WITH BREVE BELOW +1E2C ; Lu # LATIN CAPITAL LETTER I WITH TILDE BELOW +1E2E ; Lu # LATIN CAPITAL LETTER I WITH DIAERESIS AND ACUTE +1E30 ; Lu # LATIN CAPITAL LETTER K WITH ACUTE +1E32 ; Lu # LATIN CAPITAL LETTER K WITH DOT BELOW +1E34 ; Lu # LATIN CAPITAL LETTER K WITH LINE BELOW +1E36 ; Lu # LATIN CAPITAL LETTER L WITH DOT BELOW +1E38 ; Lu # LATIN CAPITAL LETTER L WITH DOT BELOW AND MACRON +1E3A ; Lu # LATIN CAPITAL LETTER L WITH LINE BELOW +1E3C ; Lu # LATIN CAPITAL LETTER L WITH CIRCUMFLEX BELOW +1E3E ; Lu # LATIN CAPITAL LETTER M WITH ACUTE +1E40 ; Lu # LATIN CAPITAL LETTER M WITH DOT ABOVE +1E42 ; Lu # LATIN CAPITAL LETTER M WITH DOT BELOW +1E44 ; Lu # LATIN CAPITAL LETTER N WITH DOT ABOVE +1E46 ; Lu # LATIN CAPITAL LETTER N WITH DOT BELOW +1E48 ; Lu # LATIN CAPITAL LETTER N WITH LINE BELOW +1E4A ; Lu # LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW +1E4C ; Lu # LATIN CAPITAL LETTER O WITH TILDE AND ACUTE +1E4E ; Lu # LATIN CAPITAL LETTER O WITH TILDE AND DIAERESIS +1E50 ; Lu # LATIN CAPITAL LETTER O WITH MACRON AND GRAVE +1E52 ; Lu # LATIN CAPITAL LETTER O WITH MACRON AND ACUTE +1E54 ; Lu # LATIN CAPITAL LETTER P WITH ACUTE +1E56 ; Lu # LATIN CAPITAL LETTER P WITH DOT ABOVE +1E58 ; Lu # LATIN CAPITAL LETTER R WITH DOT ABOVE +1E5A ; Lu # LATIN CAPITAL LETTER R WITH DOT BELOW +1E5C ; Lu # LATIN CAPITAL LETTER R WITH DOT BELOW AND MACRON +1E5E ; Lu # LATIN CAPITAL LETTER R WITH LINE BELOW +1E60 ; Lu # LATIN CAPITAL LETTER S WITH DOT ABOVE +1E62 ; Lu # LATIN CAPITAL LETTER S WITH DOT BELOW +1E64 ; Lu # LATIN CAPITAL LETTER S WITH ACUTE AND DOT ABOVE +1E66 ; Lu # LATIN CAPITAL LETTER S WITH CARON AND DOT ABOVE +1E68 ; Lu # LATIN CAPITAL LETTER S WITH DOT BELOW AND DOT ABOVE +1E6A ; Lu # LATIN CAPITAL LETTER T WITH DOT ABOVE +1E6C ; Lu # LATIN CAPITAL LETTER T WITH DOT BELOW +1E6E ; Lu # LATIN CAPITAL LETTER T WITH LINE BELOW +1E70 ; Lu # LATIN CAPITAL LETTER T WITH CIRCUMFLEX BELOW +1E72 ; Lu # LATIN CAPITAL LETTER U WITH DIAERESIS BELOW +1E74 ; Lu # LATIN CAPITAL LETTER U WITH TILDE BELOW +1E76 ; Lu # LATIN CAPITAL LETTER U WITH CIRCUMFLEX BELOW +1E78 ; Lu # LATIN CAPITAL LETTER U WITH TILDE AND ACUTE +1E7A ; Lu # LATIN CAPITAL LETTER U WITH MACRON AND DIAERESIS +1E7C ; Lu # LATIN CAPITAL LETTER V WITH TILDE +1E7E ; Lu # LATIN CAPITAL LETTER V WITH DOT BELOW +1E80 ; Lu # LATIN CAPITAL LETTER W WITH GRAVE +1E82 ; Lu # LATIN CAPITAL LETTER W WITH ACUTE +1E84 ; Lu # LATIN CAPITAL LETTER W WITH DIAERESIS +1E86 ; Lu # LATIN CAPITAL LETTER W WITH DOT ABOVE +1E88 ; Lu # LATIN CAPITAL LETTER W WITH DOT BELOW +1E8A ; Lu # LATIN CAPITAL LETTER X WITH DOT ABOVE +1E8C ; Lu # LATIN CAPITAL LETTER X WITH DIAERESIS +1E8E ; Lu # LATIN CAPITAL LETTER Y WITH DOT ABOVE +1E90 ; Lu # LATIN CAPITAL LETTER Z WITH CIRCUMFLEX +1E92 ; Lu # LATIN CAPITAL LETTER Z WITH DOT BELOW +1E94 ; Lu # LATIN CAPITAL LETTER Z WITH LINE BELOW +1E9E ; Lu # LATIN CAPITAL LETTER SHARP S +1EA0 ; Lu # LATIN CAPITAL LETTER A WITH DOT BELOW +1EA2 ; Lu # LATIN CAPITAL LETTER A WITH HOOK ABOVE +1EA4 ; Lu # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND ACUTE +1EA6 ; Lu # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE +1EA8 ; Lu # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE +1EAA ; Lu # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND TILDE +1EAC ; Lu # LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW +1EAE ; Lu # LATIN CAPITAL LETTER A WITH BREVE AND ACUTE +1EB0 ; Lu # LATIN CAPITAL LETTER A WITH BREVE AND GRAVE +1EB2 ; Lu # LATIN CAPITAL LETTER A WITH BREVE AND HOOK ABOVE +1EB4 ; Lu # LATIN CAPITAL LETTER A WITH BREVE AND TILDE +1EB6 ; Lu # LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW +1EB8 ; Lu # LATIN CAPITAL LETTER E WITH DOT BELOW +1EBA ; Lu # LATIN CAPITAL LETTER E WITH HOOK ABOVE +1EBC ; Lu # LATIN CAPITAL LETTER E WITH TILDE +1EBE ; Lu # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE +1EC0 ; Lu # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE +1EC2 ; Lu # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE +1EC4 ; Lu # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND TILDE +1EC6 ; Lu # LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND DOT BELOW +1EC8 ; Lu # LATIN CAPITAL LETTER I WITH HOOK ABOVE +1ECA ; Lu # LATIN CAPITAL LETTER I WITH DOT BELOW +1ECC ; Lu # LATIN CAPITAL LETTER O WITH DOT BELOW +1ECE ; Lu # LATIN CAPITAL LETTER O WITH HOOK ABOVE +1ED0 ; Lu # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND ACUTE +1ED2 ; Lu # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND GRAVE +1ED4 ; Lu # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE +1ED6 ; Lu # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND TILDE +1ED8 ; Lu # LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND DOT BELOW +1EDA ; Lu # LATIN CAPITAL LETTER O WITH HORN AND ACUTE +1EDC ; Lu # LATIN CAPITAL LETTER O WITH HORN AND GRAVE +1EDE ; Lu # LATIN CAPITAL LETTER O WITH HORN AND HOOK ABOVE +1EE0 ; Lu # LATIN CAPITAL LETTER O WITH HORN AND TILDE +1EE2 ; Lu # LATIN CAPITAL LETTER O WITH HORN AND DOT BELOW +1EE4 ; Lu # LATIN CAPITAL LETTER U WITH DOT BELOW +1EE6 ; Lu # LATIN CAPITAL LETTER U WITH HOOK ABOVE +1EE8 ; Lu # LATIN CAPITAL LETTER U WITH HORN AND ACUTE +1EEA ; Lu # LATIN CAPITAL LETTER U WITH HORN AND GRAVE +1EEC ; Lu # LATIN CAPITAL LETTER U WITH HORN AND HOOK ABOVE +1EEE ; Lu # LATIN CAPITAL LETTER U WITH HORN AND TILDE +1EF0 ; Lu # LATIN CAPITAL LETTER U WITH HORN AND DOT BELOW +1EF2 ; Lu # LATIN CAPITAL LETTER Y WITH GRAVE +1EF4 ; Lu # LATIN CAPITAL LETTER Y WITH DOT BELOW +1EF6 ; Lu # LATIN CAPITAL LETTER Y WITH HOOK ABOVE +1EF8 ; Lu # LATIN CAPITAL LETTER Y WITH TILDE +1EFA ; Lu # LATIN CAPITAL LETTER MIDDLE-WELSH LL +1EFC ; Lu # LATIN CAPITAL LETTER MIDDLE-WELSH V +1EFE ; Lu # LATIN CAPITAL LETTER Y WITH LOOP +1F08..1F0F ; Lu # [8] GREEK CAPITAL LETTER ALPHA WITH PSILI..GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI +1F18..1F1D ; Lu # [6] GREEK CAPITAL LETTER EPSILON WITH PSILI..GREEK CAPITAL LETTER EPSILON WITH DASIA AND OXIA +1F28..1F2F ; Lu # [8] GREEK CAPITAL LETTER ETA WITH PSILI..GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI +1F38..1F3F ; Lu # [8] GREEK CAPITAL LETTER IOTA WITH PSILI..GREEK CAPITAL LETTER IOTA WITH DASIA AND PERISPOMENI +1F48..1F4D ; Lu # [6] GREEK CAPITAL LETTER OMICRON WITH PSILI..GREEK CAPITAL LETTER OMICRON WITH DASIA AND OXIA +1F59 ; Lu # GREEK CAPITAL LETTER UPSILON WITH DASIA +1F5B ; Lu # GREEK CAPITAL LETTER UPSILON WITH DASIA AND VARIA +1F5D ; Lu # GREEK CAPITAL LETTER UPSILON WITH DASIA AND OXIA +1F5F ; Lu # GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI +1F68..1F6F ; Lu # [8] GREEK CAPITAL LETTER OMEGA WITH PSILI..GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI +1FB8..1FBB ; Lu # [4] GREEK CAPITAL LETTER ALPHA WITH VRACHY..GREEK CAPITAL LETTER ALPHA WITH OXIA +1FC8..1FCB ; Lu # [4] GREEK CAPITAL LETTER EPSILON WITH VARIA..GREEK CAPITAL LETTER ETA WITH OXIA +1FD8..1FDB ; Lu # [4] GREEK CAPITAL LETTER IOTA WITH VRACHY..GREEK CAPITAL LETTER IOTA WITH OXIA +1FE8..1FEC ; Lu # [5] GREEK CAPITAL LETTER UPSILON WITH VRACHY..GREEK CAPITAL LETTER RHO WITH DASIA +1FF8..1FFB ; Lu # [4] GREEK CAPITAL LETTER OMICRON WITH VARIA..GREEK CAPITAL LETTER OMEGA WITH OXIA +2102 ; Lu # DOUBLE-STRUCK CAPITAL C +2107 ; Lu # EULER CONSTANT +210B..210D ; Lu # [3] SCRIPT CAPITAL H..DOUBLE-STRUCK CAPITAL H +2110..2112 ; Lu # [3] SCRIPT CAPITAL I..SCRIPT CAPITAL L +2115 ; Lu # DOUBLE-STRUCK CAPITAL N +2119..211D ; Lu # [5] DOUBLE-STRUCK CAPITAL P..DOUBLE-STRUCK CAPITAL R +2124 ; Lu # DOUBLE-STRUCK CAPITAL Z +2126 ; Lu # OHM SIGN +2128 ; Lu # BLACK-LETTER CAPITAL Z +212A..212D ; Lu # [4] KELVIN SIGN..BLACK-LETTER CAPITAL C +2130..2133 ; Lu # [4] SCRIPT CAPITAL E..SCRIPT CAPITAL M +213E..213F ; Lu # [2] DOUBLE-STRUCK CAPITAL GAMMA..DOUBLE-STRUCK CAPITAL PI +2145 ; Lu # DOUBLE-STRUCK ITALIC CAPITAL D +2183 ; Lu # ROMAN NUMERAL REVERSED ONE HUNDRED +2C00..2C2E ; Lu # [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE +2C60 ; Lu # LATIN CAPITAL LETTER L WITH DOUBLE BAR +2C62..2C64 ; Lu # [3] LATIN CAPITAL LETTER L WITH MIDDLE TILDE..LATIN CAPITAL LETTER R WITH TAIL +2C67 ; Lu # LATIN CAPITAL LETTER H WITH DESCENDER +2C69 ; Lu # LATIN CAPITAL LETTER K WITH DESCENDER +2C6B ; Lu # LATIN CAPITAL LETTER Z WITH DESCENDER +2C6D..2C70 ; Lu # [4] LATIN CAPITAL LETTER ALPHA..LATIN CAPITAL LETTER TURNED ALPHA +2C72 ; Lu # LATIN CAPITAL LETTER W WITH HOOK +2C75 ; Lu # LATIN CAPITAL LETTER HALF H +2C7E..2C80 ; Lu # [3] LATIN CAPITAL LETTER S WITH SWASH TAIL..COPTIC CAPITAL LETTER ALFA +2C82 ; Lu # COPTIC CAPITAL LETTER VIDA +2C84 ; Lu # COPTIC CAPITAL LETTER GAMMA +2C86 ; Lu # COPTIC CAPITAL LETTER DALDA +2C88 ; Lu # COPTIC CAPITAL LETTER EIE +2C8A ; Lu # COPTIC CAPITAL LETTER SOU +2C8C ; Lu # COPTIC CAPITAL LETTER ZATA +2C8E ; Lu # COPTIC CAPITAL LETTER HATE +2C90 ; Lu # COPTIC CAPITAL LETTER THETHE +2C92 ; Lu # COPTIC CAPITAL LETTER IAUDA +2C94 ; Lu # COPTIC CAPITAL LETTER KAPA +2C96 ; Lu # COPTIC CAPITAL LETTER LAULA +2C98 ; Lu # COPTIC CAPITAL LETTER MI +2C9A ; Lu # COPTIC CAPITAL LETTER NI +2C9C ; Lu # COPTIC CAPITAL LETTER KSI +2C9E ; Lu # COPTIC CAPITAL LETTER O +2CA0 ; Lu # COPTIC CAPITAL LETTER PI +2CA2 ; Lu # COPTIC CAPITAL LETTER RO +2CA4 ; Lu # COPTIC CAPITAL LETTER SIMA +2CA6 ; Lu # COPTIC CAPITAL LETTER TAU +2CA8 ; Lu # COPTIC CAPITAL LETTER UA +2CAA ; Lu # COPTIC CAPITAL LETTER FI +2CAC ; Lu # COPTIC CAPITAL LETTER KHI +2CAE ; Lu # COPTIC CAPITAL LETTER PSI +2CB0 ; Lu # COPTIC CAPITAL LETTER OOU +2CB2 ; Lu # COPTIC CAPITAL LETTER DIALECT-P ALEF +2CB4 ; Lu # COPTIC CAPITAL LETTER OLD COPTIC AIN +2CB6 ; Lu # COPTIC CAPITAL LETTER CRYPTOGRAMMIC EIE +2CB8 ; Lu # COPTIC CAPITAL LETTER DIALECT-P KAPA +2CBA ; Lu # COPTIC CAPITAL LETTER DIALECT-P NI +2CBC ; Lu # COPTIC CAPITAL LETTER CRYPTOGRAMMIC NI +2CBE ; Lu # COPTIC CAPITAL LETTER OLD COPTIC OOU +2CC0 ; Lu # COPTIC CAPITAL LETTER SAMPI +2CC2 ; Lu # COPTIC CAPITAL LETTER CROSSED SHEI +2CC4 ; Lu # COPTIC CAPITAL LETTER OLD COPTIC SHEI +2CC6 ; Lu # COPTIC CAPITAL LETTER OLD COPTIC ESH +2CC8 ; Lu # COPTIC CAPITAL LETTER AKHMIMIC KHEI +2CCA ; Lu # COPTIC CAPITAL LETTER DIALECT-P HORI +2CCC ; Lu # COPTIC CAPITAL LETTER OLD COPTIC HORI +2CCE ; Lu # COPTIC CAPITAL LETTER OLD COPTIC HA +2CD0 ; Lu # COPTIC CAPITAL LETTER L-SHAPED HA +2CD2 ; Lu # COPTIC CAPITAL LETTER OLD COPTIC HEI +2CD4 ; Lu # COPTIC CAPITAL LETTER OLD COPTIC HAT +2CD6 ; Lu # COPTIC CAPITAL LETTER OLD COPTIC GANGIA +2CD8 ; Lu # COPTIC CAPITAL LETTER OLD COPTIC DJA +2CDA ; Lu # COPTIC CAPITAL LETTER OLD COPTIC SHIMA +2CDC ; Lu # COPTIC CAPITAL LETTER OLD NUBIAN SHIMA +2CDE ; Lu # COPTIC CAPITAL LETTER OLD NUBIAN NGI +2CE0 ; Lu # COPTIC CAPITAL LETTER OLD NUBIAN NYI +2CE2 ; Lu # COPTIC CAPITAL LETTER OLD NUBIAN WAU +2CEB ; Lu # COPTIC CAPITAL LETTER CRYPTOGRAMMIC SHEI +2CED ; Lu # COPTIC CAPITAL LETTER CRYPTOGRAMMIC GANGIA +2CF2 ; Lu # COPTIC CAPITAL LETTER BOHAIRIC KHEI +A640 ; Lu # CYRILLIC CAPITAL LETTER ZEMLYA +A642 ; Lu # CYRILLIC CAPITAL LETTER DZELO +A644 ; Lu # CYRILLIC CAPITAL LETTER REVERSED DZE +A646 ; Lu # CYRILLIC CAPITAL LETTER IOTA +A648 ; Lu # CYRILLIC CAPITAL LETTER DJERV +A64A ; Lu # CYRILLIC CAPITAL LETTER MONOGRAPH UK +A64C ; Lu # CYRILLIC CAPITAL LETTER BROAD OMEGA +A64E ; Lu # CYRILLIC CAPITAL LETTER NEUTRAL YER +A650 ; Lu # CYRILLIC CAPITAL LETTER YERU WITH BACK YER +A652 ; Lu # CYRILLIC CAPITAL LETTER IOTIFIED YAT +A654 ; Lu # CYRILLIC CAPITAL LETTER REVERSED YU +A656 ; Lu # CYRILLIC CAPITAL LETTER IOTIFIED A +A658 ; Lu # CYRILLIC CAPITAL LETTER CLOSED LITTLE YUS +A65A ; Lu # CYRILLIC CAPITAL LETTER BLENDED YUS +A65C ; Lu # CYRILLIC CAPITAL LETTER IOTIFIED CLOSED LITTLE YUS +A65E ; Lu # CYRILLIC CAPITAL LETTER YN +A660 ; Lu # CYRILLIC CAPITAL LETTER REVERSED TSE +A662 ; Lu # CYRILLIC CAPITAL LETTER SOFT DE +A664 ; Lu # CYRILLIC CAPITAL LETTER SOFT EL +A666 ; Lu # CYRILLIC CAPITAL LETTER SOFT EM +A668 ; Lu # CYRILLIC CAPITAL LETTER MONOCULAR O +A66A ; Lu # CYRILLIC CAPITAL LETTER BINOCULAR O +A66C ; Lu # CYRILLIC CAPITAL LETTER DOUBLE MONOCULAR O +A680 ; Lu # CYRILLIC CAPITAL LETTER DWE +A682 ; Lu # CYRILLIC CAPITAL LETTER DZWE +A684 ; Lu # CYRILLIC CAPITAL LETTER ZHWE +A686 ; Lu # CYRILLIC CAPITAL LETTER CCHE +A688 ; Lu # CYRILLIC CAPITAL LETTER DZZE +A68A ; Lu # CYRILLIC CAPITAL LETTER TE WITH MIDDLE HOOK +A68C ; Lu # CYRILLIC CAPITAL LETTER TWE +A68E ; Lu # CYRILLIC CAPITAL LETTER TSWE +A690 ; Lu # CYRILLIC CAPITAL LETTER TSSE +A692 ; Lu # CYRILLIC CAPITAL LETTER TCHE +A694 ; Lu # CYRILLIC CAPITAL LETTER HWE +A696 ; Lu # CYRILLIC CAPITAL LETTER SHWE +A698 ; Lu # CYRILLIC CAPITAL LETTER DOUBLE O +A69A ; Lu # CYRILLIC CAPITAL LETTER CROSSED O +A722 ; Lu # LATIN CAPITAL LETTER EGYPTOLOGICAL ALEF +A724 ; Lu # LATIN CAPITAL LETTER EGYPTOLOGICAL AIN +A726 ; Lu # LATIN CAPITAL LETTER HENG +A728 ; Lu # LATIN CAPITAL LETTER TZ +A72A ; Lu # LATIN CAPITAL LETTER TRESILLO +A72C ; Lu # LATIN CAPITAL LETTER CUATRILLO +A72E ; Lu # LATIN CAPITAL LETTER CUATRILLO WITH COMMA +A732 ; Lu # LATIN CAPITAL LETTER AA +A734 ; Lu # LATIN CAPITAL LETTER AO +A736 ; Lu # LATIN CAPITAL LETTER AU +A738 ; Lu # LATIN CAPITAL LETTER AV +A73A ; Lu # LATIN CAPITAL LETTER AV WITH HORIZONTAL BAR +A73C ; Lu # LATIN CAPITAL LETTER AY +A73E ; Lu # LATIN CAPITAL LETTER REVERSED C WITH DOT +A740 ; Lu # LATIN CAPITAL LETTER K WITH STROKE +A742 ; Lu # LATIN CAPITAL LETTER K WITH DIAGONAL STROKE +A744 ; Lu # LATIN CAPITAL LETTER K WITH STROKE AND DIAGONAL STROKE +A746 ; Lu # LATIN CAPITAL LETTER BROKEN L +A748 ; Lu # LATIN CAPITAL LETTER L WITH HIGH STROKE +A74A ; Lu # LATIN CAPITAL LETTER O WITH LONG STROKE OVERLAY +A74C ; Lu # LATIN CAPITAL LETTER O WITH LOOP +A74E ; Lu # LATIN CAPITAL LETTER OO +A750 ; Lu # LATIN CAPITAL LETTER P WITH STROKE THROUGH DESCENDER +A752 ; Lu # LATIN CAPITAL LETTER P WITH FLOURISH +A754 ; Lu # LATIN CAPITAL LETTER P WITH SQUIRREL TAIL +A756 ; Lu # LATIN CAPITAL LETTER Q WITH STROKE THROUGH DESCENDER +A758 ; Lu # LATIN CAPITAL LETTER Q WITH DIAGONAL STROKE +A75A ; Lu # LATIN CAPITAL LETTER R ROTUNDA +A75C ; Lu # LATIN CAPITAL LETTER RUM ROTUNDA +A75E ; Lu # LATIN CAPITAL LETTER V WITH DIAGONAL STROKE +A760 ; Lu # LATIN CAPITAL LETTER VY +A762 ; Lu # LATIN CAPITAL LETTER VISIGOTHIC Z +A764 ; Lu # LATIN CAPITAL LETTER THORN WITH STROKE +A766 ; Lu # LATIN CAPITAL LETTER THORN WITH STROKE THROUGH DESCENDER +A768 ; Lu # LATIN CAPITAL LETTER VEND +A76A ; Lu # LATIN CAPITAL LETTER ET +A76C ; Lu # LATIN CAPITAL LETTER IS +A76E ; Lu # LATIN CAPITAL LETTER CON +A779 ; Lu # LATIN CAPITAL LETTER INSULAR D +A77B ; Lu # LATIN CAPITAL LETTER INSULAR F +A77D..A77E ; Lu # [2] LATIN CAPITAL LETTER INSULAR G..LATIN CAPITAL LETTER TURNED INSULAR G +A780 ; Lu # LATIN CAPITAL LETTER TURNED L +A782 ; Lu # LATIN CAPITAL LETTER INSULAR R +A784 ; Lu # LATIN CAPITAL LETTER INSULAR S +A786 ; Lu # LATIN CAPITAL LETTER INSULAR T +A78B ; Lu # LATIN CAPITAL LETTER SALTILLO +A78D ; Lu # LATIN CAPITAL LETTER TURNED H +A790 ; Lu # LATIN CAPITAL LETTER N WITH DESCENDER +A792 ; Lu # LATIN CAPITAL LETTER C WITH BAR +A796 ; Lu # LATIN CAPITAL LETTER B WITH FLOURISH +A798 ; Lu # LATIN CAPITAL LETTER F WITH STROKE +A79A ; Lu # LATIN CAPITAL LETTER VOLAPUK AE +A79C ; Lu # LATIN CAPITAL LETTER VOLAPUK OE +A79E ; Lu # LATIN CAPITAL LETTER VOLAPUK UE +A7A0 ; Lu # LATIN CAPITAL LETTER G WITH OBLIQUE STROKE +A7A2 ; Lu # LATIN CAPITAL LETTER K WITH OBLIQUE STROKE +A7A4 ; Lu # LATIN CAPITAL LETTER N WITH OBLIQUE STROKE +A7A6 ; Lu # LATIN CAPITAL LETTER R WITH OBLIQUE STROKE +A7A8 ; Lu # LATIN CAPITAL LETTER S WITH OBLIQUE STROKE +A7AA..A7AE ; Lu # [5] LATIN CAPITAL LETTER H WITH HOOK..LATIN CAPITAL LETTER SMALL CAPITAL I +A7B0..A7B4 ; Lu # [5] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER BETA +A7B6 ; Lu # LATIN CAPITAL LETTER OMEGA +A7B8 ; Lu # LATIN CAPITAL LETTER U WITH STROKE +A7BA ; Lu # LATIN CAPITAL LETTER GLOTTAL A +A7BC ; Lu # LATIN CAPITAL LETTER GLOTTAL I +A7BE ; Lu # LATIN CAPITAL LETTER GLOTTAL U +A7C2 ; Lu # LATIN CAPITAL LETTER ANGLICANA W +A7C4..A7C7 ; Lu # [4] LATIN CAPITAL LETTER C WITH PALATAL HOOK..LATIN CAPITAL LETTER D WITH SHORT STROKE OVERLAY +A7C9 ; Lu # LATIN CAPITAL LETTER S WITH SHORT STROKE OVERLAY +A7F5 ; Lu # LATIN CAPITAL LETTER REVERSED HALF H +FF21..FF3A ; Lu # [26] FULLWIDTH LATIN CAPITAL LETTER A..FULLWIDTH LATIN CAPITAL LETTER Z +10400..10427 ; Lu # [40] DESERET CAPITAL LETTER LONG I..DESERET CAPITAL LETTER EW +104B0..104D3 ; Lu # [36] OSAGE CAPITAL LETTER A..OSAGE CAPITAL LETTER ZHA +10C80..10CB2 ; Lu # [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US +118A0..118BF ; Lu # [32] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI CAPITAL LETTER VIYO +16E40..16E5F ; Lu # [32] MEDEFAIDRIN CAPITAL LETTER M..MEDEFAIDRIN CAPITAL LETTER Y +1D400..1D419 ; Lu # [26] MATHEMATICAL BOLD CAPITAL A..MATHEMATICAL BOLD CAPITAL Z +1D434..1D44D ; Lu # [26] MATHEMATICAL ITALIC CAPITAL A..MATHEMATICAL ITALIC CAPITAL Z +1D468..1D481 ; Lu # [26] MATHEMATICAL BOLD ITALIC CAPITAL A..MATHEMATICAL BOLD ITALIC CAPITAL Z +1D49C ; Lu # MATHEMATICAL SCRIPT CAPITAL A +1D49E..1D49F ; Lu # [2] MATHEMATICAL SCRIPT CAPITAL C..MATHEMATICAL SCRIPT CAPITAL D +1D4A2 ; Lu # MATHEMATICAL SCRIPT CAPITAL G +1D4A5..1D4A6 ; Lu # [2] MATHEMATICAL SCRIPT CAPITAL J..MATHEMATICAL SCRIPT CAPITAL K +1D4A9..1D4AC ; Lu # [4] MATHEMATICAL SCRIPT CAPITAL N..MATHEMATICAL SCRIPT CAPITAL Q +1D4AE..1D4B5 ; Lu # [8] MATHEMATICAL SCRIPT CAPITAL S..MATHEMATICAL SCRIPT CAPITAL Z +1D4D0..1D4E9 ; Lu # [26] MATHEMATICAL BOLD SCRIPT CAPITAL A..MATHEMATICAL BOLD SCRIPT CAPITAL Z +1D504..1D505 ; Lu # [2] MATHEMATICAL FRAKTUR CAPITAL A..MATHEMATICAL FRAKTUR CAPITAL B +1D507..1D50A ; Lu # [4] MATHEMATICAL FRAKTUR CAPITAL D..MATHEMATICAL FRAKTUR CAPITAL G +1D50D..1D514 ; Lu # [8] MATHEMATICAL FRAKTUR CAPITAL J..MATHEMATICAL FRAKTUR CAPITAL Q +1D516..1D51C ; Lu # [7] MATHEMATICAL FRAKTUR CAPITAL S..MATHEMATICAL FRAKTUR CAPITAL Y +1D538..1D539 ; Lu # [2] MATHEMATICAL DOUBLE-STRUCK CAPITAL A..MATHEMATICAL DOUBLE-STRUCK CAPITAL B +1D53B..1D53E ; Lu # [4] MATHEMATICAL DOUBLE-STRUCK CAPITAL D..MATHEMATICAL DOUBLE-STRUCK CAPITAL G +1D540..1D544 ; Lu # [5] MATHEMATICAL DOUBLE-STRUCK CAPITAL I..MATHEMATICAL DOUBLE-STRUCK CAPITAL M +1D546 ; Lu # MATHEMATICAL DOUBLE-STRUCK CAPITAL O +1D54A..1D550 ; Lu # [7] MATHEMATICAL DOUBLE-STRUCK CAPITAL S..MATHEMATICAL DOUBLE-STRUCK CAPITAL Y +1D56C..1D585 ; Lu # [26] MATHEMATICAL BOLD FRAKTUR CAPITAL A..MATHEMATICAL BOLD FRAKTUR CAPITAL Z +1D5A0..1D5B9 ; Lu # [26] MATHEMATICAL SANS-SERIF CAPITAL A..MATHEMATICAL SANS-SERIF CAPITAL Z +1D5D4..1D5ED ; Lu # [26] MATHEMATICAL SANS-SERIF BOLD CAPITAL A..MATHEMATICAL SANS-SERIF BOLD CAPITAL Z +1D608..1D621 ; Lu # [26] MATHEMATICAL SANS-SERIF ITALIC CAPITAL A..MATHEMATICAL SANS-SERIF ITALIC CAPITAL Z +1D63C..1D655 ; Lu # [26] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL Z +1D670..1D689 ; Lu # [26] MATHEMATICAL MONOSPACE CAPITAL A..MATHEMATICAL MONOSPACE CAPITAL Z +1D6A8..1D6C0 ; Lu # [25] MATHEMATICAL BOLD CAPITAL ALPHA..MATHEMATICAL BOLD CAPITAL OMEGA +1D6E2..1D6FA ; Lu # [25] MATHEMATICAL ITALIC CAPITAL ALPHA..MATHEMATICAL ITALIC CAPITAL OMEGA +1D71C..1D734 ; Lu # [25] MATHEMATICAL BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL BOLD ITALIC CAPITAL OMEGA +1D756..1D76E ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD CAPITAL OMEGA +1D790..1D7A8 ; Lu # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL OMEGA +1D7CA ; Lu # MATHEMATICAL BOLD CAPITAL DIGAMMA +1E900..1E921 ; Lu # [34] ADLAM CAPITAL LETTER ALIF..ADLAM CAPITAL LETTER SHA + +# Total code points: 1791 + +# ================================================ + +# General_Category=Lowercase_Letter + +0061..007A ; Ll # [26] LATIN SMALL LETTER A..LATIN SMALL LETTER Z +00B5 ; Ll # MICRO SIGN +00DF..00F6 ; Ll # [24] LATIN SMALL LETTER SHARP S..LATIN SMALL LETTER O WITH DIAERESIS +00F8..00FF ; Ll # [8] LATIN SMALL LETTER O WITH STROKE..LATIN SMALL LETTER Y WITH DIAERESIS +0101 ; Ll # LATIN SMALL LETTER A WITH MACRON +0103 ; Ll # LATIN SMALL LETTER A WITH BREVE +0105 ; Ll # LATIN SMALL LETTER A WITH OGONEK +0107 ; Ll # LATIN SMALL LETTER C WITH ACUTE +0109 ; Ll # LATIN SMALL LETTER C WITH CIRCUMFLEX +010B ; Ll # LATIN SMALL LETTER C WITH DOT ABOVE +010D ; Ll # LATIN SMALL LETTER C WITH CARON +010F ; Ll # LATIN SMALL LETTER D WITH CARON +0111 ; Ll # LATIN SMALL LETTER D WITH STROKE +0113 ; Ll # LATIN SMALL LETTER E WITH MACRON +0115 ; Ll # LATIN SMALL LETTER E WITH BREVE +0117 ; Ll # LATIN SMALL LETTER E WITH DOT ABOVE +0119 ; Ll # LATIN SMALL LETTER E WITH OGONEK +011B ; Ll # LATIN SMALL LETTER E WITH CARON +011D ; Ll # LATIN SMALL LETTER G WITH CIRCUMFLEX +011F ; Ll # LATIN SMALL LETTER G WITH BREVE +0121 ; Ll # LATIN SMALL LETTER G WITH DOT ABOVE +0123 ; Ll # LATIN SMALL LETTER G WITH CEDILLA +0125 ; Ll # LATIN SMALL LETTER H WITH CIRCUMFLEX +0127 ; Ll # LATIN SMALL LETTER H WITH STROKE +0129 ; Ll # LATIN SMALL LETTER I WITH TILDE +012B ; Ll # LATIN SMALL LETTER I WITH MACRON +012D ; Ll # LATIN SMALL LETTER I WITH BREVE +012F ; Ll # LATIN SMALL LETTER I WITH OGONEK +0131 ; Ll # LATIN SMALL LETTER DOTLESS I +0133 ; Ll # LATIN SMALL LIGATURE IJ +0135 ; Ll # LATIN SMALL LETTER J WITH CIRCUMFLEX +0137..0138 ; Ll # [2] LATIN SMALL LETTER K WITH CEDILLA..LATIN SMALL LETTER KRA +013A ; Ll # LATIN SMALL LETTER L WITH ACUTE +013C ; Ll # LATIN SMALL LETTER L WITH CEDILLA +013E ; Ll # LATIN SMALL LETTER L WITH CARON +0140 ; Ll # LATIN SMALL LETTER L WITH MIDDLE DOT +0142 ; Ll # LATIN SMALL LETTER L WITH STROKE +0144 ; Ll # LATIN SMALL LETTER N WITH ACUTE +0146 ; Ll # LATIN SMALL LETTER N WITH CEDILLA +0148..0149 ; Ll # [2] LATIN SMALL LETTER N WITH CARON..LATIN SMALL LETTER N PRECEDED BY APOSTROPHE +014B ; Ll # LATIN SMALL LETTER ENG +014D ; Ll # LATIN SMALL LETTER O WITH MACRON +014F ; Ll # LATIN SMALL LETTER O WITH BREVE +0151 ; Ll # LATIN SMALL LETTER O WITH DOUBLE ACUTE +0153 ; Ll # LATIN SMALL LIGATURE OE +0155 ; Ll # LATIN SMALL LETTER R WITH ACUTE +0157 ; Ll # LATIN SMALL LETTER R WITH CEDILLA +0159 ; Ll # LATIN SMALL LETTER R WITH CARON +015B ; Ll # LATIN SMALL LETTER S WITH ACUTE +015D ; Ll # LATIN SMALL LETTER S WITH CIRCUMFLEX +015F ; Ll # LATIN SMALL LETTER S WITH CEDILLA +0161 ; Ll # LATIN SMALL LETTER S WITH CARON +0163 ; Ll # LATIN SMALL LETTER T WITH CEDILLA +0165 ; Ll # LATIN SMALL LETTER T WITH CARON +0167 ; Ll # LATIN SMALL LETTER T WITH STROKE +0169 ; Ll # LATIN SMALL LETTER U WITH TILDE +016B ; Ll # LATIN SMALL LETTER U WITH MACRON +016D ; Ll # LATIN SMALL LETTER U WITH BREVE +016F ; Ll # LATIN SMALL LETTER U WITH RING ABOVE +0171 ; Ll # LATIN SMALL LETTER U WITH DOUBLE ACUTE +0173 ; Ll # LATIN SMALL LETTER U WITH OGONEK +0175 ; Ll # LATIN SMALL LETTER W WITH CIRCUMFLEX +0177 ; Ll # LATIN SMALL LETTER Y WITH CIRCUMFLEX +017A ; Ll # LATIN SMALL LETTER Z WITH ACUTE +017C ; Ll # LATIN SMALL LETTER Z WITH DOT ABOVE +017E..0180 ; Ll # [3] LATIN SMALL LETTER Z WITH CARON..LATIN SMALL LETTER B WITH STROKE +0183 ; Ll # LATIN SMALL LETTER B WITH TOPBAR +0185 ; Ll # LATIN SMALL LETTER TONE SIX +0188 ; Ll # LATIN SMALL LETTER C WITH HOOK +018C..018D ; Ll # [2] LATIN SMALL LETTER D WITH TOPBAR..LATIN SMALL LETTER TURNED DELTA +0192 ; Ll # LATIN SMALL LETTER F WITH HOOK +0195 ; Ll # LATIN SMALL LETTER HV +0199..019B ; Ll # [3] LATIN SMALL LETTER K WITH HOOK..LATIN SMALL LETTER LAMBDA WITH STROKE +019E ; Ll # LATIN SMALL LETTER N WITH LONG RIGHT LEG +01A1 ; Ll # LATIN SMALL LETTER O WITH HORN +01A3 ; Ll # LATIN SMALL LETTER OI +01A5 ; Ll # LATIN SMALL LETTER P WITH HOOK +01A8 ; Ll # LATIN SMALL LETTER TONE TWO +01AA..01AB ; Ll # [2] LATIN LETTER REVERSED ESH LOOP..LATIN SMALL LETTER T WITH PALATAL HOOK +01AD ; Ll # LATIN SMALL LETTER T WITH HOOK +01B0 ; Ll # LATIN SMALL LETTER U WITH HORN +01B4 ; Ll # LATIN SMALL LETTER Y WITH HOOK +01B6 ; Ll # LATIN SMALL LETTER Z WITH STROKE +01B9..01BA ; Ll # [2] LATIN SMALL LETTER EZH REVERSED..LATIN SMALL LETTER EZH WITH TAIL +01BD..01BF ; Ll # [3] LATIN SMALL LETTER TONE FIVE..LATIN LETTER WYNN +01C6 ; Ll # LATIN SMALL LETTER DZ WITH CARON +01C9 ; Ll # LATIN SMALL LETTER LJ +01CC ; Ll # LATIN SMALL LETTER NJ +01CE ; Ll # LATIN SMALL LETTER A WITH CARON +01D0 ; Ll # LATIN SMALL LETTER I WITH CARON +01D2 ; Ll # LATIN SMALL LETTER O WITH CARON +01D4 ; Ll # LATIN SMALL LETTER U WITH CARON +01D6 ; Ll # LATIN SMALL LETTER U WITH DIAERESIS AND MACRON +01D8 ; Ll # LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE +01DA ; Ll # LATIN SMALL LETTER U WITH DIAERESIS AND CARON +01DC..01DD ; Ll # [2] LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE..LATIN SMALL LETTER TURNED E +01DF ; Ll # LATIN SMALL LETTER A WITH DIAERESIS AND MACRON +01E1 ; Ll # LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON +01E3 ; Ll # LATIN SMALL LETTER AE WITH MACRON +01E5 ; Ll # LATIN SMALL LETTER G WITH STROKE +01E7 ; Ll # LATIN SMALL LETTER G WITH CARON +01E9 ; Ll # LATIN SMALL LETTER K WITH CARON +01EB ; Ll # LATIN SMALL LETTER O WITH OGONEK +01ED ; Ll # LATIN SMALL LETTER O WITH OGONEK AND MACRON +01EF..01F0 ; Ll # [2] LATIN SMALL LETTER EZH WITH CARON..LATIN SMALL LETTER J WITH CARON +01F3 ; Ll # LATIN SMALL LETTER DZ +01F5 ; Ll # LATIN SMALL LETTER G WITH ACUTE +01F9 ; Ll # LATIN SMALL LETTER N WITH GRAVE +01FB ; Ll # LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE +01FD ; Ll # LATIN SMALL LETTER AE WITH ACUTE +01FF ; Ll # LATIN SMALL LETTER O WITH STROKE AND ACUTE +0201 ; Ll # LATIN SMALL LETTER A WITH DOUBLE GRAVE +0203 ; Ll # LATIN SMALL LETTER A WITH INVERTED BREVE +0205 ; Ll # LATIN SMALL LETTER E WITH DOUBLE GRAVE +0207 ; Ll # LATIN SMALL LETTER E WITH INVERTED BREVE +0209 ; Ll # LATIN SMALL LETTER I WITH DOUBLE GRAVE +020B ; Ll # LATIN SMALL LETTER I WITH INVERTED BREVE +020D ; Ll # LATIN SMALL LETTER O WITH DOUBLE GRAVE +020F ; Ll # LATIN SMALL LETTER O WITH INVERTED BREVE +0211 ; Ll # LATIN SMALL LETTER R WITH DOUBLE GRAVE +0213 ; Ll # LATIN SMALL LETTER R WITH INVERTED BREVE +0215 ; Ll # LATIN SMALL LETTER U WITH DOUBLE GRAVE +0217 ; Ll # LATIN SMALL LETTER U WITH INVERTED BREVE +0219 ; Ll # LATIN SMALL LETTER S WITH COMMA BELOW +021B ; Ll # LATIN SMALL LETTER T WITH COMMA BELOW +021D ; Ll # LATIN SMALL LETTER YOGH +021F ; Ll # LATIN SMALL LETTER H WITH CARON +0221 ; Ll # LATIN SMALL LETTER D WITH CURL +0223 ; Ll # LATIN SMALL LETTER OU +0225 ; Ll # LATIN SMALL LETTER Z WITH HOOK +0227 ; Ll # LATIN SMALL LETTER A WITH DOT ABOVE +0229 ; Ll # LATIN SMALL LETTER E WITH CEDILLA +022B ; Ll # LATIN SMALL LETTER O WITH DIAERESIS AND MACRON +022D ; Ll # LATIN SMALL LETTER O WITH TILDE AND MACRON +022F ; Ll # LATIN SMALL LETTER O WITH DOT ABOVE +0231 ; Ll # LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON +0233..0239 ; Ll # [7] LATIN SMALL LETTER Y WITH MACRON..LATIN SMALL LETTER QP DIGRAPH +023C ; Ll # LATIN SMALL LETTER C WITH STROKE +023F..0240 ; Ll # [2] LATIN SMALL LETTER S WITH SWASH TAIL..LATIN SMALL LETTER Z WITH SWASH TAIL +0242 ; Ll # LATIN SMALL LETTER GLOTTAL STOP +0247 ; Ll # LATIN SMALL LETTER E WITH STROKE +0249 ; Ll # LATIN SMALL LETTER J WITH STROKE +024B ; Ll # LATIN SMALL LETTER Q WITH HOOK TAIL +024D ; Ll # LATIN SMALL LETTER R WITH STROKE +024F..0293 ; Ll # [69] LATIN SMALL LETTER Y WITH STROKE..LATIN SMALL LETTER EZH WITH CURL +0295..02AF ; Ll # [27] LATIN LETTER PHARYNGEAL VOICED FRICATIVE..LATIN SMALL LETTER TURNED H WITH FISHHOOK AND TAIL +0371 ; Ll # GREEK SMALL LETTER HETA +0373 ; Ll # GREEK SMALL LETTER ARCHAIC SAMPI +0377 ; Ll # GREEK SMALL LETTER PAMPHYLIAN DIGAMMA +037B..037D ; Ll # [3] GREEK SMALL REVERSED LUNATE SIGMA SYMBOL..GREEK SMALL REVERSED DOTTED LUNATE SIGMA SYMBOL +0390 ; Ll # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS +03AC..03CE ; Ll # [35] GREEK SMALL LETTER ALPHA WITH TONOS..GREEK SMALL LETTER OMEGA WITH TONOS +03D0..03D1 ; Ll # [2] GREEK BETA SYMBOL..GREEK THETA SYMBOL +03D5..03D7 ; Ll # [3] GREEK PHI SYMBOL..GREEK KAI SYMBOL +03D9 ; Ll # GREEK SMALL LETTER ARCHAIC KOPPA +03DB ; Ll # GREEK SMALL LETTER STIGMA +03DD ; Ll # GREEK SMALL LETTER DIGAMMA +03DF ; Ll # GREEK SMALL LETTER KOPPA +03E1 ; Ll # GREEK SMALL LETTER SAMPI +03E3 ; Ll # COPTIC SMALL LETTER SHEI +03E5 ; Ll # COPTIC SMALL LETTER FEI +03E7 ; Ll # COPTIC SMALL LETTER KHEI +03E9 ; Ll # COPTIC SMALL LETTER HORI +03EB ; Ll # COPTIC SMALL LETTER GANGIA +03ED ; Ll # COPTIC SMALL LETTER SHIMA +03EF..03F3 ; Ll # [5] COPTIC SMALL LETTER DEI..GREEK LETTER YOT +03F5 ; Ll # GREEK LUNATE EPSILON SYMBOL +03F8 ; Ll # GREEK SMALL LETTER SHO +03FB..03FC ; Ll # [2] GREEK SMALL LETTER SAN..GREEK RHO WITH STROKE SYMBOL +0430..045F ; Ll # [48] CYRILLIC SMALL LETTER A..CYRILLIC SMALL LETTER DZHE +0461 ; Ll # CYRILLIC SMALL LETTER OMEGA +0463 ; Ll # CYRILLIC SMALL LETTER YAT +0465 ; Ll # CYRILLIC SMALL LETTER IOTIFIED E +0467 ; Ll # CYRILLIC SMALL LETTER LITTLE YUS +0469 ; Ll # CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS +046B ; Ll # CYRILLIC SMALL LETTER BIG YUS +046D ; Ll # CYRILLIC SMALL LETTER IOTIFIED BIG YUS +046F ; Ll # CYRILLIC SMALL LETTER KSI +0471 ; Ll # CYRILLIC SMALL LETTER PSI +0473 ; Ll # CYRILLIC SMALL LETTER FITA +0475 ; Ll # CYRILLIC SMALL LETTER IZHITSA +0477 ; Ll # CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GRAVE ACCENT +0479 ; Ll # CYRILLIC SMALL LETTER UK +047B ; Ll # CYRILLIC SMALL LETTER ROUND OMEGA +047D ; Ll # CYRILLIC SMALL LETTER OMEGA WITH TITLO +047F ; Ll # CYRILLIC SMALL LETTER OT +0481 ; Ll # CYRILLIC SMALL LETTER KOPPA +048B ; Ll # CYRILLIC SMALL LETTER SHORT I WITH TAIL +048D ; Ll # CYRILLIC SMALL LETTER SEMISOFT SIGN +048F ; Ll # CYRILLIC SMALL LETTER ER WITH TICK +0491 ; Ll # CYRILLIC SMALL LETTER GHE WITH UPTURN +0493 ; Ll # CYRILLIC SMALL LETTER GHE WITH STROKE +0495 ; Ll # CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK +0497 ; Ll # CYRILLIC SMALL LETTER ZHE WITH DESCENDER +0499 ; Ll # CYRILLIC SMALL LETTER ZE WITH DESCENDER +049B ; Ll # CYRILLIC SMALL LETTER KA WITH DESCENDER +049D ; Ll # CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE +049F ; Ll # CYRILLIC SMALL LETTER KA WITH STROKE +04A1 ; Ll # CYRILLIC SMALL LETTER BASHKIR KA +04A3 ; Ll # CYRILLIC SMALL LETTER EN WITH DESCENDER +04A5 ; Ll # CYRILLIC SMALL LIGATURE EN GHE +04A7 ; Ll # CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK +04A9 ; Ll # CYRILLIC SMALL LETTER ABKHASIAN HA +04AB ; Ll # CYRILLIC SMALL LETTER ES WITH DESCENDER +04AD ; Ll # CYRILLIC SMALL LETTER TE WITH DESCENDER +04AF ; Ll # CYRILLIC SMALL LETTER STRAIGHT U +04B1 ; Ll # CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE +04B3 ; Ll # CYRILLIC SMALL LETTER HA WITH DESCENDER +04B5 ; Ll # CYRILLIC SMALL LIGATURE TE TSE +04B7 ; Ll # CYRILLIC SMALL LETTER CHE WITH DESCENDER +04B9 ; Ll # CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE +04BB ; Ll # CYRILLIC SMALL LETTER SHHA +04BD ; Ll # CYRILLIC SMALL LETTER ABKHASIAN CHE +04BF ; Ll # CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER +04C2 ; Ll # CYRILLIC SMALL LETTER ZHE WITH BREVE +04C4 ; Ll # CYRILLIC SMALL LETTER KA WITH HOOK +04C6 ; Ll # CYRILLIC SMALL LETTER EL WITH TAIL +04C8 ; Ll # CYRILLIC SMALL LETTER EN WITH HOOK +04CA ; Ll # CYRILLIC SMALL LETTER EN WITH TAIL +04CC ; Ll # CYRILLIC SMALL LETTER KHAKASSIAN CHE +04CE..04CF ; Ll # [2] CYRILLIC SMALL LETTER EM WITH TAIL..CYRILLIC SMALL LETTER PALOCHKA +04D1 ; Ll # CYRILLIC SMALL LETTER A WITH BREVE +04D3 ; Ll # CYRILLIC SMALL LETTER A WITH DIAERESIS +04D5 ; Ll # CYRILLIC SMALL LIGATURE A IE +04D7 ; Ll # CYRILLIC SMALL LETTER IE WITH BREVE +04D9 ; Ll # CYRILLIC SMALL LETTER SCHWA +04DB ; Ll # CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS +04DD ; Ll # CYRILLIC SMALL LETTER ZHE WITH DIAERESIS +04DF ; Ll # CYRILLIC SMALL LETTER ZE WITH DIAERESIS +04E1 ; Ll # CYRILLIC SMALL LETTER ABKHASIAN DZE +04E3 ; Ll # CYRILLIC SMALL LETTER I WITH MACRON +04E5 ; Ll # CYRILLIC SMALL LETTER I WITH DIAERESIS +04E7 ; Ll # CYRILLIC SMALL LETTER O WITH DIAERESIS +04E9 ; Ll # CYRILLIC SMALL LETTER BARRED O +04EB ; Ll # CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS +04ED ; Ll # CYRILLIC SMALL LETTER E WITH DIAERESIS +04EF ; Ll # CYRILLIC SMALL LETTER U WITH MACRON +04F1 ; Ll # CYRILLIC SMALL LETTER U WITH DIAERESIS +04F3 ; Ll # CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE +04F5 ; Ll # CYRILLIC SMALL LETTER CHE WITH DIAERESIS +04F7 ; Ll # CYRILLIC SMALL LETTER GHE WITH DESCENDER +04F9 ; Ll # CYRILLIC SMALL LETTER YERU WITH DIAERESIS +04FB ; Ll # CYRILLIC SMALL LETTER GHE WITH STROKE AND HOOK +04FD ; Ll # CYRILLIC SMALL LETTER HA WITH HOOK +04FF ; Ll # CYRILLIC SMALL LETTER HA WITH STROKE +0501 ; Ll # CYRILLIC SMALL LETTER KOMI DE +0503 ; Ll # CYRILLIC SMALL LETTER KOMI DJE +0505 ; Ll # CYRILLIC SMALL LETTER KOMI ZJE +0507 ; Ll # CYRILLIC SMALL LETTER KOMI DZJE +0509 ; Ll # CYRILLIC SMALL LETTER KOMI LJE +050B ; Ll # CYRILLIC SMALL LETTER KOMI NJE +050D ; Ll # CYRILLIC SMALL LETTER KOMI SJE +050F ; Ll # CYRILLIC SMALL LETTER KOMI TJE +0511 ; Ll # CYRILLIC SMALL LETTER REVERSED ZE +0513 ; Ll # CYRILLIC SMALL LETTER EL WITH HOOK +0515 ; Ll # CYRILLIC SMALL LETTER LHA +0517 ; Ll # CYRILLIC SMALL LETTER RHA +0519 ; Ll # CYRILLIC SMALL LETTER YAE +051B ; Ll # CYRILLIC SMALL LETTER QA +051D ; Ll # CYRILLIC SMALL LETTER WE +051F ; Ll # CYRILLIC SMALL LETTER ALEUT KA +0521 ; Ll # CYRILLIC SMALL LETTER EL WITH MIDDLE HOOK +0523 ; Ll # CYRILLIC SMALL LETTER EN WITH MIDDLE HOOK +0525 ; Ll # CYRILLIC SMALL LETTER PE WITH DESCENDER +0527 ; Ll # CYRILLIC SMALL LETTER SHHA WITH DESCENDER +0529 ; Ll # CYRILLIC SMALL LETTER EN WITH LEFT HOOK +052B ; Ll # CYRILLIC SMALL LETTER DZZHE +052D ; Ll # CYRILLIC SMALL LETTER DCHE +052F ; Ll # CYRILLIC SMALL LETTER EL WITH DESCENDER +0560..0588 ; Ll # [41] ARMENIAN SMALL LETTER TURNED AYB..ARMENIAN SMALL LETTER YI WITH STROKE +10D0..10FA ; Ll # [43] GEORGIAN LETTER AN..GEORGIAN LETTER AIN +10FD..10FF ; Ll # [3] GEORGIAN LETTER AEN..GEORGIAN LETTER LABIAL SIGN +13F8..13FD ; Ll # [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV +1C80..1C88 ; Ll # [9] CYRILLIC SMALL LETTER ROUNDED VE..CYRILLIC SMALL LETTER UNBLENDED UK +1D00..1D2B ; Ll # [44] LATIN LETTER SMALL CAPITAL A..CYRILLIC LETTER SMALL CAPITAL EL +1D6B..1D77 ; Ll # [13] LATIN SMALL LETTER UE..LATIN SMALL LETTER TURNED G +1D79..1D9A ; Ll # [34] LATIN SMALL LETTER INSULAR G..LATIN SMALL LETTER EZH WITH RETROFLEX HOOK +1E01 ; Ll # LATIN SMALL LETTER A WITH RING BELOW +1E03 ; Ll # LATIN SMALL LETTER B WITH DOT ABOVE +1E05 ; Ll # LATIN SMALL LETTER B WITH DOT BELOW +1E07 ; Ll # LATIN SMALL LETTER B WITH LINE BELOW +1E09 ; Ll # LATIN SMALL LETTER C WITH CEDILLA AND ACUTE +1E0B ; Ll # LATIN SMALL LETTER D WITH DOT ABOVE +1E0D ; Ll # LATIN SMALL LETTER D WITH DOT BELOW +1E0F ; Ll # LATIN SMALL LETTER D WITH LINE BELOW +1E11 ; Ll # LATIN SMALL LETTER D WITH CEDILLA +1E13 ; Ll # LATIN SMALL LETTER D WITH CIRCUMFLEX BELOW +1E15 ; Ll # LATIN SMALL LETTER E WITH MACRON AND GRAVE +1E17 ; Ll # LATIN SMALL LETTER E WITH MACRON AND ACUTE +1E19 ; Ll # LATIN SMALL LETTER E WITH CIRCUMFLEX BELOW +1E1B ; Ll # LATIN SMALL LETTER E WITH TILDE BELOW +1E1D ; Ll # LATIN SMALL LETTER E WITH CEDILLA AND BREVE +1E1F ; Ll # LATIN SMALL LETTER F WITH DOT ABOVE +1E21 ; Ll # LATIN SMALL LETTER G WITH MACRON +1E23 ; Ll # LATIN SMALL LETTER H WITH DOT ABOVE +1E25 ; Ll # LATIN SMALL LETTER H WITH DOT BELOW +1E27 ; Ll # LATIN SMALL LETTER H WITH DIAERESIS +1E29 ; Ll # LATIN SMALL LETTER H WITH CEDILLA +1E2B ; Ll # LATIN SMALL LETTER H WITH BREVE BELOW +1E2D ; Ll # LATIN SMALL LETTER I WITH TILDE BELOW +1E2F ; Ll # LATIN SMALL LETTER I WITH DIAERESIS AND ACUTE +1E31 ; Ll # LATIN SMALL LETTER K WITH ACUTE +1E33 ; Ll # LATIN SMALL LETTER K WITH DOT BELOW +1E35 ; Ll # LATIN SMALL LETTER K WITH LINE BELOW +1E37 ; Ll # LATIN SMALL LETTER L WITH DOT BELOW +1E39 ; Ll # LATIN SMALL LETTER L WITH DOT BELOW AND MACRON +1E3B ; Ll # LATIN SMALL LETTER L WITH LINE BELOW +1E3D ; Ll # LATIN SMALL LETTER L WITH CIRCUMFLEX BELOW +1E3F ; Ll # LATIN SMALL LETTER M WITH ACUTE +1E41 ; Ll # LATIN SMALL LETTER M WITH DOT ABOVE +1E43 ; Ll # LATIN SMALL LETTER M WITH DOT BELOW +1E45 ; Ll # LATIN SMALL LETTER N WITH DOT ABOVE +1E47 ; Ll # LATIN SMALL LETTER N WITH DOT BELOW +1E49 ; Ll # LATIN SMALL LETTER N WITH LINE BELOW +1E4B ; Ll # LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW +1E4D ; Ll # LATIN SMALL LETTER O WITH TILDE AND ACUTE +1E4F ; Ll # LATIN SMALL LETTER O WITH TILDE AND DIAERESIS +1E51 ; Ll # LATIN SMALL LETTER O WITH MACRON AND GRAVE +1E53 ; Ll # LATIN SMALL LETTER O WITH MACRON AND ACUTE +1E55 ; Ll # LATIN SMALL LETTER P WITH ACUTE +1E57 ; Ll # LATIN SMALL LETTER P WITH DOT ABOVE +1E59 ; Ll # LATIN SMALL LETTER R WITH DOT ABOVE +1E5B ; Ll # LATIN SMALL LETTER R WITH DOT BELOW +1E5D ; Ll # LATIN SMALL LETTER R WITH DOT BELOW AND MACRON +1E5F ; Ll # LATIN SMALL LETTER R WITH LINE BELOW +1E61 ; Ll # LATIN SMALL LETTER S WITH DOT ABOVE +1E63 ; Ll # LATIN SMALL LETTER S WITH DOT BELOW +1E65 ; Ll # LATIN SMALL LETTER S WITH ACUTE AND DOT ABOVE +1E67 ; Ll # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE +1E69 ; Ll # LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE +1E6B ; Ll # LATIN SMALL LETTER T WITH DOT ABOVE +1E6D ; Ll # LATIN SMALL LETTER T WITH DOT BELOW +1E6F ; Ll # LATIN SMALL LETTER T WITH LINE BELOW +1E71 ; Ll # LATIN SMALL LETTER T WITH CIRCUMFLEX BELOW +1E73 ; Ll # LATIN SMALL LETTER U WITH DIAERESIS BELOW +1E75 ; Ll # LATIN SMALL LETTER U WITH TILDE BELOW +1E77 ; Ll # LATIN SMALL LETTER U WITH CIRCUMFLEX BELOW +1E79 ; Ll # LATIN SMALL LETTER U WITH TILDE AND ACUTE +1E7B ; Ll # LATIN SMALL LETTER U WITH MACRON AND DIAERESIS +1E7D ; Ll # LATIN SMALL LETTER V WITH TILDE +1E7F ; Ll # LATIN SMALL LETTER V WITH DOT BELOW +1E81 ; Ll # LATIN SMALL LETTER W WITH GRAVE +1E83 ; Ll # LATIN SMALL LETTER W WITH ACUTE +1E85 ; Ll # LATIN SMALL LETTER W WITH DIAERESIS +1E87 ; Ll # LATIN SMALL LETTER W WITH DOT ABOVE +1E89 ; Ll # LATIN SMALL LETTER W WITH DOT BELOW +1E8B ; Ll # LATIN SMALL LETTER X WITH DOT ABOVE +1E8D ; Ll # LATIN SMALL LETTER X WITH DIAERESIS +1E8F ; Ll # LATIN SMALL LETTER Y WITH DOT ABOVE +1E91 ; Ll # LATIN SMALL LETTER Z WITH CIRCUMFLEX +1E93 ; Ll # LATIN SMALL LETTER Z WITH DOT BELOW +1E95..1E9D ; Ll # [9] LATIN SMALL LETTER Z WITH LINE BELOW..LATIN SMALL LETTER LONG S WITH HIGH STROKE +1E9F ; Ll # LATIN SMALL LETTER DELTA +1EA1 ; Ll # LATIN SMALL LETTER A WITH DOT BELOW +1EA3 ; Ll # LATIN SMALL LETTER A WITH HOOK ABOVE +1EA5 ; Ll # LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE +1EA7 ; Ll # LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE +1EA9 ; Ll # LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE +1EAB ; Ll # LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE +1EAD ; Ll # LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW +1EAF ; Ll # LATIN SMALL LETTER A WITH BREVE AND ACUTE +1EB1 ; Ll # LATIN SMALL LETTER A WITH BREVE AND GRAVE +1EB3 ; Ll # LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE +1EB5 ; Ll # LATIN SMALL LETTER A WITH BREVE AND TILDE +1EB7 ; Ll # LATIN SMALL LETTER A WITH BREVE AND DOT BELOW +1EB9 ; Ll # LATIN SMALL LETTER E WITH DOT BELOW +1EBB ; Ll # LATIN SMALL LETTER E WITH HOOK ABOVE +1EBD ; Ll # LATIN SMALL LETTER E WITH TILDE +1EBF ; Ll # LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE +1EC1 ; Ll # LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE +1EC3 ; Ll # LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE +1EC5 ; Ll # LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE +1EC7 ; Ll # LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW +1EC9 ; Ll # LATIN SMALL LETTER I WITH HOOK ABOVE +1ECB ; Ll # LATIN SMALL LETTER I WITH DOT BELOW +1ECD ; Ll # LATIN SMALL LETTER O WITH DOT BELOW +1ECF ; Ll # LATIN SMALL LETTER O WITH HOOK ABOVE +1ED1 ; Ll # LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE +1ED3 ; Ll # LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE +1ED5 ; Ll # LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE +1ED7 ; Ll # LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE +1ED9 ; Ll # LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW +1EDB ; Ll # LATIN SMALL LETTER O WITH HORN AND ACUTE +1EDD ; Ll # LATIN SMALL LETTER O WITH HORN AND GRAVE +1EDF ; Ll # LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE +1EE1 ; Ll # LATIN SMALL LETTER O WITH HORN AND TILDE +1EE3 ; Ll # LATIN SMALL LETTER O WITH HORN AND DOT BELOW +1EE5 ; Ll # LATIN SMALL LETTER U WITH DOT BELOW +1EE7 ; Ll # LATIN SMALL LETTER U WITH HOOK ABOVE +1EE9 ; Ll # LATIN SMALL LETTER U WITH HORN AND ACUTE +1EEB ; Ll # LATIN SMALL LETTER U WITH HORN AND GRAVE +1EED ; Ll # LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE +1EEF ; Ll # LATIN SMALL LETTER U WITH HORN AND TILDE +1EF1 ; Ll # LATIN SMALL LETTER U WITH HORN AND DOT BELOW +1EF3 ; Ll # LATIN SMALL LETTER Y WITH GRAVE +1EF5 ; Ll # LATIN SMALL LETTER Y WITH DOT BELOW +1EF7 ; Ll # LATIN SMALL LETTER Y WITH HOOK ABOVE +1EF9 ; Ll # LATIN SMALL LETTER Y WITH TILDE +1EFB ; Ll # LATIN SMALL LETTER MIDDLE-WELSH LL +1EFD ; Ll # LATIN SMALL LETTER MIDDLE-WELSH V +1EFF..1F07 ; Ll # [9] LATIN SMALL LETTER Y WITH LOOP..GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI +1F10..1F15 ; Ll # [6] GREEK SMALL LETTER EPSILON WITH PSILI..GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA +1F20..1F27 ; Ll # [8] GREEK SMALL LETTER ETA WITH PSILI..GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI +1F30..1F37 ; Ll # [8] GREEK SMALL LETTER IOTA WITH PSILI..GREEK SMALL LETTER IOTA WITH DASIA AND PERISPOMENI +1F40..1F45 ; Ll # [6] GREEK SMALL LETTER OMICRON WITH PSILI..GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA +1F50..1F57 ; Ll # [8] GREEK SMALL LETTER UPSILON WITH PSILI..GREEK SMALL LETTER UPSILON WITH DASIA AND PERISPOMENI +1F60..1F67 ; Ll # [8] GREEK SMALL LETTER OMEGA WITH PSILI..GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI +1F70..1F7D ; Ll # [14] GREEK SMALL LETTER ALPHA WITH VARIA..GREEK SMALL LETTER OMEGA WITH OXIA +1F80..1F87 ; Ll # [8] GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI +1F90..1F97 ; Ll # [8] GREEK SMALL LETTER ETA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ETA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI +1FA0..1FA7 ; Ll # [8] GREEK SMALL LETTER OMEGA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI +1FB0..1FB4 ; Ll # [5] GREEK SMALL LETTER ALPHA WITH VRACHY..GREEK SMALL LETTER ALPHA WITH OXIA AND YPOGEGRAMMENI +1FB6..1FB7 ; Ll # [2] GREEK SMALL LETTER ALPHA WITH PERISPOMENI..GREEK SMALL LETTER ALPHA WITH PERISPOMENI AND YPOGEGRAMMENI +1FBE ; Ll # GREEK PROSGEGRAMMENI +1FC2..1FC4 ; Ll # [3] GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER ETA WITH OXIA AND YPOGEGRAMMENI +1FC6..1FC7 ; Ll # [2] GREEK SMALL LETTER ETA WITH PERISPOMENI..GREEK SMALL LETTER ETA WITH PERISPOMENI AND YPOGEGRAMMENI +1FD0..1FD3 ; Ll # [4] GREEK SMALL LETTER IOTA WITH VRACHY..GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA +1FD6..1FD7 ; Ll # [2] GREEK SMALL LETTER IOTA WITH PERISPOMENI..GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI +1FE0..1FE7 ; Ll # [8] GREEK SMALL LETTER UPSILON WITH VRACHY..GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND PERISPOMENI +1FF2..1FF4 ; Ll # [3] GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI..GREEK SMALL LETTER OMEGA WITH OXIA AND YPOGEGRAMMENI +1FF6..1FF7 ; Ll # [2] GREEK SMALL LETTER OMEGA WITH PERISPOMENI..GREEK SMALL LETTER OMEGA WITH PERISPOMENI AND YPOGEGRAMMENI +210A ; Ll # SCRIPT SMALL G +210E..210F ; Ll # [2] PLANCK CONSTANT..PLANCK CONSTANT OVER TWO PI +2113 ; Ll # SCRIPT SMALL L +212F ; Ll # SCRIPT SMALL E +2134 ; Ll # SCRIPT SMALL O +2139 ; Ll # INFORMATION SOURCE +213C..213D ; Ll # [2] DOUBLE-STRUCK SMALL PI..DOUBLE-STRUCK SMALL GAMMA +2146..2149 ; Ll # [4] DOUBLE-STRUCK ITALIC SMALL D..DOUBLE-STRUCK ITALIC SMALL J +214E ; Ll # TURNED SMALL F +2184 ; Ll # LATIN SMALL LETTER REVERSED C +2C30..2C5E ; Ll # [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE +2C61 ; Ll # LATIN SMALL LETTER L WITH DOUBLE BAR +2C65..2C66 ; Ll # [2] LATIN SMALL LETTER A WITH STROKE..LATIN SMALL LETTER T WITH DIAGONAL STROKE +2C68 ; Ll # LATIN SMALL LETTER H WITH DESCENDER +2C6A ; Ll # LATIN SMALL LETTER K WITH DESCENDER +2C6C ; Ll # LATIN SMALL LETTER Z WITH DESCENDER +2C71 ; Ll # LATIN SMALL LETTER V WITH RIGHT HOOK +2C73..2C74 ; Ll # [2] LATIN SMALL LETTER W WITH HOOK..LATIN SMALL LETTER V WITH CURL +2C76..2C7B ; Ll # [6] LATIN SMALL LETTER HALF H..LATIN LETTER SMALL CAPITAL TURNED E +2C81 ; Ll # COPTIC SMALL LETTER ALFA +2C83 ; Ll # COPTIC SMALL LETTER VIDA +2C85 ; Ll # COPTIC SMALL LETTER GAMMA +2C87 ; Ll # COPTIC SMALL LETTER DALDA +2C89 ; Ll # COPTIC SMALL LETTER EIE +2C8B ; Ll # COPTIC SMALL LETTER SOU +2C8D ; Ll # COPTIC SMALL LETTER ZATA +2C8F ; Ll # COPTIC SMALL LETTER HATE +2C91 ; Ll # COPTIC SMALL LETTER THETHE +2C93 ; Ll # COPTIC SMALL LETTER IAUDA +2C95 ; Ll # COPTIC SMALL LETTER KAPA +2C97 ; Ll # COPTIC SMALL LETTER LAULA +2C99 ; Ll # COPTIC SMALL LETTER MI +2C9B ; Ll # COPTIC SMALL LETTER NI +2C9D ; Ll # COPTIC SMALL LETTER KSI +2C9F ; Ll # COPTIC SMALL LETTER O +2CA1 ; Ll # COPTIC SMALL LETTER PI +2CA3 ; Ll # COPTIC SMALL LETTER RO +2CA5 ; Ll # COPTIC SMALL LETTER SIMA +2CA7 ; Ll # COPTIC SMALL LETTER TAU +2CA9 ; Ll # COPTIC SMALL LETTER UA +2CAB ; Ll # COPTIC SMALL LETTER FI +2CAD ; Ll # COPTIC SMALL LETTER KHI +2CAF ; Ll # COPTIC SMALL LETTER PSI +2CB1 ; Ll # COPTIC SMALL LETTER OOU +2CB3 ; Ll # COPTIC SMALL LETTER DIALECT-P ALEF +2CB5 ; Ll # COPTIC SMALL LETTER OLD COPTIC AIN +2CB7 ; Ll # COPTIC SMALL LETTER CRYPTOGRAMMIC EIE +2CB9 ; Ll # COPTIC SMALL LETTER DIALECT-P KAPA +2CBB ; Ll # COPTIC SMALL LETTER DIALECT-P NI +2CBD ; Ll # COPTIC SMALL LETTER CRYPTOGRAMMIC NI +2CBF ; Ll # COPTIC SMALL LETTER OLD COPTIC OOU +2CC1 ; Ll # COPTIC SMALL LETTER SAMPI +2CC3 ; Ll # COPTIC SMALL LETTER CROSSED SHEI +2CC5 ; Ll # COPTIC SMALL LETTER OLD COPTIC SHEI +2CC7 ; Ll # COPTIC SMALL LETTER OLD COPTIC ESH +2CC9 ; Ll # COPTIC SMALL LETTER AKHMIMIC KHEI +2CCB ; Ll # COPTIC SMALL LETTER DIALECT-P HORI +2CCD ; Ll # COPTIC SMALL LETTER OLD COPTIC HORI +2CCF ; Ll # COPTIC SMALL LETTER OLD COPTIC HA +2CD1 ; Ll # COPTIC SMALL LETTER L-SHAPED HA +2CD3 ; Ll # COPTIC SMALL LETTER OLD COPTIC HEI +2CD5 ; Ll # COPTIC SMALL LETTER OLD COPTIC HAT +2CD7 ; Ll # COPTIC SMALL LETTER OLD COPTIC GANGIA +2CD9 ; Ll # COPTIC SMALL LETTER OLD COPTIC DJA +2CDB ; Ll # COPTIC SMALL LETTER OLD COPTIC SHIMA +2CDD ; Ll # COPTIC SMALL LETTER OLD NUBIAN SHIMA +2CDF ; Ll # COPTIC SMALL LETTER OLD NUBIAN NGI +2CE1 ; Ll # COPTIC SMALL LETTER OLD NUBIAN NYI +2CE3..2CE4 ; Ll # [2] COPTIC SMALL LETTER OLD NUBIAN WAU..COPTIC SYMBOL KAI +2CEC ; Ll # COPTIC SMALL LETTER CRYPTOGRAMMIC SHEI +2CEE ; Ll # COPTIC SMALL LETTER CRYPTOGRAMMIC GANGIA +2CF3 ; Ll # COPTIC SMALL LETTER BOHAIRIC KHEI +2D00..2D25 ; Ll # [38] GEORGIAN SMALL LETTER AN..GEORGIAN SMALL LETTER HOE +2D27 ; Ll # GEORGIAN SMALL LETTER YN +2D2D ; Ll # GEORGIAN SMALL LETTER AEN +A641 ; Ll # CYRILLIC SMALL LETTER ZEMLYA +A643 ; Ll # CYRILLIC SMALL LETTER DZELO +A645 ; Ll # CYRILLIC SMALL LETTER REVERSED DZE +A647 ; Ll # CYRILLIC SMALL LETTER IOTA +A649 ; Ll # CYRILLIC SMALL LETTER DJERV +A64B ; Ll # CYRILLIC SMALL LETTER MONOGRAPH UK +A64D ; Ll # CYRILLIC SMALL LETTER BROAD OMEGA +A64F ; Ll # CYRILLIC SMALL LETTER NEUTRAL YER +A651 ; Ll # CYRILLIC SMALL LETTER YERU WITH BACK YER +A653 ; Ll # CYRILLIC SMALL LETTER IOTIFIED YAT +A655 ; Ll # CYRILLIC SMALL LETTER REVERSED YU +A657 ; Ll # CYRILLIC SMALL LETTER IOTIFIED A +A659 ; Ll # CYRILLIC SMALL LETTER CLOSED LITTLE YUS +A65B ; Ll # CYRILLIC SMALL LETTER BLENDED YUS +A65D ; Ll # CYRILLIC SMALL LETTER IOTIFIED CLOSED LITTLE YUS +A65F ; Ll # CYRILLIC SMALL LETTER YN +A661 ; Ll # CYRILLIC SMALL LETTER REVERSED TSE +A663 ; Ll # CYRILLIC SMALL LETTER SOFT DE +A665 ; Ll # CYRILLIC SMALL LETTER SOFT EL +A667 ; Ll # CYRILLIC SMALL LETTER SOFT EM +A669 ; Ll # CYRILLIC SMALL LETTER MONOCULAR O +A66B ; Ll # CYRILLIC SMALL LETTER BINOCULAR O +A66D ; Ll # CYRILLIC SMALL LETTER DOUBLE MONOCULAR O +A681 ; Ll # CYRILLIC SMALL LETTER DWE +A683 ; Ll # CYRILLIC SMALL LETTER DZWE +A685 ; Ll # CYRILLIC SMALL LETTER ZHWE +A687 ; Ll # CYRILLIC SMALL LETTER CCHE +A689 ; Ll # CYRILLIC SMALL LETTER DZZE +A68B ; Ll # CYRILLIC SMALL LETTER TE WITH MIDDLE HOOK +A68D ; Ll # CYRILLIC SMALL LETTER TWE +A68F ; Ll # CYRILLIC SMALL LETTER TSWE +A691 ; Ll # CYRILLIC SMALL LETTER TSSE +A693 ; Ll # CYRILLIC SMALL LETTER TCHE +A695 ; Ll # CYRILLIC SMALL LETTER HWE +A697 ; Ll # CYRILLIC SMALL LETTER SHWE +A699 ; Ll # CYRILLIC SMALL LETTER DOUBLE O +A69B ; Ll # CYRILLIC SMALL LETTER CROSSED O +A723 ; Ll # LATIN SMALL LETTER EGYPTOLOGICAL ALEF +A725 ; Ll # LATIN SMALL LETTER EGYPTOLOGICAL AIN +A727 ; Ll # LATIN SMALL LETTER HENG +A729 ; Ll # LATIN SMALL LETTER TZ +A72B ; Ll # LATIN SMALL LETTER TRESILLO +A72D ; Ll # LATIN SMALL LETTER CUATRILLO +A72F..A731 ; Ll # [3] LATIN SMALL LETTER CUATRILLO WITH COMMA..LATIN LETTER SMALL CAPITAL S +A733 ; Ll # LATIN SMALL LETTER AA +A735 ; Ll # LATIN SMALL LETTER AO +A737 ; Ll # LATIN SMALL LETTER AU +A739 ; Ll # LATIN SMALL LETTER AV +A73B ; Ll # LATIN SMALL LETTER AV WITH HORIZONTAL BAR +A73D ; Ll # LATIN SMALL LETTER AY +A73F ; Ll # LATIN SMALL LETTER REVERSED C WITH DOT +A741 ; Ll # LATIN SMALL LETTER K WITH STROKE +A743 ; Ll # LATIN SMALL LETTER K WITH DIAGONAL STROKE +A745 ; Ll # LATIN SMALL LETTER K WITH STROKE AND DIAGONAL STROKE +A747 ; Ll # LATIN SMALL LETTER BROKEN L +A749 ; Ll # LATIN SMALL LETTER L WITH HIGH STROKE +A74B ; Ll # LATIN SMALL LETTER O WITH LONG STROKE OVERLAY +A74D ; Ll # LATIN SMALL LETTER O WITH LOOP +A74F ; Ll # LATIN SMALL LETTER OO +A751 ; Ll # LATIN SMALL LETTER P WITH STROKE THROUGH DESCENDER +A753 ; Ll # LATIN SMALL LETTER P WITH FLOURISH +A755 ; Ll # LATIN SMALL LETTER P WITH SQUIRREL TAIL +A757 ; Ll # LATIN SMALL LETTER Q WITH STROKE THROUGH DESCENDER +A759 ; Ll # LATIN SMALL LETTER Q WITH DIAGONAL STROKE +A75B ; Ll # LATIN SMALL LETTER R ROTUNDA +A75D ; Ll # LATIN SMALL LETTER RUM ROTUNDA +A75F ; Ll # LATIN SMALL LETTER V WITH DIAGONAL STROKE +A761 ; Ll # LATIN SMALL LETTER VY +A763 ; Ll # LATIN SMALL LETTER VISIGOTHIC Z +A765 ; Ll # LATIN SMALL LETTER THORN WITH STROKE +A767 ; Ll # LATIN SMALL LETTER THORN WITH STROKE THROUGH DESCENDER +A769 ; Ll # LATIN SMALL LETTER VEND +A76B ; Ll # LATIN SMALL LETTER ET +A76D ; Ll # LATIN SMALL LETTER IS +A76F ; Ll # LATIN SMALL LETTER CON +A771..A778 ; Ll # [8] LATIN SMALL LETTER DUM..LATIN SMALL LETTER UM +A77A ; Ll # LATIN SMALL LETTER INSULAR D +A77C ; Ll # LATIN SMALL LETTER INSULAR F +A77F ; Ll # LATIN SMALL LETTER TURNED INSULAR G +A781 ; Ll # LATIN SMALL LETTER TURNED L +A783 ; Ll # LATIN SMALL LETTER INSULAR R +A785 ; Ll # LATIN SMALL LETTER INSULAR S +A787 ; Ll # LATIN SMALL LETTER INSULAR T +A78C ; Ll # LATIN SMALL LETTER SALTILLO +A78E ; Ll # LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT +A791 ; Ll # LATIN SMALL LETTER N WITH DESCENDER +A793..A795 ; Ll # [3] LATIN SMALL LETTER C WITH BAR..LATIN SMALL LETTER H WITH PALATAL HOOK +A797 ; Ll # LATIN SMALL LETTER B WITH FLOURISH +A799 ; Ll # LATIN SMALL LETTER F WITH STROKE +A79B ; Ll # LATIN SMALL LETTER VOLAPUK AE +A79D ; Ll # LATIN SMALL LETTER VOLAPUK OE +A79F ; Ll # LATIN SMALL LETTER VOLAPUK UE +A7A1 ; Ll # LATIN SMALL LETTER G WITH OBLIQUE STROKE +A7A3 ; Ll # LATIN SMALL LETTER K WITH OBLIQUE STROKE +A7A5 ; Ll # LATIN SMALL LETTER N WITH OBLIQUE STROKE +A7A7 ; Ll # LATIN SMALL LETTER R WITH OBLIQUE STROKE +A7A9 ; Ll # LATIN SMALL LETTER S WITH OBLIQUE STROKE +A7AF ; Ll # LATIN LETTER SMALL CAPITAL Q +A7B5 ; Ll # LATIN SMALL LETTER BETA +A7B7 ; Ll # LATIN SMALL LETTER OMEGA +A7B9 ; Ll # LATIN SMALL LETTER U WITH STROKE +A7BB ; Ll # LATIN SMALL LETTER GLOTTAL A +A7BD ; Ll # LATIN SMALL LETTER GLOTTAL I +A7BF ; Ll # LATIN SMALL LETTER GLOTTAL U +A7C3 ; Ll # LATIN SMALL LETTER ANGLICANA W +A7C8 ; Ll # LATIN SMALL LETTER D WITH SHORT STROKE OVERLAY +A7CA ; Ll # LATIN SMALL LETTER S WITH SHORT STROKE OVERLAY +A7F6 ; Ll # LATIN SMALL LETTER REVERSED HALF H +A7FA ; Ll # LATIN LETTER SMALL CAPITAL TURNED M +AB30..AB5A ; Ll # [43] LATIN SMALL LETTER BARRED ALPHA..LATIN SMALL LETTER Y WITH SHORT RIGHT LEG +AB60..AB68 ; Ll # [9] LATIN SMALL LETTER SAKHA YAT..LATIN SMALL LETTER TURNED R WITH MIDDLE TILDE +AB70..ABBF ; Ll # [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETTER YA +FB00..FB06 ; Ll # [7] LATIN SMALL LIGATURE FF..LATIN SMALL LIGATURE ST +FB13..FB17 ; Ll # [5] ARMENIAN SMALL LIGATURE MEN NOW..ARMENIAN SMALL LIGATURE MEN XEH +FF41..FF5A ; Ll # [26] FULLWIDTH LATIN SMALL LETTER A..FULLWIDTH LATIN SMALL LETTER Z +10428..1044F ; Ll # [40] DESERET SMALL LETTER LONG I..DESERET SMALL LETTER EW +104D8..104FB ; Ll # [36] OSAGE SMALL LETTER A..OSAGE SMALL LETTER ZHA +10CC0..10CF2 ; Ll # [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US +118C0..118DF ; Ll # [32] WARANG CITI SMALL LETTER NGAA..WARANG CITI SMALL LETTER VIYO +16E60..16E7F ; Ll # [32] MEDEFAIDRIN SMALL LETTER M..MEDEFAIDRIN SMALL LETTER Y +1D41A..1D433 ; Ll # [26] MATHEMATICAL BOLD SMALL A..MATHEMATICAL BOLD SMALL Z +1D44E..1D454 ; Ll # [7] MATHEMATICAL ITALIC SMALL A..MATHEMATICAL ITALIC SMALL G +1D456..1D467 ; Ll # [18] MATHEMATICAL ITALIC SMALL I..MATHEMATICAL ITALIC SMALL Z +1D482..1D49B ; Ll # [26] MATHEMATICAL BOLD ITALIC SMALL A..MATHEMATICAL BOLD ITALIC SMALL Z +1D4B6..1D4B9 ; Ll # [4] MATHEMATICAL SCRIPT SMALL A..MATHEMATICAL SCRIPT SMALL D +1D4BB ; Ll # MATHEMATICAL SCRIPT SMALL F +1D4BD..1D4C3 ; Ll # [7] MATHEMATICAL SCRIPT SMALL H..MATHEMATICAL SCRIPT SMALL N +1D4C5..1D4CF ; Ll # [11] MATHEMATICAL SCRIPT SMALL P..MATHEMATICAL SCRIPT SMALL Z +1D4EA..1D503 ; Ll # [26] MATHEMATICAL BOLD SCRIPT SMALL A..MATHEMATICAL BOLD SCRIPT SMALL Z +1D51E..1D537 ; Ll # [26] MATHEMATICAL FRAKTUR SMALL A..MATHEMATICAL FRAKTUR SMALL Z +1D552..1D56B ; Ll # [26] MATHEMATICAL DOUBLE-STRUCK SMALL A..MATHEMATICAL DOUBLE-STRUCK SMALL Z +1D586..1D59F ; Ll # [26] MATHEMATICAL BOLD FRAKTUR SMALL A..MATHEMATICAL BOLD FRAKTUR SMALL Z +1D5BA..1D5D3 ; Ll # [26] MATHEMATICAL SANS-SERIF SMALL A..MATHEMATICAL SANS-SERIF SMALL Z +1D5EE..1D607 ; Ll # [26] MATHEMATICAL SANS-SERIF BOLD SMALL A..MATHEMATICAL SANS-SERIF BOLD SMALL Z +1D622..1D63B ; Ll # [26] MATHEMATICAL SANS-SERIF ITALIC SMALL A..MATHEMATICAL SANS-SERIF ITALIC SMALL Z +1D656..1D66F ; Ll # [26] MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL A..MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL Z +1D68A..1D6A5 ; Ll # [28] MATHEMATICAL MONOSPACE SMALL A..MATHEMATICAL ITALIC SMALL DOTLESS J +1D6C2..1D6DA ; Ll # [25] MATHEMATICAL BOLD SMALL ALPHA..MATHEMATICAL BOLD SMALL OMEGA +1D6DC..1D6E1 ; Ll # [6] MATHEMATICAL BOLD EPSILON SYMBOL..MATHEMATICAL BOLD PI SYMBOL +1D6FC..1D714 ; Ll # [25] MATHEMATICAL ITALIC SMALL ALPHA..MATHEMATICAL ITALIC SMALL OMEGA +1D716..1D71B ; Ll # [6] MATHEMATICAL ITALIC EPSILON SYMBOL..MATHEMATICAL ITALIC PI SYMBOL +1D736..1D74E ; Ll # [25] MATHEMATICAL BOLD ITALIC SMALL ALPHA..MATHEMATICAL BOLD ITALIC SMALL OMEGA +1D750..1D755 ; Ll # [6] MATHEMATICAL BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL BOLD ITALIC PI SYMBOL +1D770..1D788 ; Ll # [25] MATHEMATICAL SANS-SERIF BOLD SMALL ALPHA..MATHEMATICAL SANS-SERIF BOLD SMALL OMEGA +1D78A..1D78F ; Ll # [6] MATHEMATICAL SANS-SERIF BOLD EPSILON SYMBOL..MATHEMATICAL SANS-SERIF BOLD PI SYMBOL +1D7AA..1D7C2 ; Ll # [25] MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL ALPHA..MATHEMATICAL SANS-SERIF BOLD ITALIC SMALL OMEGA +1D7C4..1D7C9 ; Ll # [6] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL +1D7CB ; Ll # MATHEMATICAL BOLD SMALL DIGAMMA +1E922..1E943 ; Ll # [34] ADLAM SMALL LETTER ALIF..ADLAM SMALL LETTER SHA + +# Total code points: 2155 + +# ================================================ + +# General_Category=Titlecase_Letter + +01C5 ; Lt # LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON +01C8 ; Lt # LATIN CAPITAL LETTER L WITH SMALL LETTER J +01CB ; Lt # LATIN CAPITAL LETTER N WITH SMALL LETTER J +01F2 ; Lt # LATIN CAPITAL LETTER D WITH SMALL LETTER Z +1F88..1F8F ; Lt # [8] GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI..GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1F98..1F9F ; Lt # [8] GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI..GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1FA8..1FAF ; Lt # [8] GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI..GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI +1FBC ; Lt # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI +1FCC ; Lt # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI +1FFC ; Lt # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI + +# Total code points: 31 + +# ================================================ + +# General_Category=Modifier_Letter + +02B0..02C1 ; Lm # [18] MODIFIER LETTER SMALL H..MODIFIER LETTER REVERSED GLOTTAL STOP +02C6..02D1 ; Lm # [12] MODIFIER LETTER CIRCUMFLEX ACCENT..MODIFIER LETTER HALF TRIANGULAR COLON +02E0..02E4 ; Lm # [5] MODIFIER LETTER SMALL GAMMA..MODIFIER LETTER SMALL REVERSED GLOTTAL STOP +02EC ; Lm # MODIFIER LETTER VOICING +02EE ; Lm # MODIFIER LETTER DOUBLE APOSTROPHE +0374 ; Lm # GREEK NUMERAL SIGN +037A ; Lm # GREEK YPOGEGRAMMENI +0559 ; Lm # ARMENIAN MODIFIER LETTER LEFT HALF RING +0640 ; Lm # ARABIC TATWEEL +06E5..06E6 ; Lm # [2] ARABIC SMALL WAW..ARABIC SMALL YEH +07F4..07F5 ; Lm # [2] NKO HIGH TONE APOSTROPHE..NKO LOW TONE APOSTROPHE +07FA ; Lm # NKO LAJANYALAN +081A ; Lm # SAMARITAN MODIFIER LETTER EPENTHETIC YUT +0824 ; Lm # SAMARITAN MODIFIER LETTER SHORT A +0828 ; Lm # SAMARITAN MODIFIER LETTER I +0971 ; Lm # DEVANAGARI SIGN HIGH SPACING DOT +0E46 ; Lm # THAI CHARACTER MAIYAMOK +0EC6 ; Lm # LAO KO LA +10FC ; Lm # MODIFIER LETTER GEORGIAN NAR +17D7 ; Lm # KHMER SIGN LEK TOO +1843 ; Lm # MONGOLIAN LETTER TODO LONG VOWEL SIGN +1AA7 ; Lm # TAI THAM SIGN MAI YAMOK +1C78..1C7D ; Lm # [6] OL CHIKI MU TTUDDAG..OL CHIKI AHAD +1D2C..1D6A ; Lm # [63] MODIFIER LETTER CAPITAL A..GREEK SUBSCRIPT SMALL LETTER CHI +1D78 ; Lm # MODIFIER LETTER CYRILLIC EN +1D9B..1DBF ; Lm # [37] MODIFIER LETTER SMALL TURNED ALPHA..MODIFIER LETTER SMALL THETA +2071 ; Lm # SUPERSCRIPT LATIN SMALL LETTER I +207F ; Lm # SUPERSCRIPT LATIN SMALL LETTER N +2090..209C ; Lm # [13] LATIN SUBSCRIPT SMALL LETTER A..LATIN SUBSCRIPT SMALL LETTER T +2C7C..2C7D ; Lm # [2] LATIN SUBSCRIPT SMALL LETTER J..MODIFIER LETTER CAPITAL V +2D6F ; Lm # TIFINAGH MODIFIER LETTER LABIALIZATION MARK +2E2F ; Lm # VERTICAL TILDE +3005 ; Lm # IDEOGRAPHIC ITERATION MARK +3031..3035 ; Lm # [5] VERTICAL KANA REPEAT MARK..VERTICAL KANA REPEAT MARK LOWER HALF +303B ; Lm # VERTICAL IDEOGRAPHIC ITERATION MARK +309D..309E ; Lm # [2] HIRAGANA ITERATION MARK..HIRAGANA VOICED ITERATION MARK +30FC..30FE ; Lm # [3] KATAKANA-HIRAGANA PROLONGED SOUND MARK..KATAKANA VOICED ITERATION MARK +A015 ; Lm # YI SYLLABLE WU +A4F8..A4FD ; Lm # [6] LISU LETTER TONE MYA TI..LISU LETTER TONE MYA JEU +A60C ; Lm # VAI SYLLABLE LENGTHENER +A67F ; Lm # CYRILLIC PAYEROK +A69C..A69D ; Lm # [2] MODIFIER LETTER CYRILLIC HARD SIGN..MODIFIER LETTER CYRILLIC SOFT SIGN +A717..A71F ; Lm # [9] MODIFIER LETTER DOT VERTICAL BAR..MODIFIER LETTER LOW INVERTED EXCLAMATION MARK +A770 ; Lm # MODIFIER LETTER US +A788 ; Lm # MODIFIER LETTER LOW CIRCUMFLEX ACCENT +A7F8..A7F9 ; Lm # [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE +A9CF ; Lm # JAVANESE PANGRANGKEP +A9E6 ; Lm # MYANMAR MODIFIER LETTER SHAN REDUPLICATION +AA70 ; Lm # MYANMAR MODIFIER LETTER KHAMTI REDUPLICATION +AADD ; Lm # TAI VIET SYMBOL SAM +AAF3..AAF4 ; Lm # [2] MEETEI MAYEK SYLLABLE REPETITION MARK..MEETEI MAYEK WORD REPETITION MARK +AB5C..AB5F ; Lm # [4] MODIFIER LETTER SMALL HENG..MODIFIER LETTER SMALL U WITH LEFT HOOK +AB69 ; Lm # MODIFIER LETTER SMALL TURNED W +FF70 ; Lm # HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK +FF9E..FF9F ; Lm # [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK +16B40..16B43 ; Lm # [4] PAHAWH HMONG SIGN VOS SEEV..PAHAWH HMONG SIGN IB YAM +16F93..16F9F ; Lm # [13] MIAO LETTER TONE-2..MIAO LETTER REFORMED TONE-8 +16FE0..16FE1 ; Lm # [2] TANGUT ITERATION MARK..NUSHU ITERATION MARK +16FE3 ; Lm # OLD CHINESE ITERATION MARK +1E137..1E13D ; Lm # [7] NYIAKENG PUACHUE HMONG SIGN FOR PERSON..NYIAKENG PUACHUE HMONG SYLLABLE LENGTHENER +1E94B ; Lm # ADLAM NASALIZATION MARK + +# Total code points: 260 + +# ================================================ + +# General_Category=Other_Letter + +00AA ; Lo # FEMININE ORDINAL INDICATOR +00BA ; Lo # MASCULINE ORDINAL INDICATOR +01BB ; Lo # LATIN LETTER TWO WITH STROKE +01C0..01C3 ; Lo # [4] LATIN LETTER DENTAL CLICK..LATIN LETTER RETROFLEX CLICK +0294 ; Lo # LATIN LETTER GLOTTAL STOP +05D0..05EA ; Lo # [27] HEBREW LETTER ALEF..HEBREW LETTER TAV +05EF..05F2 ; Lo # [4] HEBREW YOD TRIANGLE..HEBREW LIGATURE YIDDISH DOUBLE YOD +0620..063F ; Lo # [32] ARABIC LETTER KASHMIRI YEH..ARABIC LETTER FARSI YEH WITH THREE DOTS ABOVE +0641..064A ; Lo # [10] ARABIC LETTER FEH..ARABIC LETTER YEH +066E..066F ; Lo # [2] ARABIC LETTER DOTLESS BEH..ARABIC LETTER DOTLESS QAF +0671..06D3 ; Lo # [99] ARABIC LETTER ALEF WASLA..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE +06D5 ; Lo # ARABIC LETTER AE +06EE..06EF ; Lo # [2] ARABIC LETTER DAL WITH INVERTED V..ARABIC LETTER REH WITH INVERTED V +06FA..06FC ; Lo # [3] ARABIC LETTER SHEEN WITH DOT BELOW..ARABIC LETTER GHAIN WITH DOT BELOW +06FF ; Lo # ARABIC LETTER HEH WITH INVERTED V +0710 ; Lo # SYRIAC LETTER ALAPH +0712..072F ; Lo # [30] SYRIAC LETTER BETH..SYRIAC LETTER PERSIAN DHALATH +074D..07A5 ; Lo # [89] SYRIAC LETTER SOGDIAN ZHAIN..THAANA LETTER WAAVU +07B1 ; Lo # THAANA LETTER NAA +07CA..07EA ; Lo # [33] NKO LETTER A..NKO LETTER JONA RA +0800..0815 ; Lo # [22] SAMARITAN LETTER ALAF..SAMARITAN LETTER TAAF +0840..0858 ; Lo # [25] MANDAIC LETTER HALQA..MANDAIC LETTER AIN +0860..086A ; Lo # [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA +08A0..08B4 ; Lo # [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW +08B6..08C7 ; Lo # [18] ARABIC LETTER BEH WITH SMALL MEEM ABOVE..ARABIC LETTER LAM WITH SMALL ARABIC LETTER TAH ABOVE +0904..0939 ; Lo # [54] DEVANAGARI LETTER SHORT A..DEVANAGARI LETTER HA +093D ; Lo # DEVANAGARI SIGN AVAGRAHA +0950 ; Lo # DEVANAGARI OM +0958..0961 ; Lo # [10] DEVANAGARI LETTER QA..DEVANAGARI LETTER VOCALIC LL +0972..0980 ; Lo # [15] DEVANAGARI LETTER CANDRA A..BENGALI ANJI +0985..098C ; Lo # [8] BENGALI LETTER A..BENGALI LETTER VOCALIC L +098F..0990 ; Lo # [2] BENGALI LETTER E..BENGALI LETTER AI +0993..09A8 ; Lo # [22] BENGALI LETTER O..BENGALI LETTER NA +09AA..09B0 ; Lo # [7] BENGALI LETTER PA..BENGALI LETTER RA +09B2 ; Lo # BENGALI LETTER LA +09B6..09B9 ; Lo # [4] BENGALI LETTER SHA..BENGALI LETTER HA +09BD ; Lo # BENGALI SIGN AVAGRAHA +09CE ; Lo # BENGALI LETTER KHANDA TA +09DC..09DD ; Lo # [2] BENGALI LETTER RRA..BENGALI LETTER RHA +09DF..09E1 ; Lo # [3] BENGALI LETTER YYA..BENGALI LETTER VOCALIC LL +09F0..09F1 ; Lo # [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL +09FC ; Lo # BENGALI LETTER VEDIC ANUSVARA +0A05..0A0A ; Lo # [6] GURMUKHI LETTER A..GURMUKHI LETTER UU +0A0F..0A10 ; Lo # [2] GURMUKHI LETTER EE..GURMUKHI LETTER AI +0A13..0A28 ; Lo # [22] GURMUKHI LETTER OO..GURMUKHI LETTER NA +0A2A..0A30 ; Lo # [7] GURMUKHI LETTER PA..GURMUKHI LETTER RA +0A32..0A33 ; Lo # [2] GURMUKHI LETTER LA..GURMUKHI LETTER LLA +0A35..0A36 ; Lo # [2] GURMUKHI LETTER VA..GURMUKHI LETTER SHA +0A38..0A39 ; Lo # [2] GURMUKHI LETTER SA..GURMUKHI LETTER HA +0A59..0A5C ; Lo # [4] GURMUKHI LETTER KHHA..GURMUKHI LETTER RRA +0A5E ; Lo # GURMUKHI LETTER FA +0A72..0A74 ; Lo # [3] GURMUKHI IRI..GURMUKHI EK ONKAR +0A85..0A8D ; Lo # [9] GUJARATI LETTER A..GUJARATI VOWEL CANDRA E +0A8F..0A91 ; Lo # [3] GUJARATI LETTER E..GUJARATI VOWEL CANDRA O +0A93..0AA8 ; Lo # [22] GUJARATI LETTER O..GUJARATI LETTER NA +0AAA..0AB0 ; Lo # [7] GUJARATI LETTER PA..GUJARATI LETTER RA +0AB2..0AB3 ; Lo # [2] GUJARATI LETTER LA..GUJARATI LETTER LLA +0AB5..0AB9 ; Lo # [5] GUJARATI LETTER VA..GUJARATI LETTER HA +0ABD ; Lo # GUJARATI SIGN AVAGRAHA +0AD0 ; Lo # GUJARATI OM +0AE0..0AE1 ; Lo # [2] GUJARATI LETTER VOCALIC RR..GUJARATI LETTER VOCALIC LL +0AF9 ; Lo # GUJARATI LETTER ZHA +0B05..0B0C ; Lo # [8] ORIYA LETTER A..ORIYA LETTER VOCALIC L +0B0F..0B10 ; Lo # [2] ORIYA LETTER E..ORIYA LETTER AI +0B13..0B28 ; Lo # [22] ORIYA LETTER O..ORIYA LETTER NA +0B2A..0B30 ; Lo # [7] ORIYA LETTER PA..ORIYA LETTER RA +0B32..0B33 ; Lo # [2] ORIYA LETTER LA..ORIYA LETTER LLA +0B35..0B39 ; Lo # [5] ORIYA LETTER VA..ORIYA LETTER HA +0B3D ; Lo # ORIYA SIGN AVAGRAHA +0B5C..0B5D ; Lo # [2] ORIYA LETTER RRA..ORIYA LETTER RHA +0B5F..0B61 ; Lo # [3] ORIYA LETTER YYA..ORIYA LETTER VOCALIC LL +0B71 ; Lo # ORIYA LETTER WA +0B83 ; Lo # TAMIL SIGN VISARGA +0B85..0B8A ; Lo # [6] TAMIL LETTER A..TAMIL LETTER UU +0B8E..0B90 ; Lo # [3] TAMIL LETTER E..TAMIL LETTER AI +0B92..0B95 ; Lo # [4] TAMIL LETTER O..TAMIL LETTER KA +0B99..0B9A ; Lo # [2] TAMIL LETTER NGA..TAMIL LETTER CA +0B9C ; Lo # TAMIL LETTER JA +0B9E..0B9F ; Lo # [2] TAMIL LETTER NYA..TAMIL LETTER TTA +0BA3..0BA4 ; Lo # [2] TAMIL LETTER NNA..TAMIL LETTER TA +0BA8..0BAA ; Lo # [3] TAMIL LETTER NA..TAMIL LETTER PA +0BAE..0BB9 ; Lo # [12] TAMIL LETTER MA..TAMIL LETTER HA +0BD0 ; Lo # TAMIL OM +0C05..0C0C ; Lo # [8] TELUGU LETTER A..TELUGU LETTER VOCALIC L +0C0E..0C10 ; Lo # [3] TELUGU LETTER E..TELUGU LETTER AI +0C12..0C28 ; Lo # [23] TELUGU LETTER O..TELUGU LETTER NA +0C2A..0C39 ; Lo # [16] TELUGU LETTER PA..TELUGU LETTER HA +0C3D ; Lo # TELUGU SIGN AVAGRAHA +0C58..0C5A ; Lo # [3] TELUGU LETTER TSA..TELUGU LETTER RRRA +0C60..0C61 ; Lo # [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL +0C80 ; Lo # KANNADA SIGN SPACING CANDRABINDU +0C85..0C8C ; Lo # [8] KANNADA LETTER A..KANNADA LETTER VOCALIC L +0C8E..0C90 ; Lo # [3] KANNADA LETTER E..KANNADA LETTER AI +0C92..0CA8 ; Lo # [23] KANNADA LETTER O..KANNADA LETTER NA +0CAA..0CB3 ; Lo # [10] KANNADA LETTER PA..KANNADA LETTER LLA +0CB5..0CB9 ; Lo # [5] KANNADA LETTER VA..KANNADA LETTER HA +0CBD ; Lo # KANNADA SIGN AVAGRAHA +0CDE ; Lo # KANNADA LETTER FA +0CE0..0CE1 ; Lo # [2] KANNADA LETTER VOCALIC RR..KANNADA LETTER VOCALIC LL +0CF1..0CF2 ; Lo # [2] KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADHMANIYA +0D04..0D0C ; Lo # [9] MALAYALAM LETTER VEDIC ANUSVARA..MALAYALAM LETTER VOCALIC L +0D0E..0D10 ; Lo # [3] MALAYALAM LETTER E..MALAYALAM LETTER AI +0D12..0D3A ; Lo # [41] MALAYALAM LETTER O..MALAYALAM LETTER TTTA +0D3D ; Lo # MALAYALAM SIGN AVAGRAHA +0D4E ; Lo # MALAYALAM LETTER DOT REPH +0D54..0D56 ; Lo # [3] MALAYALAM LETTER CHILLU M..MALAYALAM LETTER CHILLU LLL +0D5F..0D61 ; Lo # [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL +0D7A..0D7F ; Lo # [6] MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER CHILLU K +0D85..0D96 ; Lo # [18] SINHALA LETTER AYANNA..SINHALA LETTER AUYANNA +0D9A..0DB1 ; Lo # [24] SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA LETTER DANTAJA NAYANNA +0DB3..0DBB ; Lo # [9] SINHALA LETTER SANYAKA DAYANNA..SINHALA LETTER RAYANNA +0DBD ; Lo # SINHALA LETTER DANTAJA LAYANNA +0DC0..0DC6 ; Lo # [7] SINHALA LETTER VAYANNA..SINHALA LETTER FAYANNA +0E01..0E30 ; Lo # [48] THAI CHARACTER KO KAI..THAI CHARACTER SARA A +0E32..0E33 ; Lo # [2] THAI CHARACTER SARA AA..THAI CHARACTER SARA AM +0E40..0E45 ; Lo # [6] THAI CHARACTER SARA E..THAI CHARACTER LAKKHANGYAO +0E81..0E82 ; Lo # [2] LAO LETTER KO..LAO LETTER KHO SUNG +0E84 ; Lo # LAO LETTER KHO TAM +0E86..0E8A ; Lo # [5] LAO LETTER PALI GHA..LAO LETTER SO TAM +0E8C..0EA3 ; Lo # [24] LAO LETTER PALI JHA..LAO LETTER LO LING +0EA5 ; Lo # LAO LETTER LO LOOT +0EA7..0EB0 ; Lo # [10] LAO LETTER WO..LAO VOWEL SIGN A +0EB2..0EB3 ; Lo # [2] LAO VOWEL SIGN AA..LAO VOWEL SIGN AM +0EBD ; Lo # LAO SEMIVOWEL SIGN NYO +0EC0..0EC4 ; Lo # [5] LAO VOWEL SIGN E..LAO VOWEL SIGN AI +0EDC..0EDF ; Lo # [4] LAO HO NO..LAO LETTER KHMU NYO +0F00 ; Lo # TIBETAN SYLLABLE OM +0F40..0F47 ; Lo # [8] TIBETAN LETTER KA..TIBETAN LETTER JA +0F49..0F6C ; Lo # [36] TIBETAN LETTER NYA..TIBETAN LETTER RRA +0F88..0F8C ; Lo # [5] TIBETAN SIGN LCE TSA CAN..TIBETAN SIGN INVERTED MCHU CAN +1000..102A ; Lo # [43] MYANMAR LETTER KA..MYANMAR LETTER AU +103F ; Lo # MYANMAR LETTER GREAT SA +1050..1055 ; Lo # [6] MYANMAR LETTER SHA..MYANMAR LETTER VOCALIC LL +105A..105D ; Lo # [4] MYANMAR LETTER MON NGA..MYANMAR LETTER MON BBE +1061 ; Lo # MYANMAR LETTER SGAW KAREN SHA +1065..1066 ; Lo # [2] MYANMAR LETTER WESTERN PWO KAREN THA..MYANMAR LETTER WESTERN PWO KAREN PWA +106E..1070 ; Lo # [3] MYANMAR LETTER EASTERN PWO KAREN NNA..MYANMAR LETTER EASTERN PWO KAREN GHWA +1075..1081 ; Lo # [13] MYANMAR LETTER SHAN KA..MYANMAR LETTER SHAN HA +108E ; Lo # MYANMAR LETTER RUMAI PALAUNG FA +1100..1248 ; Lo # [329] HANGUL CHOSEONG KIYEOK..ETHIOPIC SYLLABLE QWA +124A..124D ; Lo # [4] ETHIOPIC SYLLABLE QWI..ETHIOPIC SYLLABLE QWE +1250..1256 ; Lo # [7] ETHIOPIC SYLLABLE QHA..ETHIOPIC SYLLABLE QHO +1258 ; Lo # ETHIOPIC SYLLABLE QHWA +125A..125D ; Lo # [4] ETHIOPIC SYLLABLE QHWI..ETHIOPIC SYLLABLE QHWE +1260..1288 ; Lo # [41] ETHIOPIC SYLLABLE BA..ETHIOPIC SYLLABLE XWA +128A..128D ; Lo # [4] ETHIOPIC SYLLABLE XWI..ETHIOPIC SYLLABLE XWE +1290..12B0 ; Lo # [33] ETHIOPIC SYLLABLE NA..ETHIOPIC SYLLABLE KWA +12B2..12B5 ; Lo # [4] ETHIOPIC SYLLABLE KWI..ETHIOPIC SYLLABLE KWE +12B8..12BE ; Lo # [7] ETHIOPIC SYLLABLE KXA..ETHIOPIC SYLLABLE KXO +12C0 ; Lo # ETHIOPIC SYLLABLE KXWA +12C2..12C5 ; Lo # [4] ETHIOPIC SYLLABLE KXWI..ETHIOPIC SYLLABLE KXWE +12C8..12D6 ; Lo # [15] ETHIOPIC SYLLABLE WA..ETHIOPIC SYLLABLE PHARYNGEAL O +12D8..1310 ; Lo # [57] ETHIOPIC SYLLABLE ZA..ETHIOPIC SYLLABLE GWA +1312..1315 ; Lo # [4] ETHIOPIC SYLLABLE GWI..ETHIOPIC SYLLABLE GWE +1318..135A ; Lo # [67] ETHIOPIC SYLLABLE GGA..ETHIOPIC SYLLABLE FYA +1380..138F ; Lo # [16] ETHIOPIC SYLLABLE SEBATBEIT MWA..ETHIOPIC SYLLABLE PWE +1401..166C ; Lo # [620] CANADIAN SYLLABICS E..CANADIAN SYLLABICS CARRIER TTSA +166F..167F ; Lo # [17] CANADIAN SYLLABICS QAI..CANADIAN SYLLABICS BLACKFOOT W +1681..169A ; Lo # [26] OGHAM LETTER BEITH..OGHAM LETTER PEITH +16A0..16EA ; Lo # [75] RUNIC LETTER FEHU FEOH FE F..RUNIC LETTER X +16F1..16F8 ; Lo # [8] RUNIC LETTER K..RUNIC LETTER FRANKS CASKET AESC +1700..170C ; Lo # [13] TAGALOG LETTER A..TAGALOG LETTER YA +170E..1711 ; Lo # [4] TAGALOG LETTER LA..TAGALOG LETTER HA +1720..1731 ; Lo # [18] HANUNOO LETTER A..HANUNOO LETTER HA +1740..1751 ; Lo # [18] BUHID LETTER A..BUHID LETTER HA +1760..176C ; Lo # [13] TAGBANWA LETTER A..TAGBANWA LETTER YA +176E..1770 ; Lo # [3] TAGBANWA LETTER LA..TAGBANWA LETTER SA +1780..17B3 ; Lo # [52] KHMER LETTER KA..KHMER INDEPENDENT VOWEL QAU +17DC ; Lo # KHMER SIGN AVAKRAHASANYA +1820..1842 ; Lo # [35] MONGOLIAN LETTER A..MONGOLIAN LETTER CHI +1844..1878 ; Lo # [53] MONGOLIAN LETTER TODO E..MONGOLIAN LETTER CHA WITH TWO DOTS +1880..1884 ; Lo # [5] MONGOLIAN LETTER ALI GALI ANUSVARA ONE..MONGOLIAN LETTER ALI GALI INVERTED UBADAMA +1887..18A8 ; Lo # [34] MONGOLIAN LETTER ALI GALI A..MONGOLIAN LETTER MANCHU ALI GALI BHA +18AA ; Lo # MONGOLIAN LETTER MANCHU ALI GALI LHA +18B0..18F5 ; Lo # [70] CANADIAN SYLLABICS OY..CANADIAN SYLLABICS CARRIER DENTAL S +1900..191E ; Lo # [31] LIMBU VOWEL-CARRIER LETTER..LIMBU LETTER TRA +1950..196D ; Lo # [30] TAI LE LETTER KA..TAI LE LETTER AI +1970..1974 ; Lo # [5] TAI LE LETTER TONE-2..TAI LE LETTER TONE-6 +1980..19AB ; Lo # [44] NEW TAI LUE LETTER HIGH QA..NEW TAI LUE LETTER LOW SUA +19B0..19C9 ; Lo # [26] NEW TAI LUE VOWEL SIGN VOWEL SHORTENER..NEW TAI LUE TONE MARK-2 +1A00..1A16 ; Lo # [23] BUGINESE LETTER KA..BUGINESE LETTER HA +1A20..1A54 ; Lo # [53] TAI THAM LETTER HIGH KA..TAI THAM LETTER GREAT SA +1B05..1B33 ; Lo # [47] BALINESE LETTER AKARA..BALINESE LETTER HA +1B45..1B4B ; Lo # [7] BALINESE LETTER KAF SASAK..BALINESE LETTER ASYURA SASAK +1B83..1BA0 ; Lo # [30] SUNDANESE LETTER A..SUNDANESE LETTER HA +1BAE..1BAF ; Lo # [2] SUNDANESE LETTER KHA..SUNDANESE LETTER SYA +1BBA..1BE5 ; Lo # [44] SUNDANESE AVAGRAHA..BATAK LETTER U +1C00..1C23 ; Lo # [36] LEPCHA LETTER KA..LEPCHA LETTER A +1C4D..1C4F ; Lo # [3] LEPCHA LETTER TTA..LEPCHA LETTER DDA +1C5A..1C77 ; Lo # [30] OL CHIKI LETTER LA..OL CHIKI LETTER OH +1CE9..1CEC ; Lo # [4] VEDIC SIGN ANUSVARA ANTARGOMUKHA..VEDIC SIGN ANUSVARA VAMAGOMUKHA WITH TAIL +1CEE..1CF3 ; Lo # [6] VEDIC SIGN HEXIFORM LONG ANUSVARA..VEDIC SIGN ROTATED ARDHAVISARGA +1CF5..1CF6 ; Lo # [2] VEDIC SIGN JIHVAMULIYA..VEDIC SIGN UPADHMANIYA +1CFA ; Lo # VEDIC SIGN DOUBLE ANUSVARA ANTARGOMUKHA +2135..2138 ; Lo # [4] ALEF SYMBOL..DALET SYMBOL +2D30..2D67 ; Lo # [56] TIFINAGH LETTER YA..TIFINAGH LETTER YO +2D80..2D96 ; Lo # [23] ETHIOPIC SYLLABLE LOA..ETHIOPIC SYLLABLE GGWE +2DA0..2DA6 ; Lo # [7] ETHIOPIC SYLLABLE SSA..ETHIOPIC SYLLABLE SSO +2DA8..2DAE ; Lo # [7] ETHIOPIC SYLLABLE CCA..ETHIOPIC SYLLABLE CCO +2DB0..2DB6 ; Lo # [7] ETHIOPIC SYLLABLE ZZA..ETHIOPIC SYLLABLE ZZO +2DB8..2DBE ; Lo # [7] ETHIOPIC SYLLABLE CCHA..ETHIOPIC SYLLABLE CCHO +2DC0..2DC6 ; Lo # [7] ETHIOPIC SYLLABLE QYA..ETHIOPIC SYLLABLE QYO +2DC8..2DCE ; Lo # [7] ETHIOPIC SYLLABLE KYA..ETHIOPIC SYLLABLE KYO +2DD0..2DD6 ; Lo # [7] ETHIOPIC SYLLABLE XYA..ETHIOPIC SYLLABLE XYO +2DD8..2DDE ; Lo # [7] ETHIOPIC SYLLABLE GYA..ETHIOPIC SYLLABLE GYO +3006 ; Lo # IDEOGRAPHIC CLOSING MARK +303C ; Lo # MASU MARK +3041..3096 ; Lo # [86] HIRAGANA LETTER SMALL A..HIRAGANA LETTER SMALL KE +309F ; Lo # HIRAGANA DIGRAPH YORI +30A1..30FA ; Lo # [90] KATAKANA LETTER SMALL A..KATAKANA LETTER VO +30FF ; Lo # KATAKANA DIGRAPH KOTO +3105..312F ; Lo # [43] BOPOMOFO LETTER B..BOPOMOFO LETTER NN +3131..318E ; Lo # [94] HANGUL LETTER KIYEOK..HANGUL LETTER ARAEAE +31A0..31BF ; Lo # [32] BOPOMOFO LETTER BU..BOPOMOFO LETTER AH +31F0..31FF ; Lo # [16] KATAKANA LETTER SMALL KU..KATAKANA LETTER SMALL RO +3400..4DBF ; Lo # [6592] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DBF +4E00..9FFC ; Lo # [20989] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FFC +A000..A014 ; Lo # [21] YI SYLLABLE IT..YI SYLLABLE E +A016..A48C ; Lo # [1143] YI SYLLABLE BIT..YI SYLLABLE YYR +A4D0..A4F7 ; Lo # [40] LISU LETTER BA..LISU LETTER OE +A500..A60B ; Lo # [268] VAI SYLLABLE EE..VAI SYLLABLE NG +A610..A61F ; Lo # [16] VAI SYLLABLE NDOLE FA..VAI SYMBOL JONG +A62A..A62B ; Lo # [2] VAI SYLLABLE NDOLE MA..VAI SYLLABLE NDOLE DO +A66E ; Lo # CYRILLIC LETTER MULTIOCULAR O +A6A0..A6E5 ; Lo # [70] BAMUM LETTER A..BAMUM LETTER KI +A78F ; Lo # LATIN LETTER SINOLOGICAL DOT +A7F7 ; Lo # LATIN EPIGRAPHIC LETTER SIDEWAYS I +A7FB..A801 ; Lo # [7] LATIN EPIGRAPHIC LETTER REVERSED F..SYLOTI NAGRI LETTER I +A803..A805 ; Lo # [3] SYLOTI NAGRI LETTER U..SYLOTI NAGRI LETTER O +A807..A80A ; Lo # [4] SYLOTI NAGRI LETTER KO..SYLOTI NAGRI LETTER GHO +A80C..A822 ; Lo # [23] SYLOTI NAGRI LETTER CO..SYLOTI NAGRI LETTER HO +A840..A873 ; Lo # [52] PHAGS-PA LETTER KA..PHAGS-PA LETTER CANDRABINDU +A882..A8B3 ; Lo # [50] SAURASHTRA LETTER A..SAURASHTRA LETTER LLA +A8F2..A8F7 ; Lo # [6] DEVANAGARI SIGN SPACING CANDRABINDU..DEVANAGARI SIGN CANDRABINDU AVAGRAHA +A8FB ; Lo # DEVANAGARI HEADSTROKE +A8FD..A8FE ; Lo # [2] DEVANAGARI JAIN OM..DEVANAGARI LETTER AY +A90A..A925 ; Lo # [28] KAYAH LI LETTER KA..KAYAH LI LETTER OO +A930..A946 ; Lo # [23] REJANG LETTER KA..REJANG LETTER A +A960..A97C ; Lo # [29] HANGUL CHOSEONG TIKEUT-MIEUM..HANGUL CHOSEONG SSANGYEORINHIEUH +A984..A9B2 ; Lo # [47] JAVANESE LETTER A..JAVANESE LETTER HA +A9E0..A9E4 ; Lo # [5] MYANMAR LETTER SHAN GHA..MYANMAR LETTER SHAN BHA +A9E7..A9EF ; Lo # [9] MYANMAR LETTER TAI LAING NYA..MYANMAR LETTER TAI LAING NNA +A9FA..A9FE ; Lo # [5] MYANMAR LETTER TAI LAING LLA..MYANMAR LETTER TAI LAING BHA +AA00..AA28 ; Lo # [41] CHAM LETTER A..CHAM LETTER HA +AA40..AA42 ; Lo # [3] CHAM LETTER FINAL K..CHAM LETTER FINAL NG +AA44..AA4B ; Lo # [8] CHAM LETTER FINAL CH..CHAM LETTER FINAL SS +AA60..AA6F ; Lo # [16] MYANMAR LETTER KHAMTI GA..MYANMAR LETTER KHAMTI FA +AA71..AA76 ; Lo # [6] MYANMAR LETTER KHAMTI XA..MYANMAR LOGOGRAM KHAMTI HM +AA7A ; Lo # MYANMAR LETTER AITON RA +AA7E..AAAF ; Lo # [50] MYANMAR LETTER SHWE PALAUNG CHA..TAI VIET LETTER HIGH O +AAB1 ; Lo # TAI VIET VOWEL AA +AAB5..AAB6 ; Lo # [2] TAI VIET VOWEL E..TAI VIET VOWEL O +AAB9..AABD ; Lo # [5] TAI VIET VOWEL UEA..TAI VIET VOWEL AN +AAC0 ; Lo # TAI VIET TONE MAI NUENG +AAC2 ; Lo # TAI VIET TONE MAI SONG +AADB..AADC ; Lo # [2] TAI VIET SYMBOL KON..TAI VIET SYMBOL NUENG +AAE0..AAEA ; Lo # [11] MEETEI MAYEK LETTER E..MEETEI MAYEK LETTER SSA +AAF2 ; Lo # MEETEI MAYEK ANJI +AB01..AB06 ; Lo # [6] ETHIOPIC SYLLABLE TTHU..ETHIOPIC SYLLABLE TTHO +AB09..AB0E ; Lo # [6] ETHIOPIC SYLLABLE DDHU..ETHIOPIC SYLLABLE DDHO +AB11..AB16 ; Lo # [6] ETHIOPIC SYLLABLE DZU..ETHIOPIC SYLLABLE DZO +AB20..AB26 ; Lo # [7] ETHIOPIC SYLLABLE CCHHA..ETHIOPIC SYLLABLE CCHHO +AB28..AB2E ; Lo # [7] ETHIOPIC SYLLABLE BBA..ETHIOPIC SYLLABLE BBO +ABC0..ABE2 ; Lo # [35] MEETEI MAYEK LETTER KOK..MEETEI MAYEK LETTER I LONSUM +AC00..D7A3 ; Lo # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH +D7B0..D7C6 ; Lo # [23] HANGUL JUNGSEONG O-YEO..HANGUL JUNGSEONG ARAEA-E +D7CB..D7FB ; Lo # [49] HANGUL JONGSEONG NIEUN-RIEUL..HANGUL JONGSEONG PHIEUPH-THIEUTH +F900..FA6D ; Lo # [366] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA6D +FA70..FAD9 ; Lo # [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9 +FB1D ; Lo # HEBREW LETTER YOD WITH HIRIQ +FB1F..FB28 ; Lo # [10] HEBREW LIGATURE YIDDISH YOD YOD PATAH..HEBREW LETTER WIDE TAV +FB2A..FB36 ; Lo # [13] HEBREW LETTER SHIN WITH SHIN DOT..HEBREW LETTER ZAYIN WITH DAGESH +FB38..FB3C ; Lo # [5] HEBREW LETTER TET WITH DAGESH..HEBREW LETTER LAMED WITH DAGESH +FB3E ; Lo # HEBREW LETTER MEM WITH DAGESH +FB40..FB41 ; Lo # [2] HEBREW LETTER NUN WITH DAGESH..HEBREW LETTER SAMEKH WITH DAGESH +FB43..FB44 ; Lo # [2] HEBREW LETTER FINAL PE WITH DAGESH..HEBREW LETTER PE WITH DAGESH +FB46..FBB1 ; Lo # [108] HEBREW LETTER TSADI WITH DAGESH..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE FINAL FORM +FBD3..FD3D ; Lo # [363] ARABIC LETTER NG ISOLATED FORM..ARABIC LIGATURE ALEF WITH FATHATAN ISOLATED FORM +FD50..FD8F ; Lo # [64] ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM..ARABIC LIGATURE MEEM WITH KHAH WITH MEEM INITIAL FORM +FD92..FDC7 ; Lo # [54] ARABIC LIGATURE MEEM WITH JEEM WITH KHAH INITIAL FORM..ARABIC LIGATURE NOON WITH JEEM WITH YEH FINAL FORM +FDF0..FDFB ; Lo # [12] ARABIC LIGATURE SALLA USED AS KORANIC STOP SIGN ISOLATED FORM..ARABIC LIGATURE JALLAJALALOUHOU +FE70..FE74 ; Lo # [5] ARABIC FATHATAN ISOLATED FORM..ARABIC KASRATAN ISOLATED FORM +FE76..FEFC ; Lo # [135] ARABIC FATHA ISOLATED FORM..ARABIC LIGATURE LAM WITH ALEF FINAL FORM +FF66..FF6F ; Lo # [10] HALFWIDTH KATAKANA LETTER WO..HALFWIDTH KATAKANA LETTER SMALL TU +FF71..FF9D ; Lo # [45] HALFWIDTH KATAKANA LETTER A..HALFWIDTH KATAKANA LETTER N +FFA0..FFBE ; Lo # [31] HALFWIDTH HANGUL FILLER..HALFWIDTH HANGUL LETTER HIEUH +FFC2..FFC7 ; Lo # [6] HALFWIDTH HANGUL LETTER A..HALFWIDTH HANGUL LETTER E +FFCA..FFCF ; Lo # [6] HALFWIDTH HANGUL LETTER YEO..HALFWIDTH HANGUL LETTER OE +FFD2..FFD7 ; Lo # [6] HALFWIDTH HANGUL LETTER YO..HALFWIDTH HANGUL LETTER YU +FFDA..FFDC ; Lo # [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL LETTER I +10000..1000B ; Lo # [12] LINEAR B SYLLABLE B008 A..LINEAR B SYLLABLE B046 JE +1000D..10026 ; Lo # [26] LINEAR B SYLLABLE B036 JO..LINEAR B SYLLABLE B032 QO +10028..1003A ; Lo # [19] LINEAR B SYLLABLE B060 RA..LINEAR B SYLLABLE B042 WO +1003C..1003D ; Lo # [2] LINEAR B SYLLABLE B017 ZA..LINEAR B SYLLABLE B074 ZE +1003F..1004D ; Lo # [15] LINEAR B SYLLABLE B020 ZO..LINEAR B SYLLABLE B091 TWO +10050..1005D ; Lo # [14] LINEAR B SYMBOL B018..LINEAR B SYMBOL B089 +10080..100FA ; Lo # [123] LINEAR B IDEOGRAM B100 MAN..LINEAR B IDEOGRAM VESSEL B305 +10280..1029C ; Lo # [29] LYCIAN LETTER A..LYCIAN LETTER X +102A0..102D0 ; Lo # [49] CARIAN LETTER A..CARIAN LETTER UUU3 +10300..1031F ; Lo # [32] OLD ITALIC LETTER A..OLD ITALIC LETTER ESS +1032D..10340 ; Lo # [20] OLD ITALIC LETTER YE..GOTHIC LETTER PAIRTHRA +10342..10349 ; Lo # [8] GOTHIC LETTER RAIDA..GOTHIC LETTER OTHAL +10350..10375 ; Lo # [38] OLD PERMIC LETTER AN..OLD PERMIC LETTER IA +10380..1039D ; Lo # [30] UGARITIC LETTER ALPA..UGARITIC LETTER SSU +103A0..103C3 ; Lo # [36] OLD PERSIAN SIGN A..OLD PERSIAN SIGN HA +103C8..103CF ; Lo # [8] OLD PERSIAN SIGN AURAMAZDAA..OLD PERSIAN SIGN BUUMISH +10450..1049D ; Lo # [78] SHAVIAN LETTER PEEP..OSMANYA LETTER OO +10500..10527 ; Lo # [40] ELBASAN LETTER A..ELBASAN LETTER KHE +10530..10563 ; Lo # [52] CAUCASIAN ALBANIAN LETTER ALT..CAUCASIAN ALBANIAN LETTER KIW +10600..10736 ; Lo # [311] LINEAR A SIGN AB001..LINEAR A SIGN A664 +10740..10755 ; Lo # [22] LINEAR A SIGN A701 A..LINEAR A SIGN A732 JE +10760..10767 ; Lo # [8] LINEAR A SIGN A800..LINEAR A SIGN A807 +10800..10805 ; Lo # [6] CYPRIOT SYLLABLE A..CYPRIOT SYLLABLE JA +10808 ; Lo # CYPRIOT SYLLABLE JO +1080A..10835 ; Lo # [44] CYPRIOT SYLLABLE KA..CYPRIOT SYLLABLE WO +10837..10838 ; Lo # [2] CYPRIOT SYLLABLE XA..CYPRIOT SYLLABLE XE +1083C ; Lo # CYPRIOT SYLLABLE ZA +1083F..10855 ; Lo # [23] CYPRIOT SYLLABLE ZO..IMPERIAL ARAMAIC LETTER TAW +10860..10876 ; Lo # [23] PALMYRENE LETTER ALEPH..PALMYRENE LETTER TAW +10880..1089E ; Lo # [31] NABATAEAN LETTER FINAL ALEPH..NABATAEAN LETTER TAW +108E0..108F2 ; Lo # [19] HATRAN LETTER ALEPH..HATRAN LETTER QOPH +108F4..108F5 ; Lo # [2] HATRAN LETTER SHIN..HATRAN LETTER TAW +10900..10915 ; Lo # [22] PHOENICIAN LETTER ALF..PHOENICIAN LETTER TAU +10920..10939 ; Lo # [26] LYDIAN LETTER A..LYDIAN LETTER C +10980..109B7 ; Lo # [56] MEROITIC HIEROGLYPHIC LETTER A..MEROITIC CURSIVE LETTER DA +109BE..109BF ; Lo # [2] MEROITIC CURSIVE LOGOGRAM RMT..MEROITIC CURSIVE LOGOGRAM IMN +10A00 ; Lo # KHAROSHTHI LETTER A +10A10..10A13 ; Lo # [4] KHAROSHTHI LETTER KA..KHAROSHTHI LETTER GHA +10A15..10A17 ; Lo # [3] KHAROSHTHI LETTER CA..KHAROSHTHI LETTER JA +10A19..10A35 ; Lo # [29] KHAROSHTHI LETTER NYA..KHAROSHTHI LETTER VHA +10A60..10A7C ; Lo # [29] OLD SOUTH ARABIAN LETTER HE..OLD SOUTH ARABIAN LETTER THETH +10A80..10A9C ; Lo # [29] OLD NORTH ARABIAN LETTER HEH..OLD NORTH ARABIAN LETTER ZAH +10AC0..10AC7 ; Lo # [8] MANICHAEAN LETTER ALEPH..MANICHAEAN LETTER WAW +10AC9..10AE4 ; Lo # [28] MANICHAEAN LETTER ZAYIN..MANICHAEAN LETTER TAW +10B00..10B35 ; Lo # [54] AVESTAN LETTER A..AVESTAN LETTER HE +10B40..10B55 ; Lo # [22] INSCRIPTIONAL PARTHIAN LETTER ALEPH..INSCRIPTIONAL PARTHIAN LETTER TAW +10B60..10B72 ; Lo # [19] INSCRIPTIONAL PAHLAVI LETTER ALEPH..INSCRIPTIONAL PAHLAVI LETTER TAW +10B80..10B91 ; Lo # [18] PSALTER PAHLAVI LETTER ALEPH..PSALTER PAHLAVI LETTER TAW +10C00..10C48 ; Lo # [73] OLD TURKIC LETTER ORKHON A..OLD TURKIC LETTER ORKHON BASH +10D00..10D23 ; Lo # [36] HANIFI ROHINGYA LETTER A..HANIFI ROHINGYA MARK NA KHONNA +10E80..10EA9 ; Lo # [42] YEZIDI LETTER ELIF..YEZIDI LETTER ET +10EB0..10EB1 ; Lo # [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE +10F00..10F1C ; Lo # [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL +10F27 ; Lo # OLD SOGDIAN LIGATURE AYIN-DALETH +10F30..10F45 ; Lo # [22] SOGDIAN LETTER ALEPH..SOGDIAN INDEPENDENT SHIN +10FB0..10FC4 ; Lo # [21] CHORASMIAN LETTER ALEPH..CHORASMIAN LETTER TAW +10FE0..10FF6 ; Lo # [23] ELYMAIC LETTER ALEPH..ELYMAIC LIGATURE ZAYIN-YODH +11003..11037 ; Lo # [53] BRAHMI SIGN JIHVAMULIYA..BRAHMI LETTER OLD TAMIL NNNA +11083..110AF ; Lo # [45] KAITHI LETTER A..KAITHI LETTER HA +110D0..110E8 ; Lo # [25] SORA SOMPENG LETTER SAH..SORA SOMPENG LETTER MAE +11103..11126 ; Lo # [36] CHAKMA LETTER AA..CHAKMA LETTER HAA +11144 ; Lo # CHAKMA LETTER LHAA +11147 ; Lo # CHAKMA LETTER VAA +11150..11172 ; Lo # [35] MAHAJANI LETTER A..MAHAJANI LETTER RRA +11176 ; Lo # MAHAJANI LIGATURE SHRI +11183..111B2 ; Lo # [48] SHARADA LETTER A..SHARADA LETTER HA +111C1..111C4 ; Lo # [4] SHARADA SIGN AVAGRAHA..SHARADA OM +111DA ; Lo # SHARADA EKAM +111DC ; Lo # SHARADA HEADSTROKE +11200..11211 ; Lo # [18] KHOJKI LETTER A..KHOJKI LETTER JJA +11213..1122B ; Lo # [25] KHOJKI LETTER NYA..KHOJKI LETTER LLA +11280..11286 ; Lo # [7] MULTANI LETTER A..MULTANI LETTER GA +11288 ; Lo # MULTANI LETTER GHA +1128A..1128D ; Lo # [4] MULTANI LETTER CA..MULTANI LETTER JJA +1128F..1129D ; Lo # [15] MULTANI LETTER NYA..MULTANI LETTER BA +1129F..112A8 ; Lo # [10] MULTANI LETTER BHA..MULTANI LETTER RHA +112B0..112DE ; Lo # [47] KHUDAWADI LETTER A..KHUDAWADI LETTER HA +11305..1130C ; Lo # [8] GRANTHA LETTER A..GRANTHA LETTER VOCALIC L +1130F..11310 ; Lo # [2] GRANTHA LETTER EE..GRANTHA LETTER AI +11313..11328 ; Lo # [22] GRANTHA LETTER OO..GRANTHA LETTER NA +1132A..11330 ; Lo # [7] GRANTHA LETTER PA..GRANTHA LETTER RA +11332..11333 ; Lo # [2] GRANTHA LETTER LA..GRANTHA LETTER LLA +11335..11339 ; Lo # [5] GRANTHA LETTER VA..GRANTHA LETTER HA +1133D ; Lo # GRANTHA SIGN AVAGRAHA +11350 ; Lo # GRANTHA OM +1135D..11361 ; Lo # [5] GRANTHA SIGN PLUTA..GRANTHA LETTER VOCALIC LL +11400..11434 ; Lo # [53] NEWA LETTER A..NEWA LETTER HA +11447..1144A ; Lo # [4] NEWA SIGN AVAGRAHA..NEWA SIDDHI +1145F..11461 ; Lo # [3] NEWA LETTER VEDIC ANUSVARA..NEWA SIGN UPADHMANIYA +11480..114AF ; Lo # [48] TIRHUTA ANJI..TIRHUTA LETTER HA +114C4..114C5 ; Lo # [2] TIRHUTA SIGN AVAGRAHA..TIRHUTA GVANG +114C7 ; Lo # TIRHUTA OM +11580..115AE ; Lo # [47] SIDDHAM LETTER A..SIDDHAM LETTER HA +115D8..115DB ; Lo # [4] SIDDHAM LETTER THREE-CIRCLE ALTERNATE I..SIDDHAM LETTER ALTERNATE U +11600..1162F ; Lo # [48] MODI LETTER A..MODI LETTER LLA +11644 ; Lo # MODI SIGN HUVA +11680..116AA ; Lo # [43] TAKRI LETTER A..TAKRI LETTER RRA +116B8 ; Lo # TAKRI LETTER ARCHAIC KHA +11700..1171A ; Lo # [27] AHOM LETTER KA..AHOM LETTER ALTERNATE BA +11800..1182B ; Lo # [44] DOGRA LETTER A..DOGRA LETTER RRA +118FF..11906 ; Lo # [8] WARANG CITI OM..DIVES AKURU LETTER E +11909 ; Lo # DIVES AKURU LETTER O +1190C..11913 ; Lo # [8] DIVES AKURU LETTER KA..DIVES AKURU LETTER JA +11915..11916 ; Lo # [2] DIVES AKURU LETTER NYA..DIVES AKURU LETTER TTA +11918..1192F ; Lo # [24] DIVES AKURU LETTER DDA..DIVES AKURU LETTER ZA +1193F ; Lo # DIVES AKURU PREFIXED NASAL SIGN +11941 ; Lo # DIVES AKURU INITIAL RA +119A0..119A7 ; Lo # [8] NANDINAGARI LETTER A..NANDINAGARI LETTER VOCALIC RR +119AA..119D0 ; Lo # [39] NANDINAGARI LETTER E..NANDINAGARI LETTER RRA +119E1 ; Lo # NANDINAGARI SIGN AVAGRAHA +119E3 ; Lo # NANDINAGARI HEADSTROKE +11A00 ; Lo # ZANABAZAR SQUARE LETTER A +11A0B..11A32 ; Lo # [40] ZANABAZAR SQUARE LETTER KA..ZANABAZAR SQUARE LETTER KSSA +11A3A ; Lo # ZANABAZAR SQUARE CLUSTER-INITIAL LETTER RA +11A50 ; Lo # SOYOMBO LETTER A +11A5C..11A89 ; Lo # [46] SOYOMBO LETTER KA..SOYOMBO CLUSTER-INITIAL LETTER SA +11A9D ; Lo # SOYOMBO MARK PLUTA +11AC0..11AF8 ; Lo # [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL +11C00..11C08 ; Lo # [9] BHAIKSUKI LETTER A..BHAIKSUKI LETTER VOCALIC L +11C0A..11C2E ; Lo # [37] BHAIKSUKI LETTER E..BHAIKSUKI LETTER HA +11C40 ; Lo # BHAIKSUKI SIGN AVAGRAHA +11C72..11C8F ; Lo # [30] MARCHEN LETTER KA..MARCHEN LETTER A +11D00..11D06 ; Lo # [7] MASARAM GONDI LETTER A..MASARAM GONDI LETTER E +11D08..11D09 ; Lo # [2] MASARAM GONDI LETTER AI..MASARAM GONDI LETTER O +11D0B..11D30 ; Lo # [38] MASARAM GONDI LETTER AU..MASARAM GONDI LETTER TRA +11D46 ; Lo # MASARAM GONDI REPHA +11D60..11D65 ; Lo # [6] GUNJALA GONDI LETTER A..GUNJALA GONDI LETTER UU +11D67..11D68 ; Lo # [2] GUNJALA GONDI LETTER EE..GUNJALA GONDI LETTER AI +11D6A..11D89 ; Lo # [32] GUNJALA GONDI LETTER OO..GUNJALA GONDI LETTER SA +11D98 ; Lo # GUNJALA GONDI OM +11EE0..11EF2 ; Lo # [19] MAKASAR LETTER KA..MAKASAR ANGKA +11FB0 ; Lo # LISU LETTER YHA +12000..12399 ; Lo # [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U +12480..12543 ; Lo # [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU +13000..1342E ; Lo # [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032 +14400..14646 ; Lo # [583] ANATOLIAN HIEROGLYPH A001..ANATOLIAN HIEROGLYPH A530 +16800..16A38 ; Lo # [569] BAMUM LETTER PHASE-A NGKUE MFON..BAMUM LETTER PHASE-F VUEQ +16A40..16A5E ; Lo # [31] MRO LETTER TA..MRO LETTER TEK +16AD0..16AED ; Lo # [30] BASSA VAH LETTER ENNI..BASSA VAH LETTER I +16B00..16B2F ; Lo # [48] PAHAWH HMONG VOWEL KEEB..PAHAWH HMONG CONSONANT CAU +16B63..16B77 ; Lo # [21] PAHAWH HMONG SIGN VOS LUB..PAHAWH HMONG SIGN CIM NRES TOS +16B7D..16B8F ; Lo # [19] PAHAWH HMONG CLAN SIGN TSHEEJ..PAHAWH HMONG CLAN SIGN VWJ +16F00..16F4A ; Lo # [75] MIAO LETTER PA..MIAO LETTER RTE +16F50 ; Lo # MIAO LETTER NASALIZATION +17000..187F7 ; Lo # [6136] TANGUT IDEOGRAPH-17000..TANGUT IDEOGRAPH-187F7 +18800..18CD5 ; Lo # [1238] TANGUT COMPONENT-001..KHITAN SMALL SCRIPT CHARACTER-18CD5 +18D00..18D08 ; Lo # [9] TANGUT IDEOGRAPH-18D00..TANGUT IDEOGRAPH-18D08 +1B000..1B11E ; Lo # [287] KATAKANA LETTER ARCHAIC E..HENTAIGANA LETTER N-MU-MO-2 +1B150..1B152 ; Lo # [3] HIRAGANA LETTER SMALL WI..HIRAGANA LETTER SMALL WO +1B164..1B167 ; Lo # [4] KATAKANA LETTER SMALL WI..KATAKANA LETTER SMALL N +1B170..1B2FB ; Lo # [396] NUSHU CHARACTER-1B170..NUSHU CHARACTER-1B2FB +1BC00..1BC6A ; Lo # [107] DUPLOYAN LETTER H..DUPLOYAN LETTER VOCALIC M +1BC70..1BC7C ; Lo # [13] DUPLOYAN AFFIX LEFT HORIZONTAL SECANT..DUPLOYAN AFFIX ATTACHED TANGENT HOOK +1BC80..1BC88 ; Lo # [9] DUPLOYAN AFFIX HIGH ACUTE..DUPLOYAN AFFIX HIGH VERTICAL +1BC90..1BC99 ; Lo # [10] DUPLOYAN AFFIX LOW ACUTE..DUPLOYAN AFFIX LOW ARROW +1E100..1E12C ; Lo # [45] NYIAKENG PUACHUE HMONG LETTER MA..NYIAKENG PUACHUE HMONG LETTER W +1E14E ; Lo # NYIAKENG PUACHUE HMONG LOGOGRAM NYAJ +1E2C0..1E2EB ; Lo # [44] WANCHO LETTER AA..WANCHO LETTER YIH +1E800..1E8C4 ; Lo # [197] MENDE KIKAKUI SYLLABLE M001 KI..MENDE KIKAKUI SYLLABLE M060 NYON +1EE00..1EE03 ; Lo # [4] ARABIC MATHEMATICAL ALEF..ARABIC MATHEMATICAL DAL +1EE05..1EE1F ; Lo # [27] ARABIC MATHEMATICAL WAW..ARABIC MATHEMATICAL DOTLESS QAF +1EE21..1EE22 ; Lo # [2] ARABIC MATHEMATICAL INITIAL BEH..ARABIC MATHEMATICAL INITIAL JEEM +1EE24 ; Lo # ARABIC MATHEMATICAL INITIAL HEH +1EE27 ; Lo # ARABIC MATHEMATICAL INITIAL HAH +1EE29..1EE32 ; Lo # [10] ARABIC MATHEMATICAL INITIAL YEH..ARABIC MATHEMATICAL INITIAL QAF +1EE34..1EE37 ; Lo # [4] ARABIC MATHEMATICAL INITIAL SHEEN..ARABIC MATHEMATICAL INITIAL KHAH +1EE39 ; Lo # ARABIC MATHEMATICAL INITIAL DAD +1EE3B ; Lo # ARABIC MATHEMATICAL INITIAL GHAIN +1EE42 ; Lo # ARABIC MATHEMATICAL TAILED JEEM +1EE47 ; Lo # ARABIC MATHEMATICAL TAILED HAH +1EE49 ; Lo # ARABIC MATHEMATICAL TAILED YEH +1EE4B ; Lo # ARABIC MATHEMATICAL TAILED LAM +1EE4D..1EE4F ; Lo # [3] ARABIC MATHEMATICAL TAILED NOON..ARABIC MATHEMATICAL TAILED AIN +1EE51..1EE52 ; Lo # [2] ARABIC MATHEMATICAL TAILED SAD..ARABIC MATHEMATICAL TAILED QAF +1EE54 ; Lo # ARABIC MATHEMATICAL TAILED SHEEN +1EE57 ; Lo # ARABIC MATHEMATICAL TAILED KHAH +1EE59 ; Lo # ARABIC MATHEMATICAL TAILED DAD +1EE5B ; Lo # ARABIC MATHEMATICAL TAILED GHAIN +1EE5D ; Lo # ARABIC MATHEMATICAL TAILED DOTLESS NOON +1EE5F ; Lo # ARABIC MATHEMATICAL TAILED DOTLESS QAF +1EE61..1EE62 ; Lo # [2] ARABIC MATHEMATICAL STRETCHED BEH..ARABIC MATHEMATICAL STRETCHED JEEM +1EE64 ; Lo # ARABIC MATHEMATICAL STRETCHED HEH +1EE67..1EE6A ; Lo # [4] ARABIC MATHEMATICAL STRETCHED HAH..ARABIC MATHEMATICAL STRETCHED KAF +1EE6C..1EE72 ; Lo # [7] ARABIC MATHEMATICAL STRETCHED MEEM..ARABIC MATHEMATICAL STRETCHED QAF +1EE74..1EE77 ; Lo # [4] ARABIC MATHEMATICAL STRETCHED SHEEN..ARABIC MATHEMATICAL STRETCHED KHAH +1EE79..1EE7C ; Lo # [4] ARABIC MATHEMATICAL STRETCHED DAD..ARABIC MATHEMATICAL STRETCHED DOTLESS BEH +1EE7E ; Lo # ARABIC MATHEMATICAL STRETCHED DOTLESS FEH +1EE80..1EE89 ; Lo # [10] ARABIC MATHEMATICAL LOOPED ALEF..ARABIC MATHEMATICAL LOOPED YEH +1EE8B..1EE9B ; Lo # [17] ARABIC MATHEMATICAL LOOPED LAM..ARABIC MATHEMATICAL LOOPED GHAIN +1EEA1..1EEA3 ; Lo # [3] ARABIC MATHEMATICAL DOUBLE-STRUCK BEH..ARABIC MATHEMATICAL DOUBLE-STRUCK DAL +1EEA5..1EEA9 ; Lo # [5] ARABIC MATHEMATICAL DOUBLE-STRUCK WAW..ARABIC MATHEMATICAL DOUBLE-STRUCK YEH +1EEAB..1EEBB ; Lo # [17] ARABIC MATHEMATICAL DOUBLE-STRUCK LAM..ARABIC MATHEMATICAL DOUBLE-STRUCK GHAIN +20000..2A6DD ; Lo # [42718] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6DD +2A700..2B734 ; Lo # [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734 +2B740..2B81D ; Lo # [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D +2B820..2CEA1 ; Lo # [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1 +2CEB0..2EBE0 ; Lo # [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0 +2F800..2FA1D ; Lo # [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D +30000..3134A ; Lo # [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A + +# Total code points: 127004 + +# ================================================ + +# General_Category=Nonspacing_Mark + +0300..036F ; Mn # [112] COMBINING GRAVE ACCENT..COMBINING LATIN SMALL LETTER X +0483..0487 ; Mn # [5] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC POKRYTIE +0591..05BD ; Mn # [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG +05BF ; Mn # HEBREW POINT RAFE +05C1..05C2 ; Mn # [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT +05C4..05C5 ; Mn # [2] HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT +05C7 ; Mn # HEBREW POINT QAMATS QATAN +0610..061A ; Mn # [11] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL KASRA +064B..065F ; Mn # [21] ARABIC FATHATAN..ARABIC WAVY HAMZA BELOW +0670 ; Mn # ARABIC LETTER SUPERSCRIPT ALEF +06D6..06DC ; Mn # [7] ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA..ARABIC SMALL HIGH SEEN +06DF..06E4 ; Mn # [6] ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL HIGH MADDA +06E7..06E8 ; Mn # [2] ARABIC SMALL HIGH YEH..ARABIC SMALL HIGH NOON +06EA..06ED ; Mn # [4] ARABIC EMPTY CENTRE LOW STOP..ARABIC SMALL LOW MEEM +0711 ; Mn # SYRIAC LETTER SUPERSCRIPT ALAPH +0730..074A ; Mn # [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH +07A6..07B0 ; Mn # [11] THAANA ABAFILI..THAANA SUKUN +07EB..07F3 ; Mn # [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE +07FD ; Mn # NKO DANTAYALAN +0816..0819 ; Mn # [4] SAMARITAN MARK IN..SAMARITAN MARK DAGESH +081B..0823 ; Mn # [9] SAMARITAN MARK EPENTHETIC YUT..SAMARITAN VOWEL SIGN A +0825..0827 ; Mn # [3] SAMARITAN VOWEL SIGN SHORT A..SAMARITAN VOWEL SIGN U +0829..082D ; Mn # [5] SAMARITAN VOWEL SIGN LONG I..SAMARITAN MARK NEQUDAA +0859..085B ; Mn # [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK +08D3..08E1 ; Mn # [15] ARABIC SMALL LOW WAW..ARABIC SMALL HIGH SIGN SAFHA +08E3..0902 ; Mn # [32] ARABIC TURNED DAMMA BELOW..DEVANAGARI SIGN ANUSVARA +093A ; Mn # DEVANAGARI VOWEL SIGN OE +093C ; Mn # DEVANAGARI SIGN NUKTA +0941..0948 ; Mn # [8] DEVANAGARI VOWEL SIGN U..DEVANAGARI VOWEL SIGN AI +094D ; Mn # DEVANAGARI SIGN VIRAMA +0951..0957 ; Mn # [7] DEVANAGARI STRESS SIGN UDATTA..DEVANAGARI VOWEL SIGN UUE +0962..0963 ; Mn # [2] DEVANAGARI VOWEL SIGN VOCALIC L..DEVANAGARI VOWEL SIGN VOCALIC LL +0981 ; Mn # BENGALI SIGN CANDRABINDU +09BC ; Mn # BENGALI SIGN NUKTA +09C1..09C4 ; Mn # [4] BENGALI VOWEL SIGN U..BENGALI VOWEL SIGN VOCALIC RR +09CD ; Mn # BENGALI SIGN VIRAMA +09E2..09E3 ; Mn # [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL +09FE ; Mn # BENGALI SANDHI MARK +0A01..0A02 ; Mn # [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI +0A3C ; Mn # GURMUKHI SIGN NUKTA +0A41..0A42 ; Mn # [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU +0A47..0A48 ; Mn # [2] GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN AI +0A4B..0A4D ; Mn # [3] GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA +0A51 ; Mn # GURMUKHI SIGN UDAAT +0A70..0A71 ; Mn # [2] GURMUKHI TIPPI..GURMUKHI ADDAK +0A75 ; Mn # GURMUKHI SIGN YAKASH +0A81..0A82 ; Mn # [2] GUJARATI SIGN CANDRABINDU..GUJARATI SIGN ANUSVARA +0ABC ; Mn # GUJARATI SIGN NUKTA +0AC1..0AC5 ; Mn # [5] GUJARATI VOWEL SIGN U..GUJARATI VOWEL SIGN CANDRA E +0AC7..0AC8 ; Mn # [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI +0ACD ; Mn # GUJARATI SIGN VIRAMA +0AE2..0AE3 ; Mn # [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL +0AFA..0AFF ; Mn # [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE +0B01 ; Mn # ORIYA SIGN CANDRABINDU +0B3C ; Mn # ORIYA SIGN NUKTA +0B3F ; Mn # ORIYA VOWEL SIGN I +0B41..0B44 ; Mn # [4] ORIYA VOWEL SIGN U..ORIYA VOWEL SIGN VOCALIC RR +0B4D ; Mn # ORIYA SIGN VIRAMA +0B55..0B56 ; Mn # [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK +0B62..0B63 ; Mn # [2] ORIYA VOWEL SIGN VOCALIC L..ORIYA VOWEL SIGN VOCALIC LL +0B82 ; Mn # TAMIL SIGN ANUSVARA +0BC0 ; Mn # TAMIL VOWEL SIGN II +0BCD ; Mn # TAMIL SIGN VIRAMA +0C00 ; Mn # TELUGU SIGN COMBINING CANDRABINDU ABOVE +0C04 ; Mn # TELUGU SIGN COMBINING ANUSVARA ABOVE +0C3E..0C40 ; Mn # [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II +0C46..0C48 ; Mn # [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI +0C4A..0C4D ; Mn # [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA +0C55..0C56 ; Mn # [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK +0C62..0C63 ; Mn # [2] TELUGU VOWEL SIGN VOCALIC L..TELUGU VOWEL SIGN VOCALIC LL +0C81 ; Mn # KANNADA SIGN CANDRABINDU +0CBC ; Mn # KANNADA SIGN NUKTA +0CBF ; Mn # KANNADA VOWEL SIGN I +0CC6 ; Mn # KANNADA VOWEL SIGN E +0CCC..0CCD ; Mn # [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA +0CE2..0CE3 ; Mn # [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL +0D00..0D01 ; Mn # [2] MALAYALAM SIGN COMBINING ANUSVARA ABOVE..MALAYALAM SIGN CANDRABINDU +0D3B..0D3C ; Mn # [2] MALAYALAM SIGN VERTICAL BAR VIRAMA..MALAYALAM SIGN CIRCULAR VIRAMA +0D41..0D44 ; Mn # [4] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC RR +0D4D ; Mn # MALAYALAM SIGN VIRAMA +0D62..0D63 ; Mn # [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL +0D81 ; Mn # SINHALA SIGN CANDRABINDU +0DCA ; Mn # SINHALA SIGN AL-LAKUNA +0DD2..0DD4 ; Mn # [3] SINHALA VOWEL SIGN KETTI IS-PILLA..SINHALA VOWEL SIGN KETTI PAA-PILLA +0DD6 ; Mn # SINHALA VOWEL SIGN DIGA PAA-PILLA +0E31 ; Mn # THAI CHARACTER MAI HAN-AKAT +0E34..0E3A ; Mn # [7] THAI CHARACTER SARA I..THAI CHARACTER PHINTHU +0E47..0E4E ; Mn # [8] THAI CHARACTER MAITAIKHU..THAI CHARACTER YAMAKKAN +0EB1 ; Mn # LAO VOWEL SIGN MAI KAN +0EB4..0EBC ; Mn # [9] LAO VOWEL SIGN I..LAO SEMIVOWEL SIGN LO +0EC8..0ECD ; Mn # [6] LAO TONE MAI EK..LAO NIGGAHITA +0F18..0F19 ; Mn # [2] TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS +0F35 ; Mn # TIBETAN MARK NGAS BZUNG NYI ZLA +0F37 ; Mn # TIBETAN MARK NGAS BZUNG SGOR RTAGS +0F39 ; Mn # TIBETAN MARK TSA -PHRU +0F71..0F7E ; Mn # [14] TIBETAN VOWEL SIGN AA..TIBETAN SIGN RJES SU NGA RO +0F80..0F84 ; Mn # [5] TIBETAN VOWEL SIGN REVERSED I..TIBETAN MARK HALANTA +0F86..0F87 ; Mn # [2] TIBETAN SIGN LCI RTAGS..TIBETAN SIGN YANG RTAGS +0F8D..0F97 ; Mn # [11] TIBETAN SUBJOINED SIGN LCE TSA CAN..TIBETAN SUBJOINED LETTER JA +0F99..0FBC ; Mn # [36] TIBETAN SUBJOINED LETTER NYA..TIBETAN SUBJOINED LETTER FIXED-FORM RA +0FC6 ; Mn # TIBETAN SYMBOL PADMA GDAN +102D..1030 ; Mn # [4] MYANMAR VOWEL SIGN I..MYANMAR VOWEL SIGN UU +1032..1037 ; Mn # [6] MYANMAR VOWEL SIGN AI..MYANMAR SIGN DOT BELOW +1039..103A ; Mn # [2] MYANMAR SIGN VIRAMA..MYANMAR SIGN ASAT +103D..103E ; Mn # [2] MYANMAR CONSONANT SIGN MEDIAL WA..MYANMAR CONSONANT SIGN MEDIAL HA +1058..1059 ; Mn # [2] MYANMAR VOWEL SIGN VOCALIC L..MYANMAR VOWEL SIGN VOCALIC LL +105E..1060 ; Mn # [3] MYANMAR CONSONANT SIGN MON MEDIAL NA..MYANMAR CONSONANT SIGN MON MEDIAL LA +1071..1074 ; Mn # [4] MYANMAR VOWEL SIGN GEBA KAREN I..MYANMAR VOWEL SIGN KAYAH EE +1082 ; Mn # MYANMAR CONSONANT SIGN SHAN MEDIAL WA +1085..1086 ; Mn # [2] MYANMAR VOWEL SIGN SHAN E ABOVE..MYANMAR VOWEL SIGN SHAN FINAL Y +108D ; Mn # MYANMAR SIGN SHAN COUNCIL EMPHATIC TONE +109D ; Mn # MYANMAR VOWEL SIGN AITON AI +135D..135F ; Mn # [3] ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK..ETHIOPIC COMBINING GEMINATION MARK +1712..1714 ; Mn # [3] TAGALOG VOWEL SIGN I..TAGALOG SIGN VIRAMA +1732..1734 ; Mn # [3] HANUNOO VOWEL SIGN I..HANUNOO SIGN PAMUDPOD +1752..1753 ; Mn # [2] BUHID VOWEL SIGN I..BUHID VOWEL SIGN U +1772..1773 ; Mn # [2] TAGBANWA VOWEL SIGN I..TAGBANWA VOWEL SIGN U +17B4..17B5 ; Mn # [2] KHMER VOWEL INHERENT AQ..KHMER VOWEL INHERENT AA +17B7..17BD ; Mn # [7] KHMER VOWEL SIGN I..KHMER VOWEL SIGN UA +17C6 ; Mn # KHMER SIGN NIKAHIT +17C9..17D3 ; Mn # [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT +17DD ; Mn # KHMER SIGN ATTHACAN +180B..180D ; Mn # [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE +1885..1886 ; Mn # [2] MONGOLIAN LETTER ALI GALI BALUDA..MONGOLIAN LETTER ALI GALI THREE BALUDA +18A9 ; Mn # MONGOLIAN LETTER ALI GALI DAGALGA +1920..1922 ; Mn # [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U +1927..1928 ; Mn # [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O +1932 ; Mn # LIMBU SMALL LETTER ANUSVARA +1939..193B ; Mn # [3] LIMBU SIGN MUKPHRENG..LIMBU SIGN SA-I +1A17..1A18 ; Mn # [2] BUGINESE VOWEL SIGN I..BUGINESE VOWEL SIGN U +1A1B ; Mn # BUGINESE VOWEL SIGN AE +1A56 ; Mn # TAI THAM CONSONANT SIGN MEDIAL LA +1A58..1A5E ; Mn # [7] TAI THAM SIGN MAI KANG LAI..TAI THAM CONSONANT SIGN SA +1A60 ; Mn # TAI THAM SIGN SAKOT +1A62 ; Mn # TAI THAM VOWEL SIGN MAI SAT +1A65..1A6C ; Mn # [8] TAI THAM VOWEL SIGN I..TAI THAM VOWEL SIGN OA BELOW +1A73..1A7C ; Mn # [10] TAI THAM VOWEL SIGN OA ABOVE..TAI THAM SIGN KHUEN-LUE KARAN +1A7F ; Mn # TAI THAM COMBINING CRYPTOGRAMMIC DOT +1AB0..1ABD ; Mn # [14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING PARENTHESES BELOW +1ABF..1AC0 ; Mn # [2] COMBINING LATIN SMALL LETTER W BELOW..COMBINING LATIN SMALL LETTER TURNED W BELOW +1B00..1B03 ; Mn # [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG +1B34 ; Mn # BALINESE SIGN REREKAN +1B36..1B3A ; Mn # [5] BALINESE VOWEL SIGN ULU..BALINESE VOWEL SIGN RA REPA +1B3C ; Mn # BALINESE VOWEL SIGN LA LENGA +1B42 ; Mn # BALINESE VOWEL SIGN PEPET +1B6B..1B73 ; Mn # [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG +1B80..1B81 ; Mn # [2] SUNDANESE SIGN PANYECEK..SUNDANESE SIGN PANGLAYAR +1BA2..1BA5 ; Mn # [4] SUNDANESE CONSONANT SIGN PANYAKRA..SUNDANESE VOWEL SIGN PANYUKU +1BA8..1BA9 ; Mn # [2] SUNDANESE VOWEL SIGN PAMEPET..SUNDANESE VOWEL SIGN PANEULEUNG +1BAB..1BAD ; Mn # [3] SUNDANESE SIGN VIRAMA..SUNDANESE CONSONANT SIGN PASANGAN WA +1BE6 ; Mn # BATAK SIGN TOMPI +1BE8..1BE9 ; Mn # [2] BATAK VOWEL SIGN PAKPAK E..BATAK VOWEL SIGN EE +1BED ; Mn # BATAK VOWEL SIGN KARO O +1BEF..1BF1 ; Mn # [3] BATAK VOWEL SIGN U FOR SIMALUNGUN SA..BATAK CONSONANT SIGN H +1C2C..1C33 ; Mn # [8] LEPCHA VOWEL SIGN E..LEPCHA CONSONANT SIGN T +1C36..1C37 ; Mn # [2] LEPCHA SIGN RAN..LEPCHA SIGN NUKTA +1CD0..1CD2 ; Mn # [3] VEDIC TONE KARSHANA..VEDIC TONE PRENKHA +1CD4..1CE0 ; Mn # [13] VEDIC SIGN YAJURVEDIC MIDLINE SVARITA..VEDIC TONE RIGVEDIC KASHMIRI INDEPENDENT SVARITA +1CE2..1CE8 ; Mn # [7] VEDIC SIGN VISARGA SVARITA..VEDIC SIGN VISARGA ANUDATTA WITH TAIL +1CED ; Mn # VEDIC SIGN TIRYAK +1CF4 ; Mn # VEDIC TONE CANDRA ABOVE +1CF8..1CF9 ; Mn # [2] VEDIC TONE RING ABOVE..VEDIC TONE DOUBLE RING ABOVE +1DC0..1DF9 ; Mn # [58] COMBINING DOTTED GRAVE ACCENT..COMBINING WIDE INVERTED BRIDGE BELOW +1DFB..1DFF ; Mn # [5] COMBINING DELETION MARK..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW +20D0..20DC ; Mn # [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE +20E1 ; Mn # COMBINING LEFT RIGHT ARROW ABOVE +20E5..20F0 ; Mn # [12] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING ASTERISK ABOVE +2CEF..2CF1 ; Mn # [3] COPTIC COMBINING NI ABOVE..COPTIC COMBINING SPIRITUS LENIS +2D7F ; Mn # TIFINAGH CONSONANT JOINER +2DE0..2DFF ; Mn # [32] COMBINING CYRILLIC LETTER BE..COMBINING CYRILLIC LETTER IOTIFIED BIG YUS +302A..302D ; Mn # [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC ENTERING TONE MARK +3099..309A ; Mn # [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK +A66F ; Mn # COMBINING CYRILLIC VZMET +A674..A67D ; Mn # [10] COMBINING CYRILLIC LETTER UKRAINIAN IE..COMBINING CYRILLIC PAYEROK +A69E..A69F ; Mn # [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E +A6F0..A6F1 ; Mn # [2] BAMUM COMBINING MARK KOQNDON..BAMUM COMBINING MARK TUKWENTIS +A802 ; Mn # SYLOTI NAGRI SIGN DVISVARA +A806 ; Mn # SYLOTI NAGRI SIGN HASANTA +A80B ; Mn # SYLOTI NAGRI SIGN ANUSVARA +A825..A826 ; Mn # [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI NAGRI VOWEL SIGN E +A82C ; Mn # SYLOTI NAGRI SIGN ALTERNATE HASANTA +A8C4..A8C5 ; Mn # [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA SIGN CANDRABINDU +A8E0..A8F1 ; Mn # [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAGARI SIGN AVAGRAHA +A8FF ; Mn # DEVANAGARI VOWEL SIGN AY +A926..A92D ; Mn # [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU +A947..A951 ; Mn # [11] REJANG VOWEL SIGN I..REJANG CONSONANT SIGN R +A980..A982 ; Mn # [3] JAVANESE SIGN PANYANGGA..JAVANESE SIGN LAYAR +A9B3 ; Mn # JAVANESE SIGN CECAK TELU +A9B6..A9B9 ; Mn # [4] JAVANESE VOWEL SIGN WULU..JAVANESE VOWEL SIGN SUKU MENDUT +A9BC..A9BD ; Mn # [2] JAVANESE VOWEL SIGN PEPET..JAVANESE CONSONANT SIGN KERET +A9E5 ; Mn # MYANMAR SIGN SHAN SAW +AA29..AA2E ; Mn # [6] CHAM VOWEL SIGN AA..CHAM VOWEL SIGN OE +AA31..AA32 ; Mn # [2] CHAM VOWEL SIGN AU..CHAM VOWEL SIGN UE +AA35..AA36 ; Mn # [2] CHAM CONSONANT SIGN LA..CHAM CONSONANT SIGN WA +AA43 ; Mn # CHAM CONSONANT SIGN FINAL NG +AA4C ; Mn # CHAM CONSONANT SIGN FINAL M +AA7C ; Mn # MYANMAR SIGN TAI LAING TONE-2 +AAB0 ; Mn # TAI VIET MAI KANG +AAB2..AAB4 ; Mn # [3] TAI VIET VOWEL I..TAI VIET VOWEL U +AAB7..AAB8 ; Mn # [2] TAI VIET MAI KHIT..TAI VIET VOWEL IA +AABE..AABF ; Mn # [2] TAI VIET VOWEL AM..TAI VIET TONE MAI EK +AAC1 ; Mn # TAI VIET TONE MAI THO +AAEC..AAED ; Mn # [2] MEETEI MAYEK VOWEL SIGN UU..MEETEI MAYEK VOWEL SIGN AAI +AAF6 ; Mn # MEETEI MAYEK VIRAMA +ABE5 ; Mn # MEETEI MAYEK VOWEL SIGN ANAP +ABE8 ; Mn # MEETEI MAYEK VOWEL SIGN UNAP +ABED ; Mn # MEETEI MAYEK APUN IYEK +FB1E ; Mn # HEBREW POINT JUDEO-SPANISH VARIKA +FE00..FE0F ; Mn # [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16 +FE20..FE2F ; Mn # [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF +101FD ; Mn # PHAISTOS DISC SIGN COMBINING OBLIQUE STROKE +102E0 ; Mn # COPTIC EPACT THOUSANDS MARK +10376..1037A ; Mn # [5] COMBINING OLD PERMIC LETTER AN..COMBINING OLD PERMIC LETTER SII +10A01..10A03 ; Mn # [3] KHAROSHTHI VOWEL SIGN I..KHAROSHTHI VOWEL SIGN VOCALIC R +10A05..10A06 ; Mn # [2] KHAROSHTHI VOWEL SIGN E..KHAROSHTHI VOWEL SIGN O +10A0C..10A0F ; Mn # [4] KHAROSHTHI VOWEL LENGTH MARK..KHAROSHTHI SIGN VISARGA +10A38..10A3A ; Mn # [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW +10A3F ; Mn # KHAROSHTHI VIRAMA +10AE5..10AE6 ; Mn # [2] MANICHAEAN ABBREVIATION MARK ABOVE..MANICHAEAN ABBREVIATION MARK BELOW +10D24..10D27 ; Mn # [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI +10EAB..10EAC ; Mn # [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK +10F46..10F50 ; Mn # [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW +11001 ; Mn # BRAHMI SIGN ANUSVARA +11038..11046 ; Mn # [15] BRAHMI VOWEL SIGN AA..BRAHMI VIRAMA +1107F..11081 ; Mn # [3] BRAHMI NUMBER JOINER..KAITHI SIGN ANUSVARA +110B3..110B6 ; Mn # [4] KAITHI VOWEL SIGN U..KAITHI VOWEL SIGN AI +110B9..110BA ; Mn # [2] KAITHI SIGN VIRAMA..KAITHI SIGN NUKTA +11100..11102 ; Mn # [3] CHAKMA SIGN CANDRABINDU..CHAKMA SIGN VISARGA +11127..1112B ; Mn # [5] CHAKMA VOWEL SIGN A..CHAKMA VOWEL SIGN UU +1112D..11134 ; Mn # [8] CHAKMA VOWEL SIGN AI..CHAKMA MAAYYAA +11173 ; Mn # MAHAJANI SIGN NUKTA +11180..11181 ; Mn # [2] SHARADA SIGN CANDRABINDU..SHARADA SIGN ANUSVARA +111B6..111BE ; Mn # [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O +111C9..111CC ; Mn # [4] SHARADA SANDHI MARK..SHARADA EXTRA SHORT VOWEL MARK +111CF ; Mn # SHARADA SIGN INVERTED CANDRABINDU +1122F..11231 ; Mn # [3] KHOJKI VOWEL SIGN U..KHOJKI VOWEL SIGN AI +11234 ; Mn # KHOJKI SIGN ANUSVARA +11236..11237 ; Mn # [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA +1123E ; Mn # KHOJKI SIGN SUKUN +112DF ; Mn # KHUDAWADI SIGN ANUSVARA +112E3..112EA ; Mn # [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA +11300..11301 ; Mn # [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU +1133B..1133C ; Mn # [2] COMBINING BINDU BELOW..GRANTHA SIGN NUKTA +11340 ; Mn # GRANTHA VOWEL SIGN II +11366..1136C ; Mn # [7] COMBINING GRANTHA DIGIT ZERO..COMBINING GRANTHA DIGIT SIX +11370..11374 ; Mn # [5] COMBINING GRANTHA LETTER A..COMBINING GRANTHA LETTER PA +11438..1143F ; Mn # [8] NEWA VOWEL SIGN U..NEWA VOWEL SIGN AI +11442..11444 ; Mn # [3] NEWA SIGN VIRAMA..NEWA SIGN ANUSVARA +11446 ; Mn # NEWA SIGN NUKTA +1145E ; Mn # NEWA SANDHI MARK +114B3..114B8 ; Mn # [6] TIRHUTA VOWEL SIGN U..TIRHUTA VOWEL SIGN VOCALIC LL +114BA ; Mn # TIRHUTA VOWEL SIGN SHORT E +114BF..114C0 ; Mn # [2] TIRHUTA SIGN CANDRABINDU..TIRHUTA SIGN ANUSVARA +114C2..114C3 ; Mn # [2] TIRHUTA SIGN VIRAMA..TIRHUTA SIGN NUKTA +115B2..115B5 ; Mn # [4] SIDDHAM VOWEL SIGN U..SIDDHAM VOWEL SIGN VOCALIC RR +115BC..115BD ; Mn # [2] SIDDHAM SIGN CANDRABINDU..SIDDHAM SIGN ANUSVARA +115BF..115C0 ; Mn # [2] SIDDHAM SIGN VIRAMA..SIDDHAM SIGN NUKTA +115DC..115DD ; Mn # [2] SIDDHAM VOWEL SIGN ALTERNATE U..SIDDHAM VOWEL SIGN ALTERNATE UU +11633..1163A ; Mn # [8] MODI VOWEL SIGN U..MODI VOWEL SIGN AI +1163D ; Mn # MODI SIGN ANUSVARA +1163F..11640 ; Mn # [2] MODI SIGN VIRAMA..MODI SIGN ARDHACANDRA +116AB ; Mn # TAKRI SIGN ANUSVARA +116AD ; Mn # TAKRI VOWEL SIGN AA +116B0..116B5 ; Mn # [6] TAKRI VOWEL SIGN U..TAKRI VOWEL SIGN AU +116B7 ; Mn # TAKRI SIGN NUKTA +1171D..1171F ; Mn # [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA +11722..11725 ; Mn # [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU +11727..1172B ; Mn # [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER +1182F..11837 ; Mn # [9] DOGRA VOWEL SIGN U..DOGRA SIGN ANUSVARA +11839..1183A ; Mn # [2] DOGRA SIGN VIRAMA..DOGRA SIGN NUKTA +1193B..1193C ; Mn # [2] DIVES AKURU SIGN ANUSVARA..DIVES AKURU SIGN CANDRABINDU +1193E ; Mn # DIVES AKURU VIRAMA +11943 ; Mn # DIVES AKURU SIGN NUKTA +119D4..119D7 ; Mn # [4] NANDINAGARI VOWEL SIGN U..NANDINAGARI VOWEL SIGN VOCALIC RR +119DA..119DB ; Mn # [2] NANDINAGARI VOWEL SIGN E..NANDINAGARI VOWEL SIGN AI +119E0 ; Mn # NANDINAGARI SIGN VIRAMA +11A01..11A0A ; Mn # [10] ZANABAZAR SQUARE VOWEL SIGN I..ZANABAZAR SQUARE VOWEL LENGTH MARK +11A33..11A38 ; Mn # [6] ZANABAZAR SQUARE FINAL CONSONANT MARK..ZANABAZAR SQUARE SIGN ANUSVARA +11A3B..11A3E ; Mn # [4] ZANABAZAR SQUARE CLUSTER-FINAL LETTER YA..ZANABAZAR SQUARE CLUSTER-FINAL LETTER VA +11A47 ; Mn # ZANABAZAR SQUARE SUBJOINER +11A51..11A56 ; Mn # [6] SOYOMBO VOWEL SIGN I..SOYOMBO VOWEL SIGN OE +11A59..11A5B ; Mn # [3] SOYOMBO VOWEL SIGN VOCALIC R..SOYOMBO VOWEL LENGTH MARK +11A8A..11A96 ; Mn # [13] SOYOMBO FINAL CONSONANT SIGN G..SOYOMBO SIGN ANUSVARA +11A98..11A99 ; Mn # [2] SOYOMBO GEMINATION MARK..SOYOMBO SUBJOINER +11C30..11C36 ; Mn # [7] BHAIKSUKI VOWEL SIGN I..BHAIKSUKI VOWEL SIGN VOCALIC L +11C38..11C3D ; Mn # [6] BHAIKSUKI VOWEL SIGN E..BHAIKSUKI SIGN ANUSVARA +11C3F ; Mn # BHAIKSUKI SIGN VIRAMA +11C92..11CA7 ; Mn # [22] MARCHEN SUBJOINED LETTER KA..MARCHEN SUBJOINED LETTER ZA +11CAA..11CB0 ; Mn # [7] MARCHEN SUBJOINED LETTER RA..MARCHEN VOWEL SIGN AA +11CB2..11CB3 ; Mn # [2] MARCHEN VOWEL SIGN U..MARCHEN VOWEL SIGN E +11CB5..11CB6 ; Mn # [2] MARCHEN SIGN ANUSVARA..MARCHEN SIGN CANDRABINDU +11D31..11D36 ; Mn # [6] MASARAM GONDI VOWEL SIGN AA..MASARAM GONDI VOWEL SIGN VOCALIC R +11D3A ; Mn # MASARAM GONDI VOWEL SIGN E +11D3C..11D3D ; Mn # [2] MASARAM GONDI VOWEL SIGN AI..MASARAM GONDI VOWEL SIGN O +11D3F..11D45 ; Mn # [7] MASARAM GONDI VOWEL SIGN AU..MASARAM GONDI VIRAMA +11D47 ; Mn # MASARAM GONDI RA-KARA +11D90..11D91 ; Mn # [2] GUNJALA GONDI VOWEL SIGN EE..GUNJALA GONDI VOWEL SIGN AI +11D95 ; Mn # GUNJALA GONDI SIGN ANUSVARA +11D97 ; Mn # GUNJALA GONDI VIRAMA +11EF3..11EF4 ; Mn # [2] MAKASAR VOWEL SIGN I..MAKASAR VOWEL SIGN U +16AF0..16AF4 ; Mn # [5] BASSA VAH COMBINING HIGH TONE..BASSA VAH COMBINING HIGH-LOW TONE +16B30..16B36 ; Mn # [7] PAHAWH HMONG MARK CIM TUB..PAHAWH HMONG MARK CIM TAUM +16F4F ; Mn # MIAO SIGN CONSONANT MODIFIER BAR +16F8F..16F92 ; Mn # [4] MIAO TONE RIGHT..MIAO TONE BELOW +16FE4 ; Mn # KHITAN SMALL SCRIPT FILLER +1BC9D..1BC9E ; Mn # [2] DUPLOYAN THICK LETTER SELECTOR..DUPLOYAN DOUBLE MARK +1D167..1D169 ; Mn # [3] MUSICAL SYMBOL COMBINING TREMOLO-1..MUSICAL SYMBOL COMBINING TREMOLO-3 +1D17B..1D182 ; Mn # [8] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL COMBINING LOURE +1D185..1D18B ; Mn # [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE +1D1AA..1D1AD ; Mn # [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO +1D242..1D244 ; Mn # [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME +1DA00..1DA36 ; Mn # [55] SIGNWRITING HEAD RIM..SIGNWRITING AIR SUCKING IN +1DA3B..1DA6C ; Mn # [50] SIGNWRITING MOUTH CLOSED NEUTRAL..SIGNWRITING EXCITEMENT +1DA75 ; Mn # SIGNWRITING UPPER BODY TILTING FROM HIP JOINTS +1DA84 ; Mn # SIGNWRITING LOCATION HEAD NECK +1DA9B..1DA9F ; Mn # [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6 +1DAA1..1DAAF ; Mn # [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16 +1E000..1E006 ; Mn # [7] COMBINING GLAGOLITIC LETTER AZU..COMBINING GLAGOLITIC LETTER ZHIVETE +1E008..1E018 ; Mn # [17] COMBINING GLAGOLITIC LETTER ZEMLJA..COMBINING GLAGOLITIC LETTER HERU +1E01B..1E021 ; Mn # [7] COMBINING GLAGOLITIC LETTER SHTA..COMBINING GLAGOLITIC LETTER YATI +1E023..1E024 ; Mn # [2] COMBINING GLAGOLITIC LETTER YU..COMBINING GLAGOLITIC LETTER SMALL YUS +1E026..1E02A ; Mn # [5] COMBINING GLAGOLITIC LETTER YO..COMBINING GLAGOLITIC LETTER FITA +1E130..1E136 ; Mn # [7] NYIAKENG PUACHUE HMONG TONE-B..NYIAKENG PUACHUE HMONG TONE-D +1E2EC..1E2EF ; Mn # [4] WANCHO TONE TUP..WANCHO TONE KOINI +1E8D0..1E8D6 ; Mn # [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS +1E944..1E94A ; Mn # [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA +E0100..E01EF ; Mn # [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 + +# Total code points: 1839 + +# ================================================ + +# General_Category=Enclosing_Mark + +0488..0489 ; Me # [2] COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..COMBINING CYRILLIC MILLIONS SIGN +1ABE ; Me # COMBINING PARENTHESES OVERLAY +20DD..20E0 ; Me # [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH +20E2..20E4 ; Me # [3] COMBINING ENCLOSING SCREEN..COMBINING ENCLOSING UPWARD POINTING TRIANGLE +A670..A672 ; Me # [3] COMBINING CYRILLIC TEN MILLIONS SIGN..COMBINING CYRILLIC THOUSAND MILLIONS SIGN + +# Total code points: 13 + +# ================================================ + +# General_Category=Spacing_Mark + +0903 ; Mc # DEVANAGARI SIGN VISARGA +093B ; Mc # DEVANAGARI VOWEL SIGN OOE +093E..0940 ; Mc # [3] DEVANAGARI VOWEL SIGN AA..DEVANAGARI VOWEL SIGN II +0949..094C ; Mc # [4] DEVANAGARI VOWEL SIGN CANDRA O..DEVANAGARI VOWEL SIGN AU +094E..094F ; Mc # [2] DEVANAGARI VOWEL SIGN PRISHTHAMATRA E..DEVANAGARI VOWEL SIGN AW +0982..0983 ; Mc # [2] BENGALI SIGN ANUSVARA..BENGALI SIGN VISARGA +09BE..09C0 ; Mc # [3] BENGALI VOWEL SIGN AA..BENGALI VOWEL SIGN II +09C7..09C8 ; Mc # [2] BENGALI VOWEL SIGN E..BENGALI VOWEL SIGN AI +09CB..09CC ; Mc # [2] BENGALI VOWEL SIGN O..BENGALI VOWEL SIGN AU +09D7 ; Mc # BENGALI AU LENGTH MARK +0A03 ; Mc # GURMUKHI SIGN VISARGA +0A3E..0A40 ; Mc # [3] GURMUKHI VOWEL SIGN AA..GURMUKHI VOWEL SIGN II +0A83 ; Mc # GUJARATI SIGN VISARGA +0ABE..0AC0 ; Mc # [3] GUJARATI VOWEL SIGN AA..GUJARATI VOWEL SIGN II +0AC9 ; Mc # GUJARATI VOWEL SIGN CANDRA O +0ACB..0ACC ; Mc # [2] GUJARATI VOWEL SIGN O..GUJARATI VOWEL SIGN AU +0B02..0B03 ; Mc # [2] ORIYA SIGN ANUSVARA..ORIYA SIGN VISARGA +0B3E ; Mc # ORIYA VOWEL SIGN AA +0B40 ; Mc # ORIYA VOWEL SIGN II +0B47..0B48 ; Mc # [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI +0B4B..0B4C ; Mc # [2] ORIYA VOWEL SIGN O..ORIYA VOWEL SIGN AU +0B57 ; Mc # ORIYA AU LENGTH MARK +0BBE..0BBF ; Mc # [2] TAMIL VOWEL SIGN AA..TAMIL VOWEL SIGN I +0BC1..0BC2 ; Mc # [2] TAMIL VOWEL SIGN U..TAMIL VOWEL SIGN UU +0BC6..0BC8 ; Mc # [3] TAMIL VOWEL SIGN E..TAMIL VOWEL SIGN AI +0BCA..0BCC ; Mc # [3] TAMIL VOWEL SIGN O..TAMIL VOWEL SIGN AU +0BD7 ; Mc # TAMIL AU LENGTH MARK +0C01..0C03 ; Mc # [3] TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA +0C41..0C44 ; Mc # [4] TELUGU VOWEL SIGN U..TELUGU VOWEL SIGN VOCALIC RR +0C82..0C83 ; Mc # [2] KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA +0CBE ; Mc # KANNADA VOWEL SIGN AA +0CC0..0CC4 ; Mc # [5] KANNADA VOWEL SIGN II..KANNADA VOWEL SIGN VOCALIC RR +0CC7..0CC8 ; Mc # [2] KANNADA VOWEL SIGN EE..KANNADA VOWEL SIGN AI +0CCA..0CCB ; Mc # [2] KANNADA VOWEL SIGN O..KANNADA VOWEL SIGN OO +0CD5..0CD6 ; Mc # [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK +0D02..0D03 ; Mc # [2] MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISARGA +0D3E..0D40 ; Mc # [3] MALAYALAM VOWEL SIGN AA..MALAYALAM VOWEL SIGN II +0D46..0D48 ; Mc # [3] MALAYALAM VOWEL SIGN E..MALAYALAM VOWEL SIGN AI +0D4A..0D4C ; Mc # [3] MALAYALAM VOWEL SIGN O..MALAYALAM VOWEL SIGN AU +0D57 ; Mc # MALAYALAM AU LENGTH MARK +0D82..0D83 ; Mc # [2] SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARGAYA +0DCF..0DD1 ; Mc # [3] SINHALA VOWEL SIGN AELA-PILLA..SINHALA VOWEL SIGN DIGA AEDA-PILLA +0DD8..0DDF ; Mc # [8] SINHALA VOWEL SIGN GAETTA-PILLA..SINHALA VOWEL SIGN GAYANUKITTA +0DF2..0DF3 ; Mc # [2] SINHALA VOWEL SIGN DIGA GAETTA-PILLA..SINHALA VOWEL SIGN DIGA GAYANUKITTA +0F3E..0F3F ; Mc # [2] TIBETAN SIGN YAR TSHES..TIBETAN SIGN MAR TSHES +0F7F ; Mc # TIBETAN SIGN RNAM BCAD +102B..102C ; Mc # [2] MYANMAR VOWEL SIGN TALL AA..MYANMAR VOWEL SIGN AA +1031 ; Mc # MYANMAR VOWEL SIGN E +1038 ; Mc # MYANMAR SIGN VISARGA +103B..103C ; Mc # [2] MYANMAR CONSONANT SIGN MEDIAL YA..MYANMAR CONSONANT SIGN MEDIAL RA +1056..1057 ; Mc # [2] MYANMAR VOWEL SIGN VOCALIC R..MYANMAR VOWEL SIGN VOCALIC RR +1062..1064 ; Mc # [3] MYANMAR VOWEL SIGN SGAW KAREN EU..MYANMAR TONE MARK SGAW KAREN KE PHO +1067..106D ; Mc # [7] MYANMAR VOWEL SIGN WESTERN PWO KAREN EU..MYANMAR SIGN WESTERN PWO KAREN TONE-5 +1083..1084 ; Mc # [2] MYANMAR VOWEL SIGN SHAN AA..MYANMAR VOWEL SIGN SHAN E +1087..108C ; Mc # [6] MYANMAR SIGN SHAN TONE-2..MYANMAR SIGN SHAN COUNCIL TONE-3 +108F ; Mc # MYANMAR SIGN RUMAI PALAUNG TONE-5 +109A..109C ; Mc # [3] MYANMAR SIGN KHAMTI TONE-1..MYANMAR VOWEL SIGN AITON A +17B6 ; Mc # KHMER VOWEL SIGN AA +17BE..17C5 ; Mc # [8] KHMER VOWEL SIGN OE..KHMER VOWEL SIGN AU +17C7..17C8 ; Mc # [2] KHMER SIGN REAHMUK..KHMER SIGN YUUKALEAPINTU +1923..1926 ; Mc # [4] LIMBU VOWEL SIGN EE..LIMBU VOWEL SIGN AU +1929..192B ; Mc # [3] LIMBU SUBJOINED LETTER YA..LIMBU SUBJOINED LETTER WA +1930..1931 ; Mc # [2] LIMBU SMALL LETTER KA..LIMBU SMALL LETTER NGA +1933..1938 ; Mc # [6] LIMBU SMALL LETTER TA..LIMBU SMALL LETTER LA +1A19..1A1A ; Mc # [2] BUGINESE VOWEL SIGN E..BUGINESE VOWEL SIGN O +1A55 ; Mc # TAI THAM CONSONANT SIGN MEDIAL RA +1A57 ; Mc # TAI THAM CONSONANT SIGN LA TANG LAI +1A61 ; Mc # TAI THAM VOWEL SIGN A +1A63..1A64 ; Mc # [2] TAI THAM VOWEL SIGN AA..TAI THAM VOWEL SIGN TALL AA +1A6D..1A72 ; Mc # [6] TAI THAM VOWEL SIGN OY..TAI THAM VOWEL SIGN THAM AI +1B04 ; Mc # BALINESE SIGN BISAH +1B35 ; Mc # BALINESE VOWEL SIGN TEDUNG +1B3B ; Mc # BALINESE VOWEL SIGN RA REPA TEDUNG +1B3D..1B41 ; Mc # [5] BALINESE VOWEL SIGN LA LENGA TEDUNG..BALINESE VOWEL SIGN TALING REPA TEDUNG +1B43..1B44 ; Mc # [2] BALINESE VOWEL SIGN PEPET TEDUNG..BALINESE ADEG ADEG +1B82 ; Mc # SUNDANESE SIGN PANGWISAD +1BA1 ; Mc # SUNDANESE CONSONANT SIGN PAMINGKAL +1BA6..1BA7 ; Mc # [2] SUNDANESE VOWEL SIGN PANAELAENG..SUNDANESE VOWEL SIGN PANOLONG +1BAA ; Mc # SUNDANESE SIGN PAMAAEH +1BE7 ; Mc # BATAK VOWEL SIGN E +1BEA..1BEC ; Mc # [3] BATAK VOWEL SIGN I..BATAK VOWEL SIGN O +1BEE ; Mc # BATAK VOWEL SIGN U +1BF2..1BF3 ; Mc # [2] BATAK PANGOLAT..BATAK PANONGONAN +1C24..1C2B ; Mc # [8] LEPCHA SUBJOINED LETTER YA..LEPCHA VOWEL SIGN UU +1C34..1C35 ; Mc # [2] LEPCHA CONSONANT SIGN NYIN-DO..LEPCHA CONSONANT SIGN KANG +1CE1 ; Mc # VEDIC TONE ATHARVAVEDIC INDEPENDENT SVARITA +1CF7 ; Mc # VEDIC SIGN ATIKRAMA +302E..302F ; Mc # [2] HANGUL SINGLE DOT TONE MARK..HANGUL DOUBLE DOT TONE MARK +A823..A824 ; Mc # [2] SYLOTI NAGRI VOWEL SIGN A..SYLOTI NAGRI VOWEL SIGN I +A827 ; Mc # SYLOTI NAGRI VOWEL SIGN OO +A880..A881 ; Mc # [2] SAURASHTRA SIGN ANUSVARA..SAURASHTRA SIGN VISARGA +A8B4..A8C3 ; Mc # [16] SAURASHTRA CONSONANT SIGN HAARU..SAURASHTRA VOWEL SIGN AU +A952..A953 ; Mc # [2] REJANG CONSONANT SIGN H..REJANG VIRAMA +A983 ; Mc # JAVANESE SIGN WIGNYAN +A9B4..A9B5 ; Mc # [2] JAVANESE VOWEL SIGN TARUNG..JAVANESE VOWEL SIGN TOLONG +A9BA..A9BB ; Mc # [2] JAVANESE VOWEL SIGN TALING..JAVANESE VOWEL SIGN DIRGA MURE +A9BE..A9C0 ; Mc # [3] JAVANESE CONSONANT SIGN PENGKAL..JAVANESE PANGKON +AA2F..AA30 ; Mc # [2] CHAM VOWEL SIGN O..CHAM VOWEL SIGN AI +AA33..AA34 ; Mc # [2] CHAM CONSONANT SIGN YA..CHAM CONSONANT SIGN RA +AA4D ; Mc # CHAM CONSONANT SIGN FINAL H +AA7B ; Mc # MYANMAR SIGN PAO KAREN TONE +AA7D ; Mc # MYANMAR SIGN TAI LAING TONE-5 +AAEB ; Mc # MEETEI MAYEK VOWEL SIGN II +AAEE..AAEF ; Mc # [2] MEETEI MAYEK VOWEL SIGN AU..MEETEI MAYEK VOWEL SIGN AAU +AAF5 ; Mc # MEETEI MAYEK VOWEL SIGN VISARGA +ABE3..ABE4 ; Mc # [2] MEETEI MAYEK VOWEL SIGN ONAP..MEETEI MAYEK VOWEL SIGN INAP +ABE6..ABE7 ; Mc # [2] MEETEI MAYEK VOWEL SIGN YENAP..MEETEI MAYEK VOWEL SIGN SOUNAP +ABE9..ABEA ; Mc # [2] MEETEI MAYEK VOWEL SIGN CHEINAP..MEETEI MAYEK VOWEL SIGN NUNG +ABEC ; Mc # MEETEI MAYEK LUM IYEK +11000 ; Mc # BRAHMI SIGN CANDRABINDU +11002 ; Mc # BRAHMI SIGN VISARGA +11082 ; Mc # KAITHI SIGN VISARGA +110B0..110B2 ; Mc # [3] KAITHI VOWEL SIGN AA..KAITHI VOWEL SIGN II +110B7..110B8 ; Mc # [2] KAITHI VOWEL SIGN O..KAITHI VOWEL SIGN AU +1112C ; Mc # CHAKMA VOWEL SIGN E +11145..11146 ; Mc # [2] CHAKMA VOWEL SIGN AA..CHAKMA VOWEL SIGN EI +11182 ; Mc # SHARADA SIGN VISARGA +111B3..111B5 ; Mc # [3] SHARADA VOWEL SIGN AA..SHARADA VOWEL SIGN II +111BF..111C0 ; Mc # [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA +111CE ; Mc # SHARADA VOWEL SIGN PRISHTHAMATRA E +1122C..1122E ; Mc # [3] KHOJKI VOWEL SIGN AA..KHOJKI VOWEL SIGN II +11232..11233 ; Mc # [2] KHOJKI VOWEL SIGN O..KHOJKI VOWEL SIGN AU +11235 ; Mc # KHOJKI SIGN VIRAMA +112E0..112E2 ; Mc # [3] KHUDAWADI VOWEL SIGN AA..KHUDAWADI VOWEL SIGN II +11302..11303 ; Mc # [2] GRANTHA SIGN ANUSVARA..GRANTHA SIGN VISARGA +1133E..1133F ; Mc # [2] GRANTHA VOWEL SIGN AA..GRANTHA VOWEL SIGN I +11341..11344 ; Mc # [4] GRANTHA VOWEL SIGN U..GRANTHA VOWEL SIGN VOCALIC RR +11347..11348 ; Mc # [2] GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI +1134B..1134D ; Mc # [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA +11357 ; Mc # GRANTHA AU LENGTH MARK +11362..11363 ; Mc # [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL +11435..11437 ; Mc # [3] NEWA VOWEL SIGN AA..NEWA VOWEL SIGN II +11440..11441 ; Mc # [2] NEWA VOWEL SIGN O..NEWA VOWEL SIGN AU +11445 ; Mc # NEWA SIGN VISARGA +114B0..114B2 ; Mc # [3] TIRHUTA VOWEL SIGN AA..TIRHUTA VOWEL SIGN II +114B9 ; Mc # TIRHUTA VOWEL SIGN E +114BB..114BE ; Mc # [4] TIRHUTA VOWEL SIGN AI..TIRHUTA VOWEL SIGN AU +114C1 ; Mc # TIRHUTA SIGN VISARGA +115AF..115B1 ; Mc # [3] SIDDHAM VOWEL SIGN AA..SIDDHAM VOWEL SIGN II +115B8..115BB ; Mc # [4] SIDDHAM VOWEL SIGN E..SIDDHAM VOWEL SIGN AU +115BE ; Mc # SIDDHAM SIGN VISARGA +11630..11632 ; Mc # [3] MODI VOWEL SIGN AA..MODI VOWEL SIGN II +1163B..1163C ; Mc # [2] MODI VOWEL SIGN O..MODI VOWEL SIGN AU +1163E ; Mc # MODI SIGN VISARGA +116AC ; Mc # TAKRI SIGN VISARGA +116AE..116AF ; Mc # [2] TAKRI VOWEL SIGN I..TAKRI VOWEL SIGN II +116B6 ; Mc # TAKRI SIGN VIRAMA +11720..11721 ; Mc # [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA +11726 ; Mc # AHOM VOWEL SIGN E +1182C..1182E ; Mc # [3] DOGRA VOWEL SIGN AA..DOGRA VOWEL SIGN II +11838 ; Mc # DOGRA SIGN VISARGA +11930..11935 ; Mc # [6] DIVES AKURU VOWEL SIGN AA..DIVES AKURU VOWEL SIGN E +11937..11938 ; Mc # [2] DIVES AKURU VOWEL SIGN AI..DIVES AKURU VOWEL SIGN O +1193D ; Mc # DIVES AKURU SIGN HALANTA +11940 ; Mc # DIVES AKURU MEDIAL YA +11942 ; Mc # DIVES AKURU MEDIAL RA +119D1..119D3 ; Mc # [3] NANDINAGARI VOWEL SIGN AA..NANDINAGARI VOWEL SIGN II +119DC..119DF ; Mc # [4] NANDINAGARI VOWEL SIGN O..NANDINAGARI SIGN VISARGA +119E4 ; Mc # NANDINAGARI VOWEL SIGN PRISHTHAMATRA E +11A39 ; Mc # ZANABAZAR SQUARE SIGN VISARGA +11A57..11A58 ; Mc # [2] SOYOMBO VOWEL SIGN AI..SOYOMBO VOWEL SIGN AU +11A97 ; Mc # SOYOMBO SIGN VISARGA +11C2F ; Mc # BHAIKSUKI VOWEL SIGN AA +11C3E ; Mc # BHAIKSUKI SIGN VISARGA +11CA9 ; Mc # MARCHEN SUBJOINED LETTER YA +11CB1 ; Mc # MARCHEN VOWEL SIGN I +11CB4 ; Mc # MARCHEN VOWEL SIGN O +11D8A..11D8E ; Mc # [5] GUNJALA GONDI VOWEL SIGN AA..GUNJALA GONDI VOWEL SIGN UU +11D93..11D94 ; Mc # [2] GUNJALA GONDI VOWEL SIGN OO..GUNJALA GONDI VOWEL SIGN AU +11D96 ; Mc # GUNJALA GONDI SIGN VISARGA +11EF5..11EF6 ; Mc # [2] MAKASAR VOWEL SIGN E..MAKASAR VOWEL SIGN O +16F51..16F87 ; Mc # [55] MIAO SIGN ASPIRATION..MIAO VOWEL SIGN UI +16FF0..16FF1 ; Mc # [2] VIETNAMESE ALTERNATE READING MARK CA..VIETNAMESE ALTERNATE READING MARK NHAY +1D165..1D166 ; Mc # [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM +1D16D..1D172 ; Mc # [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5 + +# Total code points: 443 + +# ================================================ + +# General_Category=Decimal_Number + +0030..0039 ; Nd # [10] DIGIT ZERO..DIGIT NINE +0660..0669 ; Nd # [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NINE +06F0..06F9 ; Nd # [10] EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED ARABIC-INDIC DIGIT NINE +07C0..07C9 ; Nd # [10] NKO DIGIT ZERO..NKO DIGIT NINE +0966..096F ; Nd # [10] DEVANAGARI DIGIT ZERO..DEVANAGARI DIGIT NINE +09E6..09EF ; Nd # [10] BENGALI DIGIT ZERO..BENGALI DIGIT NINE +0A66..0A6F ; Nd # [10] GURMUKHI DIGIT ZERO..GURMUKHI DIGIT NINE +0AE6..0AEF ; Nd # [10] GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE +0B66..0B6F ; Nd # [10] ORIYA DIGIT ZERO..ORIYA DIGIT NINE +0BE6..0BEF ; Nd # [10] TAMIL DIGIT ZERO..TAMIL DIGIT NINE +0C66..0C6F ; Nd # [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE +0CE6..0CEF ; Nd # [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE +0D66..0D6F ; Nd # [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE +0DE6..0DEF ; Nd # [10] SINHALA LITH DIGIT ZERO..SINHALA LITH DIGIT NINE +0E50..0E59 ; Nd # [10] THAI DIGIT ZERO..THAI DIGIT NINE +0ED0..0ED9 ; Nd # [10] LAO DIGIT ZERO..LAO DIGIT NINE +0F20..0F29 ; Nd # [10] TIBETAN DIGIT ZERO..TIBETAN DIGIT NINE +1040..1049 ; Nd # [10] MYANMAR DIGIT ZERO..MYANMAR DIGIT NINE +1090..1099 ; Nd # [10] MYANMAR SHAN DIGIT ZERO..MYANMAR SHAN DIGIT NINE +17E0..17E9 ; Nd # [10] KHMER DIGIT ZERO..KHMER DIGIT NINE +1810..1819 ; Nd # [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE +1946..194F ; Nd # [10] LIMBU DIGIT ZERO..LIMBU DIGIT NINE +19D0..19D9 ; Nd # [10] NEW TAI LUE DIGIT ZERO..NEW TAI LUE DIGIT NINE +1A80..1A89 ; Nd # [10] TAI THAM HORA DIGIT ZERO..TAI THAM HORA DIGIT NINE +1A90..1A99 ; Nd # [10] TAI THAM THAM DIGIT ZERO..TAI THAM THAM DIGIT NINE +1B50..1B59 ; Nd # [10] BALINESE DIGIT ZERO..BALINESE DIGIT NINE +1BB0..1BB9 ; Nd # [10] SUNDANESE DIGIT ZERO..SUNDANESE DIGIT NINE +1C40..1C49 ; Nd # [10] LEPCHA DIGIT ZERO..LEPCHA DIGIT NINE +1C50..1C59 ; Nd # [10] OL CHIKI DIGIT ZERO..OL CHIKI DIGIT NINE +A620..A629 ; Nd # [10] VAI DIGIT ZERO..VAI DIGIT NINE +A8D0..A8D9 ; Nd # [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE +A900..A909 ; Nd # [10] KAYAH LI DIGIT ZERO..KAYAH LI DIGIT NINE +A9D0..A9D9 ; Nd # [10] JAVANESE DIGIT ZERO..JAVANESE DIGIT NINE +A9F0..A9F9 ; Nd # [10] MYANMAR TAI LAING DIGIT ZERO..MYANMAR TAI LAING DIGIT NINE +AA50..AA59 ; Nd # [10] CHAM DIGIT ZERO..CHAM DIGIT NINE +ABF0..ABF9 ; Nd # [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DIGIT NINE +FF10..FF19 ; Nd # [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE +104A0..104A9 ; Nd # [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE +10D30..10D39 ; Nd # [10] HANIFI ROHINGYA DIGIT ZERO..HANIFI ROHINGYA DIGIT NINE +11066..1106F ; Nd # [10] BRAHMI DIGIT ZERO..BRAHMI DIGIT NINE +110F0..110F9 ; Nd # [10] SORA SOMPENG DIGIT ZERO..SORA SOMPENG DIGIT NINE +11136..1113F ; Nd # [10] CHAKMA DIGIT ZERO..CHAKMA DIGIT NINE +111D0..111D9 ; Nd # [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE +112F0..112F9 ; Nd # [10] KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE +11450..11459 ; Nd # [10] NEWA DIGIT ZERO..NEWA DIGIT NINE +114D0..114D9 ; Nd # [10] TIRHUTA DIGIT ZERO..TIRHUTA DIGIT NINE +11650..11659 ; Nd # [10] MODI DIGIT ZERO..MODI DIGIT NINE +116C0..116C9 ; Nd # [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE +11730..11739 ; Nd # [10] AHOM DIGIT ZERO..AHOM DIGIT NINE +118E0..118E9 ; Nd # [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE +11950..11959 ; Nd # [10] DIVES AKURU DIGIT ZERO..DIVES AKURU DIGIT NINE +11C50..11C59 ; Nd # [10] BHAIKSUKI DIGIT ZERO..BHAIKSUKI DIGIT NINE +11D50..11D59 ; Nd # [10] MASARAM GONDI DIGIT ZERO..MASARAM GONDI DIGIT NINE +11DA0..11DA9 ; Nd # [10] GUNJALA GONDI DIGIT ZERO..GUNJALA GONDI DIGIT NINE +16A60..16A69 ; Nd # [10] MRO DIGIT ZERO..MRO DIGIT NINE +16B50..16B59 ; Nd # [10] PAHAWH HMONG DIGIT ZERO..PAHAWH HMONG DIGIT NINE +1D7CE..1D7FF ; Nd # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE +1E140..1E149 ; Nd # [10] NYIAKENG PUACHUE HMONG DIGIT ZERO..NYIAKENG PUACHUE HMONG DIGIT NINE +1E2F0..1E2F9 ; Nd # [10] WANCHO DIGIT ZERO..WANCHO DIGIT NINE +1E950..1E959 ; Nd # [10] ADLAM DIGIT ZERO..ADLAM DIGIT NINE +1FBF0..1FBF9 ; Nd # [10] SEGMENTED DIGIT ZERO..SEGMENTED DIGIT NINE + +# Total code points: 650 + +# ================================================ + +# General_Category=Letter_Number + +16EE..16F0 ; Nl # [3] RUNIC ARLAUG SYMBOL..RUNIC BELGTHOR SYMBOL +2160..2182 ; Nl # [35] ROMAN NUMERAL ONE..ROMAN NUMERAL TEN THOUSAND +2185..2188 ; Nl # [4] ROMAN NUMERAL SIX LATE FORM..ROMAN NUMERAL ONE HUNDRED THOUSAND +3007 ; Nl # IDEOGRAPHIC NUMBER ZERO +3021..3029 ; Nl # [9] HANGZHOU NUMERAL ONE..HANGZHOU NUMERAL NINE +3038..303A ; Nl # [3] HANGZHOU NUMERAL TEN..HANGZHOU NUMERAL THIRTY +A6E6..A6EF ; Nl # [10] BAMUM LETTER MO..BAMUM LETTER KOGHOM +10140..10174 ; Nl # [53] GREEK ACROPHONIC ATTIC ONE QUARTER..GREEK ACROPHONIC STRATIAN FIFTY MNAS +10341 ; Nl # GOTHIC LETTER NINETY +1034A ; Nl # GOTHIC LETTER NINE HUNDRED +103D1..103D5 ; Nl # [5] OLD PERSIAN NUMBER ONE..OLD PERSIAN NUMBER HUNDRED +12400..1246E ; Nl # [111] CUNEIFORM NUMERIC SIGN TWO ASH..CUNEIFORM NUMERIC SIGN NINE U VARIANT FORM + +# Total code points: 236 + +# ================================================ + +# General_Category=Other_Number + +00B2..00B3 ; No # [2] SUPERSCRIPT TWO..SUPERSCRIPT THREE +00B9 ; No # SUPERSCRIPT ONE +00BC..00BE ; No # [3] VULGAR FRACTION ONE QUARTER..VULGAR FRACTION THREE QUARTERS +09F4..09F9 ; No # [6] BENGALI CURRENCY NUMERATOR ONE..BENGALI CURRENCY DENOMINATOR SIXTEEN +0B72..0B77 ; No # [6] ORIYA FRACTION ONE QUARTER..ORIYA FRACTION THREE SIXTEENTHS +0BF0..0BF2 ; No # [3] TAMIL NUMBER TEN..TAMIL NUMBER ONE THOUSAND +0C78..0C7E ; No # [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR +0D58..0D5E ; No # [7] MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH..MALAYALAM FRACTION ONE FIFTH +0D70..0D78 ; No # [9] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE SIXTEENTHS +0F2A..0F33 ; No # [10] TIBETAN DIGIT HALF ONE..TIBETAN DIGIT HALF ZERO +1369..137C ; No # [20] ETHIOPIC DIGIT ONE..ETHIOPIC NUMBER TEN THOUSAND +17F0..17F9 ; No # [10] KHMER SYMBOL LEK ATTAK SON..KHMER SYMBOL LEK ATTAK PRAM-BUON +19DA ; No # NEW TAI LUE THAM DIGIT ONE +2070 ; No # SUPERSCRIPT ZERO +2074..2079 ; No # [6] SUPERSCRIPT FOUR..SUPERSCRIPT NINE +2080..2089 ; No # [10] SUBSCRIPT ZERO..SUBSCRIPT NINE +2150..215F ; No # [16] VULGAR FRACTION ONE SEVENTH..FRACTION NUMERATOR ONE +2189 ; No # VULGAR FRACTION ZERO THIRDS +2460..249B ; No # [60] CIRCLED DIGIT ONE..NUMBER TWENTY FULL STOP +24EA..24FF ; No # [22] CIRCLED DIGIT ZERO..NEGATIVE CIRCLED DIGIT ZERO +2776..2793 ; No # [30] DINGBAT NEGATIVE CIRCLED DIGIT ONE..DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN +2CFD ; No # COPTIC FRACTION ONE HALF +3192..3195 ; No # [4] IDEOGRAPHIC ANNOTATION ONE MARK..IDEOGRAPHIC ANNOTATION FOUR MARK +3220..3229 ; No # [10] PARENTHESIZED IDEOGRAPH ONE..PARENTHESIZED IDEOGRAPH TEN +3248..324F ; No # [8] CIRCLED NUMBER TEN ON BLACK SQUARE..CIRCLED NUMBER EIGHTY ON BLACK SQUARE +3251..325F ; No # [15] CIRCLED NUMBER TWENTY ONE..CIRCLED NUMBER THIRTY FIVE +3280..3289 ; No # [10] CIRCLED IDEOGRAPH ONE..CIRCLED IDEOGRAPH TEN +32B1..32BF ; No # [15] CIRCLED NUMBER THIRTY SIX..CIRCLED NUMBER FIFTY +A830..A835 ; No # [6] NORTH INDIC FRACTION ONE QUARTER..NORTH INDIC FRACTION THREE SIXTEENTHS +10107..10133 ; No # [45] AEGEAN NUMBER ONE..AEGEAN NUMBER NINETY THOUSAND +10175..10178 ; No # [4] GREEK ONE HALF SIGN..GREEK THREE QUARTERS SIGN +1018A..1018B ; No # [2] GREEK ZERO SIGN..GREEK ONE QUARTER SIGN +102E1..102FB ; No # [27] COPTIC EPACT DIGIT ONE..COPTIC EPACT NUMBER NINE HUNDRED +10320..10323 ; No # [4] OLD ITALIC NUMERAL ONE..OLD ITALIC NUMERAL FIFTY +10858..1085F ; No # [8] IMPERIAL ARAMAIC NUMBER ONE..IMPERIAL ARAMAIC NUMBER TEN THOUSAND +10879..1087F ; No # [7] PALMYRENE NUMBER ONE..PALMYRENE NUMBER TWENTY +108A7..108AF ; No # [9] NABATAEAN NUMBER ONE..NABATAEAN NUMBER ONE HUNDRED +108FB..108FF ; No # [5] HATRAN NUMBER ONE..HATRAN NUMBER ONE HUNDRED +10916..1091B ; No # [6] PHOENICIAN NUMBER ONE..PHOENICIAN NUMBER THREE +109BC..109BD ; No # [2] MEROITIC CURSIVE FRACTION ELEVEN TWELFTHS..MEROITIC CURSIVE FRACTION ONE HALF +109C0..109CF ; No # [16] MEROITIC CURSIVE NUMBER ONE..MEROITIC CURSIVE NUMBER SEVENTY +109D2..109FF ; No # [46] MEROITIC CURSIVE NUMBER ONE HUNDRED..MEROITIC CURSIVE FRACTION TEN TWELFTHS +10A40..10A48 ; No # [9] KHAROSHTHI DIGIT ONE..KHAROSHTHI FRACTION ONE HALF +10A7D..10A7E ; No # [2] OLD SOUTH ARABIAN NUMBER ONE..OLD SOUTH ARABIAN NUMBER FIFTY +10A9D..10A9F ; No # [3] OLD NORTH ARABIAN NUMBER ONE..OLD NORTH ARABIAN NUMBER TWENTY +10AEB..10AEF ; No # [5] MANICHAEAN NUMBER ONE..MANICHAEAN NUMBER ONE HUNDRED +10B58..10B5F ; No # [8] INSCRIPTIONAL PARTHIAN NUMBER ONE..INSCRIPTIONAL PARTHIAN NUMBER ONE THOUSAND +10B78..10B7F ; No # [8] INSCRIPTIONAL PAHLAVI NUMBER ONE..INSCRIPTIONAL PAHLAVI NUMBER ONE THOUSAND +10BA9..10BAF ; No # [7] PSALTER PAHLAVI NUMBER ONE..PSALTER PAHLAVI NUMBER ONE HUNDRED +10CFA..10CFF ; No # [6] OLD HUNGARIAN NUMBER ONE..OLD HUNGARIAN NUMBER ONE THOUSAND +10E60..10E7E ; No # [31] RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS +10F1D..10F26 ; No # [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF +10F51..10F54 ; No # [4] SOGDIAN NUMBER ONE..SOGDIAN NUMBER ONE HUNDRED +10FC5..10FCB ; No # [7] CHORASMIAN NUMBER ONE..CHORASMIAN NUMBER ONE HUNDRED +11052..11065 ; No # [20] BRAHMI NUMBER ONE..BRAHMI NUMBER ONE THOUSAND +111E1..111F4 ; No # [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND +1173A..1173B ; No # [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY +118EA..118F2 ; No # [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY +11C5A..11C6C ; No # [19] BHAIKSUKI NUMBER ONE..BHAIKSUKI HUNDREDS UNIT MARK +11FC0..11FD4 ; No # [21] TAMIL FRACTION ONE THREE-HUNDRED-AND-TWENTIETH..TAMIL FRACTION DOWNSCALING FACTOR KIIZH +16B5B..16B61 ; No # [7] PAHAWH HMONG NUMBER TENS..PAHAWH HMONG NUMBER TRILLIONS +16E80..16E96 ; No # [23] MEDEFAIDRIN DIGIT ZERO..MEDEFAIDRIN DIGIT THREE ALTERNATE FORM +1D2E0..1D2F3 ; No # [20] MAYAN NUMERAL ZERO..MAYAN NUMERAL NINETEEN +1D360..1D378 ; No # [25] COUNTING ROD UNIT DIGIT ONE..TALLY MARK FIVE +1E8C7..1E8CF ; No # [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE +1EC71..1ECAB ; No # [59] INDIC SIYAQ NUMBER ONE..INDIC SIYAQ NUMBER PREFIXED NINE +1ECAD..1ECAF ; No # [3] INDIC SIYAQ FRACTION ONE QUARTER..INDIC SIYAQ FRACTION THREE QUARTERS +1ECB1..1ECB4 ; No # [4] INDIC SIYAQ NUMBER ALTERNATE ONE..INDIC SIYAQ ALTERNATE LAKH MARK +1ED01..1ED2D ; No # [45] OTTOMAN SIYAQ NUMBER ONE..OTTOMAN SIYAQ NUMBER NINETY THOUSAND +1ED2F..1ED3D ; No # [15] OTTOMAN SIYAQ ALTERNATE NUMBER TWO..OTTOMAN SIYAQ FRACTION ONE SIXTH +1F100..1F10C ; No # [13] DIGIT ZERO FULL STOP..DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO + +# Total code points: 895 + +# ================================================ + +# General_Category=Space_Separator + +0020 ; Zs # SPACE +00A0 ; Zs # NO-BREAK SPACE +1680 ; Zs # OGHAM SPACE MARK +2000..200A ; Zs # [11] EN QUAD..HAIR SPACE +202F ; Zs # NARROW NO-BREAK SPACE +205F ; Zs # MEDIUM MATHEMATICAL SPACE +3000 ; Zs # IDEOGRAPHIC SPACE + +# Total code points: 17 + +# ================================================ + +# General_Category=Line_Separator + +2028 ; Zl # LINE SEPARATOR + +# Total code points: 1 + +# ================================================ + +# General_Category=Paragraph_Separator + +2029 ; Zp # PARAGRAPH SEPARATOR + +# Total code points: 1 + +# ================================================ + +# General_Category=Control + +0000..001F ; Cc # [32] <control-0000>..<control-001F> +007F..009F ; Cc # [33] <control-007F>..<control-009F> + +# Total code points: 65 + +# ================================================ + +# General_Category=Format + +00AD ; Cf # SOFT HYPHEN +0600..0605 ; Cf # [6] ARABIC NUMBER SIGN..ARABIC NUMBER MARK ABOVE +061C ; Cf # ARABIC LETTER MARK +06DD ; Cf # ARABIC END OF AYAH +070F ; Cf # SYRIAC ABBREVIATION MARK +08E2 ; Cf # ARABIC DISPUTED END OF AYAH +180E ; Cf # MONGOLIAN VOWEL SEPARATOR +200B..200F ; Cf # [5] ZERO WIDTH SPACE..RIGHT-TO-LEFT MARK +202A..202E ; Cf # [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE +2060..2064 ; Cf # [5] WORD JOINER..INVISIBLE PLUS +2066..206F ; Cf # [10] LEFT-TO-RIGHT ISOLATE..NOMINAL DIGIT SHAPES +FEFF ; Cf # ZERO WIDTH NO-BREAK SPACE +FFF9..FFFB ; Cf # [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR +110BD ; Cf # KAITHI NUMBER SIGN +110CD ; Cf # KAITHI NUMBER SIGN ABOVE +13430..13438 ; Cf # [9] EGYPTIAN HIEROGLYPH VERTICAL JOINER..EGYPTIAN HIEROGLYPH END SEGMENT +1BCA0..1BCA3 ; Cf # [4] SHORTHAND FORMAT LETTER OVERLAP..SHORTHAND FORMAT UP STEP +1D173..1D17A ; Cf # [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE +E0001 ; Cf # LANGUAGE TAG +E0020..E007F ; Cf # [96] TAG SPACE..CANCEL TAG + +# Total code points: 161 + +# ================================================ + +# General_Category=Private_Use + +E000..F8FF ; Co # [6400] <private-use-E000>..<private-use-F8FF> +F0000..FFFFD ; Co # [65534] <private-use-F0000>..<private-use-FFFFD> +100000..10FFFD; Co # [65534] <private-use-100000>..<private-use-10FFFD> + +# Total code points: 137468 + +# ================================================ + +# General_Category=Surrogate + +D800..DFFF ; Cs # [2048] <surrogate-D800>..<surrogate-DFFF> + +# Total code points: 2048 + +# ================================================ + +# General_Category=Dash_Punctuation + +002D ; Pd # HYPHEN-MINUS +058A ; Pd # ARMENIAN HYPHEN +05BE ; Pd # HEBREW PUNCTUATION MAQAF +1400 ; Pd # CANADIAN SYLLABICS HYPHEN +1806 ; Pd # MONGOLIAN TODO SOFT HYPHEN +2010..2015 ; Pd # [6] HYPHEN..HORIZONTAL BAR +2E17 ; Pd # DOUBLE OBLIQUE HYPHEN +2E1A ; Pd # HYPHEN WITH DIAERESIS +2E3A..2E3B ; Pd # [2] TWO-EM DASH..THREE-EM DASH +2E40 ; Pd # DOUBLE HYPHEN +301C ; Pd # WAVE DASH +3030 ; Pd # WAVY DASH +30A0 ; Pd # KATAKANA-HIRAGANA DOUBLE HYPHEN +FE31..FE32 ; Pd # [2] PRESENTATION FORM FOR VERTICAL EM DASH..PRESENTATION FORM FOR VERTICAL EN DASH +FE58 ; Pd # SMALL EM DASH +FE63 ; Pd # SMALL HYPHEN-MINUS +FF0D ; Pd # FULLWIDTH HYPHEN-MINUS +10EAD ; Pd # YEZIDI HYPHENATION MARK + +# Total code points: 25 + +# ================================================ + +# General_Category=Open_Punctuation + +0028 ; Ps # LEFT PARENTHESIS +005B ; Ps # LEFT SQUARE BRACKET +007B ; Ps # LEFT CURLY BRACKET +0F3A ; Ps # TIBETAN MARK GUG RTAGS GYON +0F3C ; Ps # TIBETAN MARK ANG KHANG GYON +169B ; Ps # OGHAM FEATHER MARK +201A ; Ps # SINGLE LOW-9 QUOTATION MARK +201E ; Ps # DOUBLE LOW-9 QUOTATION MARK +2045 ; Ps # LEFT SQUARE BRACKET WITH QUILL +207D ; Ps # SUPERSCRIPT LEFT PARENTHESIS +208D ; Ps # SUBSCRIPT LEFT PARENTHESIS +2308 ; Ps # LEFT CEILING +230A ; Ps # LEFT FLOOR +2329 ; Ps # LEFT-POINTING ANGLE BRACKET +2768 ; Ps # MEDIUM LEFT PARENTHESIS ORNAMENT +276A ; Ps # MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT +276C ; Ps # MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT +276E ; Ps # HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT +2770 ; Ps # HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT +2772 ; Ps # LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT +2774 ; Ps # MEDIUM LEFT CURLY BRACKET ORNAMENT +27C5 ; Ps # LEFT S-SHAPED BAG DELIMITER +27E6 ; Ps # MATHEMATICAL LEFT WHITE SQUARE BRACKET +27E8 ; Ps # MATHEMATICAL LEFT ANGLE BRACKET +27EA ; Ps # MATHEMATICAL LEFT DOUBLE ANGLE BRACKET +27EC ; Ps # MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET +27EE ; Ps # MATHEMATICAL LEFT FLATTENED PARENTHESIS +2983 ; Ps # LEFT WHITE CURLY BRACKET +2985 ; Ps # LEFT WHITE PARENTHESIS +2987 ; Ps # Z NOTATION LEFT IMAGE BRACKET +2989 ; Ps # Z NOTATION LEFT BINDING BRACKET +298B ; Ps # LEFT SQUARE BRACKET WITH UNDERBAR +298D ; Ps # LEFT SQUARE BRACKET WITH TICK IN TOP CORNER +298F ; Ps # LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER +2991 ; Ps # LEFT ANGLE BRACKET WITH DOT +2993 ; Ps # LEFT ARC LESS-THAN BRACKET +2995 ; Ps # DOUBLE LEFT ARC GREATER-THAN BRACKET +2997 ; Ps # LEFT BLACK TORTOISE SHELL BRACKET +29D8 ; Ps # LEFT WIGGLY FENCE +29DA ; Ps # LEFT DOUBLE WIGGLY FENCE +29FC ; Ps # LEFT-POINTING CURVED ANGLE BRACKET +2E22 ; Ps # TOP LEFT HALF BRACKET +2E24 ; Ps # BOTTOM LEFT HALF BRACKET +2E26 ; Ps # LEFT SIDEWAYS U BRACKET +2E28 ; Ps # LEFT DOUBLE PARENTHESIS +2E42 ; Ps # DOUBLE LOW-REVERSED-9 QUOTATION MARK +3008 ; Ps # LEFT ANGLE BRACKET +300A ; Ps # LEFT DOUBLE ANGLE BRACKET +300C ; Ps # LEFT CORNER BRACKET +300E ; Ps # LEFT WHITE CORNER BRACKET +3010 ; Ps # LEFT BLACK LENTICULAR BRACKET +3014 ; Ps # LEFT TORTOISE SHELL BRACKET +3016 ; Ps # LEFT WHITE LENTICULAR BRACKET +3018 ; Ps # LEFT WHITE TORTOISE SHELL BRACKET +301A ; Ps # LEFT WHITE SQUARE BRACKET +301D ; Ps # REVERSED DOUBLE PRIME QUOTATION MARK +FD3F ; Ps # ORNATE RIGHT PARENTHESIS +FE17 ; Ps # PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET +FE35 ; Ps # PRESENTATION FORM FOR VERTICAL LEFT PARENTHESIS +FE37 ; Ps # PRESENTATION FORM FOR VERTICAL LEFT CURLY BRACKET +FE39 ; Ps # PRESENTATION FORM FOR VERTICAL LEFT TORTOISE SHELL BRACKET +FE3B ; Ps # PRESENTATION FORM FOR VERTICAL LEFT BLACK LENTICULAR BRACKET +FE3D ; Ps # PRESENTATION FORM FOR VERTICAL LEFT DOUBLE ANGLE BRACKET +FE3F ; Ps # PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET +FE41 ; Ps # PRESENTATION FORM FOR VERTICAL LEFT CORNER BRACKET +FE43 ; Ps # PRESENTATION FORM FOR VERTICAL LEFT WHITE CORNER BRACKET +FE47 ; Ps # PRESENTATION FORM FOR VERTICAL LEFT SQUARE BRACKET +FE59 ; Ps # SMALL LEFT PARENTHESIS +FE5B ; Ps # SMALL LEFT CURLY BRACKET +FE5D ; Ps # SMALL LEFT TORTOISE SHELL BRACKET +FF08 ; Ps # FULLWIDTH LEFT PARENTHESIS +FF3B ; Ps # FULLWIDTH LEFT SQUARE BRACKET +FF5B ; Ps # FULLWIDTH LEFT CURLY BRACKET +FF5F ; Ps # FULLWIDTH LEFT WHITE PARENTHESIS +FF62 ; Ps # HALFWIDTH LEFT CORNER BRACKET + +# Total code points: 75 + +# ================================================ + +# General_Category=Close_Punctuation + +0029 ; Pe # RIGHT PARENTHESIS +005D ; Pe # RIGHT SQUARE BRACKET +007D ; Pe # RIGHT CURLY BRACKET +0F3B ; Pe # TIBETAN MARK GUG RTAGS GYAS +0F3D ; Pe # TIBETAN MARK ANG KHANG GYAS +169C ; Pe # OGHAM REVERSED FEATHER MARK +2046 ; Pe # RIGHT SQUARE BRACKET WITH QUILL +207E ; Pe # SUPERSCRIPT RIGHT PARENTHESIS +208E ; Pe # SUBSCRIPT RIGHT PARENTHESIS +2309 ; Pe # RIGHT CEILING +230B ; Pe # RIGHT FLOOR +232A ; Pe # RIGHT-POINTING ANGLE BRACKET +2769 ; Pe # MEDIUM RIGHT PARENTHESIS ORNAMENT +276B ; Pe # MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT +276D ; Pe # MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT +276F ; Pe # HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT +2771 ; Pe # HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT +2773 ; Pe # LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT +2775 ; Pe # MEDIUM RIGHT CURLY BRACKET ORNAMENT +27C6 ; Pe # RIGHT S-SHAPED BAG DELIMITER +27E7 ; Pe # MATHEMATICAL RIGHT WHITE SQUARE BRACKET +27E9 ; Pe # MATHEMATICAL RIGHT ANGLE BRACKET +27EB ; Pe # MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET +27ED ; Pe # MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET +27EF ; Pe # MATHEMATICAL RIGHT FLATTENED PARENTHESIS +2984 ; Pe # RIGHT WHITE CURLY BRACKET +2986 ; Pe # RIGHT WHITE PARENTHESIS +2988 ; Pe # Z NOTATION RIGHT IMAGE BRACKET +298A ; Pe # Z NOTATION RIGHT BINDING BRACKET +298C ; Pe # RIGHT SQUARE BRACKET WITH UNDERBAR +298E ; Pe # RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER +2990 ; Pe # RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER +2992 ; Pe # RIGHT ANGLE BRACKET WITH DOT +2994 ; Pe # RIGHT ARC GREATER-THAN BRACKET +2996 ; Pe # DOUBLE RIGHT ARC LESS-THAN BRACKET +2998 ; Pe # RIGHT BLACK TORTOISE SHELL BRACKET +29D9 ; Pe # RIGHT WIGGLY FENCE +29DB ; Pe # RIGHT DOUBLE WIGGLY FENCE +29FD ; Pe # RIGHT-POINTING CURVED ANGLE BRACKET +2E23 ; Pe # TOP RIGHT HALF BRACKET +2E25 ; Pe # BOTTOM RIGHT HALF BRACKET +2E27 ; Pe # RIGHT SIDEWAYS U BRACKET +2E29 ; Pe # RIGHT DOUBLE PARENTHESIS +3009 ; Pe # RIGHT ANGLE BRACKET +300B ; Pe # RIGHT DOUBLE ANGLE BRACKET +300D ; Pe # RIGHT CORNER BRACKET +300F ; Pe # RIGHT WHITE CORNER BRACKET +3011 ; Pe # RIGHT BLACK LENTICULAR BRACKET +3015 ; Pe # RIGHT TORTOISE SHELL BRACKET +3017 ; Pe # RIGHT WHITE LENTICULAR BRACKET +3019 ; Pe # RIGHT WHITE TORTOISE SHELL BRACKET +301B ; Pe # RIGHT WHITE SQUARE BRACKET +301E..301F ; Pe # [2] DOUBLE PRIME QUOTATION MARK..LOW DOUBLE PRIME QUOTATION MARK +FD3E ; Pe # ORNATE LEFT PARENTHESIS +FE18 ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET +FE36 ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT PARENTHESIS +FE38 ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET +FE3A ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT TORTOISE SHELL BRACKET +FE3C ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT BLACK LENTICULAR BRACKET +FE3E ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT DOUBLE ANGLE BRACKET +FE40 ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT ANGLE BRACKET +FE42 ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT CORNER BRACKET +FE44 ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT WHITE CORNER BRACKET +FE48 ; Pe # PRESENTATION FORM FOR VERTICAL RIGHT SQUARE BRACKET +FE5A ; Pe # SMALL RIGHT PARENTHESIS +FE5C ; Pe # SMALL RIGHT CURLY BRACKET +FE5E ; Pe # SMALL RIGHT TORTOISE SHELL BRACKET +FF09 ; Pe # FULLWIDTH RIGHT PARENTHESIS +FF3D ; Pe # FULLWIDTH RIGHT SQUARE BRACKET +FF5D ; Pe # FULLWIDTH RIGHT CURLY BRACKET +FF60 ; Pe # FULLWIDTH RIGHT WHITE PARENTHESIS +FF63 ; Pe # HALFWIDTH RIGHT CORNER BRACKET + +# Total code points: 73 + +# ================================================ + +# General_Category=Connector_Punctuation + +005F ; Pc # LOW LINE +203F..2040 ; Pc # [2] UNDERTIE..CHARACTER TIE +2054 ; Pc # INVERTED UNDERTIE +FE33..FE34 ; Pc # [2] PRESENTATION FORM FOR VERTICAL LOW LINE..PRESENTATION FORM FOR VERTICAL WAVY LOW LINE +FE4D..FE4F ; Pc # [3] DASHED LOW LINE..WAVY LOW LINE +FF3F ; Pc # FULLWIDTH LOW LINE + +# Total code points: 10 + +# ================================================ + +# General_Category=Other_Punctuation + +0021..0023 ; Po # [3] EXCLAMATION MARK..NUMBER SIGN +0025..0027 ; Po # [3] PERCENT SIGN..APOSTROPHE +002A ; Po # ASTERISK +002C ; Po # COMMA +002E..002F ; Po # [2] FULL STOP..SOLIDUS +003A..003B ; Po # [2] COLON..SEMICOLON +003F..0040 ; Po # [2] QUESTION MARK..COMMERCIAL AT +005C ; Po # REVERSE SOLIDUS +00A1 ; Po # INVERTED EXCLAMATION MARK +00A7 ; Po # SECTION SIGN +00B6..00B7 ; Po # [2] PILCROW SIGN..MIDDLE DOT +00BF ; Po # INVERTED QUESTION MARK +037E ; Po # GREEK QUESTION MARK +0387 ; Po # GREEK ANO TELEIA +055A..055F ; Po # [6] ARMENIAN APOSTROPHE..ARMENIAN ABBREVIATION MARK +0589 ; Po # ARMENIAN FULL STOP +05C0 ; Po # HEBREW PUNCTUATION PASEQ +05C3 ; Po # HEBREW PUNCTUATION SOF PASUQ +05C6 ; Po # HEBREW PUNCTUATION NUN HAFUKHA +05F3..05F4 ; Po # [2] HEBREW PUNCTUATION GERESH..HEBREW PUNCTUATION GERSHAYIM +0609..060A ; Po # [2] ARABIC-INDIC PER MILLE SIGN..ARABIC-INDIC PER TEN THOUSAND SIGN +060C..060D ; Po # [2] ARABIC COMMA..ARABIC DATE SEPARATOR +061B ; Po # ARABIC SEMICOLON +061E..061F ; Po # [2] ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC QUESTION MARK +066A..066D ; Po # [4] ARABIC PERCENT SIGN..ARABIC FIVE POINTED STAR +06D4 ; Po # ARABIC FULL STOP +0700..070D ; Po # [14] SYRIAC END OF PARAGRAPH..SYRIAC HARKLEAN ASTERISCUS +07F7..07F9 ; Po # [3] NKO SYMBOL GBAKURUNEN..NKO EXCLAMATION MARK +0830..083E ; Po # [15] SAMARITAN PUNCTUATION NEQUDAA..SAMARITAN PUNCTUATION ANNAAU +085E ; Po # MANDAIC PUNCTUATION +0964..0965 ; Po # [2] DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA +0970 ; Po # DEVANAGARI ABBREVIATION SIGN +09FD ; Po # BENGALI ABBREVIATION SIGN +0A76 ; Po # GURMUKHI ABBREVIATION SIGN +0AF0 ; Po # GUJARATI ABBREVIATION SIGN +0C77 ; Po # TELUGU SIGN SIDDHAM +0C84 ; Po # KANNADA SIGN SIDDHAM +0DF4 ; Po # SINHALA PUNCTUATION KUNDDALIYA +0E4F ; Po # THAI CHARACTER FONGMAN +0E5A..0E5B ; Po # [2] THAI CHARACTER ANGKHANKHU..THAI CHARACTER KHOMUT +0F04..0F12 ; Po # [15] TIBETAN MARK INITIAL YIG MGO MDUN MA..TIBETAN MARK RGYA GRAM SHAD +0F14 ; Po # TIBETAN MARK GTER TSHEG +0F85 ; Po # TIBETAN MARK PALUTA +0FD0..0FD4 ; Po # [5] TIBETAN MARK BSKA- SHOG GI MGO RGYAN..TIBETAN MARK CLOSING BRDA RNYING YIG MGO SGAB MA +0FD9..0FDA ; Po # [2] TIBETAN MARK LEADING MCHAN RTAGS..TIBETAN MARK TRAILING MCHAN RTAGS +104A..104F ; Po # [6] MYANMAR SIGN LITTLE SECTION..MYANMAR SYMBOL GENITIVE +10FB ; Po # GEORGIAN PARAGRAPH SEPARATOR +1360..1368 ; Po # [9] ETHIOPIC SECTION MARK..ETHIOPIC PARAGRAPH SEPARATOR +166E ; Po # CANADIAN SYLLABICS FULL STOP +16EB..16ED ; Po # [3] RUNIC SINGLE PUNCTUATION..RUNIC CROSS PUNCTUATION +1735..1736 ; Po # [2] PHILIPPINE SINGLE PUNCTUATION..PHILIPPINE DOUBLE PUNCTUATION +17D4..17D6 ; Po # [3] KHMER SIGN KHAN..KHMER SIGN CAMNUC PII KUUH +17D8..17DA ; Po # [3] KHMER SIGN BEYYAL..KHMER SIGN KOOMUUT +1800..1805 ; Po # [6] MONGOLIAN BIRGA..MONGOLIAN FOUR DOTS +1807..180A ; Po # [4] MONGOLIAN SIBE SYLLABLE BOUNDARY MARKER..MONGOLIAN NIRUGU +1944..1945 ; Po # [2] LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK +1A1E..1A1F ; Po # [2] BUGINESE PALLAWA..BUGINESE END OF SECTION +1AA0..1AA6 ; Po # [7] TAI THAM SIGN WIANG..TAI THAM SIGN REVERSED ROTATED RANA +1AA8..1AAD ; Po # [6] TAI THAM SIGN KAAN..TAI THAM SIGN CAANG +1B5A..1B60 ; Po # [7] BALINESE PANTI..BALINESE PAMENENG +1BFC..1BFF ; Po # [4] BATAK SYMBOL BINDU NA METEK..BATAK SYMBOL BINDU PANGOLAT +1C3B..1C3F ; Po # [5] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION TSHOOK +1C7E..1C7F ; Po # [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD +1CC0..1CC7 ; Po # [8] SUNDANESE PUNCTUATION BINDU SURYA..SUNDANESE PUNCTUATION BINDU BA SATANGA +1CD3 ; Po # VEDIC SIGN NIHSHVASA +2016..2017 ; Po # [2] DOUBLE VERTICAL LINE..DOUBLE LOW LINE +2020..2027 ; Po # [8] DAGGER..HYPHENATION POINT +2030..2038 ; Po # [9] PER MILLE SIGN..CARET +203B..203E ; Po # [4] REFERENCE MARK..OVERLINE +2041..2043 ; Po # [3] CARET INSERTION POINT..HYPHEN BULLET +2047..2051 ; Po # [11] DOUBLE QUESTION MARK..TWO ASTERISKS ALIGNED VERTICALLY +2053 ; Po # SWUNG DASH +2055..205E ; Po # [10] FLOWER PUNCTUATION MARK..VERTICAL FOUR DOTS +2CF9..2CFC ; Po # [4] COPTIC OLD NUBIAN FULL STOP..COPTIC OLD NUBIAN VERSE DIVIDER +2CFE..2CFF ; Po # [2] COPTIC FULL STOP..COPTIC MORPHOLOGICAL DIVIDER +2D70 ; Po # TIFINAGH SEPARATOR MARK +2E00..2E01 ; Po # [2] RIGHT ANGLE SUBSTITUTION MARKER..RIGHT ANGLE DOTTED SUBSTITUTION MARKER +2E06..2E08 ; Po # [3] RAISED INTERPOLATION MARKER..DOTTED TRANSPOSITION MARKER +2E0B ; Po # RAISED SQUARE +2E0E..2E16 ; Po # [9] EDITORIAL CORONIS..DOTTED RIGHT-POINTING ANGLE +2E18..2E19 ; Po # [2] INVERTED INTERROBANG..PALM BRANCH +2E1B ; Po # TILDE WITH RING ABOVE +2E1E..2E1F ; Po # [2] TILDE WITH DOT ABOVE..TILDE WITH DOT BELOW +2E2A..2E2E ; Po # [5] TWO DOTS OVER ONE DOT PUNCTUATION..REVERSED QUESTION MARK +2E30..2E39 ; Po # [10] RING POINT..TOP HALF SECTION SIGN +2E3C..2E3F ; Po # [4] STENOGRAPHIC FULL STOP..CAPITULUM +2E41 ; Po # REVERSED COMMA +2E43..2E4F ; Po # [13] DASH WITH LEFT UPTURN..CORNISH VERSE DIVIDER +2E52 ; Po # TIRONIAN SIGN CAPITAL ET +3001..3003 ; Po # [3] IDEOGRAPHIC COMMA..DITTO MARK +303D ; Po # PART ALTERNATION MARK +30FB ; Po # KATAKANA MIDDLE DOT +A4FE..A4FF ; Po # [2] LISU PUNCTUATION COMMA..LISU PUNCTUATION FULL STOP +A60D..A60F ; Po # [3] VAI COMMA..VAI QUESTION MARK +A673 ; Po # SLAVONIC ASTERISK +A67E ; Po # CYRILLIC KAVYKA +A6F2..A6F7 ; Po # [6] BAMUM NJAEMLI..BAMUM QUESTION MARK +A874..A877 ; Po # [4] PHAGS-PA SINGLE HEAD MARK..PHAGS-PA MARK DOUBLE SHAD +A8CE..A8CF ; Po # [2] SAURASHTRA DANDA..SAURASHTRA DOUBLE DANDA +A8F8..A8FA ; Po # [3] DEVANAGARI SIGN PUSHPIKA..DEVANAGARI CARET +A8FC ; Po # DEVANAGARI SIGN SIDDHAM +A92E..A92F ; Po # [2] KAYAH LI SIGN CWI..KAYAH LI SIGN SHYA +A95F ; Po # REJANG SECTION MARK +A9C1..A9CD ; Po # [13] JAVANESE LEFT RERENGGAN..JAVANESE TURNED PADA PISELEH +A9DE..A9DF ; Po # [2] JAVANESE PADA TIRTA TUMETES..JAVANESE PADA ISEN-ISEN +AA5C..AA5F ; Po # [4] CHAM PUNCTUATION SPIRAL..CHAM PUNCTUATION TRIPLE DANDA +AADE..AADF ; Po # [2] TAI VIET SYMBOL HO HOI..TAI VIET SYMBOL KOI KOI +AAF0..AAF1 ; Po # [2] MEETEI MAYEK CHEIKHAN..MEETEI MAYEK AHANG KHUDAM +ABEB ; Po # MEETEI MAYEK CHEIKHEI +FE10..FE16 ; Po # [7] PRESENTATION FORM FOR VERTICAL COMMA..PRESENTATION FORM FOR VERTICAL QUESTION MARK +FE19 ; Po # PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS +FE30 ; Po # PRESENTATION FORM FOR VERTICAL TWO DOT LEADER +FE45..FE46 ; Po # [2] SESAME DOT..WHITE SESAME DOT +FE49..FE4C ; Po # [4] DASHED OVERLINE..DOUBLE WAVY OVERLINE +FE50..FE52 ; Po # [3] SMALL COMMA..SMALL FULL STOP +FE54..FE57 ; Po # [4] SMALL SEMICOLON..SMALL EXCLAMATION MARK +FE5F..FE61 ; Po # [3] SMALL NUMBER SIGN..SMALL ASTERISK +FE68 ; Po # SMALL REVERSE SOLIDUS +FE6A..FE6B ; Po # [2] SMALL PERCENT SIGN..SMALL COMMERCIAL AT +FF01..FF03 ; Po # [3] FULLWIDTH EXCLAMATION MARK..FULLWIDTH NUMBER SIGN +FF05..FF07 ; Po # [3] FULLWIDTH PERCENT SIGN..FULLWIDTH APOSTROPHE +FF0A ; Po # FULLWIDTH ASTERISK +FF0C ; Po # FULLWIDTH COMMA +FF0E..FF0F ; Po # [2] FULLWIDTH FULL STOP..FULLWIDTH SOLIDUS +FF1A..FF1B ; Po # [2] FULLWIDTH COLON..FULLWIDTH SEMICOLON +FF1F..FF20 ; Po # [2] FULLWIDTH QUESTION MARK..FULLWIDTH COMMERCIAL AT +FF3C ; Po # FULLWIDTH REVERSE SOLIDUS +FF61 ; Po # HALFWIDTH IDEOGRAPHIC FULL STOP +FF64..FF65 ; Po # [2] HALFWIDTH IDEOGRAPHIC COMMA..HALFWIDTH KATAKANA MIDDLE DOT +10100..10102 ; Po # [3] AEGEAN WORD SEPARATOR LINE..AEGEAN CHECK MARK +1039F ; Po # UGARITIC WORD DIVIDER +103D0 ; Po # OLD PERSIAN WORD DIVIDER +1056F ; Po # CAUCASIAN ALBANIAN CITATION MARK +10857 ; Po # IMPERIAL ARAMAIC SECTION SIGN +1091F ; Po # PHOENICIAN WORD SEPARATOR +1093F ; Po # LYDIAN TRIANGULAR MARK +10A50..10A58 ; Po # [9] KHAROSHTHI PUNCTUATION DOT..KHAROSHTHI PUNCTUATION LINES +10A7F ; Po # OLD SOUTH ARABIAN NUMERIC INDICATOR +10AF0..10AF6 ; Po # [7] MANICHAEAN PUNCTUATION STAR..MANICHAEAN PUNCTUATION LINE FILLER +10B39..10B3F ; Po # [7] AVESTAN ABBREVIATION MARK..LARGE ONE RING OVER TWO RINGS PUNCTUATION +10B99..10B9C ; Po # [4] PSALTER PAHLAVI SECTION MARK..PSALTER PAHLAVI FOUR DOTS WITH DOT +10F55..10F59 ; Po # [5] SOGDIAN PUNCTUATION TWO VERTICAL BARS..SOGDIAN PUNCTUATION HALF CIRCLE WITH DOT +11047..1104D ; Po # [7] BRAHMI DANDA..BRAHMI PUNCTUATION LOTUS +110BB..110BC ; Po # [2] KAITHI ABBREVIATION SIGN..KAITHI ENUMERATION SIGN +110BE..110C1 ; Po # [4] KAITHI SECTION MARK..KAITHI DOUBLE DANDA +11140..11143 ; Po # [4] CHAKMA SECTION MARK..CHAKMA QUESTION MARK +11174..11175 ; Po # [2] MAHAJANI ABBREVIATION SIGN..MAHAJANI SECTION MARK +111C5..111C8 ; Po # [4] SHARADA DANDA..SHARADA SEPARATOR +111CD ; Po # SHARADA SUTRA MARK +111DB ; Po # SHARADA SIGN SIDDHAM +111DD..111DF ; Po # [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2 +11238..1123D ; Po # [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN +112A9 ; Po # MULTANI SECTION MARK +1144B..1144F ; Po # [5] NEWA DANDA..NEWA ABBREVIATION SIGN +1145A..1145B ; Po # [2] NEWA DOUBLE COMMA..NEWA PLACEHOLDER MARK +1145D ; Po # NEWA INSERTION SIGN +114C6 ; Po # TIRHUTA ABBREVIATION SIGN +115C1..115D7 ; Po # [23] SIDDHAM SIGN SIDDHAM..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES +11641..11643 ; Po # [3] MODI DANDA..MODI ABBREVIATION SIGN +11660..1166C ; Po # [13] MONGOLIAN BIRGA WITH ORNAMENT..MONGOLIAN TURNED SWIRL BIRGA WITH DOUBLE ORNAMENT +1173C..1173E ; Po # [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI +1183B ; Po # DOGRA ABBREVIATION SIGN +11944..11946 ; Po # [3] DIVES AKURU DOUBLE DANDA..DIVES AKURU END OF TEXT MARK +119E2 ; Po # NANDINAGARI SIGN SIDDHAM +11A3F..11A46 ; Po # [8] ZANABAZAR SQUARE INITIAL HEAD MARK..ZANABAZAR SQUARE CLOSING DOUBLE-LINED HEAD MARK +11A9A..11A9C ; Po # [3] SOYOMBO MARK TSHEG..SOYOMBO MARK DOUBLE SHAD +11A9E..11AA2 ; Po # [5] SOYOMBO HEAD MARK WITH MOON AND SUN AND TRIPLE FLAME..SOYOMBO TERMINAL MARK-2 +11C41..11C45 ; Po # [5] BHAIKSUKI DANDA..BHAIKSUKI GAP FILLER-2 +11C70..11C71 ; Po # [2] MARCHEN HEAD MARK..MARCHEN MARK SHAD +11EF7..11EF8 ; Po # [2] MAKASAR PASSIMBANG..MAKASAR END OF SECTION +11FFF ; Po # TAMIL PUNCTUATION END OF TEXT +12470..12474 ; Po # [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON +16A6E..16A6F ; Po # [2] MRO DANDA..MRO DOUBLE DANDA +16AF5 ; Po # BASSA VAH FULL STOP +16B37..16B3B ; Po # [5] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN VOS FEEM +16B44 ; Po # PAHAWH HMONG SIGN XAUS +16E97..16E9A ; Po # [4] MEDEFAIDRIN COMMA..MEDEFAIDRIN EXCLAMATION OH +16FE2 ; Po # OLD CHINESE HOOK MARK +1BC9F ; Po # DUPLOYAN PUNCTUATION CHINOOK FULL STOP +1DA87..1DA8B ; Po # [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS +1E95E..1E95F ; Po # [2] ADLAM INITIAL EXCLAMATION MARK..ADLAM INITIAL QUESTION MARK + +# Total code points: 593 + +# ================================================ + +# General_Category=Math_Symbol + +002B ; Sm # PLUS SIGN +003C..003E ; Sm # [3] LESS-THAN SIGN..GREATER-THAN SIGN +007C ; Sm # VERTICAL LINE +007E ; Sm # TILDE +00AC ; Sm # NOT SIGN +00B1 ; Sm # PLUS-MINUS SIGN +00D7 ; Sm # MULTIPLICATION SIGN +00F7 ; Sm # DIVISION SIGN +03F6 ; Sm # GREEK REVERSED LUNATE EPSILON SYMBOL +0606..0608 ; Sm # [3] ARABIC-INDIC CUBE ROOT..ARABIC RAY +2044 ; Sm # FRACTION SLASH +2052 ; Sm # COMMERCIAL MINUS SIGN +207A..207C ; Sm # [3] SUPERSCRIPT PLUS SIGN..SUPERSCRIPT EQUALS SIGN +208A..208C ; Sm # [3] SUBSCRIPT PLUS SIGN..SUBSCRIPT EQUALS SIGN +2118 ; Sm # SCRIPT CAPITAL P +2140..2144 ; Sm # [5] DOUBLE-STRUCK N-ARY SUMMATION..TURNED SANS-SERIF CAPITAL Y +214B ; Sm # TURNED AMPERSAND +2190..2194 ; Sm # [5] LEFTWARDS ARROW..LEFT RIGHT ARROW +219A..219B ; Sm # [2] LEFTWARDS ARROW WITH STROKE..RIGHTWARDS ARROW WITH STROKE +21A0 ; Sm # RIGHTWARDS TWO HEADED ARROW +21A3 ; Sm # RIGHTWARDS ARROW WITH TAIL +21A6 ; Sm # RIGHTWARDS ARROW FROM BAR +21AE ; Sm # LEFT RIGHT ARROW WITH STROKE +21CE..21CF ; Sm # [2] LEFT RIGHT DOUBLE ARROW WITH STROKE..RIGHTWARDS DOUBLE ARROW WITH STROKE +21D2 ; Sm # RIGHTWARDS DOUBLE ARROW +21D4 ; Sm # LEFT RIGHT DOUBLE ARROW +21F4..22FF ; Sm # [268] RIGHT ARROW WITH SMALL CIRCLE..Z NOTATION BAG MEMBERSHIP +2320..2321 ; Sm # [2] TOP HALF INTEGRAL..BOTTOM HALF INTEGRAL +237C ; Sm # RIGHT ANGLE WITH DOWNWARDS ZIGZAG ARROW +239B..23B3 ; Sm # [25] LEFT PARENTHESIS UPPER HOOK..SUMMATION BOTTOM +23DC..23E1 ; Sm # [6] TOP PARENTHESIS..BOTTOM TORTOISE SHELL BRACKET +25B7 ; Sm # WHITE RIGHT-POINTING TRIANGLE +25C1 ; Sm # WHITE LEFT-POINTING TRIANGLE +25F8..25FF ; Sm # [8] UPPER LEFT TRIANGLE..LOWER RIGHT TRIANGLE +266F ; Sm # MUSIC SHARP SIGN +27C0..27C4 ; Sm # [5] THREE DIMENSIONAL ANGLE..OPEN SUPERSET +27C7..27E5 ; Sm # [31] OR WITH DOT INSIDE..WHITE SQUARE WITH RIGHTWARDS TICK +27F0..27FF ; Sm # [16] UPWARDS QUADRUPLE ARROW..LONG RIGHTWARDS SQUIGGLE ARROW +2900..2982 ; Sm # [131] RIGHTWARDS TWO-HEADED ARROW WITH VERTICAL STROKE..Z NOTATION TYPE COLON +2999..29D7 ; Sm # [63] DOTTED FENCE..BLACK HOURGLASS +29DC..29FB ; Sm # [32] INCOMPLETE INFINITY..TRIPLE PLUS +29FE..2AFF ; Sm # [258] TINY..N-ARY WHITE VERTICAL BAR +2B30..2B44 ; Sm # [21] LEFT ARROW WITH SMALL CIRCLE..RIGHTWARDS ARROW THROUGH SUPERSET +2B47..2B4C ; Sm # [6] REVERSE TILDE OPERATOR ABOVE RIGHTWARDS ARROW..RIGHTWARDS ARROW ABOVE REVERSE TILDE OPERATOR +FB29 ; Sm # HEBREW LETTER ALTERNATIVE PLUS SIGN +FE62 ; Sm # SMALL PLUS SIGN +FE64..FE66 ; Sm # [3] SMALL LESS-THAN SIGN..SMALL EQUALS SIGN +FF0B ; Sm # FULLWIDTH PLUS SIGN +FF1C..FF1E ; Sm # [3] FULLWIDTH LESS-THAN SIGN..FULLWIDTH GREATER-THAN SIGN +FF5C ; Sm # FULLWIDTH VERTICAL LINE +FF5E ; Sm # FULLWIDTH TILDE +FFE2 ; Sm # FULLWIDTH NOT SIGN +FFE9..FFEC ; Sm # [4] HALFWIDTH LEFTWARDS ARROW..HALFWIDTH DOWNWARDS ARROW +1D6C1 ; Sm # MATHEMATICAL BOLD NABLA +1D6DB ; Sm # MATHEMATICAL BOLD PARTIAL DIFFERENTIAL +1D6FB ; Sm # MATHEMATICAL ITALIC NABLA +1D715 ; Sm # MATHEMATICAL ITALIC PARTIAL DIFFERENTIAL +1D735 ; Sm # MATHEMATICAL BOLD ITALIC NABLA +1D74F ; Sm # MATHEMATICAL BOLD ITALIC PARTIAL DIFFERENTIAL +1D76F ; Sm # MATHEMATICAL SANS-SERIF BOLD NABLA +1D789 ; Sm # MATHEMATICAL SANS-SERIF BOLD PARTIAL DIFFERENTIAL +1D7A9 ; Sm # MATHEMATICAL SANS-SERIF BOLD ITALIC NABLA +1D7C3 ; Sm # MATHEMATICAL SANS-SERIF BOLD ITALIC PARTIAL DIFFERENTIAL +1EEF0..1EEF1 ; Sm # [2] ARABIC MATHEMATICAL OPERATOR MEEM WITH HAH WITH TATWEEL..ARABIC MATHEMATICAL OPERATOR HAH WITH DAL + +# Total code points: 948 + +# ================================================ + +# General_Category=Currency_Symbol + +0024 ; Sc # DOLLAR SIGN +00A2..00A5 ; Sc # [4] CENT SIGN..YEN SIGN +058F ; Sc # ARMENIAN DRAM SIGN +060B ; Sc # AFGHANI SIGN +07FE..07FF ; Sc # [2] NKO DOROME SIGN..NKO TAMAN SIGN +09F2..09F3 ; Sc # [2] BENGALI RUPEE MARK..BENGALI RUPEE SIGN +09FB ; Sc # BENGALI GANDA MARK +0AF1 ; Sc # GUJARATI RUPEE SIGN +0BF9 ; Sc # TAMIL RUPEE SIGN +0E3F ; Sc # THAI CURRENCY SYMBOL BAHT +17DB ; Sc # KHMER CURRENCY SYMBOL RIEL +20A0..20BF ; Sc # [32] EURO-CURRENCY SIGN..BITCOIN SIGN +A838 ; Sc # NORTH INDIC RUPEE MARK +FDFC ; Sc # RIAL SIGN +FE69 ; Sc # SMALL DOLLAR SIGN +FF04 ; Sc # FULLWIDTH DOLLAR SIGN +FFE0..FFE1 ; Sc # [2] FULLWIDTH CENT SIGN..FULLWIDTH POUND SIGN +FFE5..FFE6 ; Sc # [2] FULLWIDTH YEN SIGN..FULLWIDTH WON SIGN +11FDD..11FE0 ; Sc # [4] TAMIL SIGN KAACU..TAMIL SIGN VARAAKAN +1E2FF ; Sc # WANCHO NGUN SIGN +1ECB0 ; Sc # INDIC SIYAQ RUPEE MARK + +# Total code points: 62 + +# ================================================ + +# General_Category=Modifier_Symbol + +005E ; Sk # CIRCUMFLEX ACCENT +0060 ; Sk # GRAVE ACCENT +00A8 ; Sk # DIAERESIS +00AF ; Sk # MACRON +00B4 ; Sk # ACUTE ACCENT +00B8 ; Sk # CEDILLA +02C2..02C5 ; Sk # [4] MODIFIER LETTER LEFT ARROWHEAD..MODIFIER LETTER DOWN ARROWHEAD +02D2..02DF ; Sk # [14] MODIFIER LETTER CENTRED RIGHT HALF RING..MODIFIER LETTER CROSS ACCENT +02E5..02EB ; Sk # [7] MODIFIER LETTER EXTRA-HIGH TONE BAR..MODIFIER LETTER YANG DEPARTING TONE MARK +02ED ; Sk # MODIFIER LETTER UNASPIRATED +02EF..02FF ; Sk # [17] MODIFIER LETTER LOW DOWN ARROWHEAD..MODIFIER LETTER LOW LEFT ARROW +0375 ; Sk # GREEK LOWER NUMERAL SIGN +0384..0385 ; Sk # [2] GREEK TONOS..GREEK DIALYTIKA TONOS +1FBD ; Sk # GREEK KORONIS +1FBF..1FC1 ; Sk # [3] GREEK PSILI..GREEK DIALYTIKA AND PERISPOMENI +1FCD..1FCF ; Sk # [3] GREEK PSILI AND VARIA..GREEK PSILI AND PERISPOMENI +1FDD..1FDF ; Sk # [3] GREEK DASIA AND VARIA..GREEK DASIA AND PERISPOMENI +1FED..1FEF ; Sk # [3] GREEK DIALYTIKA AND VARIA..GREEK VARIA +1FFD..1FFE ; Sk # [2] GREEK OXIA..GREEK DASIA +309B..309C ; Sk # [2] KATAKANA-HIRAGANA VOICED SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK +A700..A716 ; Sk # [23] MODIFIER LETTER CHINESE TONE YIN PING..MODIFIER LETTER EXTRA-LOW LEFT-STEM TONE BAR +A720..A721 ; Sk # [2] MODIFIER LETTER STRESS AND HIGH TONE..MODIFIER LETTER STRESS AND LOW TONE +A789..A78A ; Sk # [2] MODIFIER LETTER COLON..MODIFIER LETTER SHORT EQUALS SIGN +AB5B ; Sk # MODIFIER BREVE WITH INVERTED BREVE +AB6A..AB6B ; Sk # [2] MODIFIER LETTER LEFT TACK..MODIFIER LETTER RIGHT TACK +FBB2..FBC1 ; Sk # [16] ARABIC SYMBOL DOT ABOVE..ARABIC SYMBOL SMALL TAH BELOW +FF3E ; Sk # FULLWIDTH CIRCUMFLEX ACCENT +FF40 ; Sk # FULLWIDTH GRAVE ACCENT +FFE3 ; Sk # FULLWIDTH MACRON +1F3FB..1F3FF ; Sk # [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6 + +# Total code points: 123 + +# ================================================ + +# General_Category=Other_Symbol + +00A6 ; So # BROKEN BAR +00A9 ; So # COPYRIGHT SIGN +00AE ; So # REGISTERED SIGN +00B0 ; So # DEGREE SIGN +0482 ; So # CYRILLIC THOUSANDS SIGN +058D..058E ; So # [2] RIGHT-FACING ARMENIAN ETERNITY SIGN..LEFT-FACING ARMENIAN ETERNITY SIGN +060E..060F ; So # [2] ARABIC POETIC VERSE SIGN..ARABIC SIGN MISRA +06DE ; So # ARABIC START OF RUB EL HIZB +06E9 ; So # ARABIC PLACE OF SAJDAH +06FD..06FE ; So # [2] ARABIC SIGN SINDHI AMPERSAND..ARABIC SIGN SINDHI POSTPOSITION MEN +07F6 ; So # NKO SYMBOL OO DENNEN +09FA ; So # BENGALI ISSHAR +0B70 ; So # ORIYA ISSHAR +0BF3..0BF8 ; So # [6] TAMIL DAY SIGN..TAMIL AS ABOVE SIGN +0BFA ; So # TAMIL NUMBER SIGN +0C7F ; So # TELUGU SIGN TUUMU +0D4F ; So # MALAYALAM SIGN PARA +0D79 ; So # MALAYALAM DATE MARK +0F01..0F03 ; So # [3] TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA +0F13 ; So # TIBETAN MARK CARET -DZUD RTAGS ME LONG CAN +0F15..0F17 ; So # [3] TIBETAN LOGOTYPE SIGN CHAD RTAGS..TIBETAN ASTROLOGICAL SIGN SGRA GCAN -CHAR RTAGS +0F1A..0F1F ; So # [6] TIBETAN SIGN RDEL DKAR GCIG..TIBETAN SIGN RDEL DKAR RDEL NAG +0F34 ; So # TIBETAN MARK BSDUS RTAGS +0F36 ; So # TIBETAN MARK CARET -DZUD RTAGS BZHI MIG CAN +0F38 ; So # TIBETAN MARK CHE MGO +0FBE..0FC5 ; So # [8] TIBETAN KU RU KHA..TIBETAN SYMBOL RDO RJE +0FC7..0FCC ; So # [6] TIBETAN SYMBOL RDO RJE RGYA GRAM..TIBETAN SYMBOL NOR BU BZHI -KHYIL +0FCE..0FCF ; So # [2] TIBETAN SIGN RDEL NAG RDEL DKAR..TIBETAN SIGN RDEL NAG GSUM +0FD5..0FD8 ; So # [4] RIGHT-FACING SVASTI SIGN..LEFT-FACING SVASTI SIGN WITH DOTS +109E..109F ; So # [2] MYANMAR SYMBOL SHAN ONE..MYANMAR SYMBOL SHAN EXCLAMATION +1390..1399 ; So # [10] ETHIOPIC TONAL MARK YIZET..ETHIOPIC TONAL MARK KURT +166D ; So # CANADIAN SYLLABICS CHI SIGN +1940 ; So # LIMBU SIGN LOO +19DE..19FF ; So # [34] NEW TAI LUE SIGN LAE..KHMER SYMBOL DAP-PRAM ROC +1B61..1B6A ; So # [10] BALINESE MUSICAL SYMBOL DONG..BALINESE MUSICAL SYMBOL DANG GEDE +1B74..1B7C ; So # [9] BALINESE MUSICAL SYMBOL RIGHT-HAND OPEN DUG..BALINESE MUSICAL SYMBOL LEFT-HAND OPEN PING +2100..2101 ; So # [2] ACCOUNT OF..ADDRESSED TO THE SUBJECT +2103..2106 ; So # [4] DEGREE CELSIUS..CADA UNA +2108..2109 ; So # [2] SCRUPLE..DEGREE FAHRENHEIT +2114 ; So # L B BAR SYMBOL +2116..2117 ; So # [2] NUMERO SIGN..SOUND RECORDING COPYRIGHT +211E..2123 ; So # [6] PRESCRIPTION TAKE..VERSICLE +2125 ; So # OUNCE SIGN +2127 ; So # INVERTED OHM SIGN +2129 ; So # TURNED GREEK SMALL LETTER IOTA +212E ; So # ESTIMATED SYMBOL +213A..213B ; So # [2] ROTATED CAPITAL Q..FACSIMILE SIGN +214A ; So # PROPERTY LINE +214C..214D ; So # [2] PER SIGN..AKTIESELSKAB +214F ; So # SYMBOL FOR SAMARITAN SOURCE +218A..218B ; So # [2] TURNED DIGIT TWO..TURNED DIGIT THREE +2195..2199 ; So # [5] UP DOWN ARROW..SOUTH WEST ARROW +219C..219F ; So # [4] LEFTWARDS WAVE ARROW..UPWARDS TWO HEADED ARROW +21A1..21A2 ; So # [2] DOWNWARDS TWO HEADED ARROW..LEFTWARDS ARROW WITH TAIL +21A4..21A5 ; So # [2] LEFTWARDS ARROW FROM BAR..UPWARDS ARROW FROM BAR +21A7..21AD ; So # [7] DOWNWARDS ARROW FROM BAR..LEFT RIGHT WAVE ARROW +21AF..21CD ; So # [31] DOWNWARDS ZIGZAG ARROW..LEFTWARDS DOUBLE ARROW WITH STROKE +21D0..21D1 ; So # [2] LEFTWARDS DOUBLE ARROW..UPWARDS DOUBLE ARROW +21D3 ; So # DOWNWARDS DOUBLE ARROW +21D5..21F3 ; So # [31] UP DOWN DOUBLE ARROW..UP DOWN WHITE ARROW +2300..2307 ; So # [8] DIAMETER SIGN..WAVY LINE +230C..231F ; So # [20] BOTTOM RIGHT CROP..BOTTOM RIGHT CORNER +2322..2328 ; So # [7] FROWN..KEYBOARD +232B..237B ; So # [81] ERASE TO THE LEFT..NOT CHECK MARK +237D..239A ; So # [30] SHOULDERED OPEN BOX..CLEAR SCREEN SYMBOL +23B4..23DB ; So # [40] TOP SQUARE BRACKET..FUSE +23E2..2426 ; So # [69] WHITE TRAPEZIUM..SYMBOL FOR SUBSTITUTE FORM TWO +2440..244A ; So # [11] OCR HOOK..OCR DOUBLE BACKSLASH +249C..24E9 ; So # [78] PARENTHESIZED LATIN SMALL LETTER A..CIRCLED LATIN SMALL LETTER Z +2500..25B6 ; So # [183] BOX DRAWINGS LIGHT HORIZONTAL..BLACK RIGHT-POINTING TRIANGLE +25B8..25C0 ; So # [9] BLACK RIGHT-POINTING SMALL TRIANGLE..BLACK LEFT-POINTING TRIANGLE +25C2..25F7 ; So # [54] BLACK LEFT-POINTING SMALL TRIANGLE..WHITE CIRCLE WITH UPPER RIGHT QUADRANT +2600..266E ; So # [111] BLACK SUN WITH RAYS..MUSIC NATURAL SIGN +2670..2767 ; So # [248] WEST SYRIAC CROSS..ROTATED FLORAL HEART BULLET +2794..27BF ; So # [44] HEAVY WIDE-HEADED RIGHTWARDS ARROW..DOUBLE CURLY LOOP +2800..28FF ; So # [256] BRAILLE PATTERN BLANK..BRAILLE PATTERN DOTS-12345678 +2B00..2B2F ; So # [48] NORTH EAST WHITE ARROW..WHITE VERTICAL ELLIPSE +2B45..2B46 ; So # [2] LEFTWARDS QUADRUPLE ARROW..RIGHTWARDS QUADRUPLE ARROW +2B4D..2B73 ; So # [39] DOWNWARDS TRIANGLE-HEADED ZIGZAG ARROW..DOWNWARDS TRIANGLE-HEADED ARROW TO BAR +2B76..2B95 ; So # [32] NORTH WEST TRIANGLE-HEADED ARROW TO BAR..RIGHTWARDS BLACK ARROW +2B97..2BFF ; So # [105] SYMBOL FOR TYPE A ELECTRONICS..HELLSCHREIBER PAUSE SYMBOL +2CE5..2CEA ; So # [6] COPTIC SYMBOL MI RO..COPTIC SYMBOL SHIMA SIMA +2E50..2E51 ; So # [2] CROSS PATTY WITH RIGHT CROSSBAR..CROSS PATTY WITH LEFT CROSSBAR +2E80..2E99 ; So # [26] CJK RADICAL REPEAT..CJK RADICAL RAP +2E9B..2EF3 ; So # [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE +2F00..2FD5 ; So # [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE +2FF0..2FFB ; So # [12] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID +3004 ; So # JAPANESE INDUSTRIAL STANDARD SYMBOL +3012..3013 ; So # [2] POSTAL MARK..GETA MARK +3020 ; So # POSTAL MARK FACE +3036..3037 ; So # [2] CIRCLED POSTAL MARK..IDEOGRAPHIC TELEGRAPH LINE FEED SEPARATOR SYMBOL +303E..303F ; So # [2] IDEOGRAPHIC VARIATION INDICATOR..IDEOGRAPHIC HALF FILL SPACE +3190..3191 ; So # [2] IDEOGRAPHIC ANNOTATION LINKING MARK..IDEOGRAPHIC ANNOTATION REVERSE MARK +3196..319F ; So # [10] IDEOGRAPHIC ANNOTATION TOP MARK..IDEOGRAPHIC ANNOTATION MAN MARK +31C0..31E3 ; So # [36] CJK STROKE T..CJK STROKE Q +3200..321E ; So # [31] PARENTHESIZED HANGUL KIYEOK..PARENTHESIZED KOREAN CHARACTER O HU +322A..3247 ; So # [30] PARENTHESIZED IDEOGRAPH MOON..CIRCLED IDEOGRAPH KOTO +3250 ; So # PARTNERSHIP SIGN +3260..327F ; So # [32] CIRCLED HANGUL KIYEOK..KOREAN STANDARD SYMBOL +328A..32B0 ; So # [39] CIRCLED IDEOGRAPH MOON..CIRCLED IDEOGRAPH NIGHT +32C0..33FF ; So # [320] IDEOGRAPHIC TELEGRAPH SYMBOL FOR JANUARY..SQUARE GAL +4DC0..4DFF ; So # [64] HEXAGRAM FOR THE CREATIVE HEAVEN..HEXAGRAM FOR BEFORE COMPLETION +A490..A4C6 ; So # [55] YI RADICAL QOT..YI RADICAL KE +A828..A82B ; So # [4] SYLOTI NAGRI POETRY MARK-1..SYLOTI NAGRI POETRY MARK-4 +A836..A837 ; So # [2] NORTH INDIC QUARTER MARK..NORTH INDIC PLACEHOLDER MARK +A839 ; So # NORTH INDIC QUANTITY MARK +AA77..AA79 ; So # [3] MYANMAR SYMBOL AITON EXCLAMATION..MYANMAR SYMBOL AITON TWO +FDFD ; So # ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM +FFE4 ; So # FULLWIDTH BROKEN BAR +FFE8 ; So # HALFWIDTH FORMS LIGHT VERTICAL +FFED..FFEE ; So # [2] HALFWIDTH BLACK SQUARE..HALFWIDTH WHITE CIRCLE +FFFC..FFFD ; So # [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARACTER +10137..1013F ; So # [9] AEGEAN WEIGHT BASE UNIT..AEGEAN MEASURE THIRD SUBUNIT +10179..10189 ; So # [17] GREEK YEAR SIGN..GREEK TRYBLION BASE SIGN +1018C..1018E ; So # [3] GREEK SINUSOID SIGN..NOMISMA SIGN +10190..1019C ; So # [13] ROMAN SEXTANS SIGN..ASCIA SYMBOL +101A0 ; So # GREEK SYMBOL TAU RHO +101D0..101FC ; So # [45] PHAISTOS DISC SIGN PEDESTRIAN..PHAISTOS DISC SIGN WAVY BAND +10877..10878 ; So # [2] PALMYRENE LEFT-POINTING FLEURON..PALMYRENE RIGHT-POINTING FLEURON +10AC8 ; So # MANICHAEAN SIGN UD +1173F ; So # AHOM SYMBOL VI +11FD5..11FDC ; So # [8] TAMIL SIGN NEL..TAMIL SIGN MUKKURUNI +11FE1..11FF1 ; So # [17] TAMIL SIGN PAARAM..TAMIL SIGN VAKAIYARAA +16B3C..16B3F ; So # [4] PAHAWH HMONG SIGN XYEEM NTXIV..PAHAWH HMONG SIGN XYEEM FAIB +16B45 ; So # PAHAWH HMONG SIGN CIM TSOV ROG +1BC9C ; So # DUPLOYAN SIGN O WITH CROSS +1D000..1D0F5 ; So # [246] BYZANTINE MUSICAL SYMBOL PSILI..BYZANTINE MUSICAL SYMBOL GORGON NEO KATO +1D100..1D126 ; So # [39] MUSICAL SYMBOL SINGLE BARLINE..MUSICAL SYMBOL DRUM CLEF-2 +1D129..1D164 ; So # [60] MUSICAL SYMBOL MULTIPLE MEASURE REST..MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE +1D16A..1D16C ; So # [3] MUSICAL SYMBOL FINGERED TREMOLO-1..MUSICAL SYMBOL FINGERED TREMOLO-3 +1D183..1D184 ; So # [2] MUSICAL SYMBOL ARPEGGIATO UP..MUSICAL SYMBOL ARPEGGIATO DOWN +1D18C..1D1A9 ; So # [30] MUSICAL SYMBOL RINFORZANDO..MUSICAL SYMBOL DEGREE SLASH +1D1AE..1D1E8 ; So # [59] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL KIEVAN FLAT SIGN +1D200..1D241 ; So # [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54 +1D245 ; So # GREEK MUSICAL LEIMMA +1D300..1D356 ; So # [87] MONOGRAM FOR EARTH..TETRAGRAM FOR FOSTERING +1D800..1D9FF ; So # [512] SIGNWRITING HAND-FIST INDEX..SIGNWRITING HEAD +1DA37..1DA3A ; So # [4] SIGNWRITING AIR BLOW SMALL ROTATIONS..SIGNWRITING BREATH EXHALE +1DA6D..1DA74 ; So # [8] SIGNWRITING SHOULDER HIP SPINE..SIGNWRITING TORSO-FLOORPLANE TWISTING +1DA76..1DA83 ; So # [14] SIGNWRITING LIMB COMBINATION..SIGNWRITING LOCATION DEPTH +1DA85..1DA86 ; So # [2] SIGNWRITING LOCATION TORSO..SIGNWRITING LOCATION LIMBS DIGITS +1E14F ; So # NYIAKENG PUACHUE HMONG CIRCLED CA +1ECAC ; So # INDIC SIYAQ PLACEHOLDER +1ED2E ; So # OTTOMAN SIYAQ MARRATAN +1F000..1F02B ; So # [44] MAHJONG TILE EAST WIND..MAHJONG TILE BACK +1F030..1F093 ; So # [100] DOMINO TILE HORIZONTAL BACK..DOMINO TILE VERTICAL-06-06 +1F0A0..1F0AE ; So # [15] PLAYING CARD BACK..PLAYING CARD KING OF SPADES +1F0B1..1F0BF ; So # [15] PLAYING CARD ACE OF HEARTS..PLAYING CARD RED JOKER +1F0C1..1F0CF ; So # [15] PLAYING CARD ACE OF DIAMONDS..PLAYING CARD BLACK JOKER +1F0D1..1F0F5 ; So # [37] PLAYING CARD ACE OF CLUBS..PLAYING CARD TRUMP-21 +1F10D..1F1AD ; So # [161] CIRCLED ZERO WITH SLASH..MASK WORK SYMBOL +1F1E6..1F202 ; So # [29] REGIONAL INDICATOR SYMBOL LETTER A..SQUARED KATAKANA SA +1F210..1F23B ; So # [44] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-914D +1F240..1F248 ; So # [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557 +1F250..1F251 ; So # [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT +1F260..1F265 ; So # [6] ROUNDED SYMBOL FOR FU..ROUNDED SYMBOL FOR CAI +1F300..1F3FA ; So # [251] CYCLONE..AMPHORA +1F400..1F6D7 ; So # [728] RAT..ELEVATOR +1F6E0..1F6EC ; So # [13] HAMMER AND WRENCH..AIRPLANE ARRIVING +1F6F0..1F6FC ; So # [13] SATELLITE..ROLLER SKATE +1F700..1F773 ; So # [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE +1F780..1F7D8 ; So # [89] BLACK LEFT-POINTING ISOSCELES RIGHT TRIANGLE..NEGATIVE CIRCLED SQUARE +1F7E0..1F7EB ; So # [12] LARGE ORANGE CIRCLE..LARGE BROWN SQUARE +1F800..1F80B ; So # [12] LEFTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD..DOWNWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD +1F810..1F847 ; So # [56] LEFTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD..DOWNWARDS HEAVY ARROW +1F850..1F859 ; So # [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW +1F860..1F887 ; So # [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW +1F890..1F8AD ; So # [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS +1F8B0..1F8B1 ; So # [2] ARROW POINTING UPWARDS THEN NORTH WEST..ARROW POINTING RIGHTWARDS THEN CURVING SOUTH WEST +1F900..1F978 ; So # [121] CIRCLED CROSS FORMEE WITH FOUR DOTS..DISGUISED FACE +1F97A..1F9CB ; So # [82] FACE WITH PLEADING EYES..BUBBLE TEA +1F9CD..1FA53 ; So # [135] STANDING PERSON..BLACK CHESS KNIGHT-BISHOP +1FA60..1FA6D ; So # [14] XIANGQI RED GENERAL..XIANGQI BLACK SOLDIER +1FA70..1FA74 ; So # [5] BALLET SHOES..THONG SANDAL +1FA78..1FA7A ; So # [3] DROP OF BLOOD..STETHOSCOPE +1FA80..1FA86 ; So # [7] YO-YO..NESTING DOLLS +1FA90..1FAA8 ; So # [25] RINGED PLANET..ROCK +1FAB0..1FAB6 ; So # [7] FLY..FEATHER +1FAC0..1FAC2 ; So # [3] ANATOMICAL HEART..PEOPLE HUGGING +1FAD0..1FAD6 ; So # [7] BLUEBERRIES..TEAPOT +1FB00..1FB92 ; So # [147] BLOCK SEXTANT-1..UPPER HALF INVERSE MEDIUM SHADE AND LOWER HALF BLOCK +1FB94..1FBCA ; So # [55] LEFT HALF INVERSE MEDIUM SHADE AND RIGHT HALF BLOCK..WHITE UP-POINTING CHEVRON + +# Total code points: 6431 + +# ================================================ + +# General_Category=Initial_Punctuation + +00AB ; Pi # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK +2018 ; Pi # LEFT SINGLE QUOTATION MARK +201B..201C ; Pi # [2] SINGLE HIGH-REVERSED-9 QUOTATION MARK..LEFT DOUBLE QUOTATION MARK +201F ; Pi # DOUBLE HIGH-REVERSED-9 QUOTATION MARK +2039 ; Pi # SINGLE LEFT-POINTING ANGLE QUOTATION MARK +2E02 ; Pi # LEFT SUBSTITUTION BRACKET +2E04 ; Pi # LEFT DOTTED SUBSTITUTION BRACKET +2E09 ; Pi # LEFT TRANSPOSITION BRACKET +2E0C ; Pi # LEFT RAISED OMISSION BRACKET +2E1C ; Pi # LEFT LOW PARAPHRASE BRACKET +2E20 ; Pi # LEFT VERTICAL BAR WITH QUILL + +# Total code points: 12 + +# ================================================ + +# General_Category=Final_Punctuation + +00BB ; Pf # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK +2019 ; Pf # RIGHT SINGLE QUOTATION MARK +201D ; Pf # RIGHT DOUBLE QUOTATION MARK +203A ; Pf # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK +2E03 ; Pf # RIGHT SUBSTITUTION BRACKET +2E05 ; Pf # RIGHT DOTTED SUBSTITUTION BRACKET +2E0A ; Pf # RIGHT TRANSPOSITION BRACKET +2E0D ; Pf # RIGHT RAISED OMISSION BRACKET +2E1D ; Pf # RIGHT LOW PARAPHRASE BRACKET +2E21 ; Pf # RIGHT VERTICAL BAR WITH QUILL + +# Total code points: 10 + +# EOF diff --git a/tdemarkdown/md4c/src/CMakeLists.txt b/tdemarkdown/md4c/src/CMakeLists.txt new file mode 100644 index 000000000..66f2f5013 --- /dev/null +++ b/tdemarkdown/md4c/src/CMakeLists.txt @@ -0,0 +1,56 @@ + +set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS 1) +set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUG") + + +# Build rules for MD4C parser library + +configure_file(md4c.pc.in md4c.pc @ONLY) +add_library(md4c md4c.c md4c.h) +if(CMAKE_C_COMPILER_ID MATCHES "Clang|GNU") + target_compile_options(md4c PRIVATE -Wall -Wextra) +endif() +set_target_properties(md4c PROPERTIES + COMPILE_FLAGS "-DMD4C_USE_UTF8" + VERSION ${MD_VERSION} + SOVERSION ${MD_VERSION_MAJOR} + PUBLIC_HEADER md4c.h +) + +# Build rules for HTML renderer library + +configure_file(md4c-html.pc.in md4c-html.pc @ONLY) +add_library(md4c-html md4c-html.c md4c-html.h entity.c entity.h) +set_target_properties(md4c-html PROPERTIES + VERSION ${MD_VERSION} + SOVERSION ${MD_VERSION_MAJOR} + PUBLIC_HEADER md4c-html.h +) +target_link_libraries(md4c-html md4c) + + +# Install rules + +install( + TARGETS md4c + EXPORT md4cConfig + ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} + LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} + RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} + PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} + INCLUDES DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} +) +install(FILES ${CMAKE_BINARY_DIR}/src/md4c.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig) + +install( + TARGETS md4c-html + EXPORT md4cConfig + ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} + LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} + RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} + PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} +) +install(FILES ${CMAKE_BINARY_DIR}/src/md4c-html.pc DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig) + +install(EXPORT md4cConfig DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/md4c/ NAMESPACE md4c::) + diff --git a/tdemarkdown/md4c/src/entity.c b/tdemarkdown/md4c/src/entity.c new file mode 100644 index 000000000..9991ca17a --- /dev/null +++ b/tdemarkdown/md4c/src/entity.c @@ -0,0 +1,2190 @@ +/* + * MD4C: Markdown parser for C + * (http://github.com/mity/md4c) + * + * Copyright (c) 2016-2017 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "entity.h" +#include <string.h> + + +/* The table is generated from https://html.spec.whatwg.org/entities.json */ +static const struct entity entity_table[] = { + { "Æ", { 198, 0 } }, + { "&", { 38, 0 } }, + { "Á", { 193, 0 } }, + { "Ă", { 258, 0 } }, + { "Â", { 194, 0 } }, + { "А", { 1040, 0 } }, + { "𝔄", { 120068, 0 } }, + { "À", { 192, 0 } }, + { "Α", { 913, 0 } }, + { "Ā", { 256, 0 } }, + { "⩓", { 10835, 0 } }, + { "Ą", { 260, 0 } }, + { "𝔸", { 120120, 0 } }, + { "⁡", { 8289, 0 } }, + { "Å", { 197, 0 } }, + { "𝒜", { 119964, 0 } }, + { "≔", { 8788, 0 } }, + { "Ã", { 195, 0 } }, + { "Ä", { 196, 0 } }, + { "∖", { 8726, 0 } }, + { "⫧", { 10983, 0 } }, + { "⌆", { 8966, 0 } }, + { "Б", { 1041, 0 } }, + { "∵", { 8757, 0 } }, + { "ℬ", { 8492, 0 } }, + { "Β", { 914, 0 } }, + { "𝔅", { 120069, 0 } }, + { "𝔹", { 120121, 0 } }, + { "˘", { 728, 0 } }, + { "ℬ", { 8492, 0 } }, + { "≎", { 8782, 0 } }, + { "Ч", { 1063, 0 } }, + { "©", { 169, 0 } }, + { "Ć", { 262, 0 } }, + { "⋒", { 8914, 0 } }, + { "ⅅ", { 8517, 0 } }, + { "ℭ", { 8493, 0 } }, + { "Č", { 268, 0 } }, + { "Ç", { 199, 0 } }, + { "Ĉ", { 264, 0 } }, + { "∰", { 8752, 0 } }, + { "Ċ", { 266, 0 } }, + { "¸", { 184, 0 } }, + { "·", { 183, 0 } }, + { "ℭ", { 8493, 0 } }, + { "Χ", { 935, 0 } }, + { "⊙", { 8857, 0 } }, + { "⊖", { 8854, 0 } }, + { "⊕", { 8853, 0 } }, + { "⊗", { 8855, 0 } }, + { "∲", { 8754, 0 } }, + { "”", { 8221, 0 } }, + { "’", { 8217, 0 } }, + { "∷", { 8759, 0 } }, + { "⩴", { 10868, 0 } }, + { "≡", { 8801, 0 } }, + { "∯", { 8751, 0 } }, + { "∮", { 8750, 0 } }, + { "ℂ", { 8450, 0 } }, + { "∐", { 8720, 0 } }, + { "∳", { 8755, 0 } }, + { "⨯", { 10799, 0 } }, + { "𝒞", { 119966, 0 } }, + { "⋓", { 8915, 0 } }, + { "≍", { 8781, 0 } }, + { "ⅅ", { 8517, 0 } }, + { "⤑", { 10513, 0 } }, + { "Ђ", { 1026, 0 } }, + { "Ѕ", { 1029, 0 } }, + { "Џ", { 1039, 0 } }, + { "‡", { 8225, 0 } }, + { "↡", { 8609, 0 } }, + { "⫤", { 10980, 0 } }, + { "Ď", { 270, 0 } }, + { "Д", { 1044, 0 } }, + { "∇", { 8711, 0 } }, + { "Δ", { 916, 0 } }, + { "𝔇", { 120071, 0 } }, + { "´", { 180, 0 } }, + { "˙", { 729, 0 } }, + { "˝", { 733, 0 } }, + { "`", { 96, 0 } }, + { "˜", { 732, 0 } }, + { "⋄", { 8900, 0 } }, + { "ⅆ", { 8518, 0 } }, + { "𝔻", { 120123, 0 } }, + { "¨", { 168, 0 } }, + { "⃜", { 8412, 0 } }, + { "≐", { 8784, 0 } }, + { "∯", { 8751, 0 } }, + { "¨", { 168, 0 } }, + { "⇓", { 8659, 0 } }, + { "⇐", { 8656, 0 } }, + { "⇔", { 8660, 0 } }, + { "⫤", { 10980, 0 } }, + { "⟸", { 10232, 0 } }, + { "⟺", { 10234, 0 } }, + { "⟹", { 10233, 0 } }, + { "⇒", { 8658, 0 } }, + { "⊨", { 8872, 0 } }, + { "⇑", { 8657, 0 } }, + { "⇕", { 8661, 0 } }, + { "∥", { 8741, 0 } }, + { "↓", { 8595, 0 } }, + { "⤓", { 10515, 0 } }, + { "⇵", { 8693, 0 } }, + { "̑", { 785, 0 } }, + { "⥐", { 10576, 0 } }, + { "⥞", { 10590, 0 } }, + { "↽", { 8637, 0 } }, + { "⥖", { 10582, 0 } }, + { "⥟", { 10591, 0 } }, + { "⇁", { 8641, 0 } }, + { "⥗", { 10583, 0 } }, + { "⊤", { 8868, 0 } }, + { "↧", { 8615, 0 } }, + { "⇓", { 8659, 0 } }, + { "𝒟", { 119967, 0 } }, + { "Đ", { 272, 0 } }, + { "Ŋ", { 330, 0 } }, + { "Ð", { 208, 0 } }, + { "É", { 201, 0 } }, + { "Ě", { 282, 0 } }, + { "Ê", { 202, 0 } }, + { "Э", { 1069, 0 } }, + { "Ė", { 278, 0 } }, + { "𝔈", { 120072, 0 } }, + { "È", { 200, 0 } }, + { "∈", { 8712, 0 } }, + { "Ē", { 274, 0 } }, + { "◻", { 9723, 0 } }, + { "▫", { 9643, 0 } }, + { "Ę", { 280, 0 } }, + { "𝔼", { 120124, 0 } }, + { "Ε", { 917, 0 } }, + { "⩵", { 10869, 0 } }, + { "≂", { 8770, 0 } }, + { "⇌", { 8652, 0 } }, + { "ℰ", { 8496, 0 } }, + { "⩳", { 10867, 0 } }, + { "Η", { 919, 0 } }, + { "Ë", { 203, 0 } }, + { "∃", { 8707, 0 } }, + { "ⅇ", { 8519, 0 } }, + { "Ф", { 1060, 0 } }, + { "𝔉", { 120073, 0 } }, + { "◼", { 9724, 0 } }, + { "▪", { 9642, 0 } }, + { "𝔽", { 120125, 0 } }, + { "∀", { 8704, 0 } }, + { "ℱ", { 8497, 0 } }, + { "ℱ", { 8497, 0 } }, + { "Ѓ", { 1027, 0 } }, + { ">", { 62, 0 } }, + { "Γ", { 915, 0 } }, + { "Ϝ", { 988, 0 } }, + { "Ğ", { 286, 0 } }, + { "Ģ", { 290, 0 } }, + { "Ĝ", { 284, 0 } }, + { "Г", { 1043, 0 } }, + { "Ġ", { 288, 0 } }, + { "𝔊", { 120074, 0 } }, + { "⋙", { 8921, 0 } }, + { "𝔾", { 120126, 0 } }, + { "≥", { 8805, 0 } }, + { "⋛", { 8923, 0 } }, + { "≧", { 8807, 0 } }, + { "⪢", { 10914, 0 } }, + { "≷", { 8823, 0 } }, + { "⩾", { 10878, 0 } }, + { "≳", { 8819, 0 } }, + { "𝒢", { 119970, 0 } }, + { "≫", { 8811, 0 } }, + { "Ъ", { 1066, 0 } }, + { "ˇ", { 711, 0 } }, + { "^", { 94, 0 } }, + { "Ĥ", { 292, 0 } }, + { "ℌ", { 8460, 0 } }, + { "ℋ", { 8459, 0 } }, + { "ℍ", { 8461, 0 } }, + { "─", { 9472, 0 } }, + { "ℋ", { 8459, 0 } }, + { "Ħ", { 294, 0 } }, + { "≎", { 8782, 0 } }, + { "≏", { 8783, 0 } }, + { "Е", { 1045, 0 } }, + { "IJ", { 306, 0 } }, + { "Ё", { 1025, 0 } }, + { "Í", { 205, 0 } }, + { "Î", { 206, 0 } }, + { "И", { 1048, 0 } }, + { "İ", { 304, 0 } }, + { "ℑ", { 8465, 0 } }, + { "Ì", { 204, 0 } }, + { "ℑ", { 8465, 0 } }, + { "Ī", { 298, 0 } }, + { "ⅈ", { 8520, 0 } }, + { "⇒", { 8658, 0 } }, + { "∬", { 8748, 0 } }, + { "∫", { 8747, 0 } }, + { "⋂", { 8898, 0 } }, + { "⁣", { 8291, 0 } }, + { "⁢", { 8290, 0 } }, + { "Į", { 302, 0 } }, + { "𝕀", { 120128, 0 } }, + { "Ι", { 921, 0 } }, + { "ℐ", { 8464, 0 } }, + { "Ĩ", { 296, 0 } }, + { "І", { 1030, 0 } }, + { "Ï", { 207, 0 } }, + { "Ĵ", { 308, 0 } }, + { "Й", { 1049, 0 } }, + { "𝔍", { 120077, 0 } }, + { "𝕁", { 120129, 0 } }, + { "𝒥", { 119973, 0 } }, + { "Ј", { 1032, 0 } }, + { "Є", { 1028, 0 } }, + { "Х", { 1061, 0 } }, + { "Ќ", { 1036, 0 } }, + { "Κ", { 922, 0 } }, + { "Ķ", { 310, 0 } }, + { "К", { 1050, 0 } }, + { "𝔎", { 120078, 0 } }, + { "𝕂", { 120130, 0 } }, + { "𝒦", { 119974, 0 } }, + { "Љ", { 1033, 0 } }, + { "<", { 60, 0 } }, + { "Ĺ", { 313, 0 } }, + { "Λ", { 923, 0 } }, + { "⟪", { 10218, 0 } }, + { "ℒ", { 8466, 0 } }, + { "↞", { 8606, 0 } }, + { "Ľ", { 317, 0 } }, + { "Ļ", { 315, 0 } }, + { "Л", { 1051, 0 } }, + { "⟨", { 10216, 0 } }, + { "←", { 8592, 0 } }, + { "⇤", { 8676, 0 } }, + { "⇆", { 8646, 0 } }, + { "⌈", { 8968, 0 } }, + { "⟦", { 10214, 0 } }, + { "⥡", { 10593, 0 } }, + { "⇃", { 8643, 0 } }, + { "⥙", { 10585, 0 } }, + { "⌊", { 8970, 0 } }, + { "↔", { 8596, 0 } }, + { "⥎", { 10574, 0 } }, + { "⊣", { 8867, 0 } }, + { "↤", { 8612, 0 } }, + { "⥚", { 10586, 0 } }, + { "⊲", { 8882, 0 } }, + { "⧏", { 10703, 0 } }, + { "⊴", { 8884, 0 } }, + { "⥑", { 10577, 0 } }, + { "⥠", { 10592, 0 } }, + { "↿", { 8639, 0 } }, + { "⥘", { 10584, 0 } }, + { "↼", { 8636, 0 } }, + { "⥒", { 10578, 0 } }, + { "⇐", { 8656, 0 } }, + { "⇔", { 8660, 0 } }, + { "⋚", { 8922, 0 } }, + { "≦", { 8806, 0 } }, + { "≶", { 8822, 0 } }, + { "⪡", { 10913, 0 } }, + { "⩽", { 10877, 0 } }, + { "≲", { 8818, 0 } }, + { "𝔏", { 120079, 0 } }, + { "⋘", { 8920, 0 } }, + { "⇚", { 8666, 0 } }, + { "Ŀ", { 319, 0 } }, + { "⟵", { 10229, 0 } }, + { "⟷", { 10231, 0 } }, + { "⟶", { 10230, 0 } }, + { "⟸", { 10232, 0 } }, + { "⟺", { 10234, 0 } }, + { "⟹", { 10233, 0 } }, + { "𝕃", { 120131, 0 } }, + { "↙", { 8601, 0 } }, + { "↘", { 8600, 0 } }, + { "ℒ", { 8466, 0 } }, + { "↰", { 8624, 0 } }, + { "Ł", { 321, 0 } }, + { "≪", { 8810, 0 } }, + { "⤅", { 10501, 0 } }, + { "М", { 1052, 0 } }, + { " ", { 8287, 0 } }, + { "ℳ", { 8499, 0 } }, + { "𝔐", { 120080, 0 } }, + { "∓", { 8723, 0 } }, + { "𝕄", { 120132, 0 } }, + { "ℳ", { 8499, 0 } }, + { "Μ", { 924, 0 } }, + { "Њ", { 1034, 0 } }, + { "Ń", { 323, 0 } }, + { "Ň", { 327, 0 } }, + { "Ņ", { 325, 0 } }, + { "Н", { 1053, 0 } }, + { "​", { 8203, 0 } }, + { "​", { 8203, 0 } }, + { "​", { 8203, 0 } }, + { "​", { 8203, 0 } }, + { "≫", { 8811, 0 } }, + { "≪", { 8810, 0 } }, + { "
", { 10, 0 } }, + { "𝔑", { 120081, 0 } }, + { "⁠", { 8288, 0 } }, + { " ", { 160, 0 } }, + { "ℕ", { 8469, 0 } }, + { "⫬", { 10988, 0 } }, + { "≢", { 8802, 0 } }, + { "≭", { 8813, 0 } }, + { "∦", { 8742, 0 } }, + { "∉", { 8713, 0 } }, + { "≠", { 8800, 0 } }, + { "≂̸", { 8770, 824 } }, + { "∄", { 8708, 0 } }, + { "≯", { 8815, 0 } }, + { "≱", { 8817, 0 } }, + { "≧̸", { 8807, 824 } }, + { "≫̸", { 8811, 824 } }, + { "≹", { 8825, 0 } }, + { "⩾̸", { 10878, 824 } }, + { "≵", { 8821, 0 } }, + { "≎̸", { 8782, 824 } }, + { "≏̸", { 8783, 824 } }, + { "⋪", { 8938, 0 } }, + { "⧏̸", { 10703, 824 } }, + { "⋬", { 8940, 0 } }, + { "≮", { 8814, 0 } }, + { "≰", { 8816, 0 } }, + { "≸", { 8824, 0 } }, + { "≪̸", { 8810, 824 } }, + { "⩽̸", { 10877, 824 } }, + { "≴", { 8820, 0 } }, + { "⪢̸", { 10914, 824 } }, + { "⪡̸", { 10913, 824 } }, + { "⊀", { 8832, 0 } }, + { "⪯̸", { 10927, 824 } }, + { "⋠", { 8928, 0 } }, + { "∌", { 8716, 0 } }, + { "⋫", { 8939, 0 } }, + { "⧐̸", { 10704, 824 } }, + { "⋭", { 8941, 0 } }, + { "⊏̸", { 8847, 824 } }, + { "⋢", { 8930, 0 } }, + { "⊐̸", { 8848, 824 } }, + { "⋣", { 8931, 0 } }, + { "⊂⃒", { 8834, 8402 } }, + { "⊈", { 8840, 0 } }, + { "⊁", { 8833, 0 } }, + { "⪰̸", { 10928, 824 } }, + { "⋡", { 8929, 0 } }, + { "≿̸", { 8831, 824 } }, + { "⊃⃒", { 8835, 8402 } }, + { "⊉", { 8841, 0 } }, + { "≁", { 8769, 0 } }, + { "≄", { 8772, 0 } }, + { "≇", { 8775, 0 } }, + { "≉", { 8777, 0 } }, + { "∤", { 8740, 0 } }, + { "𝒩", { 119977, 0 } }, + { "Ñ", { 209, 0 } }, + { "Ν", { 925, 0 } }, + { "Œ", { 338, 0 } }, + { "Ó", { 211, 0 } }, + { "Ô", { 212, 0 } }, + { "О", { 1054, 0 } }, + { "Ő", { 336, 0 } }, + { "𝔒", { 120082, 0 } }, + { "Ò", { 210, 0 } }, + { "Ō", { 332, 0 } }, + { "Ω", { 937, 0 } }, + { "Ο", { 927, 0 } }, + { "𝕆", { 120134, 0 } }, + { "“", { 8220, 0 } }, + { "‘", { 8216, 0 } }, + { "⩔", { 10836, 0 } }, + { "𝒪", { 119978, 0 } }, + { "Ø", { 216, 0 } }, + { "Õ", { 213, 0 } }, + { "⨷", { 10807, 0 } }, + { "Ö", { 214, 0 } }, + { "‾", { 8254, 0 } }, + { "⏞", { 9182, 0 } }, + { "⎴", { 9140, 0 } }, + { "⏜", { 9180, 0 } }, + { "∂", { 8706, 0 } }, + { "П", { 1055, 0 } }, + { "𝔓", { 120083, 0 } }, + { "Φ", { 934, 0 } }, + { "Π", { 928, 0 } }, + { "±", { 177, 0 } }, + { "ℌ", { 8460, 0 } }, + { "ℙ", { 8473, 0 } }, + { "⪻", { 10939, 0 } }, + { "≺", { 8826, 0 } }, + { "⪯", { 10927, 0 } }, + { "≼", { 8828, 0 } }, + { "≾", { 8830, 0 } }, + { "″", { 8243, 0 } }, + { "∏", { 8719, 0 } }, + { "∷", { 8759, 0 } }, + { "∝", { 8733, 0 } }, + { "𝒫", { 119979, 0 } }, + { "Ψ", { 936, 0 } }, + { """, { 34, 0 } }, + { "𝔔", { 120084, 0 } }, + { "ℚ", { 8474, 0 } }, + { "𝒬", { 119980, 0 } }, + { "⤐", { 10512, 0 } }, + { "®", { 174, 0 } }, + { "Ŕ", { 340, 0 } }, + { "⟫", { 10219, 0 } }, + { "↠", { 8608, 0 } }, + { "⤖", { 10518, 0 } }, + { "Ř", { 344, 0 } }, + { "Ŗ", { 342, 0 } }, + { "Р", { 1056, 0 } }, + { "ℜ", { 8476, 0 } }, + { "∋", { 8715, 0 } }, + { "⇋", { 8651, 0 } }, + { "⥯", { 10607, 0 } }, + { "ℜ", { 8476, 0 } }, + { "Ρ", { 929, 0 } }, + { "⟩", { 10217, 0 } }, + { "→", { 8594, 0 } }, + { "⇥", { 8677, 0 } }, + { "⇄", { 8644, 0 } }, + { "⌉", { 8969, 0 } }, + { "⟧", { 10215, 0 } }, + { "⥝", { 10589, 0 } }, + { "⇂", { 8642, 0 } }, + { "⥕", { 10581, 0 } }, + { "⌋", { 8971, 0 } }, + { "⊢", { 8866, 0 } }, + { "↦", { 8614, 0 } }, + { "⥛", { 10587, 0 } }, + { "⊳", { 8883, 0 } }, + { "⧐", { 10704, 0 } }, + { "⊵", { 8885, 0 } }, + { "⥏", { 10575, 0 } }, + { "⥜", { 10588, 0 } }, + { "↾", { 8638, 0 } }, + { "⥔", { 10580, 0 } }, + { "⇀", { 8640, 0 } }, + { "⥓", { 10579, 0 } }, + { "⇒", { 8658, 0 } }, + { "ℝ", { 8477, 0 } }, + { "⥰", { 10608, 0 } }, + { "⇛", { 8667, 0 } }, + { "ℛ", { 8475, 0 } }, + { "↱", { 8625, 0 } }, + { "⧴", { 10740, 0 } }, + { "Щ", { 1065, 0 } }, + { "Ш", { 1064, 0 } }, + { "Ь", { 1068, 0 } }, + { "Ś", { 346, 0 } }, + { "⪼", { 10940, 0 } }, + { "Š", { 352, 0 } }, + { "Ş", { 350, 0 } }, + { "Ŝ", { 348, 0 } }, + { "С", { 1057, 0 } }, + { "𝔖", { 120086, 0 } }, + { "↓", { 8595, 0 } }, + { "←", { 8592, 0 } }, + { "→", { 8594, 0 } }, + { "↑", { 8593, 0 } }, + { "Σ", { 931, 0 } }, + { "∘", { 8728, 0 } }, + { "𝕊", { 120138, 0 } }, + { "√", { 8730, 0 } }, + { "□", { 9633, 0 } }, + { "⊓", { 8851, 0 } }, + { "⊏", { 8847, 0 } }, + { "⊑", { 8849, 0 } }, + { "⊐", { 8848, 0 } }, + { "⊒", { 8850, 0 } }, + { "⊔", { 8852, 0 } }, + { "𝒮", { 119982, 0 } }, + { "⋆", { 8902, 0 } }, + { "⋐", { 8912, 0 } }, + { "⋐", { 8912, 0 } }, + { "⊆", { 8838, 0 } }, + { "≻", { 8827, 0 } }, + { "⪰", { 10928, 0 } }, + { "≽", { 8829, 0 } }, + { "≿", { 8831, 0 } }, + { "∋", { 8715, 0 } }, + { "∑", { 8721, 0 } }, + { "⋑", { 8913, 0 } }, + { "⊃", { 8835, 0 } }, + { "⊇", { 8839, 0 } }, + { "⋑", { 8913, 0 } }, + { "Þ", { 222, 0 } }, + { "™", { 8482, 0 } }, + { "Ћ", { 1035, 0 } }, + { "Ц", { 1062, 0 } }, + { "	", { 9, 0 } }, + { "Τ", { 932, 0 } }, + { "Ť", { 356, 0 } }, + { "Ţ", { 354, 0 } }, + { "Т", { 1058, 0 } }, + { "𝔗", { 120087, 0 } }, + { "∴", { 8756, 0 } }, + { "Θ", { 920, 0 } }, + { "  ", { 8287, 8202 } }, + { " ", { 8201, 0 } }, + { "∼", { 8764, 0 } }, + { "≃", { 8771, 0 } }, + { "≅", { 8773, 0 } }, + { "≈", { 8776, 0 } }, + { "𝕋", { 120139, 0 } }, + { "⃛", { 8411, 0 } }, + { "𝒯", { 119983, 0 } }, + { "Ŧ", { 358, 0 } }, + { "Ú", { 218, 0 } }, + { "↟", { 8607, 0 } }, + { "⥉", { 10569, 0 } }, + { "Ў", { 1038, 0 } }, + { "Ŭ", { 364, 0 } }, + { "Û", { 219, 0 } }, + { "У", { 1059, 0 } }, + { "Ű", { 368, 0 } }, + { "𝔘", { 120088, 0 } }, + { "Ù", { 217, 0 } }, + { "Ū", { 362, 0 } }, + { "_", { 95, 0 } }, + { "⏟", { 9183, 0 } }, + { "⎵", { 9141, 0 } }, + { "⏝", { 9181, 0 } }, + { "⋃", { 8899, 0 } }, + { "⊎", { 8846, 0 } }, + { "Ų", { 370, 0 } }, + { "𝕌", { 120140, 0 } }, + { "↑", { 8593, 0 } }, + { "⤒", { 10514, 0 } }, + { "⇅", { 8645, 0 } }, + { "↕", { 8597, 0 } }, + { "⥮", { 10606, 0 } }, + { "⊥", { 8869, 0 } }, + { "↥", { 8613, 0 } }, + { "⇑", { 8657, 0 } }, + { "⇕", { 8661, 0 } }, + { "↖", { 8598, 0 } }, + { "↗", { 8599, 0 } }, + { "ϒ", { 978, 0 } }, + { "Υ", { 933, 0 } }, + { "Ů", { 366, 0 } }, + { "𝒰", { 119984, 0 } }, + { "Ũ", { 360, 0 } }, + { "Ü", { 220, 0 } }, + { "⊫", { 8875, 0 } }, + { "⫫", { 10987, 0 } }, + { "В", { 1042, 0 } }, + { "⊩", { 8873, 0 } }, + { "⫦", { 10982, 0 } }, + { "⋁", { 8897, 0 } }, + { "‖", { 8214, 0 } }, + { "‖", { 8214, 0 } }, + { "∣", { 8739, 0 } }, + { "|", { 124, 0 } }, + { "❘", { 10072, 0 } }, + { "≀", { 8768, 0 } }, + { " ", { 8202, 0 } }, + { "𝔙", { 120089, 0 } }, + { "𝕍", { 120141, 0 } }, + { "𝒱", { 119985, 0 } }, + { "⊪", { 8874, 0 } }, + { "Ŵ", { 372, 0 } }, + { "⋀", { 8896, 0 } }, + { "𝔚", { 120090, 0 } }, + { "𝕎", { 120142, 0 } }, + { "𝒲", { 119986, 0 } }, + { "𝔛", { 120091, 0 } }, + { "Ξ", { 926, 0 } }, + { "𝕏", { 120143, 0 } }, + { "𝒳", { 119987, 0 } }, + { "Я", { 1071, 0 } }, + { "Ї", { 1031, 0 } }, + { "Ю", { 1070, 0 } }, + { "Ý", { 221, 0 } }, + { "Ŷ", { 374, 0 } }, + { "Ы", { 1067, 0 } }, + { "𝔜", { 120092, 0 } }, + { "𝕐", { 120144, 0 } }, + { "𝒴", { 119988, 0 } }, + { "Ÿ", { 376, 0 } }, + { "Ж", { 1046, 0 } }, + { "Ź", { 377, 0 } }, + { "Ž", { 381, 0 } }, + { "З", { 1047, 0 } }, + { "Ż", { 379, 0 } }, + { "​", { 8203, 0 } }, + { "Ζ", { 918, 0 } }, + { "ℨ", { 8488, 0 } }, + { "ℤ", { 8484, 0 } }, + { "𝒵", { 119989, 0 } }, + { "á", { 225, 0 } }, + { "ă", { 259, 0 } }, + { "∾", { 8766, 0 } }, + { "∾̳", { 8766, 819 } }, + { "∿", { 8767, 0 } }, + { "â", { 226, 0 } }, + { "´", { 180, 0 } }, + { "а", { 1072, 0 } }, + { "æ", { 230, 0 } }, + { "⁡", { 8289, 0 } }, + { "𝔞", { 120094, 0 } }, + { "à", { 224, 0 } }, + { "ℵ", { 8501, 0 } }, + { "ℵ", { 8501, 0 } }, + { "α", { 945, 0 } }, + { "ā", { 257, 0 } }, + { "⨿", { 10815, 0 } }, + { "&", { 38, 0 } }, + { "∧", { 8743, 0 } }, + { "⩕", { 10837, 0 } }, + { "⩜", { 10844, 0 } }, + { "⩘", { 10840, 0 } }, + { "⩚", { 10842, 0 } }, + { "∠", { 8736, 0 } }, + { "⦤", { 10660, 0 } }, + { "∠", { 8736, 0 } }, + { "∡", { 8737, 0 } }, + { "⦨", { 10664, 0 } }, + { "⦩", { 10665, 0 } }, + { "⦪", { 10666, 0 } }, + { "⦫", { 10667, 0 } }, + { "⦬", { 10668, 0 } }, + { "⦭", { 10669, 0 } }, + { "⦮", { 10670, 0 } }, + { "⦯", { 10671, 0 } }, + { "∟", { 8735, 0 } }, + { "⊾", { 8894, 0 } }, + { "⦝", { 10653, 0 } }, + { "∢", { 8738, 0 } }, + { "Å", { 197, 0 } }, + { "⍼", { 9084, 0 } }, + { "ą", { 261, 0 } }, + { "𝕒", { 120146, 0 } }, + { "≈", { 8776, 0 } }, + { "⩰", { 10864, 0 } }, + { "⩯", { 10863, 0 } }, + { "≊", { 8778, 0 } }, + { "≋", { 8779, 0 } }, + { "'", { 39, 0 } }, + { "≈", { 8776, 0 } }, + { "≊", { 8778, 0 } }, + { "å", { 229, 0 } }, + { "𝒶", { 119990, 0 } }, + { "*", { 42, 0 } }, + { "≈", { 8776, 0 } }, + { "≍", { 8781, 0 } }, + { "ã", { 227, 0 } }, + { "ä", { 228, 0 } }, + { "∳", { 8755, 0 } }, + { "⨑", { 10769, 0 } }, + { "⫭", { 10989, 0 } }, + { "≌", { 8780, 0 } }, + { "϶", { 1014, 0 } }, + { "‵", { 8245, 0 } }, + { "∽", { 8765, 0 } }, + { "⋍", { 8909, 0 } }, + { "⊽", { 8893, 0 } }, + { "⌅", { 8965, 0 } }, + { "⌅", { 8965, 0 } }, + { "⎵", { 9141, 0 } }, + { "⎶", { 9142, 0 } }, + { "≌", { 8780, 0 } }, + { "б", { 1073, 0 } }, + { "„", { 8222, 0 } }, + { "∵", { 8757, 0 } }, + { "∵", { 8757, 0 } }, + { "⦰", { 10672, 0 } }, + { "϶", { 1014, 0 } }, + { "ℬ", { 8492, 0 } }, + { "β", { 946, 0 } }, + { "ℶ", { 8502, 0 } }, + { "≬", { 8812, 0 } }, + { "𝔟", { 120095, 0 } }, + { "⋂", { 8898, 0 } }, + { "◯", { 9711, 0 } }, + { "⋃", { 8899, 0 } }, + { "⨀", { 10752, 0 } }, + { "⨁", { 10753, 0 } }, + { "⨂", { 10754, 0 } }, + { "⨆", { 10758, 0 } }, + { "★", { 9733, 0 } }, + { "▽", { 9661, 0 } }, + { "△", { 9651, 0 } }, + { "⨄", { 10756, 0 } }, + { "⋁", { 8897, 0 } }, + { "⋀", { 8896, 0 } }, + { "⤍", { 10509, 0 } }, + { "⧫", { 10731, 0 } }, + { "▪", { 9642, 0 } }, + { "▴", { 9652, 0 } }, + { "▾", { 9662, 0 } }, + { "◂", { 9666, 0 } }, + { "▸", { 9656, 0 } }, + { "␣", { 9251, 0 } }, + { "▒", { 9618, 0 } }, + { "░", { 9617, 0 } }, + { "▓", { 9619, 0 } }, + { "█", { 9608, 0 } }, + { "=⃥", { 61, 8421 } }, + { "≡⃥", { 8801, 8421 } }, + { "⌐", { 8976, 0 } }, + { "𝕓", { 120147, 0 } }, + { "⊥", { 8869, 0 } }, + { "⊥", { 8869, 0 } }, + { "⋈", { 8904, 0 } }, + { "╗", { 9559, 0 } }, + { "╔", { 9556, 0 } }, + { "╖", { 9558, 0 } }, + { "╓", { 9555, 0 } }, + { "═", { 9552, 0 } }, + { "╦", { 9574, 0 } }, + { "╩", { 9577, 0 } }, + { "╤", { 9572, 0 } }, + { "╧", { 9575, 0 } }, + { "╝", { 9565, 0 } }, + { "╚", { 9562, 0 } }, + { "╜", { 9564, 0 } }, + { "╙", { 9561, 0 } }, + { "║", { 9553, 0 } }, + { "╬", { 9580, 0 } }, + { "╣", { 9571, 0 } }, + { "╠", { 9568, 0 } }, + { "╫", { 9579, 0 } }, + { "╢", { 9570, 0 } }, + { "╟", { 9567, 0 } }, + { "⧉", { 10697, 0 } }, + { "╕", { 9557, 0 } }, + { "╒", { 9554, 0 } }, + { "┐", { 9488, 0 } }, + { "┌", { 9484, 0 } }, + { "─", { 9472, 0 } }, + { "╥", { 9573, 0 } }, + { "╨", { 9576, 0 } }, + { "┬", { 9516, 0 } }, + { "┴", { 9524, 0 } }, + { "⊟", { 8863, 0 } }, + { "⊞", { 8862, 0 } }, + { "⊠", { 8864, 0 } }, + { "╛", { 9563, 0 } }, + { "╘", { 9560, 0 } }, + { "┘", { 9496, 0 } }, + { "└", { 9492, 0 } }, + { "│", { 9474, 0 } }, + { "╪", { 9578, 0 } }, + { "╡", { 9569, 0 } }, + { "╞", { 9566, 0 } }, + { "┼", { 9532, 0 } }, + { "┤", { 9508, 0 } }, + { "├", { 9500, 0 } }, + { "‵", { 8245, 0 } }, + { "˘", { 728, 0 } }, + { "¦", { 166, 0 } }, + { "𝒷", { 119991, 0 } }, + { "⁏", { 8271, 0 } }, + { "∽", { 8765, 0 } }, + { "⋍", { 8909, 0 } }, + { "\", { 92, 0 } }, + { "⧅", { 10693, 0 } }, + { "⟈", { 10184, 0 } }, + { "•", { 8226, 0 } }, + { "•", { 8226, 0 } }, + { "≎", { 8782, 0 } }, + { "⪮", { 10926, 0 } }, + { "≏", { 8783, 0 } }, + { "≏", { 8783, 0 } }, + { "ć", { 263, 0 } }, + { "∩", { 8745, 0 } }, + { "⩄", { 10820, 0 } }, + { "⩉", { 10825, 0 } }, + { "⩋", { 10827, 0 } }, + { "⩇", { 10823, 0 } }, + { "⩀", { 10816, 0 } }, + { "∩︀", { 8745, 65024 } }, + { "⁁", { 8257, 0 } }, + { "ˇ", { 711, 0 } }, + { "⩍", { 10829, 0 } }, + { "č", { 269, 0 } }, + { "ç", { 231, 0 } }, + { "ĉ", { 265, 0 } }, + { "⩌", { 10828, 0 } }, + { "⩐", { 10832, 0 } }, + { "ċ", { 267, 0 } }, + { "¸", { 184, 0 } }, + { "⦲", { 10674, 0 } }, + { "¢", { 162, 0 } }, + { "·", { 183, 0 } }, + { "𝔠", { 120096, 0 } }, + { "ч", { 1095, 0 } }, + { "✓", { 10003, 0 } }, + { "✓", { 10003, 0 } }, + { "χ", { 967, 0 } }, + { "○", { 9675, 0 } }, + { "⧃", { 10691, 0 } }, + { "ˆ", { 710, 0 } }, + { "≗", { 8791, 0 } }, + { "↺", { 8634, 0 } }, + { "↻", { 8635, 0 } }, + { "®", { 174, 0 } }, + { "Ⓢ", { 9416, 0 } }, + { "⊛", { 8859, 0 } }, + { "⊚", { 8858, 0 } }, + { "⊝", { 8861, 0 } }, + { "≗", { 8791, 0 } }, + { "⨐", { 10768, 0 } }, + { "⫯", { 10991, 0 } }, + { "⧂", { 10690, 0 } }, + { "♣", { 9827, 0 } }, + { "♣", { 9827, 0 } }, + { ":", { 58, 0 } }, + { "≔", { 8788, 0 } }, + { "≔", { 8788, 0 } }, + { ",", { 44, 0 } }, + { "@", { 64, 0 } }, + { "∁", { 8705, 0 } }, + { "∘", { 8728, 0 } }, + { "∁", { 8705, 0 } }, + { "ℂ", { 8450, 0 } }, + { "≅", { 8773, 0 } }, + { "⩭", { 10861, 0 } }, + { "∮", { 8750, 0 } }, + { "𝕔", { 120148, 0 } }, + { "∐", { 8720, 0 } }, + { "©", { 169, 0 } }, + { "℗", { 8471, 0 } }, + { "↵", { 8629, 0 } }, + { "✗", { 10007, 0 } }, + { "𝒸", { 119992, 0 } }, + { "⫏", { 10959, 0 } }, + { "⫑", { 10961, 0 } }, + { "⫐", { 10960, 0 } }, + { "⫒", { 10962, 0 } }, + { "⋯", { 8943, 0 } }, + { "⤸", { 10552, 0 } }, + { "⤵", { 10549, 0 } }, + { "⋞", { 8926, 0 } }, + { "⋟", { 8927, 0 } }, + { "↶", { 8630, 0 } }, + { "⤽", { 10557, 0 } }, + { "∪", { 8746, 0 } }, + { "⩈", { 10824, 0 } }, + { "⩆", { 10822, 0 } }, + { "⩊", { 10826, 0 } }, + { "⊍", { 8845, 0 } }, + { "⩅", { 10821, 0 } }, + { "∪︀", { 8746, 65024 } }, + { "↷", { 8631, 0 } }, + { "⤼", { 10556, 0 } }, + { "⋞", { 8926, 0 } }, + { "⋟", { 8927, 0 } }, + { "⋎", { 8910, 0 } }, + { "⋏", { 8911, 0 } }, + { "¤", { 164, 0 } }, + { "↶", { 8630, 0 } }, + { "↷", { 8631, 0 } }, + { "⋎", { 8910, 0 } }, + { "⋏", { 8911, 0 } }, + { "∲", { 8754, 0 } }, + { "∱", { 8753, 0 } }, + { "⌭", { 9005, 0 } }, + { "⇓", { 8659, 0 } }, + { "⥥", { 10597, 0 } }, + { "†", { 8224, 0 } }, + { "ℸ", { 8504, 0 } }, + { "↓", { 8595, 0 } }, + { "‐", { 8208, 0 } }, + { "⊣", { 8867, 0 } }, + { "⤏", { 10511, 0 } }, + { "˝", { 733, 0 } }, + { "ď", { 271, 0 } }, + { "д", { 1076, 0 } }, + { "ⅆ", { 8518, 0 } }, + { "‡", { 8225, 0 } }, + { "⇊", { 8650, 0 } }, + { "⩷", { 10871, 0 } }, + { "°", { 176, 0 } }, + { "δ", { 948, 0 } }, + { "⦱", { 10673, 0 } }, + { "⥿", { 10623, 0 } }, + { "𝔡", { 120097, 0 } }, + { "⇃", { 8643, 0 } }, + { "⇂", { 8642, 0 } }, + { "⋄", { 8900, 0 } }, + { "⋄", { 8900, 0 } }, + { "♦", { 9830, 0 } }, + { "♦", { 9830, 0 } }, + { "¨", { 168, 0 } }, + { "ϝ", { 989, 0 } }, + { "⋲", { 8946, 0 } }, + { "÷", { 247, 0 } }, + { "÷", { 247, 0 } }, + { "⋇", { 8903, 0 } }, + { "⋇", { 8903, 0 } }, + { "ђ", { 1106, 0 } }, + { "⌞", { 8990, 0 } }, + { "⌍", { 8973, 0 } }, + { "$", { 36, 0 } }, + { "𝕕", { 120149, 0 } }, + { "˙", { 729, 0 } }, + { "≐", { 8784, 0 } }, + { "≑", { 8785, 0 } }, + { "∸", { 8760, 0 } }, + { "∔", { 8724, 0 } }, + { "⊡", { 8865, 0 } }, + { "⌆", { 8966, 0 } }, + { "↓", { 8595, 0 } }, + { "⇊", { 8650, 0 } }, + { "⇃", { 8643, 0 } }, + { "⇂", { 8642, 0 } }, + { "⤐", { 10512, 0 } }, + { "⌟", { 8991, 0 } }, + { "⌌", { 8972, 0 } }, + { "𝒹", { 119993, 0 } }, + { "ѕ", { 1109, 0 } }, + { "⧶", { 10742, 0 } }, + { "đ", { 273, 0 } }, + { "⋱", { 8945, 0 } }, + { "▿", { 9663, 0 } }, + { "▾", { 9662, 0 } }, + { "⇵", { 8693, 0 } }, + { "⥯", { 10607, 0 } }, + { "⦦", { 10662, 0 } }, + { "џ", { 1119, 0 } }, + { "⟿", { 10239, 0 } }, + { "⩷", { 10871, 0 } }, + { "≑", { 8785, 0 } }, + { "é", { 233, 0 } }, + { "⩮", { 10862, 0 } }, + { "ě", { 283, 0 } }, + { "≖", { 8790, 0 } }, + { "ê", { 234, 0 } }, + { "≕", { 8789, 0 } }, + { "э", { 1101, 0 } }, + { "ė", { 279, 0 } }, + { "ⅇ", { 8519, 0 } }, + { "≒", { 8786, 0 } }, + { "𝔢", { 120098, 0 } }, + { "⪚", { 10906, 0 } }, + { "è", { 232, 0 } }, + { "⪖", { 10902, 0 } }, + { "⪘", { 10904, 0 } }, + { "⪙", { 10905, 0 } }, + { "⏧", { 9191, 0 } }, + { "ℓ", { 8467, 0 } }, + { "⪕", { 10901, 0 } }, + { "⪗", { 10903, 0 } }, + { "ē", { 275, 0 } }, + { "∅", { 8709, 0 } }, + { "∅", { 8709, 0 } }, + { "∅", { 8709, 0 } }, + { " ", { 8196, 0 } }, + { " ", { 8197, 0 } }, + { " ", { 8195, 0 } }, + { "ŋ", { 331, 0 } }, + { " ", { 8194, 0 } }, + { "ę", { 281, 0 } }, + { "𝕖", { 120150, 0 } }, + { "⋕", { 8917, 0 } }, + { "⧣", { 10723, 0 } }, + { "⩱", { 10865, 0 } }, + { "ε", { 949, 0 } }, + { "ε", { 949, 0 } }, + { "ϵ", { 1013, 0 } }, + { "≖", { 8790, 0 } }, + { "≕", { 8789, 0 } }, + { "≂", { 8770, 0 } }, + { "⪖", { 10902, 0 } }, + { "⪕", { 10901, 0 } }, + { "=", { 61, 0 } }, + { "≟", { 8799, 0 } }, + { "≡", { 8801, 0 } }, + { "⩸", { 10872, 0 } }, + { "⧥", { 10725, 0 } }, + { "≓", { 8787, 0 } }, + { "⥱", { 10609, 0 } }, + { "ℯ", { 8495, 0 } }, + { "≐", { 8784, 0 } }, + { "≂", { 8770, 0 } }, + { "η", { 951, 0 } }, + { "ð", { 240, 0 } }, + { "ë", { 235, 0 } }, + { "€", { 8364, 0 } }, + { "!", { 33, 0 } }, + { "∃", { 8707, 0 } }, + { "ℰ", { 8496, 0 } }, + { "ⅇ", { 8519, 0 } }, + { "≒", { 8786, 0 } }, + { "ф", { 1092, 0 } }, + { "♀", { 9792, 0 } }, + { "ffi", { 64259, 0 } }, + { "ff", { 64256, 0 } }, + { "ffl", { 64260, 0 } }, + { "𝔣", { 120099, 0 } }, + { "fi", { 64257, 0 } }, + { "fj", { 102, 106 } }, + { "♭", { 9837, 0 } }, + { "fl", { 64258, 0 } }, + { "▱", { 9649, 0 } }, + { "ƒ", { 402, 0 } }, + { "𝕗", { 120151, 0 } }, + { "∀", { 8704, 0 } }, + { "⋔", { 8916, 0 } }, + { "⫙", { 10969, 0 } }, + { "⨍", { 10765, 0 } }, + { "½", { 189, 0 } }, + { "½", { 189, 0 } }, + { "⅓", { 8531, 0 } }, + { "¼", { 188, 0 } }, + { "¼", { 188, 0 } }, + { "⅕", { 8533, 0 } }, + { "⅙", { 8537, 0 } }, + { "⅛", { 8539, 0 } }, + { "⅔", { 8532, 0 } }, + { "⅖", { 8534, 0 } }, + { "¾", { 190, 0 } }, + { "¾", { 190, 0 } }, + { "⅗", { 8535, 0 } }, + { "⅜", { 8540, 0 } }, + { "⅘", { 8536, 0 } }, + { "⅚", { 8538, 0 } }, + { "⅝", { 8541, 0 } }, + { "⅞", { 8542, 0 } }, + { "⁄", { 8260, 0 } }, + { "⌢", { 8994, 0 } }, + { "𝒻", { 119995, 0 } }, + { "≧", { 8807, 0 } }, + { "⪌", { 10892, 0 } }, + { "ǵ", { 501, 0 } }, + { "γ", { 947, 0 } }, + { "ϝ", { 989, 0 } }, + { "⪆", { 10886, 0 } }, + { "ğ", { 287, 0 } }, + { "ĝ", { 285, 0 } }, + { "г", { 1075, 0 } }, + { "ġ", { 289, 0 } }, + { "≥", { 8805, 0 } }, + { "⋛", { 8923, 0 } }, + { "≥", { 8805, 0 } }, + { "≧", { 8807, 0 } }, + { "⩾", { 10878, 0 } }, + { "⩾", { 10878, 0 } }, + { "⪩", { 10921, 0 } }, + { "⪀", { 10880, 0 } }, + { "⪂", { 10882, 0 } }, + { "⪄", { 10884, 0 } }, + { "⋛︀", { 8923, 65024 } }, + { "⪔", { 10900, 0 } }, + { "𝔤", { 120100, 0 } }, + { "≫", { 8811, 0 } }, + { "⋙", { 8921, 0 } }, + { "ℷ", { 8503, 0 } }, + { "ѓ", { 1107, 0 } }, + { "≷", { 8823, 0 } }, + { "⪒", { 10898, 0 } }, + { "⪥", { 10917, 0 } }, + { "⪤", { 10916, 0 } }, + { "≩", { 8809, 0 } }, + { "⪊", { 10890, 0 } }, + { "⪊", { 10890, 0 } }, + { "⪈", { 10888, 0 } }, + { "⪈", { 10888, 0 } }, + { "≩", { 8809, 0 } }, + { "⋧", { 8935, 0 } }, + { "𝕘", { 120152, 0 } }, + { "`", { 96, 0 } }, + { "ℊ", { 8458, 0 } }, + { "≳", { 8819, 0 } }, + { "⪎", { 10894, 0 } }, + { "⪐", { 10896, 0 } }, + { ">", { 62, 0 } }, + { "⪧", { 10919, 0 } }, + { "⩺", { 10874, 0 } }, + { "⋗", { 8919, 0 } }, + { "⦕", { 10645, 0 } }, + { "⩼", { 10876, 0 } }, + { "⪆", { 10886, 0 } }, + { "⥸", { 10616, 0 } }, + { "⋗", { 8919, 0 } }, + { "⋛", { 8923, 0 } }, + { "⪌", { 10892, 0 } }, + { "≷", { 8823, 0 } }, + { "≳", { 8819, 0 } }, + { "≩︀", { 8809, 65024 } }, + { "≩︀", { 8809, 65024 } }, + { "⇔", { 8660, 0 } }, + { " ", { 8202, 0 } }, + { "½", { 189, 0 } }, + { "ℋ", { 8459, 0 } }, + { "ъ", { 1098, 0 } }, + { "↔", { 8596, 0 } }, + { "⥈", { 10568, 0 } }, + { "↭", { 8621, 0 } }, + { "ℏ", { 8463, 0 } }, + { "ĥ", { 293, 0 } }, + { "♥", { 9829, 0 } }, + { "♥", { 9829, 0 } }, + { "…", { 8230, 0 } }, + { "⊹", { 8889, 0 } }, + { "𝔥", { 120101, 0 } }, + { "⤥", { 10533, 0 } }, + { "⤦", { 10534, 0 } }, + { "⇿", { 8703, 0 } }, + { "∻", { 8763, 0 } }, + { "↩", { 8617, 0 } }, + { "↪", { 8618, 0 } }, + { "𝕙", { 120153, 0 } }, + { "―", { 8213, 0 } }, + { "𝒽", { 119997, 0 } }, + { "ℏ", { 8463, 0 } }, + { "ħ", { 295, 0 } }, + { "⁃", { 8259, 0 } }, + { "‐", { 8208, 0 } }, + { "í", { 237, 0 } }, + { "⁣", { 8291, 0 } }, + { "î", { 238, 0 } }, + { "и", { 1080, 0 } }, + { "е", { 1077, 0 } }, + { "¡", { 161, 0 } }, + { "⇔", { 8660, 0 } }, + { "𝔦", { 120102, 0 } }, + { "ì", { 236, 0 } }, + { "ⅈ", { 8520, 0 } }, + { "⨌", { 10764, 0 } }, + { "∭", { 8749, 0 } }, + { "⧜", { 10716, 0 } }, + { "℩", { 8489, 0 } }, + { "ij", { 307, 0 } }, + { "ī", { 299, 0 } }, + { "ℑ", { 8465, 0 } }, + { "ℐ", { 8464, 0 } }, + { "ℑ", { 8465, 0 } }, + { "ı", { 305, 0 } }, + { "⊷", { 8887, 0 } }, + { "Ƶ", { 437, 0 } }, + { "∈", { 8712, 0 } }, + { "℅", { 8453, 0 } }, + { "∞", { 8734, 0 } }, + { "⧝", { 10717, 0 } }, + { "ı", { 305, 0 } }, + { "∫", { 8747, 0 } }, + { "⊺", { 8890, 0 } }, + { "ℤ", { 8484, 0 } }, + { "⊺", { 8890, 0 } }, + { "⨗", { 10775, 0 } }, + { "⨼", { 10812, 0 } }, + { "ё", { 1105, 0 } }, + { "į", { 303, 0 } }, + { "𝕚", { 120154, 0 } }, + { "ι", { 953, 0 } }, + { "⨼", { 10812, 0 } }, + { "¿", { 191, 0 } }, + { "𝒾", { 119998, 0 } }, + { "∈", { 8712, 0 } }, + { "⋹", { 8953, 0 } }, + { "⋵", { 8949, 0 } }, + { "⋴", { 8948, 0 } }, + { "⋳", { 8947, 0 } }, + { "∈", { 8712, 0 } }, + { "⁢", { 8290, 0 } }, + { "ĩ", { 297, 0 } }, + { "і", { 1110, 0 } }, + { "ï", { 239, 0 } }, + { "ĵ", { 309, 0 } }, + { "й", { 1081, 0 } }, + { "𝔧", { 120103, 0 } }, + { "ȷ", { 567, 0 } }, + { "𝕛", { 120155, 0 } }, + { "𝒿", { 119999, 0 } }, + { "ј", { 1112, 0 } }, + { "є", { 1108, 0 } }, + { "κ", { 954, 0 } }, + { "ϰ", { 1008, 0 } }, + { "ķ", { 311, 0 } }, + { "к", { 1082, 0 } }, + { "𝔨", { 120104, 0 } }, + { "ĸ", { 312, 0 } }, + { "х", { 1093, 0 } }, + { "ќ", { 1116, 0 } }, + { "𝕜", { 120156, 0 } }, + { "𝓀", { 120000, 0 } }, + { "⇚", { 8666, 0 } }, + { "⇐", { 8656, 0 } }, + { "⤛", { 10523, 0 } }, + { "⤎", { 10510, 0 } }, + { "≦", { 8806, 0 } }, + { "⪋", { 10891, 0 } }, + { "⥢", { 10594, 0 } }, + { "ĺ", { 314, 0 } }, + { "⦴", { 10676, 0 } }, + { "ℒ", { 8466, 0 } }, + { "λ", { 955, 0 } }, + { "⟨", { 10216, 0 } }, + { "⦑", { 10641, 0 } }, + { "⟨", { 10216, 0 } }, + { "⪅", { 10885, 0 } }, + { "«", { 171, 0 } }, + { "←", { 8592, 0 } }, + { "⇤", { 8676, 0 } }, + { "⤟", { 10527, 0 } }, + { "⤝", { 10525, 0 } }, + { "↩", { 8617, 0 } }, + { "↫", { 8619, 0 } }, + { "⤹", { 10553, 0 } }, + { "⥳", { 10611, 0 } }, + { "↢", { 8610, 0 } }, + { "⪫", { 10923, 0 } }, + { "⤙", { 10521, 0 } }, + { "⪭", { 10925, 0 } }, + { "⪭︀", { 10925, 65024 } }, + { "⤌", { 10508, 0 } }, + { "❲", { 10098, 0 } }, + { "{", { 123, 0 } }, + { "[", { 91, 0 } }, + { "⦋", { 10635, 0 } }, + { "⦏", { 10639, 0 } }, + { "⦍", { 10637, 0 } }, + { "ľ", { 318, 0 } }, + { "ļ", { 316, 0 } }, + { "⌈", { 8968, 0 } }, + { "{", { 123, 0 } }, + { "л", { 1083, 0 } }, + { "⤶", { 10550, 0 } }, + { "“", { 8220, 0 } }, + { "„", { 8222, 0 } }, + { "⥧", { 10599, 0 } }, + { "⥋", { 10571, 0 } }, + { "↲", { 8626, 0 } }, + { "≤", { 8804, 0 } }, + { "←", { 8592, 0 } }, + { "↢", { 8610, 0 } }, + { "↽", { 8637, 0 } }, + { "↼", { 8636, 0 } }, + { "⇇", { 8647, 0 } }, + { "↔", { 8596, 0 } }, + { "⇆", { 8646, 0 } }, + { "⇋", { 8651, 0 } }, + { "↭", { 8621, 0 } }, + { "⋋", { 8907, 0 } }, + { "⋚", { 8922, 0 } }, + { "≤", { 8804, 0 } }, + { "≦", { 8806, 0 } }, + { "⩽", { 10877, 0 } }, + { "⩽", { 10877, 0 } }, + { "⪨", { 10920, 0 } }, + { "⩿", { 10879, 0 } }, + { "⪁", { 10881, 0 } }, + { "⪃", { 10883, 0 } }, + { "⋚︀", { 8922, 65024 } }, + { "⪓", { 10899, 0 } }, + { "⪅", { 10885, 0 } }, + { "⋖", { 8918, 0 } }, + { "⋚", { 8922, 0 } }, + { "⪋", { 10891, 0 } }, + { "≶", { 8822, 0 } }, + { "≲", { 8818, 0 } }, + { "⥼", { 10620, 0 } }, + { "⌊", { 8970, 0 } }, + { "𝔩", { 120105, 0 } }, + { "≶", { 8822, 0 } }, + { "⪑", { 10897, 0 } }, + { "↽", { 8637, 0 } }, + { "↼", { 8636, 0 } }, + { "⥪", { 10602, 0 } }, + { "▄", { 9604, 0 } }, + { "љ", { 1113, 0 } }, + { "≪", { 8810, 0 } }, + { "⇇", { 8647, 0 } }, + { "⌞", { 8990, 0 } }, + { "⥫", { 10603, 0 } }, + { "◺", { 9722, 0 } }, + { "ŀ", { 320, 0 } }, + { "⎰", { 9136, 0 } }, + { "⎰", { 9136, 0 } }, + { "≨", { 8808, 0 } }, + { "⪉", { 10889, 0 } }, + { "⪉", { 10889, 0 } }, + { "⪇", { 10887, 0 } }, + { "⪇", { 10887, 0 } }, + { "≨", { 8808, 0 } }, + { "⋦", { 8934, 0 } }, + { "⟬", { 10220, 0 } }, + { "⇽", { 8701, 0 } }, + { "⟦", { 10214, 0 } }, + { "⟵", { 10229, 0 } }, + { "⟷", { 10231, 0 } }, + { "⟼", { 10236, 0 } }, + { "⟶", { 10230, 0 } }, + { "↫", { 8619, 0 } }, + { "↬", { 8620, 0 } }, + { "⦅", { 10629, 0 } }, + { "𝕝", { 120157, 0 } }, + { "⨭", { 10797, 0 } }, + { "⨴", { 10804, 0 } }, + { "∗", { 8727, 0 } }, + { "_", { 95, 0 } }, + { "◊", { 9674, 0 } }, + { "◊", { 9674, 0 } }, + { "⧫", { 10731, 0 } }, + { "(", { 40, 0 } }, + { "⦓", { 10643, 0 } }, + { "⇆", { 8646, 0 } }, + { "⌟", { 8991, 0 } }, + { "⇋", { 8651, 0 } }, + { "⥭", { 10605, 0 } }, + { "‎", { 8206, 0 } }, + { "⊿", { 8895, 0 } }, + { "‹", { 8249, 0 } }, + { "𝓁", { 120001, 0 } }, + { "↰", { 8624, 0 } }, + { "≲", { 8818, 0 } }, + { "⪍", { 10893, 0 } }, + { "⪏", { 10895, 0 } }, + { "[", { 91, 0 } }, + { "‘", { 8216, 0 } }, + { "‚", { 8218, 0 } }, + { "ł", { 322, 0 } }, + { "<", { 60, 0 } }, + { "⪦", { 10918, 0 } }, + { "⩹", { 10873, 0 } }, + { "⋖", { 8918, 0 } }, + { "⋋", { 8907, 0 } }, + { "⋉", { 8905, 0 } }, + { "⥶", { 10614, 0 } }, + { "⩻", { 10875, 0 } }, + { "⦖", { 10646, 0 } }, + { "◃", { 9667, 0 } }, + { "⊴", { 8884, 0 } }, + { "◂", { 9666, 0 } }, + { "⥊", { 10570, 0 } }, + { "⥦", { 10598, 0 } }, + { "≨︀", { 8808, 65024 } }, + { "≨︀", { 8808, 65024 } }, + { "∺", { 8762, 0 } }, + { "¯", { 175, 0 } }, + { "♂", { 9794, 0 } }, + { "✠", { 10016, 0 } }, + { "✠", { 10016, 0 } }, + { "↦", { 8614, 0 } }, + { "↦", { 8614, 0 } }, + { "↧", { 8615, 0 } }, + { "↤", { 8612, 0 } }, + { "↥", { 8613, 0 } }, + { "▮", { 9646, 0 } }, + { "⨩", { 10793, 0 } }, + { "м", { 1084, 0 } }, + { "—", { 8212, 0 } }, + { "∡", { 8737, 0 } }, + { "𝔪", { 120106, 0 } }, + { "℧", { 8487, 0 } }, + { "µ", { 181, 0 } }, + { "∣", { 8739, 0 } }, + { "*", { 42, 0 } }, + { "⫰", { 10992, 0 } }, + { "·", { 183, 0 } }, + { "−", { 8722, 0 } }, + { "⊟", { 8863, 0 } }, + { "∸", { 8760, 0 } }, + { "⨪", { 10794, 0 } }, + { "⫛", { 10971, 0 } }, + { "…", { 8230, 0 } }, + { "∓", { 8723, 0 } }, + { "⊧", { 8871, 0 } }, + { "𝕞", { 120158, 0 } }, + { "∓", { 8723, 0 } }, + { "𝓂", { 120002, 0 } }, + { "∾", { 8766, 0 } }, + { "μ", { 956, 0 } }, + { "⊸", { 8888, 0 } }, + { "⊸", { 8888, 0 } }, + { "⋙̸", { 8921, 824 } }, + { "≫⃒", { 8811, 8402 } }, + { "≫̸", { 8811, 824 } }, + { "⇍", { 8653, 0 } }, + { "⇎", { 8654, 0 } }, + { "⋘̸", { 8920, 824 } }, + { "≪⃒", { 8810, 8402 } }, + { "≪̸", { 8810, 824 } }, + { "⇏", { 8655, 0 } }, + { "⊯", { 8879, 0 } }, + { "⊮", { 8878, 0 } }, + { "∇", { 8711, 0 } }, + { "ń", { 324, 0 } }, + { "∠⃒", { 8736, 8402 } }, + { "≉", { 8777, 0 } }, + { "⩰̸", { 10864, 824 } }, + { "≋̸", { 8779, 824 } }, + { "ʼn", { 329, 0 } }, + { "≉", { 8777, 0 } }, + { "♮", { 9838, 0 } }, + { "♮", { 9838, 0 } }, + { "ℕ", { 8469, 0 } }, + { " ", { 160, 0 } }, + { "≎̸", { 8782, 824 } }, + { "≏̸", { 8783, 824 } }, + { "⩃", { 10819, 0 } }, + { "ň", { 328, 0 } }, + { "ņ", { 326, 0 } }, + { "≇", { 8775, 0 } }, + { "⩭̸", { 10861, 824 } }, + { "⩂", { 10818, 0 } }, + { "н", { 1085, 0 } }, + { "–", { 8211, 0 } }, + { "≠", { 8800, 0 } }, + { "⇗", { 8663, 0 } }, + { "⤤", { 10532, 0 } }, + { "↗", { 8599, 0 } }, + { "↗", { 8599, 0 } }, + { "≐̸", { 8784, 824 } }, + { "≢", { 8802, 0 } }, + { "⤨", { 10536, 0 } }, + { "≂̸", { 8770, 824 } }, + { "∄", { 8708, 0 } }, + { "∄", { 8708, 0 } }, + { "𝔫", { 120107, 0 } }, + { "≧̸", { 8807, 824 } }, + { "≱", { 8817, 0 } }, + { "≱", { 8817, 0 } }, + { "≧̸", { 8807, 824 } }, + { "⩾̸", { 10878, 824 } }, + { "⩾̸", { 10878, 824 } }, + { "≵", { 8821, 0 } }, + { "≯", { 8815, 0 } }, + { "≯", { 8815, 0 } }, + { "⇎", { 8654, 0 } }, + { "↮", { 8622, 0 } }, + { "⫲", { 10994, 0 } }, + { "∋", { 8715, 0 } }, + { "⋼", { 8956, 0 } }, + { "⋺", { 8954, 0 } }, + { "∋", { 8715, 0 } }, + { "њ", { 1114, 0 } }, + { "⇍", { 8653, 0 } }, + { "≦̸", { 8806, 824 } }, + { "↚", { 8602, 0 } }, + { "‥", { 8229, 0 } }, + { "≰", { 8816, 0 } }, + { "↚", { 8602, 0 } }, + { "↮", { 8622, 0 } }, + { "≰", { 8816, 0 } }, + { "≦̸", { 8806, 824 } }, + { "⩽̸", { 10877, 824 } }, + { "⩽̸", { 10877, 824 } }, + { "≮", { 8814, 0 } }, + { "≴", { 8820, 0 } }, + { "≮", { 8814, 0 } }, + { "⋪", { 8938, 0 } }, + { "⋬", { 8940, 0 } }, + { "∤", { 8740, 0 } }, + { "𝕟", { 120159, 0 } }, + { "¬", { 172, 0 } }, + { "∉", { 8713, 0 } }, + { "⋹̸", { 8953, 824 } }, + { "⋵̸", { 8949, 824 } }, + { "∉", { 8713, 0 } }, + { "⋷", { 8951, 0 } }, + { "⋶", { 8950, 0 } }, + { "∌", { 8716, 0 } }, + { "∌", { 8716, 0 } }, + { "⋾", { 8958, 0 } }, + { "⋽", { 8957, 0 } }, + { "∦", { 8742, 0 } }, + { "∦", { 8742, 0 } }, + { "⫽⃥", { 11005, 8421 } }, + { "∂̸", { 8706, 824 } }, + { "⨔", { 10772, 0 } }, + { "⊀", { 8832, 0 } }, + { "⋠", { 8928, 0 } }, + { "⪯̸", { 10927, 824 } }, + { "⊀", { 8832, 0 } }, + { "⪯̸", { 10927, 824 } }, + { "⇏", { 8655, 0 } }, + { "↛", { 8603, 0 } }, + { "⤳̸", { 10547, 824 } }, + { "↝̸", { 8605, 824 } }, + { "↛", { 8603, 0 } }, + { "⋫", { 8939, 0 } }, + { "⋭", { 8941, 0 } }, + { "⊁", { 8833, 0 } }, + { "⋡", { 8929, 0 } }, + { "⪰̸", { 10928, 824 } }, + { "𝓃", { 120003, 0 } }, + { "∤", { 8740, 0 } }, + { "∦", { 8742, 0 } }, + { "≁", { 8769, 0 } }, + { "≄", { 8772, 0 } }, + { "≄", { 8772, 0 } }, + { "∤", { 8740, 0 } }, + { "∦", { 8742, 0 } }, + { "⋢", { 8930, 0 } }, + { "⋣", { 8931, 0 } }, + { "⊄", { 8836, 0 } }, + { "⫅̸", { 10949, 824 } }, + { "⊈", { 8840, 0 } }, + { "⊂⃒", { 8834, 8402 } }, + { "⊈", { 8840, 0 } }, + { "⫅̸", { 10949, 824 } }, + { "⊁", { 8833, 0 } }, + { "⪰̸", { 10928, 824 } }, + { "⊅", { 8837, 0 } }, + { "⫆̸", { 10950, 824 } }, + { "⊉", { 8841, 0 } }, + { "⊃⃒", { 8835, 8402 } }, + { "⊉", { 8841, 0 } }, + { "⫆̸", { 10950, 824 } }, + { "≹", { 8825, 0 } }, + { "ñ", { 241, 0 } }, + { "≸", { 8824, 0 } }, + { "⋪", { 8938, 0 } }, + { "⋬", { 8940, 0 } }, + { "⋫", { 8939, 0 } }, + { "⋭", { 8941, 0 } }, + { "ν", { 957, 0 } }, + { "#", { 35, 0 } }, + { "№", { 8470, 0 } }, + { " ", { 8199, 0 } }, + { "⊭", { 8877, 0 } }, + { "⤄", { 10500, 0 } }, + { "≍⃒", { 8781, 8402 } }, + { "⊬", { 8876, 0 } }, + { "≥⃒", { 8805, 8402 } }, + { ">⃒", { 62, 8402 } }, + { "⧞", { 10718, 0 } }, + { "⤂", { 10498, 0 } }, + { "≤⃒", { 8804, 8402 } }, + { "<⃒", { 60, 8402 } }, + { "⊴⃒", { 8884, 8402 } }, + { "⤃", { 10499, 0 } }, + { "⊵⃒", { 8885, 8402 } }, + { "∼⃒", { 8764, 8402 } }, + { "⇖", { 8662, 0 } }, + { "⤣", { 10531, 0 } }, + { "↖", { 8598, 0 } }, + { "↖", { 8598, 0 } }, + { "⤧", { 10535, 0 } }, + { "Ⓢ", { 9416, 0 } }, + { "ó", { 243, 0 } }, + { "⊛", { 8859, 0 } }, + { "⊚", { 8858, 0 } }, + { "ô", { 244, 0 } }, + { "о", { 1086, 0 } }, + { "⊝", { 8861, 0 } }, + { "ő", { 337, 0 } }, + { "⨸", { 10808, 0 } }, + { "⊙", { 8857, 0 } }, + { "⦼", { 10684, 0 } }, + { "œ", { 339, 0 } }, + { "⦿", { 10687, 0 } }, + { "𝔬", { 120108, 0 } }, + { "˛", { 731, 0 } }, + { "ò", { 242, 0 } }, + { "⧁", { 10689, 0 } }, + { "⦵", { 10677, 0 } }, + { "Ω", { 937, 0 } }, + { "∮", { 8750, 0 } }, + { "↺", { 8634, 0 } }, + { "⦾", { 10686, 0 } }, + { "⦻", { 10683, 0 } }, + { "‾", { 8254, 0 } }, + { "⧀", { 10688, 0 } }, + { "ō", { 333, 0 } }, + { "ω", { 969, 0 } }, + { "ο", { 959, 0 } }, + { "⦶", { 10678, 0 } }, + { "⊖", { 8854, 0 } }, + { "𝕠", { 120160, 0 } }, + { "⦷", { 10679, 0 } }, + { "⦹", { 10681, 0 } }, + { "⊕", { 8853, 0 } }, + { "∨", { 8744, 0 } }, + { "↻", { 8635, 0 } }, + { "⩝", { 10845, 0 } }, + { "ℴ", { 8500, 0 } }, + { "ℴ", { 8500, 0 } }, + { "ª", { 170, 0 } }, + { "º", { 186, 0 } }, + { "⊶", { 8886, 0 } }, + { "⩖", { 10838, 0 } }, + { "⩗", { 10839, 0 } }, + { "⩛", { 10843, 0 } }, + { "ℴ", { 8500, 0 } }, + { "ø", { 248, 0 } }, + { "⊘", { 8856, 0 } }, + { "õ", { 245, 0 } }, + { "⊗", { 8855, 0 } }, + { "⨶", { 10806, 0 } }, + { "ö", { 246, 0 } }, + { "⌽", { 9021, 0 } }, + { "∥", { 8741, 0 } }, + { "¶", { 182, 0 } }, + { "∥", { 8741, 0 } }, + { "⫳", { 10995, 0 } }, + { "⫽", { 11005, 0 } }, + { "∂", { 8706, 0 } }, + { "п", { 1087, 0 } }, + { "%", { 37, 0 } }, + { ".", { 46, 0 } }, + { "‰", { 8240, 0 } }, + { "⊥", { 8869, 0 } }, + { "‱", { 8241, 0 } }, + { "𝔭", { 120109, 0 } }, + { "φ", { 966, 0 } }, + { "ϕ", { 981, 0 } }, + { "ℳ", { 8499, 0 } }, + { "☎", { 9742, 0 } }, + { "π", { 960, 0 } }, + { "⋔", { 8916, 0 } }, + { "ϖ", { 982, 0 } }, + { "ℏ", { 8463, 0 } }, + { "ℎ", { 8462, 0 } }, + { "ℏ", { 8463, 0 } }, + { "+", { 43, 0 } }, + { "⨣", { 10787, 0 } }, + { "⊞", { 8862, 0 } }, + { "⨢", { 10786, 0 } }, + { "∔", { 8724, 0 } }, + { "⨥", { 10789, 0 } }, + { "⩲", { 10866, 0 } }, + { "±", { 177, 0 } }, + { "⨦", { 10790, 0 } }, + { "⨧", { 10791, 0 } }, + { "±", { 177, 0 } }, + { "⨕", { 10773, 0 } }, + { "𝕡", { 120161, 0 } }, + { "£", { 163, 0 } }, + { "≺", { 8826, 0 } }, + { "⪳", { 10931, 0 } }, + { "⪷", { 10935, 0 } }, + { "≼", { 8828, 0 } }, + { "⪯", { 10927, 0 } }, + { "≺", { 8826, 0 } }, + { "⪷", { 10935, 0 } }, + { "≼", { 8828, 0 } }, + { "⪯", { 10927, 0 } }, + { "⪹", { 10937, 0 } }, + { "⪵", { 10933, 0 } }, + { "⋨", { 8936, 0 } }, + { "≾", { 8830, 0 } }, + { "′", { 8242, 0 } }, + { "ℙ", { 8473, 0 } }, + { "⪵", { 10933, 0 } }, + { "⪹", { 10937, 0 } }, + { "⋨", { 8936, 0 } }, + { "∏", { 8719, 0 } }, + { "⌮", { 9006, 0 } }, + { "⌒", { 8978, 0 } }, + { "⌓", { 8979, 0 } }, + { "∝", { 8733, 0 } }, + { "∝", { 8733, 0 } }, + { "≾", { 8830, 0 } }, + { "⊰", { 8880, 0 } }, + { "𝓅", { 120005, 0 } }, + { "ψ", { 968, 0 } }, + { " ", { 8200, 0 } }, + { "𝔮", { 120110, 0 } }, + { "⨌", { 10764, 0 } }, + { "𝕢", { 120162, 0 } }, + { "⁗", { 8279, 0 } }, + { "𝓆", { 120006, 0 } }, + { "ℍ", { 8461, 0 } }, + { "⨖", { 10774, 0 } }, + { "?", { 63, 0 } }, + { "≟", { 8799, 0 } }, + { """, { 34, 0 } }, + { "⇛", { 8667, 0 } }, + { "⇒", { 8658, 0 } }, + { "⤜", { 10524, 0 } }, + { "⤏", { 10511, 0 } }, + { "⥤", { 10596, 0 } }, + { "∽̱", { 8765, 817 } }, + { "ŕ", { 341, 0 } }, + { "√", { 8730, 0 } }, + { "⦳", { 10675, 0 } }, + { "⟩", { 10217, 0 } }, + { "⦒", { 10642, 0 } }, + { "⦥", { 10661, 0 } }, + { "⟩", { 10217, 0 } }, + { "»", { 187, 0 } }, + { "→", { 8594, 0 } }, + { "⥵", { 10613, 0 } }, + { "⇥", { 8677, 0 } }, + { "⤠", { 10528, 0 } }, + { "⤳", { 10547, 0 } }, + { "⤞", { 10526, 0 } }, + { "↪", { 8618, 0 } }, + { "↬", { 8620, 0 } }, + { "⥅", { 10565, 0 } }, + { "⥴", { 10612, 0 } }, + { "↣", { 8611, 0 } }, + { "↝", { 8605, 0 } }, + { "⤚", { 10522, 0 } }, + { "∶", { 8758, 0 } }, + { "ℚ", { 8474, 0 } }, + { "⤍", { 10509, 0 } }, + { "❳", { 10099, 0 } }, + { "}", { 125, 0 } }, + { "]", { 93, 0 } }, + { "⦌", { 10636, 0 } }, + { "⦎", { 10638, 0 } }, + { "⦐", { 10640, 0 } }, + { "ř", { 345, 0 } }, + { "ŗ", { 343, 0 } }, + { "⌉", { 8969, 0 } }, + { "}", { 125, 0 } }, + { "р", { 1088, 0 } }, + { "⤷", { 10551, 0 } }, + { "⥩", { 10601, 0 } }, + { "”", { 8221, 0 } }, + { "”", { 8221, 0 } }, + { "↳", { 8627, 0 } }, + { "ℜ", { 8476, 0 } }, + { "ℛ", { 8475, 0 } }, + { "ℜ", { 8476, 0 } }, + { "ℝ", { 8477, 0 } }, + { "▭", { 9645, 0 } }, + { "®", { 174, 0 } }, + { "⥽", { 10621, 0 } }, + { "⌋", { 8971, 0 } }, + { "𝔯", { 120111, 0 } }, + { "⇁", { 8641, 0 } }, + { "⇀", { 8640, 0 } }, + { "⥬", { 10604, 0 } }, + { "ρ", { 961, 0 } }, + { "ϱ", { 1009, 0 } }, + { "→", { 8594, 0 } }, + { "↣", { 8611, 0 } }, + { "⇁", { 8641, 0 } }, + { "⇀", { 8640, 0 } }, + { "⇄", { 8644, 0 } }, + { "⇌", { 8652, 0 } }, + { "⇉", { 8649, 0 } }, + { "↝", { 8605, 0 } }, + { "⋌", { 8908, 0 } }, + { "˚", { 730, 0 } }, + { "≓", { 8787, 0 } }, + { "⇄", { 8644, 0 } }, + { "⇌", { 8652, 0 } }, + { "‏", { 8207, 0 } }, + { "⎱", { 9137, 0 } }, + { "⎱", { 9137, 0 } }, + { "⫮", { 10990, 0 } }, + { "⟭", { 10221, 0 } }, + { "⇾", { 8702, 0 } }, + { "⟧", { 10215, 0 } }, + { "⦆", { 10630, 0 } }, + { "𝕣", { 120163, 0 } }, + { "⨮", { 10798, 0 } }, + { "⨵", { 10805, 0 } }, + { ")", { 41, 0 } }, + { "⦔", { 10644, 0 } }, + { "⨒", { 10770, 0 } }, + { "⇉", { 8649, 0 } }, + { "›", { 8250, 0 } }, + { "𝓇", { 120007, 0 } }, + { "↱", { 8625, 0 } }, + { "]", { 93, 0 } }, + { "’", { 8217, 0 } }, + { "’", { 8217, 0 } }, + { "⋌", { 8908, 0 } }, + { "⋊", { 8906, 0 } }, + { "▹", { 9657, 0 } }, + { "⊵", { 8885, 0 } }, + { "▸", { 9656, 0 } }, + { "⧎", { 10702, 0 } }, + { "⥨", { 10600, 0 } }, + { "℞", { 8478, 0 } }, + { "ś", { 347, 0 } }, + { "‚", { 8218, 0 } }, + { "≻", { 8827, 0 } }, + { "⪴", { 10932, 0 } }, + { "⪸", { 10936, 0 } }, + { "š", { 353, 0 } }, + { "≽", { 8829, 0 } }, + { "⪰", { 10928, 0 } }, + { "ş", { 351, 0 } }, + { "ŝ", { 349, 0 } }, + { "⪶", { 10934, 0 } }, + { "⪺", { 10938, 0 } }, + { "⋩", { 8937, 0 } }, + { "⨓", { 10771, 0 } }, + { "≿", { 8831, 0 } }, + { "с", { 1089, 0 } }, + { "⋅", { 8901, 0 } }, + { "⊡", { 8865, 0 } }, + { "⩦", { 10854, 0 } }, + { "⇘", { 8664, 0 } }, + { "⤥", { 10533, 0 } }, + { "↘", { 8600, 0 } }, + { "↘", { 8600, 0 } }, + { "§", { 167, 0 } }, + { ";", { 59, 0 } }, + { "⤩", { 10537, 0 } }, + { "∖", { 8726, 0 } }, + { "∖", { 8726, 0 } }, + { "✶", { 10038, 0 } }, + { "𝔰", { 120112, 0 } }, + { "⌢", { 8994, 0 } }, + { "♯", { 9839, 0 } }, + { "щ", { 1097, 0 } }, + { "ш", { 1096, 0 } }, + { "∣", { 8739, 0 } }, + { "∥", { 8741, 0 } }, + { "­", { 173, 0 } }, + { "σ", { 963, 0 } }, + { "ς", { 962, 0 } }, + { "ς", { 962, 0 } }, + { "∼", { 8764, 0 } }, + { "⩪", { 10858, 0 } }, + { "≃", { 8771, 0 } }, + { "≃", { 8771, 0 } }, + { "⪞", { 10910, 0 } }, + { "⪠", { 10912, 0 } }, + { "⪝", { 10909, 0 } }, + { "⪟", { 10911, 0 } }, + { "≆", { 8774, 0 } }, + { "⨤", { 10788, 0 } }, + { "⥲", { 10610, 0 } }, + { "←", { 8592, 0 } }, + { "∖", { 8726, 0 } }, + { "⨳", { 10803, 0 } }, + { "⧤", { 10724, 0 } }, + { "∣", { 8739, 0 } }, + { "⌣", { 8995, 0 } }, + { "⪪", { 10922, 0 } }, + { "⪬", { 10924, 0 } }, + { "⪬︀", { 10924, 65024 } }, + { "ь", { 1100, 0 } }, + { "/", { 47, 0 } }, + { "⧄", { 10692, 0 } }, + { "⌿", { 9023, 0 } }, + { "𝕤", { 120164, 0 } }, + { "♠", { 9824, 0 } }, + { "♠", { 9824, 0 } }, + { "∥", { 8741, 0 } }, + { "⊓", { 8851, 0 } }, + { "⊓︀", { 8851, 65024 } }, + { "⊔", { 8852, 0 } }, + { "⊔︀", { 8852, 65024 } }, + { "⊏", { 8847, 0 } }, + { "⊑", { 8849, 0 } }, + { "⊏", { 8847, 0 } }, + { "⊑", { 8849, 0 } }, + { "⊐", { 8848, 0 } }, + { "⊒", { 8850, 0 } }, + { "⊐", { 8848, 0 } }, + { "⊒", { 8850, 0 } }, + { "□", { 9633, 0 } }, + { "□", { 9633, 0 } }, + { "▪", { 9642, 0 } }, + { "▪", { 9642, 0 } }, + { "→", { 8594, 0 } }, + { "𝓈", { 120008, 0 } }, + { "∖", { 8726, 0 } }, + { "⌣", { 8995, 0 } }, + { "⋆", { 8902, 0 } }, + { "☆", { 9734, 0 } }, + { "★", { 9733, 0 } }, + { "ϵ", { 1013, 0 } }, + { "ϕ", { 981, 0 } }, + { "¯", { 175, 0 } }, + { "⊂", { 8834, 0 } }, + { "⫅", { 10949, 0 } }, + { "⪽", { 10941, 0 } }, + { "⊆", { 8838, 0 } }, + { "⫃", { 10947, 0 } }, + { "⫁", { 10945, 0 } }, + { "⫋", { 10955, 0 } }, + { "⊊", { 8842, 0 } }, + { "⪿", { 10943, 0 } }, + { "⥹", { 10617, 0 } }, + { "⊂", { 8834, 0 } }, + { "⊆", { 8838, 0 } }, + { "⫅", { 10949, 0 } }, + { "⊊", { 8842, 0 } }, + { "⫋", { 10955, 0 } }, + { "⫇", { 10951, 0 } }, + { "⫕", { 10965, 0 } }, + { "⫓", { 10963, 0 } }, + { "≻", { 8827, 0 } }, + { "⪸", { 10936, 0 } }, + { "≽", { 8829, 0 } }, + { "⪰", { 10928, 0 } }, + { "⪺", { 10938, 0 } }, + { "⪶", { 10934, 0 } }, + { "⋩", { 8937, 0 } }, + { "≿", { 8831, 0 } }, + { "∑", { 8721, 0 } }, + { "♪", { 9834, 0 } }, + { "¹", { 185, 0 } }, + { "¹", { 185, 0 } }, + { "²", { 178, 0 } }, + { "²", { 178, 0 } }, + { "³", { 179, 0 } }, + { "³", { 179, 0 } }, + { "⊃", { 8835, 0 } }, + { "⫆", { 10950, 0 } }, + { "⪾", { 10942, 0 } }, + { "⫘", { 10968, 0 } }, + { "⊇", { 8839, 0 } }, + { "⫄", { 10948, 0 } }, + { "⟉", { 10185, 0 } }, + { "⫗", { 10967, 0 } }, + { "⥻", { 10619, 0 } }, + { "⫂", { 10946, 0 } }, + { "⫌", { 10956, 0 } }, + { "⊋", { 8843, 0 } }, + { "⫀", { 10944, 0 } }, + { "⊃", { 8835, 0 } }, + { "⊇", { 8839, 0 } }, + { "⫆", { 10950, 0 } }, + { "⊋", { 8843, 0 } }, + { "⫌", { 10956, 0 } }, + { "⫈", { 10952, 0 } }, + { "⫔", { 10964, 0 } }, + { "⫖", { 10966, 0 } }, + { "⇙", { 8665, 0 } }, + { "⤦", { 10534, 0 } }, + { "↙", { 8601, 0 } }, + { "↙", { 8601, 0 } }, + { "⤪", { 10538, 0 } }, + { "ß", { 223, 0 } }, + { "⌖", { 8982, 0 } }, + { "τ", { 964, 0 } }, + { "⎴", { 9140, 0 } }, + { "ť", { 357, 0 } }, + { "ţ", { 355, 0 } }, + { "т", { 1090, 0 } }, + { "⃛", { 8411, 0 } }, + { "⌕", { 8981, 0 } }, + { "𝔱", { 120113, 0 } }, + { "∴", { 8756, 0 } }, + { "∴", { 8756, 0 } }, + { "θ", { 952, 0 } }, + { "ϑ", { 977, 0 } }, + { "ϑ", { 977, 0 } }, + { "≈", { 8776, 0 } }, + { "∼", { 8764, 0 } }, + { " ", { 8201, 0 } }, + { "≈", { 8776, 0 } }, + { "∼", { 8764, 0 } }, + { "þ", { 254, 0 } }, + { "˜", { 732, 0 } }, + { "×", { 215, 0 } }, + { "⊠", { 8864, 0 } }, + { "⨱", { 10801, 0 } }, + { "⨰", { 10800, 0 } }, + { "∭", { 8749, 0 } }, + { "⤨", { 10536, 0 } }, + { "⊤", { 8868, 0 } }, + { "⌶", { 9014, 0 } }, + { "⫱", { 10993, 0 } }, + { "𝕥", { 120165, 0 } }, + { "⫚", { 10970, 0 } }, + { "⤩", { 10537, 0 } }, + { "‴", { 8244, 0 } }, + { "™", { 8482, 0 } }, + { "▵", { 9653, 0 } }, + { "▿", { 9663, 0 } }, + { "◃", { 9667, 0 } }, + { "⊴", { 8884, 0 } }, + { "≜", { 8796, 0 } }, + { "▹", { 9657, 0 } }, + { "⊵", { 8885, 0 } }, + { "◬", { 9708, 0 } }, + { "≜", { 8796, 0 } }, + { "⨺", { 10810, 0 } }, + { "⨹", { 10809, 0 } }, + { "⧍", { 10701, 0 } }, + { "⨻", { 10811, 0 } }, + { "⏢", { 9186, 0 } }, + { "𝓉", { 120009, 0 } }, + { "ц", { 1094, 0 } }, + { "ћ", { 1115, 0 } }, + { "ŧ", { 359, 0 } }, + { "≬", { 8812, 0 } }, + { "↞", { 8606, 0 } }, + { "↠", { 8608, 0 } }, + { "⇑", { 8657, 0 } }, + { "⥣", { 10595, 0 } }, + { "ú", { 250, 0 } }, + { "↑", { 8593, 0 } }, + { "ў", { 1118, 0 } }, + { "ŭ", { 365, 0 } }, + { "û", { 251, 0 } }, + { "у", { 1091, 0 } }, + { "⇅", { 8645, 0 } }, + { "ű", { 369, 0 } }, + { "⥮", { 10606, 0 } }, + { "⥾", { 10622, 0 } }, + { "𝔲", { 120114, 0 } }, + { "ù", { 249, 0 } }, + { "↿", { 8639, 0 } }, + { "↾", { 8638, 0 } }, + { "▀", { 9600, 0 } }, + { "⌜", { 8988, 0 } }, + { "⌜", { 8988, 0 } }, + { "⌏", { 8975, 0 } }, + { "◸", { 9720, 0 } }, + { "ū", { 363, 0 } }, + { "¨", { 168, 0 } }, + { "ų", { 371, 0 } }, + { "𝕦", { 120166, 0 } }, + { "↑", { 8593, 0 } }, + { "↕", { 8597, 0 } }, + { "↿", { 8639, 0 } }, + { "↾", { 8638, 0 } }, + { "⊎", { 8846, 0 } }, + { "υ", { 965, 0 } }, + { "ϒ", { 978, 0 } }, + { "υ", { 965, 0 } }, + { "⇈", { 8648, 0 } }, + { "⌝", { 8989, 0 } }, + { "⌝", { 8989, 0 } }, + { "⌎", { 8974, 0 } }, + { "ů", { 367, 0 } }, + { "◹", { 9721, 0 } }, + { "𝓊", { 120010, 0 } }, + { "⋰", { 8944, 0 } }, + { "ũ", { 361, 0 } }, + { "▵", { 9653, 0 } }, + { "▴", { 9652, 0 } }, + { "⇈", { 8648, 0 } }, + { "ü", { 252, 0 } }, + { "⦧", { 10663, 0 } }, + { "⇕", { 8661, 0 } }, + { "⫨", { 10984, 0 } }, + { "⫩", { 10985, 0 } }, + { "⊨", { 8872, 0 } }, + { "⦜", { 10652, 0 } }, + { "ϵ", { 1013, 0 } }, + { "ϰ", { 1008, 0 } }, + { "∅", { 8709, 0 } }, + { "ϕ", { 981, 0 } }, + { "ϖ", { 982, 0 } }, + { "∝", { 8733, 0 } }, + { "↕", { 8597, 0 } }, + { "ϱ", { 1009, 0 } }, + { "ς", { 962, 0 } }, + { "⊊︀", { 8842, 65024 } }, + { "⫋︀", { 10955, 65024 } }, + { "⊋︀", { 8843, 65024 } }, + { "⫌︀", { 10956, 65024 } }, + { "ϑ", { 977, 0 } }, + { "⊲", { 8882, 0 } }, + { "⊳", { 8883, 0 } }, + { "в", { 1074, 0 } }, + { "⊢", { 8866, 0 } }, + { "∨", { 8744, 0 } }, + { "⊻", { 8891, 0 } }, + { "≚", { 8794, 0 } }, + { "⋮", { 8942, 0 } }, + { "|", { 124, 0 } }, + { "|", { 124, 0 } }, + { "𝔳", { 120115, 0 } }, + { "⊲", { 8882, 0 } }, + { "⊂⃒", { 8834, 8402 } }, + { "⊃⃒", { 8835, 8402 } }, + { "𝕧", { 120167, 0 } }, + { "∝", { 8733, 0 } }, + { "⊳", { 8883, 0 } }, + { "𝓋", { 120011, 0 } }, + { "⫋︀", { 10955, 65024 } }, + { "⊊︀", { 8842, 65024 } }, + { "⫌︀", { 10956, 65024 } }, + { "⊋︀", { 8843, 65024 } }, + { "⦚", { 10650, 0 } }, + { "ŵ", { 373, 0 } }, + { "⩟", { 10847, 0 } }, + { "∧", { 8743, 0 } }, + { "≙", { 8793, 0 } }, + { "℘", { 8472, 0 } }, + { "𝔴", { 120116, 0 } }, + { "𝕨", { 120168, 0 } }, + { "℘", { 8472, 0 } }, + { "≀", { 8768, 0 } }, + { "≀", { 8768, 0 } }, + { "𝓌", { 120012, 0 } }, + { "⋂", { 8898, 0 } }, + { "◯", { 9711, 0 } }, + { "⋃", { 8899, 0 } }, + { "▽", { 9661, 0 } }, + { "𝔵", { 120117, 0 } }, + { "⟺", { 10234, 0 } }, + { "⟷", { 10231, 0 } }, + { "ξ", { 958, 0 } }, + { "⟸", { 10232, 0 } }, + { "⟵", { 10229, 0 } }, + { "⟼", { 10236, 0 } }, + { "⋻", { 8955, 0 } }, + { "⨀", { 10752, 0 } }, + { "𝕩", { 120169, 0 } }, + { "⨁", { 10753, 0 } }, + { "⨂", { 10754, 0 } }, + { "⟹", { 10233, 0 } }, + { "⟶", { 10230, 0 } }, + { "𝓍", { 120013, 0 } }, + { "⨆", { 10758, 0 } }, + { "⨄", { 10756, 0 } }, + { "△", { 9651, 0 } }, + { "⋁", { 8897, 0 } }, + { "⋀", { 8896, 0 } }, + { "ý", { 253, 0 } }, + { "я", { 1103, 0 } }, + { "ŷ", { 375, 0 } }, + { "ы", { 1099, 0 } }, + { "¥", { 165, 0 } }, + { "𝔶", { 120118, 0 } }, + { "ї", { 1111, 0 } }, + { "𝕪", { 120170, 0 } }, + { "𝓎", { 120014, 0 } }, + { "ю", { 1102, 0 } }, + { "ÿ", { 255, 0 } }, + { "ź", { 378, 0 } }, + { "ž", { 382, 0 } }, + { "з", { 1079, 0 } }, + { "ż", { 380, 0 } }, + { "ℨ", { 8488, 0 } }, + { "ζ", { 950, 0 } }, + { "𝔷", { 120119, 0 } }, + { "ж", { 1078, 0 } }, + { "⇝", { 8669, 0 } }, + { "𝕫", { 120171, 0 } }, + { "𝓏", { 120015, 0 } }, + { "‍", { 8205, 0 } }, + { "‌", { 8204, 0 } } +}; + + +struct entity_key { + const char* name; + size_t name_size; +}; + +static int +entity_cmp(const void* p_key, const void* p_entity) +{ + struct entity_key* key = (struct entity_key*) p_key; + struct entity* ent = (struct entity*) p_entity; + + return strncmp(key->name, ent->name, key->name_size); +} + +const struct entity* +entity_lookup(const char* name, size_t name_size) +{ + struct entity_key key = { name, name_size }; + + return bsearch(&key, + entity_table, + sizeof(entity_table) / sizeof(entity_table[0]), + sizeof(struct entity), + entity_cmp); +} diff --git a/tdemarkdown/md4c/src/entity.h b/tdemarkdown/md4c/src/entity.h new file mode 100644 index 000000000..36395fe51 --- /dev/null +++ b/tdemarkdown/md4c/src/entity.h @@ -0,0 +1,42 @@ +/* + * MD4C: Markdown parser for C + * (http://github.com/mity/md4c) + * + * Copyright (c) 2016-2019 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef MD4C_ENTITY_H +#define MD4C_ENTITY_H + +#include <stdlib.h> + + +/* Most entities are formed by single Unicode codepoint, few by two codepoints. + * Single-codepoint entities have codepoints[1] set to zero. */ +struct entity { + const char* name; + unsigned codepoints[2]; +}; + +const struct entity* entity_lookup(const char* name, size_t name_size); + + +#endif /* MD4C_ENTITY_H */ diff --git a/tdemarkdown/md4c/src/md4c-html.c b/tdemarkdown/md4c/src/md4c-html.c new file mode 100644 index 000000000..d604aecb0 --- /dev/null +++ b/tdemarkdown/md4c/src/md4c-html.c @@ -0,0 +1,573 @@ +/* + * MD4C: Markdown parser for C + * (http://github.com/mity/md4c) + * + * Copyright (c) 2016-2019 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include <stdio.h> +#include <string.h> + +#include "md4c-html.h" +#include "entity.h" + + +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199409L + /* C89/90 or old compilers in general may not understand "inline". */ + #if defined __GNUC__ + #define inline __inline__ + #elif defined _MSC_VER + #define inline __inline + #else + #define inline + #endif +#endif + +#ifdef _WIN32 + #define snprintf _snprintf +#endif + + + +typedef struct MD_HTML_tag MD_HTML; +struct MD_HTML_tag { + void (*process_output)(const MD_CHAR*, MD_SIZE, void*); + void* userdata; + unsigned flags; + int image_nesting_level; + char escape_map[256]; +}; + +#define NEED_HTML_ESC_FLAG 0x1 +#define NEED_URL_ESC_FLAG 0x2 + + +/***************************************** + *** HTML rendering helper functions *** + *****************************************/ + +#define ISDIGIT(ch) ('0' <= (ch) && (ch) <= '9') +#define ISLOWER(ch) ('a' <= (ch) && (ch) <= 'z') +#define ISUPPER(ch) ('A' <= (ch) && (ch) <= 'Z') +#define ISALNUM(ch) (ISLOWER(ch) || ISUPPER(ch) || ISDIGIT(ch)) + + +static inline void +render_verbatim(MD_HTML* r, const MD_CHAR* text, MD_SIZE size) +{ + r->process_output(text, size, r->userdata); +} + +/* Keep this as a macro. Most compiler should then be smart enough to replace + * the strlen() call with a compile-time constant if the string is a C literal. */ +#define RENDER_VERBATIM(r, verbatim) \ + render_verbatim((r), (verbatim), (MD_SIZE) (strlen(verbatim))) + + +static void +render_html_escaped(MD_HTML* r, const MD_CHAR* data, MD_SIZE size) +{ + MD_OFFSET beg = 0; + MD_OFFSET off = 0; + + /* Some characters need to be escaped in normal HTML text. */ + #define NEED_HTML_ESC(ch) (r->escape_map[(unsigned char)(ch)] & NEED_HTML_ESC_FLAG) + + while(1) { + /* Optimization: Use some loop unrolling. */ + while(off + 3 < size && !NEED_HTML_ESC(data[off+0]) && !NEED_HTML_ESC(data[off+1]) + && !NEED_HTML_ESC(data[off+2]) && !NEED_HTML_ESC(data[off+3])) + off += 4; + while(off < size && !NEED_HTML_ESC(data[off])) + off++; + + if(off > beg) + render_verbatim(r, data + beg, off - beg); + + if(off < size) { + switch(data[off]) { + case '&': RENDER_VERBATIM(r, "&"); break; + case '<': RENDER_VERBATIM(r, "<"); break; + case '>': RENDER_VERBATIM(r, ">"); break; + case '"': RENDER_VERBATIM(r, """); break; + } + off++; + } else { + break; + } + beg = off; + } +} + +static void +render_url_escaped(MD_HTML* r, const MD_CHAR* data, MD_SIZE size) +{ + static const MD_CHAR hex_chars[] = "0123456789ABCDEF"; + MD_OFFSET beg = 0; + MD_OFFSET off = 0; + + /* Some characters need to be escaped in URL attributes. */ + #define NEED_URL_ESC(ch) (r->escape_map[(unsigned char)(ch)] & NEED_URL_ESC_FLAG) + + while(1) { + while(off < size && !NEED_URL_ESC(data[off])) + off++; + if(off > beg) + render_verbatim(r, data + beg, off - beg); + + if(off < size) { + char hex[3]; + + switch(data[off]) { + case '&': RENDER_VERBATIM(r, "&"); break; + default: + hex[0] = '%'; + hex[1] = hex_chars[((unsigned)data[off] >> 4) & 0xf]; + hex[2] = hex_chars[((unsigned)data[off] >> 0) & 0xf]; + render_verbatim(r, hex, 3); + break; + } + off++; + } else { + break; + } + + beg = off; + } +} + +static unsigned +hex_val(char ch) +{ + if('0' <= ch && ch <= '9') + return ch - '0'; + if('A' <= ch && ch <= 'Z') + return ch - 'A' + 10; + else + return ch - 'a' + 10; +} + +static void +render_utf8_codepoint(MD_HTML* r, unsigned codepoint, + void (*fn_append)(MD_HTML*, const MD_CHAR*, MD_SIZE)) +{ + static const MD_CHAR utf8_replacement_char[] = { 0xef, 0xbf, 0xbd }; + + unsigned char utf8[4]; + size_t n; + + if(codepoint <= 0x7f) { + n = 1; + utf8[0] = codepoint; + } else if(codepoint <= 0x7ff) { + n = 2; + utf8[0] = 0xc0 | ((codepoint >> 6) & 0x1f); + utf8[1] = 0x80 + ((codepoint >> 0) & 0x3f); + } else if(codepoint <= 0xffff) { + n = 3; + utf8[0] = 0xe0 | ((codepoint >> 12) & 0xf); + utf8[1] = 0x80 + ((codepoint >> 6) & 0x3f); + utf8[2] = 0x80 + ((codepoint >> 0) & 0x3f); + } else { + n = 4; + utf8[0] = 0xf0 | ((codepoint >> 18) & 0x7); + utf8[1] = 0x80 + ((codepoint >> 12) & 0x3f); + utf8[2] = 0x80 + ((codepoint >> 6) & 0x3f); + utf8[3] = 0x80 + ((codepoint >> 0) & 0x3f); + } + + if(0 < codepoint && codepoint <= 0x10ffff) + fn_append(r, (char*)utf8, (MD_SIZE)n); + else + fn_append(r, utf8_replacement_char, 3); +} + +/* Translate entity to its UTF-8 equivalent, or output the verbatim one + * if such entity is unknown (or if the translation is disabled). */ +static void +render_entity(MD_HTML* r, const MD_CHAR* text, MD_SIZE size, + void (*fn_append)(MD_HTML*, const MD_CHAR*, MD_SIZE)) +{ + if(r->flags & MD_HTML_FLAG_VERBATIM_ENTITIES) { + render_verbatim(r, text, size); + return; + } + + /* We assume UTF-8 output is what is desired. */ + if(size > 3 && text[1] == '#') { + unsigned codepoint = 0; + + if(text[2] == 'x' || text[2] == 'X') { + /* Hexadecimal entity (e.g. "�")). */ + MD_SIZE i; + for(i = 3; i < size-1; i++) + codepoint = 16 * codepoint + hex_val(text[i]); + } else { + /* Decimal entity (e.g. "&1234;") */ + MD_SIZE i; + for(i = 2; i < size-1; i++) + codepoint = 10 * codepoint + (text[i] - '0'); + } + + render_utf8_codepoint(r, codepoint, fn_append); + return; + } else { + /* Named entity (e.g. " "). */ + const struct entity* ent; + + ent = entity_lookup(text, size); + if(ent != NULL) { + render_utf8_codepoint(r, ent->codepoints[0], fn_append); + if(ent->codepoints[1]) + render_utf8_codepoint(r, ent->codepoints[1], fn_append); + return; + } + } + + fn_append(r, text, size); +} + +static void +render_attribute(MD_HTML* r, const MD_ATTRIBUTE* attr, + void (*fn_append)(MD_HTML*, const MD_CHAR*, MD_SIZE)) +{ + int i; + + for(i = 0; attr->substr_offsets[i] < attr->size; i++) { + MD_TEXTTYPE type = attr->substr_types[i]; + MD_OFFSET off = attr->substr_offsets[i]; + MD_SIZE size = attr->substr_offsets[i+1] - off; + const MD_CHAR* text = attr->text + off; + + switch(type) { + case MD_TEXT_NULLCHAR: render_utf8_codepoint(r, 0x0000, render_verbatim); break; + case MD_TEXT_ENTITY: render_entity(r, text, size, fn_append); break; + default: fn_append(r, text, size); break; + } + } +} + + +static void +render_open_ol_block(MD_HTML* r, const MD_BLOCK_OL_DETAIL* det) +{ + char buf[64]; + + if(det->start == 1) { + RENDER_VERBATIM(r, "<ol>\n"); + return; + } + + snprintf(buf, sizeof(buf), "<ol start=\"%u\">\n", det->start); + RENDER_VERBATIM(r, buf); +} + +static void +render_open_li_block(MD_HTML* r, const MD_BLOCK_LI_DETAIL* det) +{ + if(det->is_task) { + RENDER_VERBATIM(r, "<li class=\"task-list-item\">" + "<input type=\"checkbox\" class=\"task-list-item-checkbox\" disabled"); + if(det->task_mark == 'x' || det->task_mark == 'X') + RENDER_VERBATIM(r, " checked"); + RENDER_VERBATIM(r, ">"); + } else { + RENDER_VERBATIM(r, "<li>"); + } +} + +static void +render_open_code_block(MD_HTML* r, const MD_BLOCK_CODE_DETAIL* det) +{ + RENDER_VERBATIM(r, "<pre><code"); + + /* If known, output the HTML 5 attribute class="language-LANGNAME". */ + if(det->lang.text != NULL) { + RENDER_VERBATIM(r, " class=\"language-"); + render_attribute(r, &det->lang, render_html_escaped); + RENDER_VERBATIM(r, "\""); + } + + RENDER_VERBATIM(r, ">"); +} + +static void +render_open_td_block(MD_HTML* r, const MD_CHAR* cell_type, const MD_BLOCK_TD_DETAIL* det) +{ + RENDER_VERBATIM(r, "<"); + RENDER_VERBATIM(r, cell_type); + + switch(det->align) { + case MD_ALIGN_LEFT: RENDER_VERBATIM(r, " align=\"left\">"); break; + case MD_ALIGN_CENTER: RENDER_VERBATIM(r, " align=\"center\">"); break; + case MD_ALIGN_RIGHT: RENDER_VERBATIM(r, " align=\"right\">"); break; + default: RENDER_VERBATIM(r, ">"); break; + } +} + +static void +render_open_a_span(MD_HTML* r, const MD_SPAN_A_DETAIL* det) +{ + RENDER_VERBATIM(r, "<a href=\""); + render_attribute(r, &det->href, render_url_escaped); + + if(det->title.text != NULL) { + RENDER_VERBATIM(r, "\" title=\""); + render_attribute(r, &det->title, render_html_escaped); + } + + RENDER_VERBATIM(r, "\">"); +} + +static void +render_open_img_span(MD_HTML* r, const MD_SPAN_IMG_DETAIL* det) +{ + RENDER_VERBATIM(r, "<img src=\""); + render_attribute(r, &det->src, render_url_escaped); + + RENDER_VERBATIM(r, "\" alt=\""); + + r->image_nesting_level++; +} + +static void +render_close_img_span(MD_HTML* r, const MD_SPAN_IMG_DETAIL* det) +{ + if(det->title.text != NULL) { + RENDER_VERBATIM(r, "\" title=\""); + render_attribute(r, &det->title, render_html_escaped); + } + + RENDER_VERBATIM(r, (r->flags & MD_HTML_FLAG_XHTML) ? "\" />" : "\">"); + + r->image_nesting_level--; +} + +static void +render_open_wikilink_span(MD_HTML* r, const MD_SPAN_WIKILINK_DETAIL* det) +{ + RENDER_VERBATIM(r, "<x-wikilink data-target=\""); + render_attribute(r, &det->target, render_html_escaped); + + RENDER_VERBATIM(r, "\">"); +} + + +/************************************** + *** HTML renderer implementation *** + **************************************/ + +static int +enter_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata) +{ + static const MD_CHAR* head[6] = { "<h1>", "<h2>", "<h3>", "<h4>", "<h5>", "<h6>" }; + MD_HTML* r = (MD_HTML*) userdata; + + switch(type) { + case MD_BLOCK_DOC: /* noop */ break; + case MD_BLOCK_QUOTE: RENDER_VERBATIM(r, "<blockquote>\n"); break; + case MD_BLOCK_UL: RENDER_VERBATIM(r, "<ul>\n"); break; + case MD_BLOCK_OL: render_open_ol_block(r, (const MD_BLOCK_OL_DETAIL*)detail); break; + case MD_BLOCK_LI: render_open_li_block(r, (const MD_BLOCK_LI_DETAIL*)detail); break; + case MD_BLOCK_HR: RENDER_VERBATIM(r, (r->flags & MD_HTML_FLAG_XHTML) ? "<hr />\n" : "<hr>\n"); break; + case MD_BLOCK_H: RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break; + case MD_BLOCK_CODE: render_open_code_block(r, (const MD_BLOCK_CODE_DETAIL*) detail); break; + case MD_BLOCK_HTML: /* noop */ break; + case MD_BLOCK_P: RENDER_VERBATIM(r, "<p>"); break; + case MD_BLOCK_TABLE: RENDER_VERBATIM(r, "<table>\n"); break; + case MD_BLOCK_THEAD: RENDER_VERBATIM(r, "<thead>\n"); break; + case MD_BLOCK_TBODY: RENDER_VERBATIM(r, "<tbody>\n"); break; + case MD_BLOCK_TR: RENDER_VERBATIM(r, "<tr>\n"); break; + case MD_BLOCK_TH: render_open_td_block(r, "th", (MD_BLOCK_TD_DETAIL*)detail); break; + case MD_BLOCK_TD: render_open_td_block(r, "td", (MD_BLOCK_TD_DETAIL*)detail); break; + } + + return 0; +} + +static int +leave_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata) +{ + static const MD_CHAR* head[6] = { "</h1>\n", "</h2>\n", "</h3>\n", "</h4>\n", "</h5>\n", "</h6>\n" }; + MD_HTML* r = (MD_HTML*) userdata; + + switch(type) { + case MD_BLOCK_DOC: /*noop*/ break; + case MD_BLOCK_QUOTE: RENDER_VERBATIM(r, "</blockquote>\n"); break; + case MD_BLOCK_UL: RENDER_VERBATIM(r, "</ul>\n"); break; + case MD_BLOCK_OL: RENDER_VERBATIM(r, "</ol>\n"); break; + case MD_BLOCK_LI: RENDER_VERBATIM(r, "</li>\n"); break; + case MD_BLOCK_HR: /*noop*/ break; + case MD_BLOCK_H: RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break; + case MD_BLOCK_CODE: RENDER_VERBATIM(r, "</code></pre>\n"); break; + case MD_BLOCK_HTML: /* noop */ break; + case MD_BLOCK_P: RENDER_VERBATIM(r, "</p>\n"); break; + case MD_BLOCK_TABLE: RENDER_VERBATIM(r, "</table>\n"); break; + case MD_BLOCK_THEAD: RENDER_VERBATIM(r, "</thead>\n"); break; + case MD_BLOCK_TBODY: RENDER_VERBATIM(r, "</tbody>\n"); break; + case MD_BLOCK_TR: RENDER_VERBATIM(r, "</tr>\n"); break; + case MD_BLOCK_TH: RENDER_VERBATIM(r, "</th>\n"); break; + case MD_BLOCK_TD: RENDER_VERBATIM(r, "</td>\n"); break; + } + + return 0; +} + +static int +enter_span_callback(MD_SPANTYPE type, void* detail, void* userdata) +{ + MD_HTML* r = (MD_HTML*) userdata; + + if(r->image_nesting_level > 0) { + /* We are inside a Markdown image label. Markdown allows to use any + * emphasis and other rich contents in that context similarly as in + * any link label. + * + * However, unlike in the case of links (where that contents becomes + * contents of the <a>...</a> tag), in the case of images the contents + * is supposed to fall into the attribute alt: <img alt="...">. + * + * In that context we naturally cannot output nested HTML tags. So lets + * suppress them and only output the plain text (i.e. what falls into + * text() callback). + * + * This make-it-a-plain-text approach is the recommended practice by + * CommonMark specification (for HTML output). + */ + return 0; + } + + switch(type) { + case MD_SPAN_EM: RENDER_VERBATIM(r, "<em>"); break; + case MD_SPAN_STRONG: RENDER_VERBATIM(r, "<strong>"); break; + case MD_SPAN_U: RENDER_VERBATIM(r, "<u>"); break; + case MD_SPAN_A: render_open_a_span(r, (MD_SPAN_A_DETAIL*) detail); break; + case MD_SPAN_IMG: render_open_img_span(r, (MD_SPAN_IMG_DETAIL*) detail); break; + case MD_SPAN_CODE: RENDER_VERBATIM(r, "<code>"); break; + case MD_SPAN_DEL: RENDER_VERBATIM(r, "<del>"); break; + case MD_SPAN_LATEXMATH: RENDER_VERBATIM(r, "<x-equation>"); break; + case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "<x-equation type=\"display\">"); break; + case MD_SPAN_WIKILINK: render_open_wikilink_span(r, (MD_SPAN_WIKILINK_DETAIL*) detail); break; + } + + return 0; +} + +static int +leave_span_callback(MD_SPANTYPE type, void* detail, void* userdata) +{ + MD_HTML* r = (MD_HTML*) userdata; + + if(r->image_nesting_level > 0) { + /* Ditto as in enter_span_callback(), except we have to allow the + * end of the <img> tag. */ + if(r->image_nesting_level == 1 && type == MD_SPAN_IMG) + render_close_img_span(r, (MD_SPAN_IMG_DETAIL*) detail); + return 0; + } + + switch(type) { + case MD_SPAN_EM: RENDER_VERBATIM(r, "</em>"); break; + case MD_SPAN_STRONG: RENDER_VERBATIM(r, "</strong>"); break; + case MD_SPAN_U: RENDER_VERBATIM(r, "</u>"); break; + case MD_SPAN_A: RENDER_VERBATIM(r, "</a>"); break; + case MD_SPAN_IMG: /*noop, handled above*/ break; + case MD_SPAN_CODE: RENDER_VERBATIM(r, "</code>"); break; + case MD_SPAN_DEL: RENDER_VERBATIM(r, "</del>"); break; + case MD_SPAN_LATEXMATH: /*fall through*/ + case MD_SPAN_LATEXMATH_DISPLAY: RENDER_VERBATIM(r, "</x-equation>"); break; + case MD_SPAN_WIKILINK: RENDER_VERBATIM(r, "</x-wikilink>"); break; + } + + return 0; +} + +static int +text_callback(MD_TEXTTYPE type, const MD_CHAR* text, MD_SIZE size, void* userdata) +{ + MD_HTML* r = (MD_HTML*) userdata; + + switch(type) { + case MD_TEXT_NULLCHAR: render_utf8_codepoint(r, 0x0000, render_verbatim); break; + case MD_TEXT_BR: RENDER_VERBATIM(r, (r->image_nesting_level == 0 + ? ((r->flags & MD_HTML_FLAG_XHTML) ? "<br />\n" : "<br>\n") + : " ")); + break; + case MD_TEXT_SOFTBR: RENDER_VERBATIM(r, (r->image_nesting_level == 0 ? "\n" : " ")); break; + case MD_TEXT_HTML: render_verbatim(r, text, size); break; + case MD_TEXT_ENTITY: render_entity(r, text, size, render_html_escaped); break; + default: render_html_escaped(r, text, size); break; + } + + return 0; +} + +static void +debug_log_callback(const char* msg, void* userdata) +{ + MD_HTML* r = (MD_HTML*) userdata; + if(r->flags & MD_HTML_FLAG_DEBUG) + fprintf(stderr, "MD4C: %s\n", msg); +} + +int +md_html(const MD_CHAR* input, MD_SIZE input_size, + void (*process_output)(const MD_CHAR*, MD_SIZE, void*), + void* userdata, unsigned parser_flags, unsigned renderer_flags) +{ + MD_HTML render = { process_output, userdata, renderer_flags, 0, { 0 } }; + int i; + + MD_PARSER parser = { + 0, + parser_flags, + enter_block_callback, + leave_block_callback, + enter_span_callback, + leave_span_callback, + text_callback, + debug_log_callback, + NULL + }; + + /* Build map of characters which need escaping. */ + for(i = 0; i < 256; i++) { + unsigned char ch = (unsigned char) i; + + if(strchr("\"&<>", ch) != NULL) + render.escape_map[i] |= NEED_HTML_ESC_FLAG; + + if(!ISALNUM(ch) && strchr("~-_.+!*(),%#@?=;:/,+$", ch) == NULL) + render.escape_map[i] |= NEED_URL_ESC_FLAG; + } + + /* Consider skipping UTF-8 byte order mark (BOM). */ + if(renderer_flags & MD_HTML_FLAG_SKIP_UTF8_BOM && sizeof(MD_CHAR) == 1) { + static const MD_CHAR bom[3] = { 0xef, 0xbb, 0xbf }; + if(input_size >= sizeof(bom) && memcmp(input, bom, sizeof(bom)) == 0) { + input += sizeof(bom); + input_size -= sizeof(bom); + } + } + + return md_parse(input, input_size, &parser, (void*) &render); +} + diff --git a/tdemarkdown/md4c/src/md4c-html.h b/tdemarkdown/md4c/src/md4c-html.h new file mode 100644 index 000000000..23d3f7396 --- /dev/null +++ b/tdemarkdown/md4c/src/md4c-html.h @@ -0,0 +1,68 @@ +/* + * MD4C: Markdown parser for C + * (http://github.com/mity/md4c) + * + * Copyright (c) 2016-2017 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef MD4C_HTML_H +#define MD4C_HTML_H + +#include "md4c.h" + +#ifdef __cplusplus + extern "C" { +#endif + + +/* If set, debug output from md_parse() is sent to stderr. */ +#define MD_HTML_FLAG_DEBUG 0x0001 +#define MD_HTML_FLAG_VERBATIM_ENTITIES 0x0002 +#define MD_HTML_FLAG_SKIP_UTF8_BOM 0x0004 +#define MD_HTML_FLAG_XHTML 0x0008 + + +/* Render Markdown into HTML. + * + * Note only contents of <body> tag is generated. Caller must generate + * HTML header/footer manually before/after calling md_html(). + * + * Params input and input_size specify the Markdown input. + * Callback process_output() gets called with chunks of HTML output. + * (Typical implementation may just output the bytes to a file or append to + * some buffer). + * Param userdata is just propagated back to process_output() callback. + * Param parser_flags are flags from md4c.h propagated to md_parse(). + * Param render_flags is bitmask of MD_HTML_FLAG_xxxx. + * + * Returns -1 on error (if md_parse() fails.) + * Returns 0 on success. + */ +int md_html(const MD_CHAR* input, MD_SIZE input_size, + void (*process_output)(const MD_CHAR*, MD_SIZE, void*), + void* userdata, unsigned parser_flags, unsigned renderer_flags); + + +#ifdef __cplusplus + } /* extern "C" { */ +#endif + +#endif /* MD4C_HTML_H */ diff --git a/tdemarkdown/md4c/src/md4c-html.pc.in b/tdemarkdown/md4c/src/md4c-html.pc.in new file mode 100644 index 000000000..504bb52ea --- /dev/null +++ b/tdemarkdown/md4c/src/md4c-html.pc.in @@ -0,0 +1,13 @@ +prefix=@CMAKE_INSTALL_PREFIX@ +exec_prefix=@CMAKE_INSTALL_PREFIX@ +libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@ +includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@ + +Name: @PROJECT_NAME@ HTML renderer +Description: Markdown to HTML converter library. +Version: @PROJECT_VERSION@ +URL: @PROJECT_URL@ + +Requires: md4c = @PROJECT_VERSION@ +Libs: -L${libdir} -lmd4c-html +Cflags: -I${includedir} diff --git a/tdemarkdown/md4c/src/md4c.c b/tdemarkdown/md4c/src/md4c.c new file mode 100644 index 000000000..3677c0e06 --- /dev/null +++ b/tdemarkdown/md4c/src/md4c.c @@ -0,0 +1,6410 @@ +/* + * MD4C: Markdown parser for C + * (http://github.com/mity/md4c) + * + * Copyright (c) 2016-2020 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "md4c.h" + +#include <limits.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + + +/***************************** + *** Miscellaneous Stuff *** + *****************************/ + +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199409L + /* C89/90 or old compilers in general may not understand "inline". */ + #if defined __GNUC__ + #define inline __inline__ + #elif defined _MSC_VER + #define inline __inline + #else + #define inline + #endif +#endif + +/* Make the UTF-8 support the default. */ +#if !defined MD4C_USE_ASCII && !defined MD4C_USE_UTF8 && !defined MD4C_USE_UTF16 + #define MD4C_USE_UTF8 +#endif + +/* Magic for making wide literals with MD4C_USE_UTF16. */ +#ifdef _T + #undef _T +#endif +#if defined MD4C_USE_UTF16 + #define _T(x) L##x +#else + #define _T(x) x +#endif + +/* Misc. macros. */ +#define SIZEOF_ARRAY(a) (sizeof(a) / sizeof(a[0])) + +#define STRINGIZE_(x) #x +#define STRINGIZE(x) STRINGIZE_(x) + +#ifndef TRUE + #define TRUE 1 + #define FALSE 0 +#endif + +#define MD_LOG(msg) \ + do { \ + if(ctx->parser.debug_log != NULL) \ + ctx->parser.debug_log((msg), ctx->userdata); \ + } while(0) + +#ifdef DEBUG + #define MD_ASSERT(cond) \ + do { \ + if(!(cond)) { \ + MD_LOG(__FILE__ ":" STRINGIZE(__LINE__) ": " \ + "Assertion '" STRINGIZE(cond) "' failed."); \ + exit(1); \ + } \ + } while(0) + + #define MD_UNREACHABLE() MD_ASSERT(1 == 0) +#else + #ifdef __GNUC__ + #define MD_ASSERT(cond) do { if(!(cond)) __builtin_unreachable(); } while(0) + #define MD_UNREACHABLE() do { __builtin_unreachable(); } while(0) + #elif defined _MSC_VER && _MSC_VER > 120 + #define MD_ASSERT(cond) do { __assume(cond); } while(0) + #define MD_UNREACHABLE() do { __assume(0); } while(0) + #else + #define MD_ASSERT(cond) do {} while(0) + #define MD_UNREACHABLE() do {} while(0) + #endif +#endif + +/* For falling through case labels in switch statements. */ +#if defined __clang__ && __clang_major__ >= 12 + #define MD_FALLTHROUGH() __attribute__((fallthrough)) +#elif defined __GNUC__ && __GNUC__ >= 7 + #define MD_FALLTHROUGH() __attribute__((fallthrough)) +#else + #define MD_FALLTHROUGH() ((void)0) +#endif + +/* Suppress "unused parameter" warnings. */ +#define MD_UNUSED(x) ((void)x) + + +/************************ + *** Internal Types *** + ************************/ + +/* These are omnipresent so lets save some typing. */ +#define CHAR MD_CHAR +#define SZ MD_SIZE +#define OFF MD_OFFSET + +typedef struct MD_MARK_tag MD_MARK; +typedef struct MD_BLOCK_tag MD_BLOCK; +typedef struct MD_CONTAINER_tag MD_CONTAINER; +typedef struct MD_REF_DEF_tag MD_REF_DEF; + + +/* During analyzes of inline marks, we need to manage some "mark chains", + * of (yet unresolved) openers. This structure holds start/end of the chain. + * The chain internals are then realized through MD_MARK::prev and ::next. + */ +typedef struct MD_MARKCHAIN_tag MD_MARKCHAIN; +struct MD_MARKCHAIN_tag { + int head; /* Index of first mark in the chain, or -1 if empty. */ + int tail; /* Index of last mark in the chain, or -1 if empty. */ +}; + +/* Context propagated through all the parsing. */ +typedef struct MD_CTX_tag MD_CTX; +struct MD_CTX_tag { + /* Immutable stuff (parameters of md_parse()). */ + const CHAR* text; + SZ size; + MD_PARSER parser; + void* userdata; + + /* When this is true, it allows some optimizations. */ + int doc_ends_with_newline; + + /* Helper temporary growing buffer. */ + CHAR* buffer; + unsigned alloc_buffer; + + /* Reference definitions. */ + MD_REF_DEF* ref_defs; + int n_ref_defs; + int alloc_ref_defs; + void** ref_def_hashtable; + int ref_def_hashtable_size; + + /* Stack of inline/span markers. + * This is only used for parsing a single block contents but by storing it + * here we may reuse the stack for subsequent blocks; i.e. we have fewer + * (re)allocations. */ + MD_MARK* marks; + int n_marks; + int alloc_marks; + +#if defined MD4C_USE_UTF16 + char mark_char_map[128]; +#else + char mark_char_map[256]; +#endif + + /* For resolving of inline spans. */ + MD_MARKCHAIN mark_chains[13]; +#define PTR_CHAIN (ctx->mark_chains[0]) +#define TABLECELLBOUNDARIES (ctx->mark_chains[1]) +#define ASTERISK_OPENERS_extraword_mod3_0 (ctx->mark_chains[2]) +#define ASTERISK_OPENERS_extraword_mod3_1 (ctx->mark_chains[3]) +#define ASTERISK_OPENERS_extraword_mod3_2 (ctx->mark_chains[4]) +#define ASTERISK_OPENERS_intraword_mod3_0 (ctx->mark_chains[5]) +#define ASTERISK_OPENERS_intraword_mod3_1 (ctx->mark_chains[6]) +#define ASTERISK_OPENERS_intraword_mod3_2 (ctx->mark_chains[7]) +#define UNDERSCORE_OPENERS (ctx->mark_chains[8]) +#define TILDE_OPENERS_1 (ctx->mark_chains[9]) +#define TILDE_OPENERS_2 (ctx->mark_chains[10]) +#define BRACKET_OPENERS (ctx->mark_chains[11]) +#define DOLLAR_OPENERS (ctx->mark_chains[12]) +#define OPENERS_CHAIN_FIRST 1 +#define OPENERS_CHAIN_LAST 12 + + int n_table_cell_boundaries; + + /* For resolving links. */ + int unresolved_link_head; + int unresolved_link_tail; + + /* For resolving raw HTML. */ + OFF html_comment_horizon; + OFF html_proc_instr_horizon; + OFF html_decl_horizon; + OFF html_cdata_horizon; + + /* For block analysis. + * Notes: + * -- It holds MD_BLOCK as well as MD_LINE structures. After each + * MD_BLOCK, its (multiple) MD_LINE(s) follow. + * -- For MD_BLOCK_HTML and MD_BLOCK_CODE, MD_VERBATIMLINE(s) are used + * instead of MD_LINE(s). + */ + void* block_bytes; + MD_BLOCK* current_block; + int n_block_bytes; + int alloc_block_bytes; + + /* For container block analysis. */ + MD_CONTAINER* containers; + int n_containers; + int alloc_containers; + + /* Minimal indentation to call the block "indented code block". */ + unsigned code_indent_offset; + + /* Contextual info for line analysis. */ + SZ code_fence_length; /* For checking closing fence length. */ + int html_block_type; /* For checking closing raw HTML condition. */ + int last_line_has_list_loosening_effect; + int last_list_item_starts_with_two_blank_lines; +}; + +enum MD_LINETYPE_tag { + MD_LINE_BLANK, + MD_LINE_HR, + MD_LINE_ATXHEADER, + MD_LINE_SETEXTHEADER, + MD_LINE_SETEXTUNDERLINE, + MD_LINE_INDENTEDCODE, + MD_LINE_FENCEDCODE, + MD_LINE_HTML, + MD_LINE_TEXT, + MD_LINE_TABLE, + MD_LINE_TABLEUNDERLINE +}; +typedef enum MD_LINETYPE_tag MD_LINETYPE; + +typedef struct MD_LINE_ANALYSIS_tag MD_LINE_ANALYSIS; +struct MD_LINE_ANALYSIS_tag { + MD_LINETYPE type : 16; + unsigned data : 16; + OFF beg; + OFF end; + unsigned indent; /* Indentation level. */ +}; + +typedef struct MD_LINE_tag MD_LINE; +struct MD_LINE_tag { + OFF beg; + OFF end; +}; + +typedef struct MD_VERBATIMLINE_tag MD_VERBATIMLINE; +struct MD_VERBATIMLINE_tag { + OFF beg; + OFF end; + OFF indent; +}; + + +/***************** + *** Helpers *** + *****************/ + +/* Character accessors. */ +#define CH(off) (ctx->text[(off)]) +#define STR(off) (ctx->text + (off)) + +/* Character classification. + * Note we assume ASCII compatibility of code points < 128 here. */ +#define ISIN_(ch, ch_min, ch_max) ((ch_min) <= (unsigned)(ch) && (unsigned)(ch) <= (ch_max)) +#define ISANYOF_(ch, palette) ((ch) != _T('\0') && md_strchr((palette), (ch)) != NULL) +#define ISANYOF2_(ch, ch1, ch2) ((ch) == (ch1) || (ch) == (ch2)) +#define ISANYOF3_(ch, ch1, ch2, ch3) ((ch) == (ch1) || (ch) == (ch2) || (ch) == (ch3)) +#define ISASCII_(ch) ((unsigned)(ch) <= 127) +#define ISBLANK_(ch) (ISANYOF2_((ch), _T(' '), _T('\t'))) +#define ISNEWLINE_(ch) (ISANYOF2_((ch), _T('\r'), _T('\n'))) +#define ISWHITESPACE_(ch) (ISBLANK_(ch) || ISANYOF2_((ch), _T('\v'), _T('\f'))) +#define ISCNTRL_(ch) ((unsigned)(ch) <= 31 || (unsigned)(ch) == 127) +#define ISPUNCT_(ch) (ISIN_(ch, 33, 47) || ISIN_(ch, 58, 64) || ISIN_(ch, 91, 96) || ISIN_(ch, 123, 126)) +#define ISUPPER_(ch) (ISIN_(ch, _T('A'), _T('Z'))) +#define ISLOWER_(ch) (ISIN_(ch, _T('a'), _T('z'))) +#define ISALPHA_(ch) (ISUPPER_(ch) || ISLOWER_(ch)) +#define ISDIGIT_(ch) (ISIN_(ch, _T('0'), _T('9'))) +#define ISXDIGIT_(ch) (ISDIGIT_(ch) || ISIN_(ch, _T('A'), _T('F')) || ISIN_(ch, _T('a'), _T('f'))) +#define ISALNUM_(ch) (ISALPHA_(ch) || ISDIGIT_(ch)) + +#define ISANYOF(off, palette) ISANYOF_(CH(off), (palette)) +#define ISANYOF2(off, ch1, ch2) ISANYOF2_(CH(off), (ch1), (ch2)) +#define ISANYOF3(off, ch1, ch2, ch3) ISANYOF3_(CH(off), (ch1), (ch2), (ch3)) +#define ISASCII(off) ISASCII_(CH(off)) +#define ISBLANK(off) ISBLANK_(CH(off)) +#define ISNEWLINE(off) ISNEWLINE_(CH(off)) +#define ISWHITESPACE(off) ISWHITESPACE_(CH(off)) +#define ISCNTRL(off) ISCNTRL_(CH(off)) +#define ISPUNCT(off) ISPUNCT_(CH(off)) +#define ISUPPER(off) ISUPPER_(CH(off)) +#define ISLOWER(off) ISLOWER_(CH(off)) +#define ISALPHA(off) ISALPHA_(CH(off)) +#define ISDIGIT(off) ISDIGIT_(CH(off)) +#define ISXDIGIT(off) ISXDIGIT_(CH(off)) +#define ISALNUM(off) ISALNUM_(CH(off)) + + +#if defined MD4C_USE_UTF16 + #define md_strchr wcschr +#else + #define md_strchr strchr +#endif + + +/* Case insensitive check of string equality. */ +static inline int +md_ascii_case_eq(const CHAR* s1, const CHAR* s2, SZ n) +{ + OFF i; + for(i = 0; i < n; i++) { + CHAR ch1 = s1[i]; + CHAR ch2 = s2[i]; + + if(ISLOWER_(ch1)) + ch1 += ('A'-'a'); + if(ISLOWER_(ch2)) + ch2 += ('A'-'a'); + if(ch1 != ch2) + return FALSE; + } + return TRUE; +} + +static inline int +md_ascii_eq(const CHAR* s1, const CHAR* s2, SZ n) +{ + return memcmp(s1, s2, n * sizeof(CHAR)) == 0; +} + +static int +md_text_with_null_replacement(MD_CTX* ctx, MD_TEXTTYPE type, const CHAR* str, SZ size) +{ + OFF off = 0; + int ret = 0; + + while(1) { + while(off < size && str[off] != _T('\0')) + off++; + + if(off > 0) { + ret = ctx->parser.text(type, str, off, ctx->userdata); + if(ret != 0) + return ret; + + str += off; + size -= off; + off = 0; + } + + if(off >= size) + return 0; + + ret = ctx->parser.text(MD_TEXT_NULLCHAR, _T(""), 1, ctx->userdata); + if(ret != 0) + return ret; + off++; + } +} + + +#define MD_CHECK(func) \ + do { \ + ret = (func); \ + if(ret < 0) \ + goto abort; \ + } while(0) + + +#define MD_TEMP_BUFFER(sz) \ + do { \ + if(sz > ctx->alloc_buffer) { \ + CHAR* new_buffer; \ + SZ new_size = ((sz) + (sz) / 2 + 128) & ~127; \ + \ + new_buffer = realloc(ctx->buffer, new_size); \ + if(new_buffer == NULL) { \ + MD_LOG("realloc() failed."); \ + ret = -1; \ + goto abort; \ + } \ + \ + ctx->buffer = new_buffer; \ + ctx->alloc_buffer = new_size; \ + } \ + } while(0) + + +#define MD_ENTER_BLOCK(type, arg) \ + do { \ + ret = ctx->parser.enter_block((type), (arg), ctx->userdata); \ + if(ret != 0) { \ + MD_LOG("Aborted from enter_block() callback."); \ + goto abort; \ + } \ + } while(0) + +#define MD_LEAVE_BLOCK(type, arg) \ + do { \ + ret = ctx->parser.leave_block((type), (arg), ctx->userdata); \ + if(ret != 0) { \ + MD_LOG("Aborted from leave_block() callback."); \ + goto abort; \ + } \ + } while(0) + +#define MD_ENTER_SPAN(type, arg) \ + do { \ + ret = ctx->parser.enter_span((type), (arg), ctx->userdata); \ + if(ret != 0) { \ + MD_LOG("Aborted from enter_span() callback."); \ + goto abort; \ + } \ + } while(0) + +#define MD_LEAVE_SPAN(type, arg) \ + do { \ + ret = ctx->parser.leave_span((type), (arg), ctx->userdata); \ + if(ret != 0) { \ + MD_LOG("Aborted from leave_span() callback."); \ + goto abort; \ + } \ + } while(0) + +#define MD_TEXT(type, str, size) \ + do { \ + if(size > 0) { \ + ret = ctx->parser.text((type), (str), (size), ctx->userdata); \ + if(ret != 0) { \ + MD_LOG("Aborted from text() callback."); \ + goto abort; \ + } \ + } \ + } while(0) + +#define MD_TEXT_INSECURE(type, str, size) \ + do { \ + if(size > 0) { \ + ret = md_text_with_null_replacement(ctx, type, str, size); \ + if(ret != 0) { \ + MD_LOG("Aborted from text() callback."); \ + goto abort; \ + } \ + } \ + } while(0) + + +/* If the offset falls into a gap between line, we return the following + * line. */ +static const MD_LINE* +md_lookup_line(OFF off, const MD_LINE* lines, int n_lines) +{ + int lo, hi; + int pivot; + const MD_LINE* line; + + lo = 0; + hi = n_lines - 1; + while(lo <= hi) { + pivot = (lo + hi) / 2; + line = &lines[pivot]; + + if(off < line->beg) { + hi = pivot - 1; + if(hi < 0 || lines[hi].end <= off) + return line; + } else if(off > line->end) { + lo = pivot + 1; + } else { + return line; + } + } + + return NULL; +} + + +/************************* + *** Unicode Support *** + *************************/ + +typedef struct MD_UNICODE_FOLD_INFO_tag MD_UNICODE_FOLD_INFO; +struct MD_UNICODE_FOLD_INFO_tag { + unsigned codepoints[3]; + unsigned n_codepoints; +}; + + +#if defined MD4C_USE_UTF16 || defined MD4C_USE_UTF8 + /* Binary search over sorted "map" of codepoints. Consecutive sequences + * of codepoints may be encoded in the map by just using the + * (MIN_CODEPOINT | 0x40000000) and (MAX_CODEPOINT | 0x80000000). + * + * Returns index of the found record in the map (in the case of ranges, + * the minimal value is used); or -1 on failure. */ + static int + md_unicode_bsearch__(unsigned codepoint, const unsigned* map, size_t map_size) + { + int beg, end; + int pivot_beg, pivot_end; + + beg = 0; + end = (int) map_size-1; + while(beg <= end) { + /* Pivot may be a range, not just a single value. */ + pivot_beg = pivot_end = (beg + end) / 2; + if(map[pivot_end] & 0x40000000) + pivot_end++; + if(map[pivot_beg] & 0x80000000) + pivot_beg--; + + if(codepoint < (map[pivot_beg] & 0x00ffffff)) + end = pivot_beg - 1; + else if(codepoint > (map[pivot_end] & 0x00ffffff)) + beg = pivot_end + 1; + else + return pivot_beg; + } + + return -1; + } + + static int + md_is_unicode_whitespace__(unsigned codepoint) + { +#define R(cp_min, cp_max) ((cp_min) | 0x40000000), ((cp_max) | 0x80000000) +#define S(cp) (cp) + /* Unicode "Zs" category. + * (generated by scripts/build_whitespace_map.py) */ + static const unsigned WHITESPACE_MAP[] = { + S(0x0020), S(0x00a0), S(0x1680), R(0x2000,0x200a), S(0x202f), S(0x205f), S(0x3000) + }; +#undef R +#undef S + + /* The ASCII ones are the most frequently used ones, also CommonMark + * specification requests few more in this range. */ + if(codepoint <= 0x7f) + return ISWHITESPACE_(codepoint); + + return (md_unicode_bsearch__(codepoint, WHITESPACE_MAP, SIZEOF_ARRAY(WHITESPACE_MAP)) >= 0); + } + + static int + md_is_unicode_punct__(unsigned codepoint) + { +#define R(cp_min, cp_max) ((cp_min) | 0x40000000), ((cp_max) | 0x80000000) +#define S(cp) (cp) + /* Unicode "Pc", "Pd", "Pe", "Pf", "Pi", "Po", "Ps" categories. + * (generated by scripts/build_punct_map.py) */ + static const unsigned PUNCT_MAP[] = { + R(0x0021,0x0023), R(0x0025,0x002a), R(0x002c,0x002f), R(0x003a,0x003b), R(0x003f,0x0040), + R(0x005b,0x005d), S(0x005f), S(0x007b), S(0x007d), S(0x00a1), S(0x00a7), S(0x00ab), R(0x00b6,0x00b7), + S(0x00bb), S(0x00bf), S(0x037e), S(0x0387), R(0x055a,0x055f), R(0x0589,0x058a), S(0x05be), S(0x05c0), + S(0x05c3), S(0x05c6), R(0x05f3,0x05f4), R(0x0609,0x060a), R(0x060c,0x060d), S(0x061b), R(0x061e,0x061f), + R(0x066a,0x066d), S(0x06d4), R(0x0700,0x070d), R(0x07f7,0x07f9), R(0x0830,0x083e), S(0x085e), + R(0x0964,0x0965), S(0x0970), S(0x09fd), S(0x0a76), S(0x0af0), S(0x0c77), S(0x0c84), S(0x0df4), S(0x0e4f), + R(0x0e5a,0x0e5b), R(0x0f04,0x0f12), S(0x0f14), R(0x0f3a,0x0f3d), S(0x0f85), R(0x0fd0,0x0fd4), + R(0x0fd9,0x0fda), R(0x104a,0x104f), S(0x10fb), R(0x1360,0x1368), S(0x1400), S(0x166e), R(0x169b,0x169c), + R(0x16eb,0x16ed), R(0x1735,0x1736), R(0x17d4,0x17d6), R(0x17d8,0x17da), R(0x1800,0x180a), + R(0x1944,0x1945), R(0x1a1e,0x1a1f), R(0x1aa0,0x1aa6), R(0x1aa8,0x1aad), R(0x1b5a,0x1b60), + R(0x1bfc,0x1bff), R(0x1c3b,0x1c3f), R(0x1c7e,0x1c7f), R(0x1cc0,0x1cc7), S(0x1cd3), R(0x2010,0x2027), + R(0x2030,0x2043), R(0x2045,0x2051), R(0x2053,0x205e), R(0x207d,0x207e), R(0x208d,0x208e), + R(0x2308,0x230b), R(0x2329,0x232a), R(0x2768,0x2775), R(0x27c5,0x27c6), R(0x27e6,0x27ef), + R(0x2983,0x2998), R(0x29d8,0x29db), R(0x29fc,0x29fd), R(0x2cf9,0x2cfc), R(0x2cfe,0x2cff), S(0x2d70), + R(0x2e00,0x2e2e), R(0x2e30,0x2e4f), S(0x2e52), R(0x3001,0x3003), R(0x3008,0x3011), R(0x3014,0x301f), + S(0x3030), S(0x303d), S(0x30a0), S(0x30fb), R(0xa4fe,0xa4ff), R(0xa60d,0xa60f), S(0xa673), S(0xa67e), + R(0xa6f2,0xa6f7), R(0xa874,0xa877), R(0xa8ce,0xa8cf), R(0xa8f8,0xa8fa), S(0xa8fc), R(0xa92e,0xa92f), + S(0xa95f), R(0xa9c1,0xa9cd), R(0xa9de,0xa9df), R(0xaa5c,0xaa5f), R(0xaade,0xaadf), R(0xaaf0,0xaaf1), + S(0xabeb), R(0xfd3e,0xfd3f), R(0xfe10,0xfe19), R(0xfe30,0xfe52), R(0xfe54,0xfe61), S(0xfe63), S(0xfe68), + R(0xfe6a,0xfe6b), R(0xff01,0xff03), R(0xff05,0xff0a), R(0xff0c,0xff0f), R(0xff1a,0xff1b), + R(0xff1f,0xff20), R(0xff3b,0xff3d), S(0xff3f), S(0xff5b), S(0xff5d), R(0xff5f,0xff65), R(0x10100,0x10102), + S(0x1039f), S(0x103d0), S(0x1056f), S(0x10857), S(0x1091f), S(0x1093f), R(0x10a50,0x10a58), S(0x10a7f), + R(0x10af0,0x10af6), R(0x10b39,0x10b3f), R(0x10b99,0x10b9c), S(0x10ead), R(0x10f55,0x10f59), + R(0x11047,0x1104d), R(0x110bb,0x110bc), R(0x110be,0x110c1), R(0x11140,0x11143), R(0x11174,0x11175), + R(0x111c5,0x111c8), S(0x111cd), S(0x111db), R(0x111dd,0x111df), R(0x11238,0x1123d), S(0x112a9), + R(0x1144b,0x1144f), R(0x1145a,0x1145b), S(0x1145d), S(0x114c6), R(0x115c1,0x115d7), R(0x11641,0x11643), + R(0x11660,0x1166c), R(0x1173c,0x1173e), S(0x1183b), R(0x11944,0x11946), S(0x119e2), R(0x11a3f,0x11a46), + R(0x11a9a,0x11a9c), R(0x11a9e,0x11aa2), R(0x11c41,0x11c45), R(0x11c70,0x11c71), R(0x11ef7,0x11ef8), + S(0x11fff), R(0x12470,0x12474), R(0x16a6e,0x16a6f), S(0x16af5), R(0x16b37,0x16b3b), S(0x16b44), + R(0x16e97,0x16e9a), S(0x16fe2), S(0x1bc9f), R(0x1da87,0x1da8b), R(0x1e95e,0x1e95f) + }; +#undef R +#undef S + + /* The ASCII ones are the most frequently used ones, also CommonMark + * specification requests few more in this range. */ + if(codepoint <= 0x7f) + return ISPUNCT_(codepoint); + + return (md_unicode_bsearch__(codepoint, PUNCT_MAP, SIZEOF_ARRAY(PUNCT_MAP)) >= 0); + } + + static void + md_get_unicode_fold_info(unsigned codepoint, MD_UNICODE_FOLD_INFO* info) + { +#define R(cp_min, cp_max) ((cp_min) | 0x40000000), ((cp_max) | 0x80000000) +#define S(cp) (cp) + /* Unicode "Pc", "Pd", "Pe", "Pf", "Pi", "Po", "Ps" categories. + * (generated by scripts/build_folding_map.py) */ + static const unsigned FOLD_MAP_1[] = { + R(0x0041,0x005a), S(0x00b5), R(0x00c0,0x00d6), R(0x00d8,0x00de), R(0x0100,0x012e), R(0x0132,0x0136), + R(0x0139,0x0147), R(0x014a,0x0176), S(0x0178), R(0x0179,0x017d), S(0x017f), S(0x0181), S(0x0182), + S(0x0184), S(0x0186), S(0x0187), S(0x0189), S(0x018a), S(0x018b), S(0x018e), S(0x018f), S(0x0190), + S(0x0191), S(0x0193), S(0x0194), S(0x0196), S(0x0197), S(0x0198), S(0x019c), S(0x019d), S(0x019f), + R(0x01a0,0x01a4), S(0x01a6), S(0x01a7), S(0x01a9), S(0x01ac), S(0x01ae), S(0x01af), S(0x01b1), S(0x01b2), + S(0x01b3), S(0x01b5), S(0x01b7), S(0x01b8), S(0x01bc), S(0x01c4), S(0x01c5), S(0x01c7), S(0x01c8), + S(0x01ca), R(0x01cb,0x01db), R(0x01de,0x01ee), S(0x01f1), S(0x01f2), S(0x01f4), S(0x01f6), S(0x01f7), + R(0x01f8,0x021e), S(0x0220), R(0x0222,0x0232), S(0x023a), S(0x023b), S(0x023d), S(0x023e), S(0x0241), + S(0x0243), S(0x0244), S(0x0245), R(0x0246,0x024e), S(0x0345), S(0x0370), S(0x0372), S(0x0376), S(0x037f), + S(0x0386), R(0x0388,0x038a), S(0x038c), S(0x038e), S(0x038f), R(0x0391,0x03a1), R(0x03a3,0x03ab), + S(0x03c2), S(0x03cf), S(0x03d0), S(0x03d1), S(0x03d5), S(0x03d6), R(0x03d8,0x03ee), S(0x03f0), S(0x03f1), + S(0x03f4), S(0x03f5), S(0x03f7), S(0x03f9), S(0x03fa), R(0x03fd,0x03ff), R(0x0400,0x040f), + R(0x0410,0x042f), R(0x0460,0x0480), R(0x048a,0x04be), S(0x04c0), R(0x04c1,0x04cd), R(0x04d0,0x052e), + R(0x0531,0x0556), R(0x10a0,0x10c5), S(0x10c7), S(0x10cd), R(0x13f8,0x13fd), S(0x1c80), S(0x1c81), + S(0x1c82), S(0x1c83), S(0x1c84), S(0x1c85), S(0x1c86), S(0x1c87), S(0x1c88), R(0x1c90,0x1cba), + R(0x1cbd,0x1cbf), R(0x1e00,0x1e94), S(0x1e9b), R(0x1ea0,0x1efe), R(0x1f08,0x1f0f), R(0x1f18,0x1f1d), + R(0x1f28,0x1f2f), R(0x1f38,0x1f3f), R(0x1f48,0x1f4d), S(0x1f59), S(0x1f5b), S(0x1f5d), S(0x1f5f), + R(0x1f68,0x1f6f), S(0x1fb8), S(0x1fb9), S(0x1fba), S(0x1fbb), S(0x1fbe), R(0x1fc8,0x1fcb), S(0x1fd8), + S(0x1fd9), S(0x1fda), S(0x1fdb), S(0x1fe8), S(0x1fe9), S(0x1fea), S(0x1feb), S(0x1fec), S(0x1ff8), + S(0x1ff9), S(0x1ffa), S(0x1ffb), S(0x2126), S(0x212a), S(0x212b), S(0x2132), R(0x2160,0x216f), S(0x2183), + R(0x24b6,0x24cf), R(0x2c00,0x2c2e), S(0x2c60), S(0x2c62), S(0x2c63), S(0x2c64), R(0x2c67,0x2c6b), + S(0x2c6d), S(0x2c6e), S(0x2c6f), S(0x2c70), S(0x2c72), S(0x2c75), S(0x2c7e), S(0x2c7f), R(0x2c80,0x2ce2), + S(0x2ceb), S(0x2ced), S(0x2cf2), R(0xa640,0xa66c), R(0xa680,0xa69a), R(0xa722,0xa72e), R(0xa732,0xa76e), + S(0xa779), S(0xa77b), S(0xa77d), R(0xa77e,0xa786), S(0xa78b), S(0xa78d), S(0xa790), S(0xa792), + R(0xa796,0xa7a8), S(0xa7aa), S(0xa7ab), S(0xa7ac), S(0xa7ad), S(0xa7ae), S(0xa7b0), S(0xa7b1), S(0xa7b2), + S(0xa7b3), R(0xa7b4,0xa7be), S(0xa7c2), S(0xa7c4), S(0xa7c5), S(0xa7c6), S(0xa7c7), S(0xa7c9), S(0xa7f5), + R(0xab70,0xabbf), R(0xff21,0xff3a), R(0x10400,0x10427), R(0x104b0,0x104d3), R(0x10c80,0x10cb2), + R(0x118a0,0x118bf), R(0x16e40,0x16e5f), R(0x1e900,0x1e921) + }; + static const unsigned FOLD_MAP_1_DATA[] = { + 0x0061, 0x007a, 0x03bc, 0x00e0, 0x00f6, 0x00f8, 0x00fe, 0x0101, 0x012f, 0x0133, 0x0137, 0x013a, 0x0148, + 0x014b, 0x0177, 0x00ff, 0x017a, 0x017e, 0x0073, 0x0253, 0x0183, 0x0185, 0x0254, 0x0188, 0x0256, 0x0257, + 0x018c, 0x01dd, 0x0259, 0x025b, 0x0192, 0x0260, 0x0263, 0x0269, 0x0268, 0x0199, 0x026f, 0x0272, 0x0275, + 0x01a1, 0x01a5, 0x0280, 0x01a8, 0x0283, 0x01ad, 0x0288, 0x01b0, 0x028a, 0x028b, 0x01b4, 0x01b6, 0x0292, + 0x01b9, 0x01bd, 0x01c6, 0x01c6, 0x01c9, 0x01c9, 0x01cc, 0x01cc, 0x01dc, 0x01df, 0x01ef, 0x01f3, 0x01f3, + 0x01f5, 0x0195, 0x01bf, 0x01f9, 0x021f, 0x019e, 0x0223, 0x0233, 0x2c65, 0x023c, 0x019a, 0x2c66, 0x0242, + 0x0180, 0x0289, 0x028c, 0x0247, 0x024f, 0x03b9, 0x0371, 0x0373, 0x0377, 0x03f3, 0x03ac, 0x03ad, 0x03af, + 0x03cc, 0x03cd, 0x03ce, 0x03b1, 0x03c1, 0x03c3, 0x03cb, 0x03c3, 0x03d7, 0x03b2, 0x03b8, 0x03c6, 0x03c0, + 0x03d9, 0x03ef, 0x03ba, 0x03c1, 0x03b8, 0x03b5, 0x03f8, 0x03f2, 0x03fb, 0x037b, 0x037d, 0x0450, 0x045f, + 0x0430, 0x044f, 0x0461, 0x0481, 0x048b, 0x04bf, 0x04cf, 0x04c2, 0x04ce, 0x04d1, 0x052f, 0x0561, 0x0586, + 0x2d00, 0x2d25, 0x2d27, 0x2d2d, 0x13f0, 0x13f5, 0x0432, 0x0434, 0x043e, 0x0441, 0x0442, 0x0442, 0x044a, + 0x0463, 0xa64b, 0x10d0, 0x10fa, 0x10fd, 0x10ff, 0x1e01, 0x1e95, 0x1e61, 0x1ea1, 0x1eff, 0x1f00, 0x1f07, + 0x1f10, 0x1f15, 0x1f20, 0x1f27, 0x1f30, 0x1f37, 0x1f40, 0x1f45, 0x1f51, 0x1f53, 0x1f55, 0x1f57, 0x1f60, + 0x1f67, 0x1fb0, 0x1fb1, 0x1f70, 0x1f71, 0x03b9, 0x1f72, 0x1f75, 0x1fd0, 0x1fd1, 0x1f76, 0x1f77, 0x1fe0, + 0x1fe1, 0x1f7a, 0x1f7b, 0x1fe5, 0x1f78, 0x1f79, 0x1f7c, 0x1f7d, 0x03c9, 0x006b, 0x00e5, 0x214e, 0x2170, + 0x217f, 0x2184, 0x24d0, 0x24e9, 0x2c30, 0x2c5e, 0x2c61, 0x026b, 0x1d7d, 0x027d, 0x2c68, 0x2c6c, 0x0251, + 0x0271, 0x0250, 0x0252, 0x2c73, 0x2c76, 0x023f, 0x0240, 0x2c81, 0x2ce3, 0x2cec, 0x2cee, 0x2cf3, 0xa641, + 0xa66d, 0xa681, 0xa69b, 0xa723, 0xa72f, 0xa733, 0xa76f, 0xa77a, 0xa77c, 0x1d79, 0xa77f, 0xa787, 0xa78c, + 0x0265, 0xa791, 0xa793, 0xa797, 0xa7a9, 0x0266, 0x025c, 0x0261, 0x026c, 0x026a, 0x029e, 0x0287, 0x029d, + 0xab53, 0xa7b5, 0xa7bf, 0xa7c3, 0xa794, 0x0282, 0x1d8e, 0xa7c8, 0xa7ca, 0xa7f6, 0x13a0, 0x13ef, 0xff41, + 0xff5a, 0x10428, 0x1044f, 0x104d8, 0x104fb, 0x10cc0, 0x10cf2, 0x118c0, 0x118df, 0x16e60, 0x16e7f, 0x1e922, + 0x1e943 + }; + static const unsigned FOLD_MAP_2[] = { + S(0x00df), S(0x0130), S(0x0149), S(0x01f0), S(0x0587), S(0x1e96), S(0x1e97), S(0x1e98), S(0x1e99), + S(0x1e9a), S(0x1e9e), S(0x1f50), R(0x1f80,0x1f87), R(0x1f88,0x1f8f), R(0x1f90,0x1f97), R(0x1f98,0x1f9f), + R(0x1fa0,0x1fa7), R(0x1fa8,0x1faf), S(0x1fb2), S(0x1fb3), S(0x1fb4), S(0x1fb6), S(0x1fbc), S(0x1fc2), + S(0x1fc3), S(0x1fc4), S(0x1fc6), S(0x1fcc), S(0x1fd6), S(0x1fe4), S(0x1fe6), S(0x1ff2), S(0x1ff3), + S(0x1ff4), S(0x1ff6), S(0x1ffc), S(0xfb00), S(0xfb01), S(0xfb02), S(0xfb05), S(0xfb06), S(0xfb13), + S(0xfb14), S(0xfb15), S(0xfb16), S(0xfb17) + }; + static const unsigned FOLD_MAP_2_DATA[] = { + 0x0073,0x0073, 0x0069,0x0307, 0x02bc,0x006e, 0x006a,0x030c, 0x0565,0x0582, 0x0068,0x0331, 0x0074,0x0308, + 0x0077,0x030a, 0x0079,0x030a, 0x0061,0x02be, 0x0073,0x0073, 0x03c5,0x0313, 0x1f00,0x03b9, 0x1f07,0x03b9, + 0x1f00,0x03b9, 0x1f07,0x03b9, 0x1f20,0x03b9, 0x1f27,0x03b9, 0x1f20,0x03b9, 0x1f27,0x03b9, 0x1f60,0x03b9, + 0x1f67,0x03b9, 0x1f60,0x03b9, 0x1f67,0x03b9, 0x1f70,0x03b9, 0x03b1,0x03b9, 0x03ac,0x03b9, 0x03b1,0x0342, + 0x03b1,0x03b9, 0x1f74,0x03b9, 0x03b7,0x03b9, 0x03ae,0x03b9, 0x03b7,0x0342, 0x03b7,0x03b9, 0x03b9,0x0342, + 0x03c1,0x0313, 0x03c5,0x0342, 0x1f7c,0x03b9, 0x03c9,0x03b9, 0x03ce,0x03b9, 0x03c9,0x0342, 0x03c9,0x03b9, + 0x0066,0x0066, 0x0066,0x0069, 0x0066,0x006c, 0x0073,0x0074, 0x0073,0x0074, 0x0574,0x0576, 0x0574,0x0565, + 0x0574,0x056b, 0x057e,0x0576, 0x0574,0x056d + }; + static const unsigned FOLD_MAP_3[] = { + S(0x0390), S(0x03b0), S(0x1f52), S(0x1f54), S(0x1f56), S(0x1fb7), S(0x1fc7), S(0x1fd2), S(0x1fd3), + S(0x1fd7), S(0x1fe2), S(0x1fe3), S(0x1fe7), S(0x1ff7), S(0xfb03), S(0xfb04) + }; + static const unsigned FOLD_MAP_3_DATA[] = { + 0x03b9,0x0308,0x0301, 0x03c5,0x0308,0x0301, 0x03c5,0x0313,0x0300, 0x03c5,0x0313,0x0301, + 0x03c5,0x0313,0x0342, 0x03b1,0x0342,0x03b9, 0x03b7,0x0342,0x03b9, 0x03b9,0x0308,0x0300, + 0x03b9,0x0308,0x0301, 0x03b9,0x0308,0x0342, 0x03c5,0x0308,0x0300, 0x03c5,0x0308,0x0301, + 0x03c5,0x0308,0x0342, 0x03c9,0x0342,0x03b9, 0x0066,0x0066,0x0069, 0x0066,0x0066,0x006c + }; +#undef R +#undef S + static const struct { + const unsigned* map; + const unsigned* data; + size_t map_size; + unsigned n_codepoints; + } FOLD_MAP_LIST[] = { + { FOLD_MAP_1, FOLD_MAP_1_DATA, SIZEOF_ARRAY(FOLD_MAP_1), 1 }, + { FOLD_MAP_2, FOLD_MAP_2_DATA, SIZEOF_ARRAY(FOLD_MAP_2), 2 }, + { FOLD_MAP_3, FOLD_MAP_3_DATA, SIZEOF_ARRAY(FOLD_MAP_3), 3 } + }; + + int i; + + /* Fast path for ASCII characters. */ + if(codepoint <= 0x7f) { + info->codepoints[0] = codepoint; + if(ISUPPER_(codepoint)) + info->codepoints[0] += 'a' - 'A'; + info->n_codepoints = 1; + return; + } + + /* Try to locate the codepoint in any of the maps. */ + for(i = 0; i < (int) SIZEOF_ARRAY(FOLD_MAP_LIST); i++) { + int index; + + index = md_unicode_bsearch__(codepoint, FOLD_MAP_LIST[i].map, FOLD_MAP_LIST[i].map_size); + if(index >= 0) { + /* Found the mapping. */ + unsigned n_codepoints = FOLD_MAP_LIST[i].n_codepoints; + const unsigned* map = FOLD_MAP_LIST[i].map; + const unsigned* codepoints = FOLD_MAP_LIST[i].data + (index * n_codepoints); + + memcpy(info->codepoints, codepoints, sizeof(unsigned) * n_codepoints); + info->n_codepoints = n_codepoints; + + if(FOLD_MAP_LIST[i].map[index] != codepoint) { + /* The found mapping maps whole range of codepoints, + * i.e. we have to offset info->codepoints[0] accordingly. */ + if((map[index] & 0x00ffffff)+1 == codepoints[0]) { + /* Alternating type of the range. */ + info->codepoints[0] = codepoint + ((codepoint & 0x1) == (map[index] & 0x1) ? 1 : 0); + } else { + /* Range to range kind of mapping. */ + info->codepoints[0] += (codepoint - (map[index] & 0x00ffffff)); + } + } + + return; + } + } + + /* No mapping found. Map the codepoint to itself. */ + info->codepoints[0] = codepoint; + info->n_codepoints = 1; + } +#endif + + +#if defined MD4C_USE_UTF16 + #define IS_UTF16_SURROGATE_HI(word) (((WORD)(word) & 0xfc00) == 0xd800) + #define IS_UTF16_SURROGATE_LO(word) (((WORD)(word) & 0xfc00) == 0xdc00) + #define UTF16_DECODE_SURROGATE(hi, lo) (0x10000 + ((((unsigned)(hi) & 0x3ff) << 10) | (((unsigned)(lo) & 0x3ff) << 0))) + + static unsigned + md_decode_utf16le__(const CHAR* str, SZ str_size, SZ* p_size) + { + if(IS_UTF16_SURROGATE_HI(str[0])) { + if(1 < str_size && IS_UTF16_SURROGATE_LO(str[1])) { + if(p_size != NULL) + *p_size = 2; + return UTF16_DECODE_SURROGATE(str[0], str[1]); + } + } + + if(p_size != NULL) + *p_size = 1; + return str[0]; + } + + static unsigned + md_decode_utf16le_before__(MD_CTX* ctx, OFF off) + { + if(off > 2 && IS_UTF16_SURROGATE_HI(CH(off-2)) && IS_UTF16_SURROGATE_LO(CH(off-1))) + return UTF16_DECODE_SURROGATE(CH(off-2), CH(off-1)); + + return CH(off); + } + + /* No whitespace uses surrogates, so no decoding needed here. */ + #define ISUNICODEWHITESPACE_(codepoint) md_is_unicode_whitespace__(codepoint) + #define ISUNICODEWHITESPACE(off) md_is_unicode_whitespace__(CH(off)) + #define ISUNICODEWHITESPACEBEFORE(off) md_is_unicode_whitespace__(CH((off)-1)) + + #define ISUNICODEPUNCT(off) md_is_unicode_punct__(md_decode_utf16le__(STR(off), ctx->size - (off), NULL)) + #define ISUNICODEPUNCTBEFORE(off) md_is_unicode_punct__(md_decode_utf16le_before__(ctx, off)) + + static inline int + md_decode_unicode(const CHAR* str, OFF off, SZ str_size, SZ* p_char_size) + { + return md_decode_utf16le__(str+off, str_size-off, p_char_size); + } +#elif defined MD4C_USE_UTF8 + #define IS_UTF8_LEAD1(byte) ((unsigned char)(byte) <= 0x7f) + #define IS_UTF8_LEAD2(byte) (((unsigned char)(byte) & 0xe0) == 0xc0) + #define IS_UTF8_LEAD3(byte) (((unsigned char)(byte) & 0xf0) == 0xe0) + #define IS_UTF8_LEAD4(byte) (((unsigned char)(byte) & 0xf8) == 0xf0) + #define IS_UTF8_TAIL(byte) (((unsigned char)(byte) & 0xc0) == 0x80) + + static unsigned + md_decode_utf8__(const CHAR* str, SZ str_size, SZ* p_size) + { + if(!IS_UTF8_LEAD1(str[0])) { + if(IS_UTF8_LEAD2(str[0])) { + if(1 < str_size && IS_UTF8_TAIL(str[1])) { + if(p_size != NULL) + *p_size = 2; + + return (((unsigned int)str[0] & 0x1f) << 6) | + (((unsigned int)str[1] & 0x3f) << 0); + } + } else if(IS_UTF8_LEAD3(str[0])) { + if(2 < str_size && IS_UTF8_TAIL(str[1]) && IS_UTF8_TAIL(str[2])) { + if(p_size != NULL) + *p_size = 3; + + return (((unsigned int)str[0] & 0x0f) << 12) | + (((unsigned int)str[1] & 0x3f) << 6) | + (((unsigned int)str[2] & 0x3f) << 0); + } + } else if(IS_UTF8_LEAD4(str[0])) { + if(3 < str_size && IS_UTF8_TAIL(str[1]) && IS_UTF8_TAIL(str[2]) && IS_UTF8_TAIL(str[3])) { + if(p_size != NULL) + *p_size = 4; + + return (((unsigned int)str[0] & 0x07) << 18) | + (((unsigned int)str[1] & 0x3f) << 12) | + (((unsigned int)str[2] & 0x3f) << 6) | + (((unsigned int)str[3] & 0x3f) << 0); + } + } + } + + if(p_size != NULL) + *p_size = 1; + return (unsigned) str[0]; + } + + static unsigned + md_decode_utf8_before__(MD_CTX* ctx, OFF off) + { + if(!IS_UTF8_LEAD1(CH(off-1))) { + if(off > 1 && IS_UTF8_LEAD2(CH(off-2)) && IS_UTF8_TAIL(CH(off-1))) + return (((unsigned int)CH(off-2) & 0x1f) << 6) | + (((unsigned int)CH(off-1) & 0x3f) << 0); + + if(off > 2 && IS_UTF8_LEAD3(CH(off-3)) && IS_UTF8_TAIL(CH(off-2)) && IS_UTF8_TAIL(CH(off-1))) + return (((unsigned int)CH(off-3) & 0x0f) << 12) | + (((unsigned int)CH(off-2) & 0x3f) << 6) | + (((unsigned int)CH(off-1) & 0x3f) << 0); + + if(off > 3 && IS_UTF8_LEAD4(CH(off-4)) && IS_UTF8_TAIL(CH(off-3)) && IS_UTF8_TAIL(CH(off-2)) && IS_UTF8_TAIL(CH(off-1))) + return (((unsigned int)CH(off-4) & 0x07) << 18) | + (((unsigned int)CH(off-3) & 0x3f) << 12) | + (((unsigned int)CH(off-2) & 0x3f) << 6) | + (((unsigned int)CH(off-1) & 0x3f) << 0); + } + + return (unsigned) CH(off-1); + } + + #define ISUNICODEWHITESPACE_(codepoint) md_is_unicode_whitespace__(codepoint) + #define ISUNICODEWHITESPACE(off) md_is_unicode_whitespace__(md_decode_utf8__(STR(off), ctx->size - (off), NULL)) + #define ISUNICODEWHITESPACEBEFORE(off) md_is_unicode_whitespace__(md_decode_utf8_before__(ctx, off)) + + #define ISUNICODEPUNCT(off) md_is_unicode_punct__(md_decode_utf8__(STR(off), ctx->size - (off), NULL)) + #define ISUNICODEPUNCTBEFORE(off) md_is_unicode_punct__(md_decode_utf8_before__(ctx, off)) + + static inline unsigned + md_decode_unicode(const CHAR* str, OFF off, SZ str_size, SZ* p_char_size) + { + return md_decode_utf8__(str+off, str_size-off, p_char_size); + } +#else + #define ISUNICODEWHITESPACE_(codepoint) ISWHITESPACE_(codepoint) + #define ISUNICODEWHITESPACE(off) ISWHITESPACE(off) + #define ISUNICODEWHITESPACEBEFORE(off) ISWHITESPACE((off)-1) + + #define ISUNICODEPUNCT(off) ISPUNCT(off) + #define ISUNICODEPUNCTBEFORE(off) ISPUNCT((off)-1) + + static inline void + md_get_unicode_fold_info(unsigned codepoint, MD_UNICODE_FOLD_INFO* info) + { + info->codepoints[0] = codepoint; + if(ISUPPER_(codepoint)) + info->codepoints[0] += 'a' - 'A'; + info->n_codepoints = 1; + } + + static inline unsigned + md_decode_unicode(const CHAR* str, OFF off, SZ str_size, SZ* p_size) + { + *p_size = 1; + return (unsigned) str[off]; + } +#endif + + +/************************************* + *** Helper string manipulations *** + *************************************/ + +/* Fill buffer with copy of the string between 'beg' and 'end' but replace any + * line breaks with given replacement character. + * + * NOTE: Caller is responsible to make sure the buffer is large enough. + * (Given the output is always shorter then input, (end - beg) is good idea + * what the caller should allocate.) + */ +static void +md_merge_lines(MD_CTX* ctx, OFF beg, OFF end, const MD_LINE* lines, int n_lines, + CHAR line_break_replacement_char, CHAR* buffer, SZ* p_size) +{ + CHAR* ptr = buffer; + int line_index = 0; + OFF off = beg; + + MD_UNUSED(n_lines); + + while(1) { + const MD_LINE* line = &lines[line_index]; + OFF line_end = line->end; + if(end < line_end) + line_end = end; + + while(off < line_end) { + *ptr = CH(off); + ptr++; + off++; + } + + if(off >= end) { + *p_size = (MD_SIZE)(ptr - buffer); + return; + } + + *ptr = line_break_replacement_char; + ptr++; + + line_index++; + off = lines[line_index].beg; + } +} + +/* Wrapper of md_merge_lines() which allocates new buffer for the output string. + */ +static int +md_merge_lines_alloc(MD_CTX* ctx, OFF beg, OFF end, const MD_LINE* lines, int n_lines, + CHAR line_break_replacement_char, CHAR** p_str, SZ* p_size) +{ + CHAR* buffer; + + buffer = (CHAR*) malloc(sizeof(CHAR) * (end - beg)); + if(buffer == NULL) { + MD_LOG("malloc() failed."); + return -1; + } + + md_merge_lines(ctx, beg, end, lines, n_lines, + line_break_replacement_char, buffer, p_size); + + *p_str = buffer; + return 0; +} + +static OFF +md_skip_unicode_whitespace(const CHAR* label, OFF off, SZ size) +{ + SZ char_size; + unsigned codepoint; + + while(off < size) { + codepoint = md_decode_unicode(label, off, size, &char_size); + if(!ISUNICODEWHITESPACE_(codepoint) && !ISNEWLINE_(label[off])) + break; + off += char_size; + } + + return off; +} + + +/****************************** + *** Recognizing raw HTML *** + ******************************/ + +/* md_is_html_tag() may be called when processing inlines (inline raw HTML) + * or when breaking document to blocks (checking for start of HTML block type 7). + * + * When breaking document to blocks, we do not yet know line boundaries, but + * in that case the whole tag has to live on a single line. We distinguish this + * by n_lines == 0. + */ +static int +md_is_html_tag(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, OFF max_end, OFF* p_end) +{ + int attr_state; + OFF off = beg; + OFF line_end = (n_lines > 0) ? lines[0].end : ctx->size; + int i = 0; + + MD_ASSERT(CH(beg) == _T('<')); + + if(off + 1 >= line_end) + return FALSE; + off++; + + /* For parsing attributes, we need a little state automaton below. + * State -1: no attributes are allowed. + * State 0: attribute could follow after some whitespace. + * State 1: after a whitespace (attribute name may follow). + * State 2: after attribute name ('=' MAY follow). + * State 3: after '=' (value specification MUST follow). + * State 41: in middle of unquoted attribute value. + * State 42: in middle of single-quoted attribute value. + * State 43: in middle of double-quoted attribute value. + */ + attr_state = 0; + + if(CH(off) == _T('/')) { + /* Closer tag "</ ... >". No attributes may be present. */ + attr_state = -1; + off++; + } + + /* Tag name */ + if(off >= line_end || !ISALPHA(off)) + return FALSE; + off++; + while(off < line_end && (ISALNUM(off) || CH(off) == _T('-'))) + off++; + + /* (Optional) attributes (if not closer), (optional) '/' (if not closer) + * and final '>'. */ + while(1) { + while(off < line_end && !ISNEWLINE(off)) { + if(attr_state > 40) { + if(attr_state == 41 && (ISBLANK(off) || ISANYOF(off, _T("\"'=<>`")))) { + attr_state = 0; + off--; /* Put the char back for re-inspection in the new state. */ + } else if(attr_state == 42 && CH(off) == _T('\'')) { + attr_state = 0; + } else if(attr_state == 43 && CH(off) == _T('"')) { + attr_state = 0; + } + off++; + } else if(ISWHITESPACE(off)) { + if(attr_state == 0) + attr_state = 1; + off++; + } else if(attr_state <= 2 && CH(off) == _T('>')) { + /* End. */ + goto done; + } else if(attr_state <= 2 && CH(off) == _T('/') && off+1 < line_end && CH(off+1) == _T('>')) { + /* End with digraph '/>' */ + off++; + goto done; + } else if((attr_state == 1 || attr_state == 2) && (ISALPHA(off) || CH(off) == _T('_') || CH(off) == _T(':'))) { + off++; + /* Attribute name */ + while(off < line_end && (ISALNUM(off) || ISANYOF(off, _T("_.:-")))) + off++; + attr_state = 2; + } else if(attr_state == 2 && CH(off) == _T('=')) { + /* Attribute assignment sign */ + off++; + attr_state = 3; + } else if(attr_state == 3) { + /* Expecting start of attribute value. */ + if(CH(off) == _T('"')) + attr_state = 43; + else if(CH(off) == _T('\'')) + attr_state = 42; + else if(!ISANYOF(off, _T("\"'=<>`")) && !ISNEWLINE(off)) + attr_state = 41; + else + return FALSE; + off++; + } else { + /* Anything unexpected. */ + return FALSE; + } + } + + /* We have to be on a single line. See definition of start condition + * of HTML block, type 7. */ + if(n_lines == 0) + return FALSE; + + i++; + if(i >= n_lines) + return FALSE; + + off = lines[i].beg; + line_end = lines[i].end; + + if(attr_state == 0 || attr_state == 41) + attr_state = 1; + + if(off >= max_end) + return FALSE; + } + +done: + if(off >= max_end) + return FALSE; + + *p_end = off+1; + return TRUE; +} + +static int +md_scan_for_html_closer(MD_CTX* ctx, const MD_CHAR* str, MD_SIZE len, + const MD_LINE* lines, int n_lines, + OFF beg, OFF max_end, OFF* p_end, + OFF* p_scan_horizon) +{ + OFF off = beg; + int i = 0; + + if(off < *p_scan_horizon && *p_scan_horizon >= max_end - len) { + /* We have already scanned the range up to the max_end so we know + * there is nothing to see. */ + return FALSE; + } + + while(TRUE) { + while(off + len <= lines[i].end && off + len <= max_end) { + if(md_ascii_eq(STR(off), str, len)) { + /* Success. */ + *p_end = off + len; + return TRUE; + } + off++; + } + + i++; + if(off >= max_end || i >= n_lines) { + /* Failure. */ + *p_scan_horizon = off; + return FALSE; + } + + off = lines[i].beg; + } +} + +static int +md_is_html_comment(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg; + + MD_ASSERT(CH(beg) == _T('<')); + + if(off + 4 >= lines[0].end) + return FALSE; + if(CH(off+1) != _T('!') || CH(off+2) != _T('-') || CH(off+3) != _T('-')) + return FALSE; + off += 4; + + /* ">" and "->" must not follow the opening. */ + if(off < lines[0].end && CH(off) == _T('>')) + return FALSE; + if(off+1 < lines[0].end && CH(off) == _T('-') && CH(off+1) == _T('>')) + return FALSE; + + /* HTML comment must not contain "--", so we scan just for "--" instead + * of "-->" and verify manually that '>' follows. */ + if(md_scan_for_html_closer(ctx, _T("--"), 2, + lines, n_lines, off, max_end, p_end, &ctx->html_comment_horizon)) + { + if(*p_end < max_end && CH(*p_end) == _T('>')) { + *p_end = *p_end + 1; + return TRUE; + } + } + + return FALSE; +} + +static int +md_is_html_processing_instruction(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg; + + if(off + 2 >= lines[0].end) + return FALSE; + if(CH(off+1) != _T('?')) + return FALSE; + off += 2; + + return md_scan_for_html_closer(ctx, _T("?>"), 2, + lines, n_lines, off, max_end, p_end, &ctx->html_proc_instr_horizon); +} + +static int +md_is_html_declaration(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg; + + if(off + 2 >= lines[0].end) + return FALSE; + if(CH(off+1) != _T('!')) + return FALSE; + off += 2; + + /* Declaration name. */ + if(off >= lines[0].end || !ISALPHA(off)) + return FALSE; + off++; + while(off < lines[0].end && ISALPHA(off)) + off++; + if(off < lines[0].end && !ISWHITESPACE(off)) + return FALSE; + + return md_scan_for_html_closer(ctx, _T(">"), 1, + lines, n_lines, off, max_end, p_end, &ctx->html_decl_horizon); +} + +static int +md_is_html_cdata(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, OFF max_end, OFF* p_end) +{ + static const CHAR open_str[] = _T("<![CDATA["); + static const SZ open_size = SIZEOF_ARRAY(open_str) - 1; + + OFF off = beg; + + if(off + open_size >= lines[0].end) + return FALSE; + if(memcmp(STR(off), open_str, open_size) != 0) + return FALSE; + off += open_size; + + if(lines[n_lines-1].end < max_end) + max_end = lines[n_lines-1].end - 2; + + return md_scan_for_html_closer(ctx, _T("]]>"), 3, + lines, n_lines, off, max_end, p_end, &ctx->html_cdata_horizon); +} + +static int +md_is_html_any(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, OFF max_end, OFF* p_end) +{ + MD_ASSERT(CH(beg) == _T('<')); + return (md_is_html_tag(ctx, lines, n_lines, beg, max_end, p_end) || + md_is_html_comment(ctx, lines, n_lines, beg, max_end, p_end) || + md_is_html_processing_instruction(ctx, lines, n_lines, beg, max_end, p_end) || + md_is_html_declaration(ctx, lines, n_lines, beg, max_end, p_end) || + md_is_html_cdata(ctx, lines, n_lines, beg, max_end, p_end)); +} + + +/**************************** + *** Recognizing Entity *** + ****************************/ + +static int +md_is_hex_entity_contents(MD_CTX* ctx, const CHAR* text, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg; + MD_UNUSED(ctx); + + while(off < max_end && ISXDIGIT_(text[off]) && off - beg <= 8) + off++; + + if(1 <= off - beg && off - beg <= 6) { + *p_end = off; + return TRUE; + } else { + return FALSE; + } +} + +static int +md_is_dec_entity_contents(MD_CTX* ctx, const CHAR* text, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg; + MD_UNUSED(ctx); + + while(off < max_end && ISDIGIT_(text[off]) && off - beg <= 8) + off++; + + if(1 <= off - beg && off - beg <= 7) { + *p_end = off; + return TRUE; + } else { + return FALSE; + } +} + +static int +md_is_named_entity_contents(MD_CTX* ctx, const CHAR* text, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg; + MD_UNUSED(ctx); + + if(off < max_end && ISALPHA_(text[off])) + off++; + else + return FALSE; + + while(off < max_end && ISALNUM_(text[off]) && off - beg <= 48) + off++; + + if(2 <= off - beg && off - beg <= 48) { + *p_end = off; + return TRUE; + } else { + return FALSE; + } +} + +static int +md_is_entity_str(MD_CTX* ctx, const CHAR* text, OFF beg, OFF max_end, OFF* p_end) +{ + int is_contents; + OFF off = beg; + + MD_ASSERT(text[off] == _T('&')); + off++; + + if(off+2 < max_end && text[off] == _T('#') && (text[off+1] == _T('x') || text[off+1] == _T('X'))) + is_contents = md_is_hex_entity_contents(ctx, text, off+2, max_end, &off); + else if(off+1 < max_end && text[off] == _T('#')) + is_contents = md_is_dec_entity_contents(ctx, text, off+1, max_end, &off); + else + is_contents = md_is_named_entity_contents(ctx, text, off, max_end, &off); + + if(is_contents && off < max_end && text[off] == _T(';')) { + *p_end = off+1; + return TRUE; + } else { + return FALSE; + } +} + +static inline int +md_is_entity(MD_CTX* ctx, OFF beg, OFF max_end, OFF* p_end) +{ + return md_is_entity_str(ctx, ctx->text, beg, max_end, p_end); +} + + +/****************************** + *** Attribute Management *** + ******************************/ + +typedef struct MD_ATTRIBUTE_BUILD_tag MD_ATTRIBUTE_BUILD; +struct MD_ATTRIBUTE_BUILD_tag { + CHAR* text; + MD_TEXTTYPE* substr_types; + OFF* substr_offsets; + int substr_count; + int substr_alloc; + MD_TEXTTYPE trivial_types[1]; + OFF trivial_offsets[2]; +}; + + +#define MD_BUILD_ATTR_NO_ESCAPES 0x0001 + +static int +md_build_attr_append_substr(MD_CTX* ctx, MD_ATTRIBUTE_BUILD* build, + MD_TEXTTYPE type, OFF off) +{ + if(build->substr_count >= build->substr_alloc) { + MD_TEXTTYPE* new_substr_types; + OFF* new_substr_offsets; + + build->substr_alloc = (build->substr_alloc > 0 + ? build->substr_alloc + build->substr_alloc / 2 + : 8); + new_substr_types = (MD_TEXTTYPE*) realloc(build->substr_types, + build->substr_alloc * sizeof(MD_TEXTTYPE)); + if(new_substr_types == NULL) { + MD_LOG("realloc() failed."); + return -1; + } + /* Note +1 to reserve space for final offset (== raw_size). */ + new_substr_offsets = (OFF*) realloc(build->substr_offsets, + (build->substr_alloc+1) * sizeof(OFF)); + if(new_substr_offsets == NULL) { + MD_LOG("realloc() failed."); + free(new_substr_types); + return -1; + } + + build->substr_types = new_substr_types; + build->substr_offsets = new_substr_offsets; + } + + build->substr_types[build->substr_count] = type; + build->substr_offsets[build->substr_count] = off; + build->substr_count++; + return 0; +} + +static void +md_free_attribute(MD_CTX* ctx, MD_ATTRIBUTE_BUILD* build) +{ + MD_UNUSED(ctx); + + if(build->substr_alloc > 0) { + free(build->text); + free(build->substr_types); + free(build->substr_offsets); + } +} + +static int +md_build_attribute(MD_CTX* ctx, const CHAR* raw_text, SZ raw_size, + unsigned flags, MD_ATTRIBUTE* attr, MD_ATTRIBUTE_BUILD* build) +{ + OFF raw_off, off; + int is_trivial; + int ret = 0; + + memset(build, 0, sizeof(MD_ATTRIBUTE_BUILD)); + + /* If there is no backslash and no ampersand, build trivial attribute + * without any malloc(). */ + is_trivial = TRUE; + for(raw_off = 0; raw_off < raw_size; raw_off++) { + if(ISANYOF3_(raw_text[raw_off], _T('\\'), _T('&'), _T('\0'))) { + is_trivial = FALSE; + break; + } + } + + if(is_trivial) { + build->text = (CHAR*) (raw_size ? raw_text : NULL); + build->substr_types = build->trivial_types; + build->substr_offsets = build->trivial_offsets; + build->substr_count = 1; + build->substr_alloc = 0; + build->trivial_types[0] = MD_TEXT_NORMAL; + build->trivial_offsets[0] = 0; + build->trivial_offsets[1] = raw_size; + off = raw_size; + } else { + build->text = (CHAR*) malloc(raw_size * sizeof(CHAR)); + if(build->text == NULL) { + MD_LOG("malloc() failed."); + goto abort; + } + + raw_off = 0; + off = 0; + + while(raw_off < raw_size) { + if(raw_text[raw_off] == _T('\0')) { + MD_CHECK(md_build_attr_append_substr(ctx, build, MD_TEXT_NULLCHAR, off)); + memcpy(build->text + off, raw_text + raw_off, 1); + off++; + raw_off++; + continue; + } + + if(raw_text[raw_off] == _T('&')) { + OFF ent_end; + + if(md_is_entity_str(ctx, raw_text, raw_off, raw_size, &ent_end)) { + MD_CHECK(md_build_attr_append_substr(ctx, build, MD_TEXT_ENTITY, off)); + memcpy(build->text + off, raw_text + raw_off, ent_end - raw_off); + off += ent_end - raw_off; + raw_off = ent_end; + continue; + } + } + + if(build->substr_count == 0 || build->substr_types[build->substr_count-1] != MD_TEXT_NORMAL) + MD_CHECK(md_build_attr_append_substr(ctx, build, MD_TEXT_NORMAL, off)); + + if(!(flags & MD_BUILD_ATTR_NO_ESCAPES) && + raw_text[raw_off] == _T('\\') && raw_off+1 < raw_size && + (ISPUNCT_(raw_text[raw_off+1]) || ISNEWLINE_(raw_text[raw_off+1]))) + raw_off++; + + build->text[off++] = raw_text[raw_off++]; + } + build->substr_offsets[build->substr_count] = off; + } + + attr->text = build->text; + attr->size = off; + attr->substr_offsets = build->substr_offsets; + attr->substr_types = build->substr_types; + return 0; + +abort: + md_free_attribute(ctx, build); + return -1; +} + + +/********************************************* + *** Dictionary of Reference Definitions *** + *********************************************/ + +#define MD_FNV1A_BASE 2166136261U +#define MD_FNV1A_PRIME 16777619U + +static inline unsigned +md_fnv1a(unsigned base, const void* data, size_t n) +{ + const unsigned char* buf = (const unsigned char*) data; + unsigned hash = base; + size_t i; + + for(i = 0; i < n; i++) { + hash ^= buf[i]; + hash *= MD_FNV1A_PRIME; + } + + return hash; +} + + +struct MD_REF_DEF_tag { + CHAR* label; + CHAR* title; + unsigned hash; + SZ label_size; + SZ title_size; + OFF dest_beg; + OFF dest_end; + unsigned char label_needs_free : 1; + unsigned char title_needs_free : 1; +}; + +/* Label equivalence is quite complicated with regards to whitespace and case + * folding. This complicates computing a hash of it as well as direct comparison + * of two labels. */ + +static unsigned +md_link_label_hash(const CHAR* label, SZ size) +{ + unsigned hash = MD_FNV1A_BASE; + OFF off; + unsigned codepoint; + int is_whitespace = FALSE; + + off = md_skip_unicode_whitespace(label, 0, size); + while(off < size) { + SZ char_size; + + codepoint = md_decode_unicode(label, off, size, &char_size); + is_whitespace = ISUNICODEWHITESPACE_(codepoint) || ISNEWLINE_(label[off]); + + if(is_whitespace) { + codepoint = ' '; + hash = md_fnv1a(hash, &codepoint, sizeof(unsigned)); + off = md_skip_unicode_whitespace(label, off, size); + } else { + MD_UNICODE_FOLD_INFO fold_info; + + md_get_unicode_fold_info(codepoint, &fold_info); + hash = md_fnv1a(hash, fold_info.codepoints, fold_info.n_codepoints * sizeof(unsigned)); + off += char_size; + } + } + + return hash; +} + +static OFF +md_link_label_cmp_load_fold_info(const CHAR* label, OFF off, SZ size, + MD_UNICODE_FOLD_INFO* fold_info) +{ + unsigned codepoint; + SZ char_size; + + if(off >= size) { + /* Treat end of a link label as a whitespace. */ + goto whitespace; + } + + codepoint = md_decode_unicode(label, off, size, &char_size); + off += char_size; + if(ISUNICODEWHITESPACE_(codepoint)) { + /* Treat all whitespace as equivalent */ + goto whitespace; + } + + /* Get real folding info. */ + md_get_unicode_fold_info(codepoint, fold_info); + return off; + +whitespace: + fold_info->codepoints[0] = _T(' '); + fold_info->n_codepoints = 1; + return md_skip_unicode_whitespace(label, off, size); +} + +static int +md_link_label_cmp(const CHAR* a_label, SZ a_size, const CHAR* b_label, SZ b_size) +{ + OFF a_off; + OFF b_off; + MD_UNICODE_FOLD_INFO a_fi = { { 0 }, 0 }; + MD_UNICODE_FOLD_INFO b_fi = { { 0 }, 0 }; + OFF a_fi_off = 0; + OFF b_fi_off = 0; + int cmp; + + a_off = md_skip_unicode_whitespace(a_label, 0, a_size); + b_off = md_skip_unicode_whitespace(b_label, 0, b_size); + while(a_off < a_size || a_fi_off < a_fi.n_codepoints || + b_off < b_size || b_fi_off < b_fi.n_codepoints) + { + /* If needed, load fold info for next char. */ + if(a_fi_off >= a_fi.n_codepoints) { + a_fi_off = 0; + a_off = md_link_label_cmp_load_fold_info(a_label, a_off, a_size, &a_fi); + } + if(b_fi_off >= b_fi.n_codepoints) { + b_fi_off = 0; + b_off = md_link_label_cmp_load_fold_info(b_label, b_off, b_size, &b_fi); + } + + cmp = b_fi.codepoints[b_fi_off] - a_fi.codepoints[a_fi_off]; + if(cmp != 0) + return cmp; + + a_fi_off++; + b_fi_off++; + } + + return 0; +} + +typedef struct MD_REF_DEF_LIST_tag MD_REF_DEF_LIST; +struct MD_REF_DEF_LIST_tag { + int n_ref_defs; + int alloc_ref_defs; + MD_REF_DEF* ref_defs[]; /* Valid items always point into ctx->ref_defs[] */ +}; + +static int +md_ref_def_cmp(const void* a, const void* b) +{ + const MD_REF_DEF* a_ref = *(const MD_REF_DEF**)a; + const MD_REF_DEF* b_ref = *(const MD_REF_DEF**)b; + + if(a_ref->hash < b_ref->hash) + return -1; + else if(a_ref->hash > b_ref->hash) + return +1; + else + return md_link_label_cmp(a_ref->label, a_ref->label_size, b_ref->label, b_ref->label_size); +} + +static int +md_ref_def_cmp_for_sort(const void* a, const void* b) +{ + int cmp; + + cmp = md_ref_def_cmp(a, b); + + /* Ensure stability of the sorting. */ + if(cmp == 0) { + const MD_REF_DEF* a_ref = *(const MD_REF_DEF**)a; + const MD_REF_DEF* b_ref = *(const MD_REF_DEF**)b; + + if(a_ref < b_ref) + cmp = -1; + else if(a_ref > b_ref) + cmp = +1; + else + cmp = 0; + } + + return cmp; +} + +static int +md_build_ref_def_hashtable(MD_CTX* ctx) +{ + int i, j; + + if(ctx->n_ref_defs == 0) + return 0; + + ctx->ref_def_hashtable_size = (ctx->n_ref_defs * 5) / 4; + ctx->ref_def_hashtable = malloc(ctx->ref_def_hashtable_size * sizeof(void*)); + if(ctx->ref_def_hashtable == NULL) { + MD_LOG("malloc() failed."); + goto abort; + } + memset(ctx->ref_def_hashtable, 0, ctx->ref_def_hashtable_size * sizeof(void*)); + + /* Each member of ctx->ref_def_hashtable[] can be: + * -- NULL, + * -- pointer to the MD_REF_DEF in ctx->ref_defs[], or + * -- pointer to a MD_REF_DEF_LIST, which holds multiple pointers to + * such MD_REF_DEFs. + */ + for(i = 0; i < ctx->n_ref_defs; i++) { + MD_REF_DEF* def = &ctx->ref_defs[i]; + void* bucket; + MD_REF_DEF_LIST* list; + + def->hash = md_link_label_hash(def->label, def->label_size); + bucket = ctx->ref_def_hashtable[def->hash % ctx->ref_def_hashtable_size]; + + if(bucket == NULL) { + /* The bucket is empty. Make it just point to the def. */ + ctx->ref_def_hashtable[def->hash % ctx->ref_def_hashtable_size] = def; + continue; + } + + if(ctx->ref_defs <= (MD_REF_DEF*) bucket && (MD_REF_DEF*) bucket < ctx->ref_defs + ctx->n_ref_defs) { + /* The bucket already contains one ref. def. Lets see whether it + * is the same label (ref. def. duplicate) or different one + * (hash conflict). */ + MD_REF_DEF* old_def = (MD_REF_DEF*) bucket; + + if(md_link_label_cmp(def->label, def->label_size, old_def->label, old_def->label_size) == 0) { + /* Duplicate label: Ignore this ref. def. */ + continue; + } + + /* Make the bucket complex, i.e. able to hold more ref. defs. */ + list = (MD_REF_DEF_LIST*) malloc(sizeof(MD_REF_DEF_LIST) + 2 * sizeof(MD_REF_DEF*)); + if(list == NULL) { + MD_LOG("malloc() failed."); + goto abort; + } + list->ref_defs[0] = old_def; + list->ref_defs[1] = def; + list->n_ref_defs = 2; + list->alloc_ref_defs = 2; + ctx->ref_def_hashtable[def->hash % ctx->ref_def_hashtable_size] = list; + continue; + } + + /* Append the def to the complex bucket list. + * + * Note in this case we ignore potential duplicates to avoid expensive + * iterating over the complex bucket. Below, we revisit all the complex + * buckets and handle it more cheaply after the complex bucket contents + * is sorted. */ + list = (MD_REF_DEF_LIST*) bucket; + if(list->n_ref_defs >= list->alloc_ref_defs) { + int alloc_ref_defs = list->alloc_ref_defs + list->alloc_ref_defs / 2; + MD_REF_DEF_LIST* list_tmp = (MD_REF_DEF_LIST*) realloc(list, + sizeof(MD_REF_DEF_LIST) + alloc_ref_defs * sizeof(MD_REF_DEF*)); + if(list_tmp == NULL) { + MD_LOG("realloc() failed."); + goto abort; + } + list = list_tmp; + list->alloc_ref_defs = alloc_ref_defs; + ctx->ref_def_hashtable[def->hash % ctx->ref_def_hashtable_size] = list; + } + + list->ref_defs[list->n_ref_defs] = def; + list->n_ref_defs++; + } + + /* Sort the complex buckets so we can use bsearch() with them. */ + for(i = 0; i < ctx->ref_def_hashtable_size; i++) { + void* bucket = ctx->ref_def_hashtable[i]; + MD_REF_DEF_LIST* list; + + if(bucket == NULL) + continue; + if(ctx->ref_defs <= (MD_REF_DEF*) bucket && (MD_REF_DEF*) bucket < ctx->ref_defs + ctx->n_ref_defs) + continue; + + list = (MD_REF_DEF_LIST*) bucket; + qsort(list->ref_defs, list->n_ref_defs, sizeof(MD_REF_DEF*), md_ref_def_cmp_for_sort); + + /* Disable all duplicates in the complex bucket by forcing all such + * records to point to the 1st such ref. def. I.e. no matter which + * record is found during the lookup, it will always point to the right + * ref. def. in ctx->ref_defs[]. */ + for(j = 1; j < list->n_ref_defs; j++) { + if(md_ref_def_cmp(&list->ref_defs[j-1], &list->ref_defs[j]) == 0) + list->ref_defs[j] = list->ref_defs[j-1]; + } + } + + return 0; + +abort: + return -1; +} + +static void +md_free_ref_def_hashtable(MD_CTX* ctx) +{ + if(ctx->ref_def_hashtable != NULL) { + int i; + + for(i = 0; i < ctx->ref_def_hashtable_size; i++) { + void* bucket = ctx->ref_def_hashtable[i]; + if(bucket == NULL) + continue; + if(ctx->ref_defs <= (MD_REF_DEF*) bucket && (MD_REF_DEF*) bucket < ctx->ref_defs + ctx->n_ref_defs) + continue; + free(bucket); + } + + free(ctx->ref_def_hashtable); + } +} + +static const MD_REF_DEF* +md_lookup_ref_def(MD_CTX* ctx, const CHAR* label, SZ label_size) +{ + unsigned hash; + void* bucket; + + if(ctx->ref_def_hashtable_size == 0) + return NULL; + + hash = md_link_label_hash(label, label_size); + bucket = ctx->ref_def_hashtable[hash % ctx->ref_def_hashtable_size]; + + if(bucket == NULL) { + return NULL; + } else if(ctx->ref_defs <= (MD_REF_DEF*) bucket && (MD_REF_DEF*) bucket < ctx->ref_defs + ctx->n_ref_defs) { + const MD_REF_DEF* def = (MD_REF_DEF*) bucket; + + if(md_link_label_cmp(def->label, def->label_size, label, label_size) == 0) + return def; + else + return NULL; + } else { + MD_REF_DEF_LIST* list = (MD_REF_DEF_LIST*) bucket; + MD_REF_DEF key_buf; + const MD_REF_DEF* key = &key_buf; + const MD_REF_DEF** ret; + + key_buf.label = (CHAR*) label; + key_buf.label_size = label_size; + key_buf.hash = md_link_label_hash(key_buf.label, key_buf.label_size); + + ret = (const MD_REF_DEF**) bsearch(&key, list->ref_defs, + list->n_ref_defs, sizeof(MD_REF_DEF*), md_ref_def_cmp); + if(ret != NULL) + return *ret; + else + return NULL; + } +} + + +/*************************** + *** Recognizing Links *** + ***************************/ + +/* Note this code is partially shared between processing inlines and blocks + * as reference definitions and links share some helper parser functions. + */ + +typedef struct MD_LINK_ATTR_tag MD_LINK_ATTR; +struct MD_LINK_ATTR_tag { + OFF dest_beg; + OFF dest_end; + + CHAR* title; + SZ title_size; + int title_needs_free; +}; + + +static int +md_is_link_label(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, + OFF* p_end, int* p_beg_line_index, int* p_end_line_index, + OFF* p_contents_beg, OFF* p_contents_end) +{ + OFF off = beg; + OFF contents_beg = 0; + OFF contents_end = 0; + int line_index = 0; + int len = 0; + + if(CH(off) != _T('[')) + return FALSE; + off++; + + while(1) { + OFF line_end = lines[line_index].end; + + while(off < line_end) { + if(CH(off) == _T('\\') && off+1 < ctx->size && (ISPUNCT(off+1) || ISNEWLINE(off+1))) { + if(contents_end == 0) { + contents_beg = off; + *p_beg_line_index = line_index; + } + contents_end = off + 2; + off += 2; + } else if(CH(off) == _T('[')) { + return FALSE; + } else if(CH(off) == _T(']')) { + if(contents_beg < contents_end) { + /* Success. */ + *p_contents_beg = contents_beg; + *p_contents_end = contents_end; + *p_end = off+1; + *p_end_line_index = line_index; + return TRUE; + } else { + /* Link label must have some non-whitespace contents. */ + return FALSE; + } + } else { + unsigned codepoint; + SZ char_size; + + codepoint = md_decode_unicode(ctx->text, off, ctx->size, &char_size); + if(!ISUNICODEWHITESPACE_(codepoint)) { + if(contents_end == 0) { + contents_beg = off; + *p_beg_line_index = line_index; + } + contents_end = off + char_size; + } + + off += char_size; + } + + len++; + if(len > 999) + return FALSE; + } + + line_index++; + len++; + if(line_index < n_lines) + off = lines[line_index].beg; + else + break; + } + + return FALSE; +} + +static int +md_is_link_destination_A(MD_CTX* ctx, OFF beg, OFF max_end, OFF* p_end, + OFF* p_contents_beg, OFF* p_contents_end) +{ + OFF off = beg; + + if(off >= max_end || CH(off) != _T('<')) + return FALSE; + off++; + + while(off < max_end) { + if(CH(off) == _T('\\') && off+1 < max_end && ISPUNCT(off+1)) { + off += 2; + continue; + } + + if(ISNEWLINE(off) || CH(off) == _T('<')) + return FALSE; + + if(CH(off) == _T('>')) { + /* Success. */ + *p_contents_beg = beg+1; + *p_contents_end = off; + *p_end = off+1; + return TRUE; + } + + off++; + } + + return FALSE; +} + +static int +md_is_link_destination_B(MD_CTX* ctx, OFF beg, OFF max_end, OFF* p_end, + OFF* p_contents_beg, OFF* p_contents_end) +{ + OFF off = beg; + int parenthesis_level = 0; + + while(off < max_end) { + if(CH(off) == _T('\\') && off+1 < max_end && ISPUNCT(off+1)) { + off += 2; + continue; + } + + if(ISWHITESPACE(off) || ISCNTRL(off)) + break; + + /* Link destination may include balanced pairs of unescaped '(' ')'. + * Note we limit the maximal nesting level by 32 to protect us from + * https://github.com/jgm/cmark/issues/214 */ + if(CH(off) == _T('(')) { + parenthesis_level++; + if(parenthesis_level > 32) + return FALSE; + } else if(CH(off) == _T(')')) { + if(parenthesis_level == 0) + break; + parenthesis_level--; + } + + off++; + } + + if(parenthesis_level != 0 || off == beg) + return FALSE; + + /* Success. */ + *p_contents_beg = beg; + *p_contents_end = off; + *p_end = off; + return TRUE; +} + +static inline int +md_is_link_destination(MD_CTX* ctx, OFF beg, OFF max_end, OFF* p_end, + OFF* p_contents_beg, OFF* p_contents_end) +{ + if(CH(beg) == _T('<')) + return md_is_link_destination_A(ctx, beg, max_end, p_end, p_contents_beg, p_contents_end); + else + return md_is_link_destination_B(ctx, beg, max_end, p_end, p_contents_beg, p_contents_end); +} + +static int +md_is_link_title(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, + OFF* p_end, int* p_beg_line_index, int* p_end_line_index, + OFF* p_contents_beg, OFF* p_contents_end) +{ + OFF off = beg; + CHAR closer_char; + int line_index = 0; + + /* White space with up to one line break. */ + while(off < lines[line_index].end && ISWHITESPACE(off)) + off++; + if(off >= lines[line_index].end) { + line_index++; + if(line_index >= n_lines) + return FALSE; + off = lines[line_index].beg; + } + if(off == beg) + return FALSE; + + *p_beg_line_index = line_index; + + /* First char determines how to detect end of it. */ + switch(CH(off)) { + case _T('"'): closer_char = _T('"'); break; + case _T('\''): closer_char = _T('\''); break; + case _T('('): closer_char = _T(')'); break; + default: return FALSE; + } + off++; + + *p_contents_beg = off; + + while(line_index < n_lines) { + OFF line_end = lines[line_index].end; + + while(off < line_end) { + if(CH(off) == _T('\\') && off+1 < ctx->size && (ISPUNCT(off+1) || ISNEWLINE(off+1))) { + off++; + } else if(CH(off) == closer_char) { + /* Success. */ + *p_contents_end = off; + *p_end = off+1; + *p_end_line_index = line_index; + return TRUE; + } else if(closer_char == _T(')') && CH(off) == _T('(')) { + /* ()-style title cannot contain (unescaped '(')) */ + return FALSE; + } + + off++; + } + + line_index++; + } + + return FALSE; +} + +/* Returns 0 if it is not a reference definition. + * + * Returns N > 0 if it is a reference definition. N then corresponds to the + * number of lines forming it). In this case the definition is stored for + * resolving any links referring to it. + * + * Returns -1 in case of an error (out of memory). + */ +static int +md_is_link_reference_definition(MD_CTX* ctx, const MD_LINE* lines, int n_lines) +{ + OFF label_contents_beg; + OFF label_contents_end; + int label_contents_line_index = -1; + int label_is_multiline = FALSE; + OFF dest_contents_beg; + OFF dest_contents_end; + OFF title_contents_beg; + OFF title_contents_end; + int title_contents_line_index; + int title_is_multiline = FALSE; + OFF off; + int line_index = 0; + int tmp_line_index; + MD_REF_DEF* def = NULL; + int ret = 0; + + /* Link label. */ + if(!md_is_link_label(ctx, lines, n_lines, lines[0].beg, + &off, &label_contents_line_index, &line_index, + &label_contents_beg, &label_contents_end)) + return FALSE; + label_is_multiline = (label_contents_line_index != line_index); + + /* Colon. */ + if(off >= lines[line_index].end || CH(off) != _T(':')) + return FALSE; + off++; + + /* Optional white space with up to one line break. */ + while(off < lines[line_index].end && ISWHITESPACE(off)) + off++; + if(off >= lines[line_index].end) { + line_index++; + if(line_index >= n_lines) + return FALSE; + off = lines[line_index].beg; + } + + /* Link destination. */ + if(!md_is_link_destination(ctx, off, lines[line_index].end, + &off, &dest_contents_beg, &dest_contents_end)) + return FALSE; + + /* (Optional) title. Note we interpret it as an title only if nothing + * more follows on its last line. */ + if(md_is_link_title(ctx, lines + line_index, n_lines - line_index, off, + &off, &title_contents_line_index, &tmp_line_index, + &title_contents_beg, &title_contents_end) + && off >= lines[line_index + tmp_line_index].end) + { + title_is_multiline = (tmp_line_index != title_contents_line_index); + title_contents_line_index += line_index; + line_index += tmp_line_index; + } else { + /* Not a title. */ + title_is_multiline = FALSE; + title_contents_beg = off; + title_contents_end = off; + title_contents_line_index = 0; + } + + /* Nothing more can follow on the last line. */ + if(off < lines[line_index].end) + return FALSE; + + /* So, it _is_ a reference definition. Remember it. */ + if(ctx->n_ref_defs >= ctx->alloc_ref_defs) { + MD_REF_DEF* new_defs; + + ctx->alloc_ref_defs = (ctx->alloc_ref_defs > 0 + ? ctx->alloc_ref_defs + ctx->alloc_ref_defs / 2 + : 16); + new_defs = (MD_REF_DEF*) realloc(ctx->ref_defs, ctx->alloc_ref_defs * sizeof(MD_REF_DEF)); + if(new_defs == NULL) { + MD_LOG("realloc() failed."); + goto abort; + } + + ctx->ref_defs = new_defs; + } + def = &ctx->ref_defs[ctx->n_ref_defs]; + memset(def, 0, sizeof(MD_REF_DEF)); + + if(label_is_multiline) { + MD_CHECK(md_merge_lines_alloc(ctx, label_contents_beg, label_contents_end, + lines + label_contents_line_index, n_lines - label_contents_line_index, + _T(' '), &def->label, &def->label_size)); + def->label_needs_free = TRUE; + } else { + def->label = (CHAR*) STR(label_contents_beg); + def->label_size = label_contents_end - label_contents_beg; + } + + if(title_is_multiline) { + MD_CHECK(md_merge_lines_alloc(ctx, title_contents_beg, title_contents_end, + lines + title_contents_line_index, n_lines - title_contents_line_index, + _T('\n'), &def->title, &def->title_size)); + def->title_needs_free = TRUE; + } else { + def->title = (CHAR*) STR(title_contents_beg); + def->title_size = title_contents_end - title_contents_beg; + } + + def->dest_beg = dest_contents_beg; + def->dest_end = dest_contents_end; + + /* Success. */ + ctx->n_ref_defs++; + return line_index + 1; + +abort: + /* Failure. */ + if(def != NULL && def->label_needs_free) + free(def->label); + if(def != NULL && def->title_needs_free) + free(def->title); + return ret; +} + +static int +md_is_link_reference(MD_CTX* ctx, const MD_LINE* lines, int n_lines, + OFF beg, OFF end, MD_LINK_ATTR* attr) +{ + const MD_REF_DEF* def; + const MD_LINE* beg_line; + int is_multiline; + CHAR* label; + SZ label_size; + int ret; + + MD_ASSERT(CH(beg) == _T('[') || CH(beg) == _T('!')); + MD_ASSERT(CH(end-1) == _T(']')); + + beg += (CH(beg) == _T('!') ? 2 : 1); + end--; + + /* Find lines corresponding to the beg and end positions. */ + beg_line = md_lookup_line(beg, lines, n_lines); + is_multiline = (end > beg_line->end); + + if(is_multiline) { + MD_CHECK(md_merge_lines_alloc(ctx, beg, end, beg_line, + (int)(n_lines - (beg_line - lines)), _T(' '), &label, &label_size)); + } else { + label = (CHAR*) STR(beg); + label_size = end - beg; + } + + def = md_lookup_ref_def(ctx, label, label_size); + if(def != NULL) { + attr->dest_beg = def->dest_beg; + attr->dest_end = def->dest_end; + attr->title = def->title; + attr->title_size = def->title_size; + attr->title_needs_free = FALSE; + } + + if(is_multiline) + free(label); + + ret = (def != NULL); + +abort: + return ret; +} + +static int +md_is_inline_link_spec(MD_CTX* ctx, const MD_LINE* lines, int n_lines, + OFF beg, OFF* p_end, MD_LINK_ATTR* attr) +{ + int line_index = 0; + int tmp_line_index; + OFF title_contents_beg; + OFF title_contents_end; + int title_contents_line_index; + int title_is_multiline; + OFF off = beg; + int ret = FALSE; + + while(off >= lines[line_index].end) + line_index++; + + MD_ASSERT(CH(off) == _T('(')); + off++; + + /* Optional white space with up to one line break. */ + while(off < lines[line_index].end && ISWHITESPACE(off)) + off++; + if(off >= lines[line_index].end && (off >= ctx->size || ISNEWLINE(off))) { + line_index++; + if(line_index >= n_lines) + return FALSE; + off = lines[line_index].beg; + } + + /* Link destination may be omitted, but only when not also having a title. */ + if(off < ctx->size && CH(off) == _T(')')) { + attr->dest_beg = off; + attr->dest_end = off; + attr->title = NULL; + attr->title_size = 0; + attr->title_needs_free = FALSE; + off++; + *p_end = off; + return TRUE; + } + + /* Link destination. */ + if(!md_is_link_destination(ctx, off, lines[line_index].end, + &off, &attr->dest_beg, &attr->dest_end)) + return FALSE; + + /* (Optional) title. */ + if(md_is_link_title(ctx, lines + line_index, n_lines - line_index, off, + &off, &title_contents_line_index, &tmp_line_index, + &title_contents_beg, &title_contents_end)) + { + title_is_multiline = (tmp_line_index != title_contents_line_index); + title_contents_line_index += line_index; + line_index += tmp_line_index; + } else { + /* Not a title. */ + title_is_multiline = FALSE; + title_contents_beg = off; + title_contents_end = off; + title_contents_line_index = 0; + } + + /* Optional whitespace followed with final ')'. */ + while(off < lines[line_index].end && ISWHITESPACE(off)) + off++; + if (off >= lines[line_index].end && (off >= ctx->size || ISNEWLINE(off))) { + line_index++; + if(line_index >= n_lines) + return FALSE; + off = lines[line_index].beg; + } + if(CH(off) != _T(')')) + goto abort; + off++; + + if(title_contents_beg >= title_contents_end) { + attr->title = NULL; + attr->title_size = 0; + attr->title_needs_free = FALSE; + } else if(!title_is_multiline) { + attr->title = (CHAR*) STR(title_contents_beg); + attr->title_size = title_contents_end - title_contents_beg; + attr->title_needs_free = FALSE; + } else { + MD_CHECK(md_merge_lines_alloc(ctx, title_contents_beg, title_contents_end, + lines + title_contents_line_index, n_lines - title_contents_line_index, + _T('\n'), &attr->title, &attr->title_size)); + attr->title_needs_free = TRUE; + } + + *p_end = off; + ret = TRUE; + +abort: + return ret; +} + +static void +md_free_ref_defs(MD_CTX* ctx) +{ + int i; + + for(i = 0; i < ctx->n_ref_defs; i++) { + MD_REF_DEF* def = &ctx->ref_defs[i]; + + if(def->label_needs_free) + free(def->label); + if(def->title_needs_free) + free(def->title); + } + + free(ctx->ref_defs); +} + + +/****************************************** + *** Processing Inlines (a.k.a Spans) *** + ******************************************/ + +/* We process inlines in few phases: + * + * (1) We go through the block text and collect all significant characters + * which may start/end a span or some other significant position into + * ctx->marks[]. Core of this is what md_collect_marks() does. + * + * We also do some very brief preliminary context-less analysis, whether + * it might be opener or closer (e.g. of an emphasis span). + * + * This speeds the other steps as we do not need to re-iterate over all + * characters anymore. + * + * (2) We analyze each potential mark types, in order by their precedence. + * + * In each md_analyze_XXX() function, we re-iterate list of the marks, + * skipping already resolved regions (in preceding precedences) and try to + * resolve them. + * + * (2.1) For trivial marks, which are single (e.g. HTML entity), we just mark + * them as resolved. + * + * (2.2) For range-type marks, we analyze whether the mark could be closer + * and, if yes, whether there is some preceding opener it could satisfy. + * + * If not we check whether it could be really an opener and if yes, we + * remember it so subsequent closers may resolve it. + * + * (3) Finally, when all marks were analyzed, we render the block contents + * by calling MD_RENDERER::text() callback, interrupting by ::enter_span() + * or ::close_span() whenever we reach a resolved mark. + */ + + +/* The mark structure. + * + * '\\': Maybe escape sequence. + * '\0': NULL char. + * '*': Maybe (strong) emphasis start/end. + * '_': Maybe (strong) emphasis start/end. + * '~': Maybe strikethrough start/end (needs MD_FLAG_STRIKETHROUGH). + * '`': Maybe code span start/end. + * '&': Maybe start of entity. + * ';': Maybe end of entity. + * '<': Maybe start of raw HTML or autolink. + * '>': Maybe end of raw HTML or autolink. + * '[': Maybe start of link label or link text. + * '!': Equivalent of '[' for image. + * ']': Maybe end of link label or link text. + * '@': Maybe permissive e-mail auto-link (needs MD_FLAG_PERMISSIVEEMAILAUTOLINKS). + * ':': Maybe permissive URL auto-link (needs MD_FLAG_PERMISSIVEURLAUTOLINKS). + * '.': Maybe permissive WWW auto-link (needs MD_FLAG_PERMISSIVEWWWAUTOLINKS). + * 'D': Dummy mark, it reserves a space for splitting a previous mark + * (e.g. emphasis) or to make more space for storing some special data + * related to the preceding mark (e.g. link). + * + * Note that not all instances of these chars in the text imply creation of the + * structure. Only those which have (or may have, after we see more context) + * the special meaning. + * + * (Keep this struct as small as possible to fit as much of them into CPU + * cache line.) + */ +struct MD_MARK_tag { + OFF beg; + OFF end; + + /* For unresolved openers, 'prev' and 'next' form the chain of open openers + * of given type 'ch'. + * + * During resolving, we disconnect from the chain and point to the + * corresponding counterpart so opener points to its closer and vice versa. + */ + int prev; + int next; + CHAR ch; + unsigned char flags; +}; + +/* Mark flags (these apply to ALL mark types). */ +#define MD_MARK_POTENTIAL_OPENER 0x01 /* Maybe opener. */ +#define MD_MARK_POTENTIAL_CLOSER 0x02 /* Maybe closer. */ +#define MD_MARK_OPENER 0x04 /* Definitely opener. */ +#define MD_MARK_CLOSER 0x08 /* Definitely closer. */ +#define MD_MARK_RESOLVED 0x10 /* Resolved in any definite way. */ + +/* Mark flags specific for various mark types (so they can share bits). */ +#define MD_MARK_EMPH_INTRAWORD 0x20 /* Helper for the "rule of 3". */ +#define MD_MARK_EMPH_MOD3_0 0x40 +#define MD_MARK_EMPH_MOD3_1 0x80 +#define MD_MARK_EMPH_MOD3_2 (0x40 | 0x80) +#define MD_MARK_EMPH_MOD3_MASK (0x40 | 0x80) +#define MD_MARK_AUTOLINK 0x20 /* Distinguisher for '<', '>'. */ +#define MD_MARK_VALIDPERMISSIVEAUTOLINK 0x20 /* For permissive autolinks. */ +#define MD_MARK_HASNESTEDBRACKETS 0x20 /* For '[' to rule out invalid link labels early */ + +static MD_MARKCHAIN* +md_asterisk_chain(MD_CTX* ctx, unsigned flags) +{ + switch(flags & (MD_MARK_EMPH_INTRAWORD | MD_MARK_EMPH_MOD3_MASK)) { + case MD_MARK_EMPH_INTRAWORD | MD_MARK_EMPH_MOD3_0: return &ASTERISK_OPENERS_intraword_mod3_0; + case MD_MARK_EMPH_INTRAWORD | MD_MARK_EMPH_MOD3_1: return &ASTERISK_OPENERS_intraword_mod3_1; + case MD_MARK_EMPH_INTRAWORD | MD_MARK_EMPH_MOD3_2: return &ASTERISK_OPENERS_intraword_mod3_2; + case MD_MARK_EMPH_MOD3_0: return &ASTERISK_OPENERS_extraword_mod3_0; + case MD_MARK_EMPH_MOD3_1: return &ASTERISK_OPENERS_extraword_mod3_1; + case MD_MARK_EMPH_MOD3_2: return &ASTERISK_OPENERS_extraword_mod3_2; + default: MD_UNREACHABLE(); + } + return NULL; +} + +static MD_MARKCHAIN* +md_mark_chain(MD_CTX* ctx, int mark_index) +{ + MD_MARK* mark = &ctx->marks[mark_index]; + + switch(mark->ch) { + case _T('*'): return md_asterisk_chain(ctx, mark->flags); + case _T('_'): return &UNDERSCORE_OPENERS; + case _T('~'): return (mark->end - mark->beg == 1) ? &TILDE_OPENERS_1 : &TILDE_OPENERS_2; + case _T('!'): MD_FALLTHROUGH(); + case _T('['): return &BRACKET_OPENERS; + case _T('|'): return &TABLECELLBOUNDARIES; + default: return NULL; + } +} + +static MD_MARK* +md_push_mark(MD_CTX* ctx) +{ + if(ctx->n_marks >= ctx->alloc_marks) { + MD_MARK* new_marks; + + ctx->alloc_marks = (ctx->alloc_marks > 0 + ? ctx->alloc_marks + ctx->alloc_marks / 2 + : 64); + new_marks = realloc(ctx->marks, ctx->alloc_marks * sizeof(MD_MARK)); + if(new_marks == NULL) { + MD_LOG("realloc() failed."); + return NULL; + } + + ctx->marks = new_marks; + } + + return &ctx->marks[ctx->n_marks++]; +} + +#define PUSH_MARK_() \ + do { \ + mark = md_push_mark(ctx); \ + if(mark == NULL) { \ + ret = -1; \ + goto abort; \ + } \ + } while(0) + +#define PUSH_MARK(ch_, beg_, end_, flags_) \ + do { \ + PUSH_MARK_(); \ + mark->beg = (beg_); \ + mark->end = (end_); \ + mark->prev = -1; \ + mark->next = -1; \ + mark->ch = (char)(ch_); \ + mark->flags = (flags_); \ + } while(0) + + +static void +md_mark_chain_append(MD_CTX* ctx, MD_MARKCHAIN* chain, int mark_index) +{ + if(chain->tail >= 0) + ctx->marks[chain->tail].next = mark_index; + else + chain->head = mark_index; + + ctx->marks[mark_index].prev = chain->tail; + ctx->marks[mark_index].next = -1; + chain->tail = mark_index; +} + +/* Sometimes, we need to store a pointer into the mark. It is quite rare + * so we do not bother to make MD_MARK use union, and it can only happen + * for dummy marks. */ +static inline void +md_mark_store_ptr(MD_CTX* ctx, int mark_index, void* ptr) +{ + MD_MARK* mark = &ctx->marks[mark_index]; + MD_ASSERT(mark->ch == 'D'); + + /* Check only members beg and end are misused for this. */ + MD_ASSERT(sizeof(void*) <= 2 * sizeof(OFF)); + memcpy(mark, &ptr, sizeof(void*)); +} + +static inline void* +md_mark_get_ptr(MD_CTX* ctx, int mark_index) +{ + void* ptr; + MD_MARK* mark = &ctx->marks[mark_index]; + MD_ASSERT(mark->ch == 'D'); + memcpy(&ptr, mark, sizeof(void*)); + return ptr; +} + +static void +md_resolve_range(MD_CTX* ctx, MD_MARKCHAIN* chain, int opener_index, int closer_index) +{ + MD_MARK* opener = &ctx->marks[opener_index]; + MD_MARK* closer = &ctx->marks[closer_index]; + + /* Remove opener from the list of openers. */ + if(chain != NULL) { + if(opener->prev >= 0) + ctx->marks[opener->prev].next = opener->next; + else + chain->head = opener->next; + + if(opener->next >= 0) + ctx->marks[opener->next].prev = opener->prev; + else + chain->tail = opener->prev; + } + + /* Interconnect opener and closer and mark both as resolved. */ + opener->next = closer_index; + opener->flags |= MD_MARK_OPENER | MD_MARK_RESOLVED; + closer->prev = opener_index; + closer->flags |= MD_MARK_CLOSER | MD_MARK_RESOLVED; +} + + +#define MD_ROLLBACK_ALL 0 +#define MD_ROLLBACK_CROSSING 1 + +/* In the range ctx->marks[opener_index] ... [closer_index], undo some or all + * resolvings accordingly to these rules: + * + * (1) All openers BEFORE the range corresponding to any closer inside the + * range are un-resolved and they are re-added to their respective chains + * of unresolved openers. This ensures we can reuse the opener for closers + * AFTER the range. + * + * (2) If 'how' is MD_ROLLBACK_ALL, then ALL resolved marks inside the range + * are discarded. + * + * (3) If 'how' is MD_ROLLBACK_CROSSING, only closers with openers handled + * in (1) are discarded. I.e. pairs of openers and closers which are both + * inside the range are retained as well as any unpaired marks. + */ +static void +md_rollback(MD_CTX* ctx, int opener_index, int closer_index, int how) +{ + int i; + int mark_index; + + /* Cut all unresolved openers at the mark index. */ + for(i = OPENERS_CHAIN_FIRST; i < OPENERS_CHAIN_LAST+1; i++) { + MD_MARKCHAIN* chain = &ctx->mark_chains[i]; + + while(chain->tail >= opener_index) { + int same = chain->tail == opener_index; + chain->tail = ctx->marks[chain->tail].prev; + if (same) break; + } + + if(chain->tail >= 0) + ctx->marks[chain->tail].next = -1; + else + chain->head = -1; + } + + /* Go backwards so that unresolved openers are re-added into their + * respective chains, in the right order. */ + mark_index = closer_index - 1; + while(mark_index > opener_index) { + MD_MARK* mark = &ctx->marks[mark_index]; + int mark_flags = mark->flags; + int discard_flag = (how == MD_ROLLBACK_ALL); + + if(mark->flags & MD_MARK_CLOSER) { + int mark_opener_index = mark->prev; + + /* Undo opener BEFORE the range. */ + if(mark_opener_index < opener_index) { + MD_MARK* mark_opener = &ctx->marks[mark_opener_index]; + MD_MARKCHAIN* chain; + + mark_opener->flags &= ~(MD_MARK_OPENER | MD_MARK_CLOSER | MD_MARK_RESOLVED); + chain = md_mark_chain(ctx, opener_index); + if(chain != NULL) { + md_mark_chain_append(ctx, chain, mark_opener_index); + discard_flag = 1; + } + } + } + + /* And reset our flags. */ + if(discard_flag) { + /* Make zero-length closer a dummy mark as that's how it was born */ + if((mark->flags & MD_MARK_CLOSER) && mark->beg == mark->end) + mark->ch = 'D'; + + mark->flags &= ~(MD_MARK_OPENER | MD_MARK_CLOSER | MD_MARK_RESOLVED); + } + + /* Jump as far as we can over unresolved or non-interesting marks. */ + switch(how) { + case MD_ROLLBACK_CROSSING: + if((mark_flags & MD_MARK_CLOSER) && mark->prev > opener_index) { + /* If we are closer with opener INSIDE the range, there may + * not be any other crosser inside the subrange. */ + mark_index = mark->prev; + break; + } + MD_FALLTHROUGH(); + default: + mark_index--; + break; + } + } +} + +static void +md_build_mark_char_map(MD_CTX* ctx) +{ + memset(ctx->mark_char_map, 0, sizeof(ctx->mark_char_map)); + + ctx->mark_char_map['\\'] = 1; + ctx->mark_char_map['*'] = 1; + ctx->mark_char_map['_'] = 1; + ctx->mark_char_map['`'] = 1; + ctx->mark_char_map['&'] = 1; + ctx->mark_char_map[';'] = 1; + ctx->mark_char_map['<'] = 1; + ctx->mark_char_map['>'] = 1; + ctx->mark_char_map['['] = 1; + ctx->mark_char_map['!'] = 1; + ctx->mark_char_map[']'] = 1; + ctx->mark_char_map['\0'] = 1; + + if(ctx->parser.flags & MD_FLAG_STRIKETHROUGH) + ctx->mark_char_map['~'] = 1; + + if(ctx->parser.flags & MD_FLAG_LATEXMATHSPANS) + ctx->mark_char_map['$'] = 1; + + if(ctx->parser.flags & MD_FLAG_PERMISSIVEEMAILAUTOLINKS) + ctx->mark_char_map['@'] = 1; + + if(ctx->parser.flags & MD_FLAG_PERMISSIVEURLAUTOLINKS) + ctx->mark_char_map[':'] = 1; + + if(ctx->parser.flags & MD_FLAG_PERMISSIVEWWWAUTOLINKS) + ctx->mark_char_map['.'] = 1; + + if((ctx->parser.flags & MD_FLAG_TABLES) || (ctx->parser.flags & MD_FLAG_WIKILINKS)) + ctx->mark_char_map['|'] = 1; + + if(ctx->parser.flags & MD_FLAG_COLLAPSEWHITESPACE) { + int i; + + for(i = 0; i < (int) sizeof(ctx->mark_char_map); i++) { + if(ISWHITESPACE_(i)) + ctx->mark_char_map[i] = 1; + } + } +} + +/* We limit code span marks to lower than 32 backticks. This solves the + * pathologic case of too many openers, each of different length: Their + * resolving would be then O(n^2). */ +#define CODESPAN_MARK_MAXLEN 32 + +static int +md_is_code_span(MD_CTX* ctx, const MD_LINE* lines, int n_lines, OFF beg, + OFF* p_opener_beg, OFF* p_opener_end, + OFF* p_closer_beg, OFF* p_closer_end, + OFF last_potential_closers[CODESPAN_MARK_MAXLEN], + int* p_reached_paragraph_end) +{ + OFF opener_beg = beg; + OFF opener_end; + OFF closer_beg; + OFF closer_end; + SZ mark_len; + OFF line_end; + int has_space_after_opener = FALSE; + int has_eol_after_opener = FALSE; + int has_space_before_closer = FALSE; + int has_eol_before_closer = FALSE; + int has_only_space = TRUE; + int line_index = 0; + + line_end = lines[0].end; + opener_end = opener_beg; + while(opener_end < line_end && CH(opener_end) == _T('`')) + opener_end++; + has_space_after_opener = (opener_end < line_end && CH(opener_end) == _T(' ')); + has_eol_after_opener = (opener_end == line_end); + + /* The caller needs to know end of the opening mark even if we fail. */ + *p_opener_end = opener_end; + + mark_len = opener_end - opener_beg; + if(mark_len > CODESPAN_MARK_MAXLEN) + return FALSE; + + /* Check whether we already know there is no closer of this length. + * If so, re-scan does no sense. This fixes issue #59. */ + if(last_potential_closers[mark_len-1] >= lines[n_lines-1].end || + (*p_reached_paragraph_end && last_potential_closers[mark_len-1] < opener_end)) + return FALSE; + + closer_beg = opener_end; + closer_end = opener_end; + + /* Find closer mark. */ + while(TRUE) { + while(closer_beg < line_end && CH(closer_beg) != _T('`')) { + if(CH(closer_beg) != _T(' ')) + has_only_space = FALSE; + closer_beg++; + } + closer_end = closer_beg; + while(closer_end < line_end && CH(closer_end) == _T('`')) + closer_end++; + + if(closer_end - closer_beg == mark_len) { + /* Success. */ + has_space_before_closer = (closer_beg > lines[line_index].beg && CH(closer_beg-1) == _T(' ')); + has_eol_before_closer = (closer_beg == lines[line_index].beg); + break; + } + + if(closer_end - closer_beg > 0) { + /* We have found a back-tick which is not part of the closer. */ + has_only_space = FALSE; + + /* But if we eventually fail, remember it as a potential closer + * of its own length for future attempts. This mitigates needs for + * rescans. */ + if(closer_end - closer_beg < CODESPAN_MARK_MAXLEN) { + if(closer_beg > last_potential_closers[closer_end - closer_beg - 1]) + last_potential_closers[closer_end - closer_beg - 1] = closer_beg; + } + } + + if(closer_end >= line_end) { + line_index++; + if(line_index >= n_lines) { + /* Reached end of the paragraph and still nothing. */ + *p_reached_paragraph_end = TRUE; + return FALSE; + } + /* Try on the next line. */ + line_end = lines[line_index].end; + closer_beg = lines[line_index].beg; + } else { + closer_beg = closer_end; + } + } + + /* If there is a space or a new line both after and before the opener + * (and if the code span is not made of spaces only), consume one initial + * and one trailing space as part of the marks. */ + if(!has_only_space && + (has_space_after_opener || has_eol_after_opener) && + (has_space_before_closer || has_eol_before_closer)) + { + if(has_space_after_opener) + opener_end++; + else + opener_end = lines[1].beg; + + if(has_space_before_closer) + closer_beg--; + else { + closer_beg = lines[line_index-1].end; + /* We need to eat the preceding "\r\n" but not any line trailing + * spaces. */ + while(closer_beg < ctx->size && ISBLANK(closer_beg)) + closer_beg++; + } + } + + *p_opener_beg = opener_beg; + *p_opener_end = opener_end; + *p_closer_beg = closer_beg; + *p_closer_end = closer_end; + return TRUE; +} + +static int +md_is_autolink_uri(MD_CTX* ctx, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg+1; + + MD_ASSERT(CH(beg) == _T('<')); + + /* Check for scheme. */ + if(off >= max_end || !ISASCII(off)) + return FALSE; + off++; + while(1) { + if(off >= max_end) + return FALSE; + if(off - beg > 32) + return FALSE; + if(CH(off) == _T(':') && off - beg >= 3) + break; + if(!ISALNUM(off) && CH(off) != _T('+') && CH(off) != _T('-') && CH(off) != _T('.')) + return FALSE; + off++; + } + + /* Check the path after the scheme. */ + while(off < max_end && CH(off) != _T('>')) { + if(ISWHITESPACE(off) || ISCNTRL(off) || CH(off) == _T('<')) + return FALSE; + off++; + } + + if(off >= max_end) + return FALSE; + + MD_ASSERT(CH(off) == _T('>')); + *p_end = off+1; + return TRUE; +} + +static int +md_is_autolink_email(MD_CTX* ctx, OFF beg, OFF max_end, OFF* p_end) +{ + OFF off = beg + 1; + int label_len; + + MD_ASSERT(CH(beg) == _T('<')); + + /* The code should correspond to this regexp: + /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+ + @[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? + (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ + */ + + /* Username (before '@'). */ + while(off < max_end && (ISALNUM(off) || ISANYOF(off, _T(".!#$%&'*+/=?^_`{|}~-")))) + off++; + if(off <= beg+1) + return FALSE; + + /* '@' */ + if(off >= max_end || CH(off) != _T('@')) + return FALSE; + off++; + + /* Labels delimited with '.'; each label is sequence of 1 - 63 alnum + * characters or '-', but '-' is not allowed as first or last char. */ + label_len = 0; + while(off < max_end) { + if(ISALNUM(off)) + label_len++; + else if(CH(off) == _T('-') && label_len > 0) + label_len++; + else if(CH(off) == _T('.') && label_len > 0 && CH(off-1) != _T('-')) + label_len = 0; + else + break; + + if(label_len > 63) + return FALSE; + + off++; + } + + if(label_len <= 0 || off >= max_end || CH(off) != _T('>') || CH(off-1) == _T('-')) + return FALSE; + + *p_end = off+1; + return TRUE; +} + +static int +md_is_autolink(MD_CTX* ctx, OFF beg, OFF max_end, OFF* p_end, int* p_missing_mailto) +{ + if(md_is_autolink_uri(ctx, beg, max_end, p_end)) { + *p_missing_mailto = FALSE; + return TRUE; + } + + if(md_is_autolink_email(ctx, beg, max_end, p_end)) { + *p_missing_mailto = TRUE; + return TRUE; + } + + return FALSE; +} + +static int +md_collect_marks(MD_CTX* ctx, const MD_LINE* lines, int n_lines, int table_mode) +{ + const MD_LINE* line_term = lines + n_lines; + const MD_LINE* line; + int ret = 0; + MD_MARK* mark; + OFF codespan_last_potential_closers[CODESPAN_MARK_MAXLEN] = { 0 }; + int codespan_scanned_till_paragraph_end = FALSE; + + for(line = lines; line < line_term; line++) { + OFF off = line->beg; + OFF line_end = line->end; + + while(TRUE) { + CHAR ch; + +#ifdef MD4C_USE_UTF16 + /* For UTF-16, mark_char_map[] covers only ASCII. */ + #define IS_MARK_CHAR(off) ((CH(off) < SIZEOF_ARRAY(ctx->mark_char_map)) && \ + (ctx->mark_char_map[(unsigned char) CH(off)])) +#else + /* For 8-bit encodings, mark_char_map[] covers all 256 elements. */ + #define IS_MARK_CHAR(off) (ctx->mark_char_map[(unsigned char) CH(off)]) +#endif + + /* Optimization: Use some loop unrolling. */ + while(off + 3 < line_end && !IS_MARK_CHAR(off+0) && !IS_MARK_CHAR(off+1) + && !IS_MARK_CHAR(off+2) && !IS_MARK_CHAR(off+3)) + off += 4; + while(off < line_end && !IS_MARK_CHAR(off+0)) + off++; + + if(off >= line_end) + break; + + ch = CH(off); + + /* A backslash escape. + * It can go beyond line->end as it may involve escaped new + * line to form a hard break. */ + if(ch == _T('\\') && off+1 < ctx->size && (ISPUNCT(off+1) || ISNEWLINE(off+1))) { + /* Hard-break cannot be on the last line of the block. */ + if(!ISNEWLINE(off+1) || line+1 < line_term) + PUSH_MARK(ch, off, off+2, MD_MARK_RESOLVED); + off += 2; + continue; + } + + /* A potential (string) emphasis start/end. */ + if(ch == _T('*') || ch == _T('_')) { + OFF tmp = off+1; + int left_level; /* What precedes: 0 = whitespace; 1 = punctuation; 2 = other char. */ + int right_level; /* What follows: 0 = whitespace; 1 = punctuation; 2 = other char. */ + + while(tmp < line_end && CH(tmp) == ch) + tmp++; + + if(off == line->beg || ISUNICODEWHITESPACEBEFORE(off)) + left_level = 0; + else if(ISUNICODEPUNCTBEFORE(off)) + left_level = 1; + else + left_level = 2; + + if(tmp == line_end || ISUNICODEWHITESPACE(tmp)) + right_level = 0; + else if(ISUNICODEPUNCT(tmp)) + right_level = 1; + else + right_level = 2; + + /* Intra-word underscore doesn't have special meaning. */ + if(ch == _T('_') && left_level == 2 && right_level == 2) { + left_level = 0; + right_level = 0; + } + + if(left_level != 0 || right_level != 0) { + unsigned flags = 0; + + if(left_level > 0 && left_level >= right_level) + flags |= MD_MARK_POTENTIAL_CLOSER; + if(right_level > 0 && right_level >= left_level) + flags |= MD_MARK_POTENTIAL_OPENER; + if(left_level == 2 && right_level == 2) + flags |= MD_MARK_EMPH_INTRAWORD; + + /* For "the rule of three" we need to remember the original + * size of the mark (modulo three), before we potentially + * split the mark when being later resolved partially by some + * shorter closer. */ + switch((tmp - off) % 3) { + case 0: flags |= MD_MARK_EMPH_MOD3_0; break; + case 1: flags |= MD_MARK_EMPH_MOD3_1; break; + case 2: flags |= MD_MARK_EMPH_MOD3_2; break; + } + + PUSH_MARK(ch, off, tmp, flags); + + /* During resolving, multiple asterisks may have to be + * split into independent span start/ends. Consider e.g. + * "**foo* bar*". Therefore we push also some empty dummy + * marks to have enough space for that. */ + off++; + while(off < tmp) { + PUSH_MARK('D', off, off, 0); + off++; + } + continue; + } + + off = tmp; + continue; + } + + /* A potential code span start/end. */ + if(ch == _T('`')) { + OFF opener_beg, opener_end; + OFF closer_beg, closer_end; + int is_code_span; + + is_code_span = md_is_code_span(ctx, line, line_term - line, off, + &opener_beg, &opener_end, &closer_beg, &closer_end, + codespan_last_potential_closers, + &codespan_scanned_till_paragraph_end); + if(is_code_span) { + PUSH_MARK(_T('`'), opener_beg, opener_end, MD_MARK_OPENER | MD_MARK_RESOLVED); + PUSH_MARK(_T('`'), closer_beg, closer_end, MD_MARK_CLOSER | MD_MARK_RESOLVED); + ctx->marks[ctx->n_marks-2].next = ctx->n_marks-1; + ctx->marks[ctx->n_marks-1].prev = ctx->n_marks-2; + + off = closer_end; + + /* Advance the current line accordingly. */ + if(off > line_end) { + line = md_lookup_line(off, line, line_term - line); + line_end = line->end; + } + continue; + } + + off = opener_end; + continue; + } + + /* A potential entity start. */ + if(ch == _T('&')) { + PUSH_MARK(ch, off, off+1, MD_MARK_POTENTIAL_OPENER); + off++; + continue; + } + + /* A potential entity end. */ + if(ch == _T(';')) { + /* We surely cannot be entity unless the previous mark is '&'. */ + if(ctx->n_marks > 0 && ctx->marks[ctx->n_marks-1].ch == _T('&')) + PUSH_MARK(ch, off, off+1, MD_MARK_POTENTIAL_CLOSER); + + off++; + continue; + } + + /* A potential autolink or raw HTML start/end. */ + if(ch == _T('<')) { + int is_autolink; + OFF autolink_end; + int missing_mailto; + + if(!(ctx->parser.flags & MD_FLAG_NOHTMLSPANS)) { + int is_html; + OFF html_end; + + /* Given the nature of the raw HTML, we have to recognize + * it here. Doing so later in md_analyze_lt_gt() could + * open can of worms of quadratic complexity. */ + is_html = md_is_html_any(ctx, line, line_term - line, off, + lines[n_lines-1].end, &html_end); + if(is_html) { + PUSH_MARK(_T('<'), off, off, MD_MARK_OPENER | MD_MARK_RESOLVED); + PUSH_MARK(_T('>'), html_end, html_end, MD_MARK_CLOSER | MD_MARK_RESOLVED); + ctx->marks[ctx->n_marks-2].next = ctx->n_marks-1; + ctx->marks[ctx->n_marks-1].prev = ctx->n_marks-2; + off = html_end; + + /* Advance the current line accordingly. */ + if(off > line_end) { + line = md_lookup_line(off, line, line_term - line); + line_end = line->end; + } + continue; + } + } + + is_autolink = md_is_autolink(ctx, off, lines[n_lines-1].end, + &autolink_end, &missing_mailto); + if(is_autolink) { + PUSH_MARK((missing_mailto ? _T('@') : _T('<')), off, off+1, + MD_MARK_OPENER | MD_MARK_RESOLVED | MD_MARK_AUTOLINK); + PUSH_MARK(_T('>'), autolink_end-1, autolink_end, + MD_MARK_CLOSER | MD_MARK_RESOLVED | MD_MARK_AUTOLINK); + ctx->marks[ctx->n_marks-2].next = ctx->n_marks-1; + ctx->marks[ctx->n_marks-1].prev = ctx->n_marks-2; + off = autolink_end; + continue; + } + + off++; + continue; + } + + /* A potential link or its part. */ + if(ch == _T('[') || (ch == _T('!') && off+1 < line_end && CH(off+1) == _T('['))) { + OFF tmp = (ch == _T('[') ? off+1 : off+2); + PUSH_MARK(ch, off, tmp, MD_MARK_POTENTIAL_OPENER); + off = tmp; + /* Two dummies to make enough place for data we need if it is + * a link. */ + PUSH_MARK('D', off, off, 0); + PUSH_MARK('D', off, off, 0); + continue; + } + if(ch == _T(']')) { + PUSH_MARK(ch, off, off+1, MD_MARK_POTENTIAL_CLOSER); + off++; + continue; + } + + /* A potential permissive e-mail autolink. */ + if(ch == _T('@')) { + if(line->beg + 1 <= off && ISALNUM(off-1) && + off + 3 < line->end && ISALNUM(off+1)) + { + PUSH_MARK(ch, off, off+1, MD_MARK_POTENTIAL_OPENER); + /* Push a dummy as a reserve for a closer. */ + PUSH_MARK('D', off, off, 0); + } + + off++; + continue; + } + + /* A potential permissive URL autolink. */ + if(ch == _T(':')) { + static struct { + const CHAR* scheme; + SZ scheme_size; + const CHAR* suffix; + SZ suffix_size; + } scheme_map[] = { + /* In the order from the most frequently used, arguably. */ + { _T("http"), 4, _T("//"), 2 }, + { _T("https"), 5, _T("//"), 2 }, + { _T("ftp"), 3, _T("//"), 2 } + }; + int scheme_index; + + for(scheme_index = 0; scheme_index < (int) SIZEOF_ARRAY(scheme_map); scheme_index++) { + const CHAR* scheme = scheme_map[scheme_index].scheme; + const SZ scheme_size = scheme_map[scheme_index].scheme_size; + const CHAR* suffix = scheme_map[scheme_index].suffix; + const SZ suffix_size = scheme_map[scheme_index].suffix_size; + + if(line->beg + scheme_size <= off && md_ascii_eq(STR(off-scheme_size), scheme, scheme_size) && + (line->beg + scheme_size == off || ISWHITESPACE(off-scheme_size-1) || ISANYOF(off-scheme_size-1, _T("*_~(["))) && + off + 1 + suffix_size < line->end && md_ascii_eq(STR(off+1), suffix, suffix_size)) + { + PUSH_MARK(ch, off-scheme_size, off+1+suffix_size, MD_MARK_POTENTIAL_OPENER); + /* Push a dummy as a reserve for a closer. */ + PUSH_MARK('D', off, off, 0); + off += 1 + suffix_size; + break; + } + } + + off++; + continue; + } + + /* A potential permissive WWW autolink. */ + if(ch == _T('.')) { + if(line->beg + 3 <= off && md_ascii_eq(STR(off-3), _T("www"), 3) && + (line->beg + 3 == off || ISWHITESPACE(off-4) || ISANYOF(off-4, _T("*_~(["))) && + off + 1 < line_end) + { + PUSH_MARK(ch, off-3, off+1, MD_MARK_POTENTIAL_OPENER); + /* Push a dummy as a reserve for a closer. */ + PUSH_MARK('D', off, off, 0); + off++; + continue; + } + + off++; + continue; + } + + /* A potential table cell boundary or wiki link label delimiter. */ + if((table_mode || ctx->parser.flags & MD_FLAG_WIKILINKS) && ch == _T('|')) { + PUSH_MARK(ch, off, off+1, 0); + off++; + continue; + } + + /* A potential strikethrough start/end. */ + if(ch == _T('~')) { + OFF tmp = off+1; + + while(tmp < line_end && CH(tmp) == _T('~')) + tmp++; + + if(tmp - off < 3) { + unsigned flags = 0; + + if(tmp < line_end && !ISUNICODEWHITESPACE(tmp)) + flags |= MD_MARK_POTENTIAL_OPENER; + if(off > line->beg && !ISUNICODEWHITESPACEBEFORE(off)) + flags |= MD_MARK_POTENTIAL_CLOSER; + if(flags != 0) + PUSH_MARK(ch, off, tmp, flags); + } + + off = tmp; + continue; + } + + /* A potential equation start/end */ + if(ch == _T('$')) { + /* We can have at most two consecutive $ signs, + * where two dollar signs signify a display equation. */ + OFF tmp = off+1; + + while(tmp < line_end && CH(tmp) == _T('$')) + tmp++; + + if (tmp - off <= 2) + PUSH_MARK(ch, off, tmp, MD_MARK_POTENTIAL_OPENER | MD_MARK_POTENTIAL_CLOSER); + off = tmp; + continue; + } + + /* Turn non-trivial whitespace into single space. */ + if(ISWHITESPACE_(ch)) { + OFF tmp = off+1; + + while(tmp < line_end && ISWHITESPACE(tmp)) + tmp++; + + if(tmp - off > 1 || ch != _T(' ')) + PUSH_MARK(ch, off, tmp, MD_MARK_RESOLVED); + + off = tmp; + continue; + } + + /* NULL character. */ + if(ch == _T('\0')) { + PUSH_MARK(ch, off, off+1, MD_MARK_RESOLVED); + off++; + continue; + } + + off++; + } + } + + /* Add a dummy mark at the end of the mark vector to simplify + * process_inlines(). */ + PUSH_MARK(127, ctx->size, ctx->size, MD_MARK_RESOLVED); + +abort: + return ret; +} + +static void +md_analyze_bracket(MD_CTX* ctx, int mark_index) +{ + /* We cannot really resolve links here as for that we would need + * more context. E.g. a following pair of brackets (reference link), + * or enclosing pair of brackets (if the inner is the link, the outer + * one cannot be.) + * + * Therefore we here only construct a list of '[' ']' pairs ordered by + * position of the closer. This allows us to analyze what is or is not + * link in the right order, from inside to outside in case of nested + * brackets. + * + * The resolving itself is deferred to md_resolve_links(). + */ + + MD_MARK* mark = &ctx->marks[mark_index]; + + if(mark->flags & MD_MARK_POTENTIAL_OPENER) { + if(BRACKET_OPENERS.head != -1) + ctx->marks[BRACKET_OPENERS.tail].flags |= MD_MARK_HASNESTEDBRACKETS; + + md_mark_chain_append(ctx, &BRACKET_OPENERS, mark_index); + return; + } + + if(BRACKET_OPENERS.tail >= 0) { + /* Pop the opener from the chain. */ + int opener_index = BRACKET_OPENERS.tail; + MD_MARK* opener = &ctx->marks[opener_index]; + if(opener->prev >= 0) + ctx->marks[opener->prev].next = -1; + else + BRACKET_OPENERS.head = -1; + BRACKET_OPENERS.tail = opener->prev; + + /* Interconnect the opener and closer. */ + opener->next = mark_index; + mark->prev = opener_index; + + /* Add the pair into chain of potential links for md_resolve_links(). + * Note we misuse opener->prev for this as opener->next points to its + * closer. */ + if(ctx->unresolved_link_tail >= 0) + ctx->marks[ctx->unresolved_link_tail].prev = opener_index; + else + ctx->unresolved_link_head = opener_index; + ctx->unresolved_link_tail = opener_index; + opener->prev = -1; + } +} + +/* Forward declaration. */ +static void md_analyze_link_contents(MD_CTX* ctx, const MD_LINE* lines, int n_lines, + int mark_beg, int mark_end); + +static int +md_resolve_links(MD_CTX* ctx, const MD_LINE* lines, int n_lines) +{ + int opener_index = ctx->unresolved_link_head; + OFF last_link_beg = 0; + OFF last_link_end = 0; + OFF last_img_beg = 0; + OFF last_img_end = 0; + + while(opener_index >= 0) { + MD_MARK* opener = &ctx->marks[opener_index]; + int closer_index = opener->next; + MD_MARK* closer = &ctx->marks[closer_index]; + int next_index = opener->prev; + MD_MARK* next_opener; + MD_MARK* next_closer; + MD_LINK_ATTR attr; + int is_link = FALSE; + + if(next_index >= 0) { + next_opener = &ctx->marks[next_index]; + next_closer = &ctx->marks[next_opener->next]; + } else { + next_opener = NULL; + next_closer = NULL; + } + + /* If nested ("[ [ ] ]"), we need to make sure that: + * - The outer does not end inside of (...) belonging to the inner. + * - The outer cannot be link if the inner is link (i.e. not image). + * + * (Note we here analyze from inner to outer as the marks are ordered + * by closer->beg.) + */ + if((opener->beg < last_link_beg && closer->end < last_link_end) || + (opener->beg < last_img_beg && closer->end < last_img_end) || + (opener->beg < last_link_end && opener->ch == '[')) + { + opener_index = next_index; + continue; + } + + /* Recognize and resolve wiki links. + * Wiki-links maybe '[[destination]]' or '[[destination|label]]'. + */ + if ((ctx->parser.flags & MD_FLAG_WIKILINKS) && + (opener->end - opener->beg == 1) && /* not image */ + next_opener != NULL && /* double '[' opener */ + next_opener->ch == '[' && + (next_opener->beg == opener->beg - 1) && + (next_opener->end - next_opener->beg == 1) && + next_closer != NULL && /* double ']' closer */ + next_closer->ch == ']' && + (next_closer->beg == closer->beg + 1) && + (next_closer->end - next_closer->beg == 1)) + { + MD_MARK* delim = NULL; + int delim_index; + OFF dest_beg, dest_end; + + is_link = TRUE; + + /* We don't allow destination to be longer than 100 characters. + * Lets scan to see whether there is '|'. (If not then the whole + * wiki-link has to be below the 100 characters.) */ + delim_index = opener_index + 1; + while(delim_index < closer_index) { + MD_MARK* m = &ctx->marks[delim_index]; + if(m->ch == '|') { + delim = m; + break; + } + if(m->ch != 'D' && m->beg - opener->end > 100) + break; + delim_index++; + } + dest_beg = opener->end; + dest_end = (delim != NULL) ? delim->beg : closer->beg; + if(dest_end - dest_beg == 0 || dest_end - dest_beg > 100) + is_link = FALSE; + + /* There may not be any new line in the destination. */ + if(is_link) { + OFF off; + for(off = dest_beg; off < dest_end; off++) { + if(ISNEWLINE(off)) { + is_link = FALSE; + break; + } + } + } + + if(is_link) { + if(delim != NULL) { + if(delim->end < closer->beg) { + md_rollback(ctx, opener_index, delim_index, MD_ROLLBACK_ALL); + md_rollback(ctx, delim_index, closer_index, MD_ROLLBACK_CROSSING); + delim->flags |= MD_MARK_RESOLVED; + opener->end = delim->beg; + } else { + /* The pipe is just before the closer: [[foo|]] */ + md_rollback(ctx, opener_index, closer_index, MD_ROLLBACK_ALL); + closer->beg = delim->beg; + delim = NULL; + } + } + + opener->beg = next_opener->beg; + opener->next = closer_index; + opener->flags |= MD_MARK_OPENER | MD_MARK_RESOLVED; + + closer->end = next_closer->end; + closer->prev = opener_index; + closer->flags |= MD_MARK_CLOSER | MD_MARK_RESOLVED; + + last_link_beg = opener->beg; + last_link_end = closer->end; + + if(delim != NULL) + md_analyze_link_contents(ctx, lines, n_lines, delim_index+1, closer_index); + + opener_index = next_opener->prev; + continue; + } + } + + if(next_opener != NULL && next_opener->beg == closer->end) { + if(next_closer->beg > closer->end + 1) { + /* Might be full reference link. */ + if(!(next_opener->flags & MD_MARK_HASNESTEDBRACKETS)) + is_link = md_is_link_reference(ctx, lines, n_lines, next_opener->beg, next_closer->end, &attr); + } else { + /* Might be shortcut reference link. */ + if(!(opener->flags & MD_MARK_HASNESTEDBRACKETS)) + is_link = md_is_link_reference(ctx, lines, n_lines, opener->beg, closer->end, &attr); + } + + if(is_link < 0) + return -1; + + if(is_link) { + /* Eat the 2nd "[...]". */ + closer->end = next_closer->end; + + /* Do not analyze the label as a standalone link in the next + * iteration. */ + next_index = ctx->marks[next_index].prev; + } + } else { + if(closer->end < ctx->size && CH(closer->end) == _T('(')) { + /* Might be inline link. */ + OFF inline_link_end = UINT_MAX; + + is_link = md_is_inline_link_spec(ctx, lines, n_lines, closer->end, &inline_link_end, &attr); + if(is_link < 0) + return -1; + + /* Check the closing ')' is not inside an already resolved range + * (i.e. a range with a higher priority), e.g. a code span. */ + if(is_link) { + int i = closer_index + 1; + + while(i < ctx->n_marks) { + MD_MARK* mark = &ctx->marks[i]; + + if(mark->beg >= inline_link_end) + break; + if((mark->flags & (MD_MARK_OPENER | MD_MARK_RESOLVED)) == (MD_MARK_OPENER | MD_MARK_RESOLVED)) { + if(ctx->marks[mark->next].beg >= inline_link_end) { + /* Cancel the link status. */ + if(attr.title_needs_free) + free(attr.title); + is_link = FALSE; + break; + } + + i = mark->next + 1; + } else { + i++; + } + } + } + + if(is_link) { + /* Eat the "(...)" */ + closer->end = inline_link_end; + } + } + + if(!is_link) { + /* Might be collapsed reference link. */ + if(!(opener->flags & MD_MARK_HASNESTEDBRACKETS)) + is_link = md_is_link_reference(ctx, lines, n_lines, opener->beg, closer->end, &attr); + if(is_link < 0) + return -1; + } + } + + if(is_link) { + /* Resolve the brackets as a link. */ + opener->flags |= MD_MARK_OPENER | MD_MARK_RESOLVED; + closer->flags |= MD_MARK_CLOSER | MD_MARK_RESOLVED; + + /* If it is a link, we store the destination and title in the two + * dummy marks after the opener. */ + MD_ASSERT(ctx->marks[opener_index+1].ch == 'D'); + ctx->marks[opener_index+1].beg = attr.dest_beg; + ctx->marks[opener_index+1].end = attr.dest_end; + + MD_ASSERT(ctx->marks[opener_index+2].ch == 'D'); + md_mark_store_ptr(ctx, opener_index+2, attr.title); + /* The title might or might not have been allocated for us. */ + if(attr.title_needs_free) + md_mark_chain_append(ctx, &PTR_CHAIN, opener_index+2); + ctx->marks[opener_index+2].prev = attr.title_size; + + if(opener->ch == '[') { + last_link_beg = opener->beg; + last_link_end = closer->end; + } else { + last_img_beg = opener->beg; + last_img_end = closer->end; + } + + md_analyze_link_contents(ctx, lines, n_lines, opener_index+1, closer_index); + + /* If the link text is formed by nothing but permissive autolink, + * suppress the autolink. + * See https://github.com/mity/md4c/issues/152 for more info. */ + if(ctx->parser.flags & MD_FLAG_PERMISSIVEAUTOLINKS) { + MD_MARK* first_nested; + MD_MARK* last_nested; + + first_nested = opener + 1; + while(first_nested->ch == _T('D') && first_nested < closer) + first_nested++; + + last_nested = closer - 1; + while(first_nested->ch == _T('D') && last_nested > opener) + last_nested--; + + if((first_nested->flags & MD_MARK_RESOLVED) && + first_nested->beg == opener->end && + ISANYOF_(first_nested->ch, _T("@:.")) && + first_nested->next == (last_nested - ctx->marks) && + last_nested->end == closer->beg) + { + first_nested->ch = _T('D'); + first_nested->flags &= ~MD_MARK_RESOLVED; + last_nested->ch = _T('D'); + last_nested->flags &= ~MD_MARK_RESOLVED; + } + } + } + + opener_index = next_index; + } + + return 0; +} + +/* Analyze whether the mark '&' starts a HTML entity. + * If so, update its flags as well as flags of corresponding closer ';'. */ +static void +md_analyze_entity(MD_CTX* ctx, int mark_index) +{ + MD_MARK* opener = &ctx->marks[mark_index]; + MD_MARK* closer; + OFF off; + + /* Cannot be entity if there is no closer as the next mark. + * (Any other mark between would mean strange character which cannot be + * part of the entity. + * + * So we can do all the work on '&' and do not call this later for the + * closing mark ';'. + */ + if(mark_index + 1 >= ctx->n_marks) + return; + closer = &ctx->marks[mark_index+1]; + if(closer->ch != ';') + return; + + if(md_is_entity(ctx, opener->beg, closer->end, &off)) { + MD_ASSERT(off == closer->end); + + md_resolve_range(ctx, NULL, mark_index, mark_index+1); + opener->end = closer->end; + } +} + +static void +md_analyze_table_cell_boundary(MD_CTX* ctx, int mark_index) +{ + MD_MARK* mark = &ctx->marks[mark_index]; + mark->flags |= MD_MARK_RESOLVED; + + md_mark_chain_append(ctx, &TABLECELLBOUNDARIES, mark_index); + ctx->n_table_cell_boundaries++; +} + +/* Split a longer mark into two. The new mark takes the given count of + * characters. May only be called if an adequate number of dummy 'D' marks + * follows. + */ +static int +md_split_emph_mark(MD_CTX* ctx, int mark_index, SZ n) +{ + MD_MARK* mark = &ctx->marks[mark_index]; + int new_mark_index = mark_index + (mark->end - mark->beg - n); + MD_MARK* dummy = &ctx->marks[new_mark_index]; + + MD_ASSERT(mark->end - mark->beg > n); + MD_ASSERT(dummy->ch == 'D'); + + memcpy(dummy, mark, sizeof(MD_MARK)); + mark->end -= n; + dummy->beg = mark->end; + + return new_mark_index; +} + +static void +md_analyze_emph(MD_CTX* ctx, int mark_index) +{ + MD_MARK* mark = &ctx->marks[mark_index]; + MD_MARKCHAIN* chain = md_mark_chain(ctx, mark_index); + + /* If we can be a closer, try to resolve with the preceding opener. */ + if(mark->flags & MD_MARK_POTENTIAL_CLOSER) { + MD_MARK* opener = NULL; + int opener_index = 0; + + if(mark->ch == _T('*')) { + MD_MARKCHAIN* opener_chains[6]; + int i, n_opener_chains; + unsigned flags = mark->flags; + + /* Apply the "rule of three". */ + n_opener_chains = 0; + opener_chains[n_opener_chains++] = &ASTERISK_OPENERS_intraword_mod3_0; + if((flags & MD_MARK_EMPH_MOD3_MASK) != MD_MARK_EMPH_MOD3_2) + opener_chains[n_opener_chains++] = &ASTERISK_OPENERS_intraword_mod3_1; + if((flags & MD_MARK_EMPH_MOD3_MASK) != MD_MARK_EMPH_MOD3_1) + opener_chains[n_opener_chains++] = &ASTERISK_OPENERS_intraword_mod3_2; + opener_chains[n_opener_chains++] = &ASTERISK_OPENERS_extraword_mod3_0; + if(!(flags & MD_MARK_EMPH_INTRAWORD) || (flags & MD_MARK_EMPH_MOD3_MASK) != MD_MARK_EMPH_MOD3_2) + opener_chains[n_opener_chains++] = &ASTERISK_OPENERS_extraword_mod3_1; + if(!(flags & MD_MARK_EMPH_INTRAWORD) || (flags & MD_MARK_EMPH_MOD3_MASK) != MD_MARK_EMPH_MOD3_1) + opener_chains[n_opener_chains++] = &ASTERISK_OPENERS_extraword_mod3_2; + + /* Opener is the most recent mark from the allowed chains. */ + for(i = 0; i < n_opener_chains; i++) { + if(opener_chains[i]->tail >= 0) { + int tmp_index = opener_chains[i]->tail; + MD_MARK* tmp_mark = &ctx->marks[tmp_index]; + if(opener == NULL || tmp_mark->end > opener->end) { + opener_index = tmp_index; + opener = tmp_mark; + } + } + } + } else { + /* Simple emph. mark */ + if(chain->tail >= 0) { + opener_index = chain->tail; + opener = &ctx->marks[opener_index]; + } + } + + /* Resolve, if we have found matching opener. */ + if(opener != NULL) { + SZ opener_size = opener->end - opener->beg; + SZ closer_size = mark->end - mark->beg; + MD_MARKCHAIN* opener_chain = md_mark_chain(ctx, opener_index); + + if(opener_size > closer_size) { + opener_index = md_split_emph_mark(ctx, opener_index, closer_size); + md_mark_chain_append(ctx, opener_chain, opener_index); + } else if(opener_size < closer_size) { + md_split_emph_mark(ctx, mark_index, closer_size - opener_size); + } + + md_rollback(ctx, opener_index, mark_index, MD_ROLLBACK_CROSSING); + md_resolve_range(ctx, opener_chain, opener_index, mark_index); + return; + } + } + + /* If we could not resolve as closer, we may be yet be an opener. */ + if(mark->flags & MD_MARK_POTENTIAL_OPENER) + md_mark_chain_append(ctx, chain, mark_index); +} + +static void +md_analyze_tilde(MD_CTX* ctx, int mark_index) +{ + MD_MARK* mark = &ctx->marks[mark_index]; + MD_MARKCHAIN* chain = md_mark_chain(ctx, mark_index); + + /* We attempt to be Github Flavored Markdown compatible here. GFM accepts + * only tildes sequences of length 1 and 2, and the length of the opener + * and closer has to match. */ + + if((mark->flags & MD_MARK_POTENTIAL_CLOSER) && chain->head >= 0) { + int opener_index = chain->head; + + md_rollback(ctx, opener_index, mark_index, MD_ROLLBACK_CROSSING); + md_resolve_range(ctx, chain, opener_index, mark_index); + return; + } + + if(mark->flags & MD_MARK_POTENTIAL_OPENER) + md_mark_chain_append(ctx, chain, mark_index); +} + +static void +md_analyze_dollar(MD_CTX* ctx, int mark_index) +{ + /* This should mimic the way inline equations work in LaTeX, so there + * can only ever be one item in the chain (i.e. the dollars can't be + * nested). This is basically the same as the md_analyze_tilde function, + * except that we require matching openers and closers to be of the same + * length. + * + * E.g.: $abc$$def$$ => abc (display equation) def (end equation) */ + if(DOLLAR_OPENERS.head >= 0) { + /* If the potential closer has a non-matching number of $, discard */ + MD_MARK* open = &ctx->marks[DOLLAR_OPENERS.head]; + MD_MARK* close = &ctx->marks[mark_index]; + + int opener_index = DOLLAR_OPENERS.head; + md_rollback(ctx, opener_index, mark_index, MD_ROLLBACK_ALL); + if (open->end - open->beg == close->end - close->beg) { + /* We are the matching closer */ + md_resolve_range(ctx, &DOLLAR_OPENERS, opener_index, mark_index); + return; + } + } + + md_mark_chain_append(ctx, &DOLLAR_OPENERS, mark_index); +} + +static void +md_analyze_permissive_url_autolink(MD_CTX* ctx, int mark_index) +{ + MD_MARK* opener = &ctx->marks[mark_index]; + int closer_index = mark_index + 1; + MD_MARK* closer = &ctx->marks[closer_index]; + MD_MARK* next_resolved_mark; + OFF off = opener->end; + int n_dots = FALSE; + int has_underscore_in_last_seg = FALSE; + int has_underscore_in_next_to_last_seg = FALSE; + int n_opened_parenthesis = 0; + int n_excess_parenthesis = 0; + + /* Check for domain. */ + while(off < ctx->size) { + if(ISALNUM(off) || CH(off) == _T('-')) { + off++; + } else if(CH(off) == _T('.')) { + /* We must see at least one period. */ + n_dots++; + has_underscore_in_next_to_last_seg = has_underscore_in_last_seg; + has_underscore_in_last_seg = FALSE; + off++; + } else if(CH(off) == _T('_')) { + /* No underscore may be present in the last two domain segments. */ + has_underscore_in_last_seg = TRUE; + off++; + } else { + break; + } + } + if(off > opener->end && CH(off-1) == _T('.')) { + off--; + n_dots--; + } + if(off <= opener->end || n_dots == 0 || has_underscore_in_next_to_last_seg || has_underscore_in_last_seg) + return; + + /* Check for path. */ + next_resolved_mark = closer + 1; + while(next_resolved_mark->ch == 'D' || !(next_resolved_mark->flags & MD_MARK_RESOLVED)) + next_resolved_mark++; + while(off < next_resolved_mark->beg && CH(off) != _T('<') && !ISWHITESPACE(off) && !ISNEWLINE(off)) { + /* Parenthesis must be balanced. */ + if(CH(off) == _T('(')) { + n_opened_parenthesis++; + } else if(CH(off) == _T(')')) { + if(n_opened_parenthesis > 0) + n_opened_parenthesis--; + else + n_excess_parenthesis++; + } + + off++; + } + + /* Trim a trailing punctuation from the end. */ + while(TRUE) { + if(ISANYOF(off-1, _T("?!.,:*_~"))) { + off--; + } else if(CH(off-1) == ')' && n_excess_parenthesis > 0) { + /* Unmatched ')' can be in an interior of the path but not at the + * of it, so the auto-link may be safely nested in a parenthesis + * pair. */ + off--; + n_excess_parenthesis--; + } else { + break; + } + } + + /* Ok. Lets call it an auto-link. Adapt opener and create closer to zero + * length so all the contents becomes the link text. */ + MD_ASSERT(closer->ch == 'D' || + ((ctx->parser.flags & MD_FLAG_PERMISSIVEWWWAUTOLINKS) && + (closer->ch == '.' || closer->ch == ':' || closer->ch == '@'))); + opener->end = opener->beg; + closer->ch = opener->ch; + closer->beg = off; + closer->end = off; + md_resolve_range(ctx, NULL, mark_index, closer_index); +} + +/* The permissive autolinks do not have to be enclosed in '<' '>' but we + * instead impose stricter rules what is understood as an e-mail address + * here. Actually any non-alphanumeric characters with exception of '.' + * are prohibited both in username and after '@'. */ +static void +md_analyze_permissive_email_autolink(MD_CTX* ctx, int mark_index) +{ + MD_MARK* opener = &ctx->marks[mark_index]; + int closer_index; + MD_MARK* closer; + OFF beg = opener->beg; + OFF end = opener->end; + int dot_count = 0; + + MD_ASSERT(opener->ch == _T('@')); + + /* Scan for name before '@'. */ + while(beg > 0 && (ISALNUM(beg-1) || ISANYOF(beg-1, _T(".-_+")))) + beg--; + + /* Scan for domain after '@'. */ + while(end < ctx->size && (ISALNUM(end) || ISANYOF(end, _T(".-_")))) { + if(CH(end) == _T('.')) + dot_count++; + end++; + } + if(CH(end-1) == _T('.')) { /* Final '.' not part of it. */ + dot_count--; + end--; + } + else if(ISANYOF2(end-1, _T('-'), _T('_'))) /* These are forbidden at the end. */ + return; + if(CH(end-1) == _T('@') || dot_count == 0) + return; + + /* Ok. Lets call it auto-link. Adapt opener and create closer to zero + * length so all the contents becomes the link text. */ + closer_index = mark_index + 1; + closer = &ctx->marks[closer_index]; + if (closer->ch != 'D') return; + + opener->beg = beg; + opener->end = beg; + closer->ch = opener->ch; + closer->beg = end; + closer->end = end; + md_resolve_range(ctx, NULL, mark_index, closer_index); +} + +static inline void +md_analyze_marks(MD_CTX* ctx, const MD_LINE* lines, int n_lines, + int mark_beg, int mark_end, const CHAR* mark_chars) +{ + int i = mark_beg; + MD_UNUSED(lines); + MD_UNUSED(n_lines); + + while(i < mark_end) { + MD_MARK* mark = &ctx->marks[i]; + + /* Skip resolved spans. */ + if(mark->flags & MD_MARK_RESOLVED) { + if(mark->flags & MD_MARK_OPENER) { + MD_ASSERT(i < mark->next); + i = mark->next + 1; + } else { + i++; + } + continue; + } + + /* Skip marks we do not want to deal with. */ + if(!ISANYOF_(mark->ch, mark_chars)) { + i++; + continue; + } + + /* Analyze the mark. */ + switch(mark->ch) { + case '[': /* Pass through. */ + case '!': /* Pass through. */ + case ']': md_analyze_bracket(ctx, i); break; + case '&': md_analyze_entity(ctx, i); break; + case '|': md_analyze_table_cell_boundary(ctx, i); break; + case '_': /* Pass through. */ + case '*': md_analyze_emph(ctx, i); break; + case '~': md_analyze_tilde(ctx, i); break; + case '$': md_analyze_dollar(ctx, i); break; + case '.': /* Pass through. */ + case ':': md_analyze_permissive_url_autolink(ctx, i); break; + case '@': md_analyze_permissive_email_autolink(ctx, i); break; + } + + i++; + } +} + +/* Analyze marks (build ctx->marks). */ +static int +md_analyze_inlines(MD_CTX* ctx, const MD_LINE* lines, int n_lines, int table_mode) +{ + int ret; + + /* Reset the previously collected stack of marks. */ + ctx->n_marks = 0; + + /* Collect all marks. */ + MD_CHECK(md_collect_marks(ctx, lines, n_lines, table_mode)); + + /* (1) Links. */ + md_analyze_marks(ctx, lines, n_lines, 0, ctx->n_marks, _T("[]!")); + MD_CHECK(md_resolve_links(ctx, lines, n_lines)); + BRACKET_OPENERS.head = -1; + BRACKET_OPENERS.tail = -1; + ctx->unresolved_link_head = -1; + ctx->unresolved_link_tail = -1; + + if(table_mode) { + /* (2) Analyze table cell boundaries. + * Note we reset TABLECELLBOUNDARIES chain prior to the call md_analyze_marks(), + * not after, because caller may need it. */ + MD_ASSERT(n_lines == 1); + TABLECELLBOUNDARIES.head = -1; + TABLECELLBOUNDARIES.tail = -1; + ctx->n_table_cell_boundaries = 0; + md_analyze_marks(ctx, lines, n_lines, 0, ctx->n_marks, _T("|")); + return ret; + } + + /* (3) Emphasis and strong emphasis; permissive autolinks. */ + md_analyze_link_contents(ctx, lines, n_lines, 0, ctx->n_marks); + +abort: + return ret; +} + +static void +md_analyze_link_contents(MD_CTX* ctx, const MD_LINE* lines, int n_lines, + int mark_beg, int mark_end) +{ + int i; + + md_analyze_marks(ctx, lines, n_lines, mark_beg, mark_end, _T("&")); + md_analyze_marks(ctx, lines, n_lines, mark_beg, mark_end, _T("*_~$@:.")); + + for(i = OPENERS_CHAIN_FIRST; i <= OPENERS_CHAIN_LAST; i++) { + ctx->mark_chains[i].head = -1; + ctx->mark_chains[i].tail = -1; + } +} + +static int +md_enter_leave_span_a(MD_CTX* ctx, int enter, MD_SPANTYPE type, + const CHAR* dest, SZ dest_size, int prohibit_escapes_in_dest, + const CHAR* title, SZ title_size) +{ + MD_ATTRIBUTE_BUILD href_build = { 0 }; + MD_ATTRIBUTE_BUILD title_build = { 0 }; + MD_SPAN_A_DETAIL det; + int ret = 0; + + /* Note we here rely on fact that MD_SPAN_A_DETAIL and + * MD_SPAN_IMG_DETAIL are binary-compatible. */ + memset(&det, 0, sizeof(MD_SPAN_A_DETAIL)); + MD_CHECK(md_build_attribute(ctx, dest, dest_size, + (prohibit_escapes_in_dest ? MD_BUILD_ATTR_NO_ESCAPES : 0), + &det.href, &href_build)); + MD_CHECK(md_build_attribute(ctx, title, title_size, 0, &det.title, &title_build)); + + if(enter) + MD_ENTER_SPAN(type, &det); + else + MD_LEAVE_SPAN(type, &det); + +abort: + md_free_attribute(ctx, &href_build); + md_free_attribute(ctx, &title_build); + return ret; +} + +static int +md_enter_leave_span_wikilink(MD_CTX* ctx, int enter, const CHAR* target, SZ target_size) +{ + MD_ATTRIBUTE_BUILD target_build = { 0 }; + MD_SPAN_WIKILINK_DETAIL det; + int ret = 0; + + memset(&det, 0, sizeof(MD_SPAN_WIKILINK_DETAIL)); + MD_CHECK(md_build_attribute(ctx, target, target_size, 0, &det.target, &target_build)); + + if (enter) + MD_ENTER_SPAN(MD_SPAN_WIKILINK, &det); + else + MD_LEAVE_SPAN(MD_SPAN_WIKILINK, &det); + +abort: + md_free_attribute(ctx, &target_build); + return ret; +} + + +/* Render the output, accordingly to the analyzed ctx->marks. */ +static int +md_process_inlines(MD_CTX* ctx, const MD_LINE* lines, int n_lines) +{ + MD_TEXTTYPE text_type; + const MD_LINE* line = lines; + MD_MARK* prev_mark = NULL; + MD_MARK* mark; + OFF off = lines[0].beg; + OFF end = lines[n_lines-1].end; + int enforce_hardbreak = 0; + int ret = 0; + + /* Find first resolved mark. Note there is always at least one resolved + * mark, the dummy last one after the end of the latest line we actually + * never really reach. This saves us of a lot of special checks and cases + * in this function. */ + mark = ctx->marks; + while(!(mark->flags & MD_MARK_RESOLVED)) + mark++; + + text_type = MD_TEXT_NORMAL; + + while(1) { + /* Process the text up to the next mark or end-of-line. */ + OFF tmp = (line->end < mark->beg ? line->end : mark->beg); + if(tmp > off) { + MD_TEXT(text_type, STR(off), tmp - off); + off = tmp; + } + + /* If reached the mark, process it and move to next one. */ + if(off >= mark->beg) { + switch(mark->ch) { + case '\\': /* Backslash escape. */ + if(ISNEWLINE(mark->beg+1)) + enforce_hardbreak = 1; + else + MD_TEXT(text_type, STR(mark->beg+1), 1); + break; + + case ' ': /* Non-trivial space. */ + MD_TEXT(text_type, _T(" "), 1); + break; + + case '`': /* Code span. */ + if(mark->flags & MD_MARK_OPENER) { + MD_ENTER_SPAN(MD_SPAN_CODE, NULL); + text_type = MD_TEXT_CODE; + } else { + MD_LEAVE_SPAN(MD_SPAN_CODE, NULL); + text_type = MD_TEXT_NORMAL; + } + break; + + case '_': /* Underline (or emphasis if we fall through). */ + if(ctx->parser.flags & MD_FLAG_UNDERLINE) { + if(mark->flags & MD_MARK_OPENER) { + while(off < mark->end) { + MD_ENTER_SPAN(MD_SPAN_U, NULL); + off++; + } + } else { + while(off < mark->end) { + MD_LEAVE_SPAN(MD_SPAN_U, NULL); + off++; + } + } + break; + } + MD_FALLTHROUGH(); + + case '*': /* Emphasis, strong emphasis. */ + if(mark->flags & MD_MARK_OPENER) { + if((mark->end - off) % 2) { + MD_ENTER_SPAN(MD_SPAN_EM, NULL); + off++; + } + while(off + 1 < mark->end) { + MD_ENTER_SPAN(MD_SPAN_STRONG, NULL); + off += 2; + } + } else { + while(off + 1 < mark->end) { + MD_LEAVE_SPAN(MD_SPAN_STRONG, NULL); + off += 2; + } + if((mark->end - off) % 2) { + MD_LEAVE_SPAN(MD_SPAN_EM, NULL); + off++; + } + } + break; + + case '~': + if(mark->flags & MD_MARK_OPENER) + MD_ENTER_SPAN(MD_SPAN_DEL, NULL); + else + MD_LEAVE_SPAN(MD_SPAN_DEL, NULL); + break; + + case '$': + if(mark->flags & MD_MARK_OPENER) { + MD_ENTER_SPAN((mark->end - off) % 2 ? MD_SPAN_LATEXMATH : MD_SPAN_LATEXMATH_DISPLAY, NULL); + text_type = MD_TEXT_LATEXMATH; + } else { + MD_LEAVE_SPAN((mark->end - off) % 2 ? MD_SPAN_LATEXMATH : MD_SPAN_LATEXMATH_DISPLAY, NULL); + text_type = MD_TEXT_NORMAL; + } + break; + + case '[': /* Link, wiki link, image. */ + case '!': + case ']': + { + const MD_MARK* opener = (mark->ch != ']' ? mark : &ctx->marks[mark->prev]); + const MD_MARK* closer = &ctx->marks[opener->next]; + const MD_MARK* dest_mark; + const MD_MARK* title_mark; + + if ((opener->ch == '[' && closer->ch == ']') && + opener->end - opener->beg >= 2 && + closer->end - closer->beg >= 2) + { + int has_label = (opener->end - opener->beg > 2); + SZ target_sz; + + if(has_label) + target_sz = opener->end - (opener->beg+2); + else + target_sz = closer->beg - opener->end; + + MD_CHECK(md_enter_leave_span_wikilink(ctx, (mark->ch != ']'), + has_label ? STR(opener->beg+2) : STR(opener->end), + target_sz)); + + break; + } + + dest_mark = opener+1; + MD_ASSERT(dest_mark->ch == 'D'); + title_mark = opener+2; + if (title_mark->ch != 'D') break; + + MD_CHECK(md_enter_leave_span_a(ctx, (mark->ch != ']'), + (opener->ch == '!' ? MD_SPAN_IMG : MD_SPAN_A), + STR(dest_mark->beg), dest_mark->end - dest_mark->beg, FALSE, + md_mark_get_ptr(ctx, (int)(title_mark - ctx->marks)), + title_mark->prev)); + + /* link/image closer may span multiple lines. */ + if(mark->ch == ']') { + while(mark->end > line->end) + line++; + } + + break; + } + + case '<': + case '>': /* Autolink or raw HTML. */ + if(!(mark->flags & MD_MARK_AUTOLINK)) { + /* Raw HTML. */ + if(mark->flags & MD_MARK_OPENER) + text_type = MD_TEXT_HTML; + else + text_type = MD_TEXT_NORMAL; + break; + } + /* Pass through, if auto-link. */ + MD_FALLTHROUGH(); + + case '@': /* Permissive e-mail autolink. */ + case ':': /* Permissive URL autolink. */ + case '.': /* Permissive WWW autolink. */ + { + MD_MARK* opener = ((mark->flags & MD_MARK_OPENER) ? mark : &ctx->marks[mark->prev]); + MD_MARK* closer = &ctx->marks[opener->next]; + const CHAR* dest = STR(opener->end); + SZ dest_size = closer->beg - opener->end; + + /* For permissive auto-links we do not know closer mark + * position at the time of md_collect_marks(), therefore + * it can be out-of-order in ctx->marks[]. + * + * With this flag, we make sure that we output the closer + * only if we processed the opener. */ + if(mark->flags & MD_MARK_OPENER) + closer->flags |= MD_MARK_VALIDPERMISSIVEAUTOLINK; + + if(opener->ch == '@' || opener->ch == '.') { + dest_size += 7; + MD_TEMP_BUFFER(dest_size * sizeof(CHAR)); + memcpy(ctx->buffer, + (opener->ch == '@' ? _T("mailto:") : _T("http://")), + 7 * sizeof(CHAR)); + memcpy(ctx->buffer + 7, dest, (dest_size-7) * sizeof(CHAR)); + dest = ctx->buffer; + } + + if(closer->flags & MD_MARK_VALIDPERMISSIVEAUTOLINK) + MD_CHECK(md_enter_leave_span_a(ctx, (mark->flags & MD_MARK_OPENER), + MD_SPAN_A, dest, dest_size, TRUE, NULL, 0)); + break; + } + + case '&': /* Entity. */ + MD_TEXT(MD_TEXT_ENTITY, STR(mark->beg), mark->end - mark->beg); + break; + + case '\0': + MD_TEXT(MD_TEXT_NULLCHAR, _T(""), 1); + break; + + case 127: + goto abort; + } + + off = mark->end; + + /* Move to next resolved mark. */ + prev_mark = mark; + mark++; + while(!(mark->flags & MD_MARK_RESOLVED) || mark->beg < off) + mark++; + } + + /* If reached end of line, move to next one. */ + if(off >= line->end) { + /* If it is the last line, we are done. */ + if(off >= end) + break; + + if(text_type == MD_TEXT_CODE || text_type == MD_TEXT_LATEXMATH) { + OFF tmp; + + MD_ASSERT(prev_mark != NULL); + MD_ASSERT(ISANYOF2_(prev_mark->ch, '`', '$') && (prev_mark->flags & MD_MARK_OPENER)); + MD_ASSERT(ISANYOF2_(mark->ch, '`', '$') && (mark->flags & MD_MARK_CLOSER)); + + /* Inside a code span, trailing line whitespace has to be + * outputted. */ + tmp = off; + while(off < ctx->size && ISBLANK(off)) + off++; + if(off > tmp) + MD_TEXT(text_type, STR(tmp), off-tmp); + + /* and new lines are transformed into single spaces. */ + if(prev_mark->end < off && off < mark->beg) + MD_TEXT(text_type, _T(" "), 1); + } else if(text_type == MD_TEXT_HTML) { + /* Inside raw HTML, we output the new line verbatim, including + * any trailing spaces. */ + OFF tmp = off; + + while(tmp < end && ISBLANK(tmp)) + tmp++; + if(tmp > off) + MD_TEXT(MD_TEXT_HTML, STR(off), tmp - off); + MD_TEXT(MD_TEXT_HTML, _T("\n"), 1); + } else { + /* Output soft or hard line break. */ + MD_TEXTTYPE break_type = MD_TEXT_SOFTBR; + + if(text_type == MD_TEXT_NORMAL) { + if(enforce_hardbreak) + break_type = MD_TEXT_BR; + else if((CH(line->end) == _T(' ') && CH(line->end+1) == _T(' '))) + break_type = MD_TEXT_BR; + } + + MD_TEXT(break_type, _T("\n"), 1); + } + + /* Move to the next line. */ + line++; + off = line->beg; + + enforce_hardbreak = 0; + } + } + +abort: + return ret; +} + + +/*************************** + *** Processing Tables *** + ***************************/ + +static void +md_analyze_table_alignment(MD_CTX* ctx, OFF beg, OFF end, MD_ALIGN* align, int n_align) +{ + static const MD_ALIGN align_map[] = { MD_ALIGN_DEFAULT, MD_ALIGN_LEFT, MD_ALIGN_RIGHT, MD_ALIGN_CENTER }; + OFF off = beg; + + while(n_align > 0) { + int index = 0; /* index into align_map[] */ + + while(CH(off) != _T('-')) + off++; + if(off > beg && CH(off-1) == _T(':')) + index |= 1; + while(off < end && CH(off) == _T('-')) + off++; + if(off < end && CH(off) == _T(':')) + index |= 2; + + *align = align_map[index]; + align++; + n_align--; + } + +} + +/* Forward declaration. */ +static int md_process_normal_block_contents(MD_CTX* ctx, const MD_LINE* lines, int n_lines); + +static int +md_process_table_cell(MD_CTX* ctx, MD_BLOCKTYPE cell_type, MD_ALIGN align, OFF beg, OFF end) +{ + MD_LINE line; + MD_BLOCK_TD_DETAIL det; + int ret = 0; + + while(beg < end && ISWHITESPACE(beg)) + beg++; + while(end > beg && ISWHITESPACE(end-1)) + end--; + + det.align = align; + line.beg = beg; + line.end = end; + + MD_ENTER_BLOCK(cell_type, &det); + MD_CHECK(md_process_normal_block_contents(ctx, &line, 1)); + MD_LEAVE_BLOCK(cell_type, &det); + +abort: + return ret; +} + +static int +md_process_table_row(MD_CTX* ctx, MD_BLOCKTYPE cell_type, OFF beg, OFF end, + const MD_ALIGN* align, int col_count) +{ + MD_LINE line; + OFF* pipe_offs = NULL; + int i, j, k, n; + int ret = 0; + + line.beg = beg; + line.end = end; + + /* Break the line into table cells by identifying pipe characters who + * form the cell boundary. */ + MD_CHECK(md_analyze_inlines(ctx, &line, 1, TRUE)); + + /* We have to remember the cell boundaries in local buffer because + * ctx->marks[] shall be reused during cell contents processing. */ + n = ctx->n_table_cell_boundaries + 2; + pipe_offs = (OFF*) malloc(n * sizeof(OFF)); + if(pipe_offs == NULL) { + MD_LOG("malloc() failed."); + ret = -1; + goto abort; + } + j = 0; + pipe_offs[j++] = beg; + for(i = TABLECELLBOUNDARIES.head; i >= 0; i = ctx->marks[i].next) { + MD_MARK* mark = &ctx->marks[i]; + pipe_offs[j++] = mark->end; + } + pipe_offs[j++] = end+1; + + /* Process cells. */ + MD_ENTER_BLOCK(MD_BLOCK_TR, NULL); + k = 0; + for(i = 0; i < j-1 && k < col_count; i++) { + if(pipe_offs[i] < pipe_offs[i+1]-1) + MD_CHECK(md_process_table_cell(ctx, cell_type, align[k++], pipe_offs[i], pipe_offs[i+1]-1)); + } + /* Make sure we call enough table cells even if the current table contains + * too few of them. */ + while(k < col_count) + MD_CHECK(md_process_table_cell(ctx, cell_type, align[k++], 0, 0)); + MD_LEAVE_BLOCK(MD_BLOCK_TR, NULL); + +abort: + free(pipe_offs); + + /* Free any temporary memory blocks stored within some dummy marks. */ + for(i = PTR_CHAIN.head; i >= 0; i = ctx->marks[i].next) + free(md_mark_get_ptr(ctx, i)); + PTR_CHAIN.head = -1; + PTR_CHAIN.tail = -1; + + return ret; +} + +static int +md_process_table_block_contents(MD_CTX* ctx, int col_count, const MD_LINE* lines, int n_lines) +{ + MD_ALIGN* align; + int i; + int ret = 0; + + /* At least two lines have to be present: The column headers and the line + * with the underlines. */ + MD_ASSERT(n_lines >= 2); + + align = malloc(col_count * sizeof(MD_ALIGN)); + if(align == NULL) { + MD_LOG("malloc() failed."); + ret = -1; + goto abort; + } + + md_analyze_table_alignment(ctx, lines[1].beg, lines[1].end, align, col_count); + + MD_ENTER_BLOCK(MD_BLOCK_THEAD, NULL); + MD_CHECK(md_process_table_row(ctx, MD_BLOCK_TH, + lines[0].beg, lines[0].end, align, col_count)); + MD_LEAVE_BLOCK(MD_BLOCK_THEAD, NULL); + + if(n_lines > 2) { + MD_ENTER_BLOCK(MD_BLOCK_TBODY, NULL); + for(i = 2; i < n_lines; i++) { + MD_CHECK(md_process_table_row(ctx, MD_BLOCK_TD, + lines[i].beg, lines[i].end, align, col_count)); + } + MD_LEAVE_BLOCK(MD_BLOCK_TBODY, NULL); + } + +abort: + free(align); + return ret; +} + + +/************************** + *** Processing Block *** + **************************/ + +#define MD_BLOCK_CONTAINER_OPENER 0x01 +#define MD_BLOCK_CONTAINER_CLOSER 0x02 +#define MD_BLOCK_CONTAINER (MD_BLOCK_CONTAINER_OPENER | MD_BLOCK_CONTAINER_CLOSER) +#define MD_BLOCK_LOOSE_LIST 0x04 +#define MD_BLOCK_SETEXT_HEADER 0x08 + +struct MD_BLOCK_tag { + MD_BLOCKTYPE type : 8; + unsigned flags : 8; + + /* MD_BLOCK_H: Header level (1 - 6) + * MD_BLOCK_CODE: Non-zero if fenced, zero if indented. + * MD_BLOCK_LI: Task mark character (0 if not task list item, 'x', 'X' or ' '). + * MD_BLOCK_TABLE: Column count (as determined by the table underline). + */ + unsigned data : 16; + + /* Leaf blocks: Count of lines (MD_LINE or MD_VERBATIMLINE) on the block. + * MD_BLOCK_LI: Task mark offset in the input doc. + * MD_BLOCK_OL: Start item number. + */ + unsigned n_lines; +}; + +struct MD_CONTAINER_tag { + CHAR ch; + unsigned is_loose : 8; + unsigned is_task : 8; + unsigned start; + unsigned mark_indent; + unsigned contents_indent; + OFF block_byte_off; + OFF task_mark_off; +}; + + +static int +md_process_normal_block_contents(MD_CTX* ctx, const MD_LINE* lines, int n_lines) +{ + int i; + int ret; + + MD_CHECK(md_analyze_inlines(ctx, lines, n_lines, FALSE)); + MD_CHECK(md_process_inlines(ctx, lines, n_lines)); + +abort: + /* Free any temporary memory blocks stored within some dummy marks. */ + for(i = PTR_CHAIN.head; i >= 0; i = ctx->marks[i].next) + free(md_mark_get_ptr(ctx, i)); + PTR_CHAIN.head = -1; + PTR_CHAIN.tail = -1; + + return ret; +} + +static int +md_process_verbatim_block_contents(MD_CTX* ctx, MD_TEXTTYPE text_type, const MD_VERBATIMLINE* lines, int n_lines) +{ + static const CHAR indent_chunk_str[] = _T(" "); + static const SZ indent_chunk_size = SIZEOF_ARRAY(indent_chunk_str) - 1; + + int i; + int ret = 0; + + for(i = 0; i < n_lines; i++) { + const MD_VERBATIMLINE* line = &lines[i]; + int indent = line->indent; + + MD_ASSERT(indent >= 0); + + /* Output code indentation. */ + while(indent > (int) indent_chunk_size) { + MD_TEXT(text_type, indent_chunk_str, indent_chunk_size); + indent -= indent_chunk_size; + } + if(indent > 0) + MD_TEXT(text_type, indent_chunk_str, indent); + + /* Output the code line itself. */ + MD_TEXT_INSECURE(text_type, STR(line->beg), line->end - line->beg); + + /* Enforce end-of-line. */ + MD_TEXT(text_type, _T("\n"), 1); + } + +abort: + return ret; +} + +static int +md_process_code_block_contents(MD_CTX* ctx, int is_fenced, const MD_VERBATIMLINE* lines, int n_lines) +{ + if(is_fenced) { + /* Skip the first line in case of fenced code: It is the fence. + * (Only the starting fence is present due to logic in md_analyze_line().) */ + lines++; + n_lines--; + } else { + /* Ignore blank lines at start/end of indented code block. */ + while(n_lines > 0 && lines[0].beg == lines[0].end) { + lines++; + n_lines--; + } + while(n_lines > 0 && lines[n_lines-1].beg == lines[n_lines-1].end) { + n_lines--; + } + } + + if(n_lines == 0) + return 0; + + return md_process_verbatim_block_contents(ctx, MD_TEXT_CODE, lines, n_lines); +} + +static int +md_setup_fenced_code_detail(MD_CTX* ctx, const MD_BLOCK* block, MD_BLOCK_CODE_DETAIL* det, + MD_ATTRIBUTE_BUILD* info_build, MD_ATTRIBUTE_BUILD* lang_build) +{ + const MD_VERBATIMLINE* fence_line = (const MD_VERBATIMLINE*)(block + 1); + OFF beg = fence_line->beg; + OFF end = fence_line->end; + OFF lang_end; + CHAR fence_ch = CH(fence_line->beg); + int ret = 0; + + /* Skip the fence itself. */ + while(beg < ctx->size && CH(beg) == fence_ch) + beg++; + /* Trim initial spaces. */ + while(beg < ctx->size && CH(beg) == _T(' ')) + beg++; + + /* Trim trailing spaces. */ + while(end > beg && CH(end-1) == _T(' ')) + end--; + + /* Build info string attribute. */ + MD_CHECK(md_build_attribute(ctx, STR(beg), end - beg, 0, &det->info, info_build)); + + /* Build info string attribute. */ + lang_end = beg; + while(lang_end < end && !ISWHITESPACE(lang_end)) + lang_end++; + MD_CHECK(md_build_attribute(ctx, STR(beg), lang_end - beg, 0, &det->lang, lang_build)); + + det->fence_char = fence_ch; + +abort: + return ret; +} + +static int +md_process_leaf_block(MD_CTX* ctx, const MD_BLOCK* block) +{ + union { + MD_BLOCK_H_DETAIL header; + MD_BLOCK_CODE_DETAIL code; + MD_BLOCK_TABLE_DETAIL table; + } det; + MD_ATTRIBUTE_BUILD info_build; + MD_ATTRIBUTE_BUILD lang_build; + int is_in_tight_list; + int clean_fence_code_detail = FALSE; + int ret = 0; + + memset(&det, 0, sizeof(det)); + + if(ctx->n_containers == 0) + is_in_tight_list = FALSE; + else + is_in_tight_list = !ctx->containers[ctx->n_containers-1].is_loose; + + switch(block->type) { + case MD_BLOCK_H: + det.header.level = block->data; + break; + + case MD_BLOCK_CODE: + /* For fenced code block, we may need to set the info string. */ + if(block->data != 0) { + memset(&det.code, 0, sizeof(MD_BLOCK_CODE_DETAIL)); + clean_fence_code_detail = TRUE; + MD_CHECK(md_setup_fenced_code_detail(ctx, block, &det.code, &info_build, &lang_build)); + } + break; + + case MD_BLOCK_TABLE: + det.table.col_count = block->data; + det.table.head_row_count = 1; + det.table.body_row_count = block->n_lines - 2; + break; + + default: + /* Noop. */ + break; + } + + if(!is_in_tight_list || block->type != MD_BLOCK_P) + MD_ENTER_BLOCK(block->type, (void*) &det); + + /* Process the block contents accordingly to is type. */ + switch(block->type) { + case MD_BLOCK_HR: + /* noop */ + break; + + case MD_BLOCK_CODE: + MD_CHECK(md_process_code_block_contents(ctx, (block->data != 0), + (const MD_VERBATIMLINE*)(block + 1), block->n_lines)); + break; + + case MD_BLOCK_HTML: + MD_CHECK(md_process_verbatim_block_contents(ctx, MD_TEXT_HTML, + (const MD_VERBATIMLINE*)(block + 1), block->n_lines)); + break; + + case MD_BLOCK_TABLE: + MD_CHECK(md_process_table_block_contents(ctx, block->data, + (const MD_LINE*)(block + 1), block->n_lines)); + break; + + default: + MD_CHECK(md_process_normal_block_contents(ctx, + (const MD_LINE*)(block + 1), block->n_lines)); + break; + } + + if(!is_in_tight_list || block->type != MD_BLOCK_P) + MD_LEAVE_BLOCK(block->type, (void*) &det); + +abort: + if(clean_fence_code_detail) { + md_free_attribute(ctx, &info_build); + md_free_attribute(ctx, &lang_build); + } + return ret; +} + +static int +md_process_all_blocks(MD_CTX* ctx) +{ + int byte_off = 0; + int ret = 0; + + /* ctx->containers now is not needed for detection of lists and list items + * so we reuse it for tracking what lists are loose or tight. We rely + * on the fact the vector is large enough to hold the deepest nesting + * level of lists. */ + ctx->n_containers = 0; + + while(byte_off < ctx->n_block_bytes) { + MD_BLOCK* block = (MD_BLOCK*)((char*)ctx->block_bytes + byte_off); + union { + MD_BLOCK_UL_DETAIL ul; + MD_BLOCK_OL_DETAIL ol; + MD_BLOCK_LI_DETAIL li; + } det; + + switch(block->type) { + case MD_BLOCK_UL: + det.ul.is_tight = (block->flags & MD_BLOCK_LOOSE_LIST) ? FALSE : TRUE; + det.ul.mark = (CHAR) block->data; + break; + + case MD_BLOCK_OL: + det.ol.start = block->n_lines; + det.ol.is_tight = (block->flags & MD_BLOCK_LOOSE_LIST) ? FALSE : TRUE; + det.ol.mark_delimiter = (CHAR) block->data; + break; + + case MD_BLOCK_LI: + det.li.is_task = (block->data != 0); + det.li.task_mark = (CHAR) block->data; + det.li.task_mark_offset = (OFF) block->n_lines; + break; + + default: + /* noop */ + break; + } + + if(block->flags & MD_BLOCK_CONTAINER) { + if(block->flags & MD_BLOCK_CONTAINER_CLOSER) { + MD_LEAVE_BLOCK(block->type, &det); + + if(block->type == MD_BLOCK_UL || block->type == MD_BLOCK_OL || block->type == MD_BLOCK_QUOTE) + ctx->n_containers--; + } + + if(block->flags & MD_BLOCK_CONTAINER_OPENER) { + MD_ENTER_BLOCK(block->type, &det); + + if(block->type == MD_BLOCK_UL || block->type == MD_BLOCK_OL) { + ctx->containers[ctx->n_containers].is_loose = (block->flags & MD_BLOCK_LOOSE_LIST); + ctx->n_containers++; + } else if(block->type == MD_BLOCK_QUOTE) { + /* This causes that any text in a block quote, even if + * nested inside a tight list item, is wrapped with + * <p>...</p>. */ + ctx->containers[ctx->n_containers].is_loose = TRUE; + ctx->n_containers++; + } + } + } else { + MD_CHECK(md_process_leaf_block(ctx, block)); + + if(block->type == MD_BLOCK_CODE || block->type == MD_BLOCK_HTML) + byte_off += block->n_lines * sizeof(MD_VERBATIMLINE); + else + byte_off += block->n_lines * sizeof(MD_LINE); + } + + byte_off += sizeof(MD_BLOCK); + } + + ctx->n_block_bytes = 0; + +abort: + return ret; +} + + +/************************************ + *** Grouping Lines into Blocks *** + ************************************/ + +static void* +md_push_block_bytes(MD_CTX* ctx, int n_bytes) +{ + void* ptr; + + if(ctx->n_block_bytes + n_bytes > ctx->alloc_block_bytes) { + void* new_block_bytes; + + ctx->alloc_block_bytes = (ctx->alloc_block_bytes > 0 + ? ctx->alloc_block_bytes + ctx->alloc_block_bytes / 2 + : 512); + new_block_bytes = realloc(ctx->block_bytes, ctx->alloc_block_bytes); + if(new_block_bytes == NULL) { + MD_LOG("realloc() failed."); + return NULL; + } + + /* Fix the ->current_block after the reallocation. */ + if(ctx->current_block != NULL) { + OFF off_current_block = (OFF) ((char*) ctx->current_block - (char*) ctx->block_bytes); + ctx->current_block = (MD_BLOCK*) ((char*) new_block_bytes + off_current_block); + } + + ctx->block_bytes = new_block_bytes; + } + + ptr = (char*)ctx->block_bytes + ctx->n_block_bytes; + ctx->n_block_bytes += n_bytes; + return ptr; +} + +static int +md_start_new_block(MD_CTX* ctx, const MD_LINE_ANALYSIS* line) +{ + MD_BLOCK* block; + + MD_ASSERT(ctx->current_block == NULL); + + block = (MD_BLOCK*) md_push_block_bytes(ctx, sizeof(MD_BLOCK)); + if(block == NULL) + return -1; + + switch(line->type) { + case MD_LINE_HR: + block->type = MD_BLOCK_HR; + break; + + case MD_LINE_ATXHEADER: + case MD_LINE_SETEXTHEADER: + block->type = MD_BLOCK_H; + break; + + case MD_LINE_FENCEDCODE: + case MD_LINE_INDENTEDCODE: + block->type = MD_BLOCK_CODE; + break; + + case MD_LINE_TEXT: + block->type = MD_BLOCK_P; + break; + + case MD_LINE_HTML: + block->type = MD_BLOCK_HTML; + break; + + case MD_LINE_BLANK: + case MD_LINE_SETEXTUNDERLINE: + case MD_LINE_TABLEUNDERLINE: + default: + MD_UNREACHABLE(); + break; + } + + block->flags = 0; + block->data = line->data; + block->n_lines = 0; + + ctx->current_block = block; + return 0; +} + +/* Eat from start of current (textual) block any reference definitions and + * remember them so we can resolve any links referring to them. + * + * (Reference definitions can only be at start of it as they cannot break + * a paragraph.) + */ +static int +md_consume_link_reference_definitions(MD_CTX* ctx) +{ + MD_LINE* lines = (MD_LINE*) (ctx->current_block + 1); + int n_lines = ctx->current_block->n_lines; + int n = 0; + + /* Compute how many lines at the start of the block form one or more + * reference definitions. */ + while(n < n_lines) { + int n_link_ref_lines; + + n_link_ref_lines = md_is_link_reference_definition(ctx, + lines + n, n_lines - n); + /* Not a reference definition? */ + if(n_link_ref_lines == 0) + break; + + /* We fail if it is the ref. def. but it could not be stored due + * a memory allocation error. */ + if(n_link_ref_lines < 0) + return -1; + + n += n_link_ref_lines; + } + + /* If there was at least one reference definition, we need to remove + * its lines from the block, or perhaps even the whole block. */ + if(n > 0) { + if(n == n_lines) { + /* Remove complete block. */ + ctx->n_block_bytes -= n * sizeof(MD_LINE); + ctx->n_block_bytes -= sizeof(MD_BLOCK); + ctx->current_block = NULL; + } else { + /* Remove just some initial lines from the block. */ + memmove(lines, lines + n, (n_lines - n) * sizeof(MD_LINE)); + ctx->current_block->n_lines -= n; + ctx->n_block_bytes -= n * sizeof(MD_LINE); + } + } + + return 0; +} + +static int +md_end_current_block(MD_CTX* ctx) +{ + int ret = 0; + + if(ctx->current_block == NULL) + return ret; + + /* Check whether there is a reference definition. (We do this here instead + * of in md_analyze_line() because reference definition can take multiple + * lines.) */ + if(ctx->current_block->type == MD_BLOCK_P || + (ctx->current_block->type == MD_BLOCK_H && (ctx->current_block->flags & MD_BLOCK_SETEXT_HEADER))) + { + MD_LINE* lines = (MD_LINE*) (ctx->current_block + 1); + if(CH(lines[0].beg) == _T('[')) { + MD_CHECK(md_consume_link_reference_definitions(ctx)); + if(ctx->current_block == NULL) + return ret; + } + } + + if(ctx->current_block->type == MD_BLOCK_H && (ctx->current_block->flags & MD_BLOCK_SETEXT_HEADER)) { + int n_lines = ctx->current_block->n_lines; + + if(n_lines > 1) { + /* Get rid of the underline. */ + ctx->current_block->n_lines--; + ctx->n_block_bytes -= sizeof(MD_LINE); + } else { + /* Only the underline has left after eating the ref. defs. + * Keep the line as beginning of a new ordinary paragraph. */ + ctx->current_block->type = MD_BLOCK_P; + return 0; + } + } + + /* Mark we are not building any block anymore. */ + ctx->current_block = NULL; + +abort: + return ret; +} + +static int +md_add_line_into_current_block(MD_CTX* ctx, const MD_LINE_ANALYSIS* analysis) +{ + MD_ASSERT(ctx->current_block != NULL); + + if(ctx->current_block->type == MD_BLOCK_CODE || ctx->current_block->type == MD_BLOCK_HTML) { + MD_VERBATIMLINE* line; + + line = (MD_VERBATIMLINE*) md_push_block_bytes(ctx, sizeof(MD_VERBATIMLINE)); + if(line == NULL) + return -1; + + line->indent = analysis->indent; + line->beg = analysis->beg; + line->end = analysis->end; + } else { + MD_LINE* line; + + line = (MD_LINE*) md_push_block_bytes(ctx, sizeof(MD_LINE)); + if(line == NULL) + return -1; + + line->beg = analysis->beg; + line->end = analysis->end; + } + ctx->current_block->n_lines++; + + return 0; +} + +static int +md_push_container_bytes(MD_CTX* ctx, MD_BLOCKTYPE type, unsigned start, + unsigned data, unsigned flags) +{ + MD_BLOCK* block; + int ret = 0; + + MD_CHECK(md_end_current_block(ctx)); + + block = (MD_BLOCK*) md_push_block_bytes(ctx, sizeof(MD_BLOCK)); + if(block == NULL) + return -1; + + block->type = type; + block->flags = flags; + block->data = data; + block->n_lines = start; + +abort: + return ret; +} + + + +/*********************** + *** Line Analysis *** + ***********************/ + +static int +md_is_hr_line(MD_CTX* ctx, OFF beg, OFF* p_end, OFF* p_killer) +{ + OFF off = beg + 1; + int n = 1; + + while(off < ctx->size && (CH(off) == CH(beg) || CH(off) == _T(' ') || CH(off) == _T('\t'))) { + if(CH(off) == CH(beg)) + n++; + off++; + } + + if(n < 3) { + *p_killer = off; + return FALSE; + } + + /* Nothing else can be present on the line. */ + if(off < ctx->size && !ISNEWLINE(off)) { + *p_killer = off; + return FALSE; + } + + *p_end = off; + return TRUE; +} + +static int +md_is_atxheader_line(MD_CTX* ctx, OFF beg, OFF* p_beg, OFF* p_end, unsigned* p_level) +{ + int n; + OFF off = beg + 1; + + while(off < ctx->size && CH(off) == _T('#') && off - beg < 7) + off++; + n = off - beg; + + if(n > 6) + return FALSE; + *p_level = n; + + if(!(ctx->parser.flags & MD_FLAG_PERMISSIVEATXHEADERS) && off < ctx->size && + CH(off) != _T(' ') && CH(off) != _T('\t') && !ISNEWLINE(off)) + return FALSE; + + while(off < ctx->size && CH(off) == _T(' ')) + off++; + *p_beg = off; + *p_end = off; + return TRUE; +} + +static int +md_is_setext_underline(MD_CTX* ctx, OFF beg, OFF* p_end, unsigned* p_level) +{ + OFF off = beg + 1; + + while(off < ctx->size && CH(off) == CH(beg)) + off++; + + /* Optionally, space(s) can follow. */ + while(off < ctx->size && CH(off) == _T(' ')) + off++; + + /* But nothing more is allowed on the line. */ + if(off < ctx->size && !ISNEWLINE(off)) + return FALSE; + + *p_level = (CH(beg) == _T('=') ? 1 : 2); + *p_end = off; + return TRUE; +} + +static int +md_is_table_underline(MD_CTX* ctx, OFF beg, OFF* p_end, unsigned* p_col_count) +{ + OFF off = beg; + int found_pipe = FALSE; + unsigned col_count = 0; + + if(off < ctx->size && CH(off) == _T('|')) { + found_pipe = TRUE; + off++; + while(off < ctx->size && ISWHITESPACE(off)) + off++; + } + + while(1) { + int delimited = FALSE; + + /* Cell underline ("-----", ":----", "----:" or ":----:") */ + if(off < ctx->size && CH(off) == _T(':')) + off++; + if(off >= ctx->size || CH(off) != _T('-')) + return FALSE; + while(off < ctx->size && CH(off) == _T('-')) + off++; + if(off < ctx->size && CH(off) == _T(':')) + off++; + + col_count++; + + /* Pipe delimiter (optional at the end of line). */ + while(off < ctx->size && ISWHITESPACE(off)) + off++; + if(off < ctx->size && CH(off) == _T('|')) { + delimited = TRUE; + found_pipe = TRUE; + off++; + while(off < ctx->size && ISWHITESPACE(off)) + off++; + } + + /* Success, if we reach end of line. */ + if(off >= ctx->size || ISNEWLINE(off)) + break; + + if(!delimited) + return FALSE; + } + + if(!found_pipe) + return FALSE; + + *p_end = off; + *p_col_count = col_count; + return TRUE; +} + +static int +md_is_opening_code_fence(MD_CTX* ctx, OFF beg, OFF* p_end) +{ + OFF off = beg; + + while(off < ctx->size && CH(off) == CH(beg)) + off++; + + /* Fence must have at least three characters. */ + if(off - beg < 3) + return FALSE; + + ctx->code_fence_length = off - beg; + + /* Optionally, space(s) can follow. */ + while(off < ctx->size && CH(off) == _T(' ')) + off++; + + /* Optionally, an info string can follow. */ + while(off < ctx->size && !ISNEWLINE(off)) { + /* Backtick-based fence must not contain '`' in the info string. */ + if(CH(beg) == _T('`') && CH(off) == _T('`')) + return FALSE; + off++; + } + + *p_end = off; + return TRUE; +} + +static int +md_is_closing_code_fence(MD_CTX* ctx, CHAR ch, OFF beg, OFF* p_end) +{ + OFF off = beg; + int ret = FALSE; + + /* Closing fence must have at least the same length and use same char as + * opening one. */ + while(off < ctx->size && CH(off) == ch) + off++; + if(off - beg < ctx->code_fence_length) + goto out; + + /* Optionally, space(s) can follow */ + while(off < ctx->size && CH(off) == _T(' ')) + off++; + + /* But nothing more is allowed on the line. */ + if(off < ctx->size && !ISNEWLINE(off)) + goto out; + + ret = TRUE; + +out: + /* Note we set *p_end even on failure: If we are not closing fence, caller + * would eat the line anyway without any parsing. */ + *p_end = off; + return ret; +} + +/* Returns type of the raw HTML block, or FALSE if it is not HTML block. + * (Refer to CommonMark specification for details about the types.) + */ +static int +md_is_html_block_start_condition(MD_CTX* ctx, OFF beg) +{ + typedef struct TAG_tag TAG; + struct TAG_tag { + const CHAR* name; + unsigned len : 8; + }; + + /* Type 6 is started by a long list of allowed tags. We use two-level + * tree to speed-up the search. */ +#ifdef X + #undef X +#endif +#define X(name) { _T(name), (sizeof(name)-1) / sizeof(CHAR) } +#define Xend { NULL, 0 } + static const TAG t1[] = { X("pre"), X("script"), X("style"), X("textarea"), Xend }; + + static const TAG a6[] = { X("address"), X("article"), X("aside"), Xend }; + static const TAG b6[] = { X("base"), X("basefont"), X("blockquote"), X("body"), Xend }; + static const TAG c6[] = { X("caption"), X("center"), X("col"), X("colgroup"), Xend }; + static const TAG d6[] = { X("dd"), X("details"), X("dialog"), X("dir"), + X("div"), X("dl"), X("dt"), Xend }; + static const TAG f6[] = { X("fieldset"), X("figcaption"), X("figure"), X("footer"), + X("form"), X("frame"), X("frameset"), Xend }; + static const TAG h6[] = { X("h1"), X("head"), X("header"), X("hr"), X("html"), Xend }; + static const TAG i6[] = { X("iframe"), Xend }; + static const TAG l6[] = { X("legend"), X("li"), X("link"), Xend }; + static const TAG m6[] = { X("main"), X("menu"), X("menuitem"), Xend }; + static const TAG n6[] = { X("nav"), X("noframes"), Xend }; + static const TAG o6[] = { X("ol"), X("optgroup"), X("option"), Xend }; + static const TAG p6[] = { X("p"), X("param"), Xend }; + static const TAG s6[] = { X("section"), X("source"), X("summary"), Xend }; + static const TAG t6[] = { X("table"), X("tbody"), X("td"), X("tfoot"), X("th"), + X("thead"), X("title"), X("tr"), X("track"), Xend }; + static const TAG u6[] = { X("ul"), Xend }; + static const TAG xx[] = { Xend }; +#undef X + + static const TAG* map6[26] = { + a6, b6, c6, d6, xx, f6, xx, h6, i6, xx, xx, l6, m6, + n6, o6, p6, xx, xx, s6, t6, u6, xx, xx, xx, xx, xx + }; + OFF off = beg + 1; + int i; + + /* Check for type 1: <script, <pre, or <style */ + for(i = 0; t1[i].name != NULL; i++) { + if(off + t1[i].len <= ctx->size) { + if(md_ascii_case_eq(STR(off), t1[i].name, t1[i].len)) + return 1; + } + } + + /* Check for type 2: <!-- */ + if(off + 3 < ctx->size && CH(off) == _T('!') && CH(off+1) == _T('-') && CH(off+2) == _T('-')) + return 2; + + /* Check for type 3: <? */ + if(off < ctx->size && CH(off) == _T('?')) + return 3; + + /* Check for type 4 or 5: <! */ + if(off < ctx->size && CH(off) == _T('!')) { + /* Check for type 4: <! followed by uppercase letter. */ + if(off + 1 < ctx->size && ISASCII(off+1)) + return 4; + + /* Check for type 5: <![CDATA[ */ + if(off + 8 < ctx->size) { + if(md_ascii_eq(STR(off), _T("![CDATA["), 8)) + return 5; + } + } + + /* Check for type 6: Many possible starting tags listed above. */ + if(off + 1 < ctx->size && (ISALPHA(off) || (CH(off) == _T('/') && ISALPHA(off+1)))) { + int slot; + const TAG* tags; + + if(CH(off) == _T('/')) + off++; + + slot = (ISUPPER(off) ? CH(off) - 'A' : CH(off) - 'a'); + tags = map6[slot]; + + for(i = 0; tags[i].name != NULL; i++) { + if(off + tags[i].len <= ctx->size) { + if(md_ascii_case_eq(STR(off), tags[i].name, tags[i].len)) { + OFF tmp = off + tags[i].len; + if(tmp >= ctx->size) + return 6; + if(ISBLANK(tmp) || ISNEWLINE(tmp) || CH(tmp) == _T('>')) + return 6; + if(tmp+1 < ctx->size && CH(tmp) == _T('/') && CH(tmp+1) == _T('>')) + return 6; + break; + } + } + } + } + + /* Check for type 7: any COMPLETE other opening or closing tag. */ + if(off + 1 < ctx->size) { + OFF end; + + if(md_is_html_tag(ctx, NULL, 0, beg, ctx->size, &end)) { + /* Only optional whitespace and new line may follow. */ + while(end < ctx->size && ISWHITESPACE(end)) + end++; + if(end >= ctx->size || ISNEWLINE(end)) + return 7; + } + } + + return FALSE; +} + +/* Case sensitive check whether there is a substring 'what' between 'beg' + * and end of line. */ +static int +md_line_contains(MD_CTX* ctx, OFF beg, const CHAR* what, SZ what_len, OFF* p_end) +{ + OFF i; + for(i = beg; i + what_len < ctx->size; i++) { + if(ISNEWLINE(i)) + break; + if(memcmp(STR(i), what, what_len * sizeof(CHAR)) == 0) { + *p_end = i + what_len; + return TRUE; + } + } + + *p_end = i; + return FALSE; +} + +/* Returns type of HTML block end condition or FALSE if not an end condition. + * + * Note it fills p_end even when it is not end condition as the caller + * does not need to analyze contents of a raw HTML block. + */ +static int +md_is_html_block_end_condition(MD_CTX* ctx, OFF beg, OFF* p_end) +{ + switch(ctx->html_block_type) { + case 1: + { + OFF off = beg; + + while(off < ctx->size && !ISNEWLINE(off)) { + if(CH(off) == _T('<')) { + #define FIND_TAG_END(string, length) \ + if(off + length <= ctx->size && \ + md_ascii_case_eq(STR(off), _T(string), length)) { \ + *p_end = off + length; \ + return TRUE; \ + } + FIND_TAG_END("</script>", 9) + FIND_TAG_END("</style>", 8) + FIND_TAG_END("</pre>", 6) + #undef FIND_TAG_END + } + + off++; + } + *p_end = off; + return FALSE; + } + + case 2: + return (md_line_contains(ctx, beg, _T("-->"), 3, p_end) ? 2 : FALSE); + + case 3: + return (md_line_contains(ctx, beg, _T("?>"), 2, p_end) ? 3 : FALSE); + + case 4: + return (md_line_contains(ctx, beg, _T(">"), 1, p_end) ? 4 : FALSE); + + case 5: + return (md_line_contains(ctx, beg, _T("]]>"), 3, p_end) ? 5 : FALSE); + + case 6: /* Pass through */ + case 7: + *p_end = beg; + return (beg >= ctx->size || ISNEWLINE(beg) ? ctx->html_block_type : FALSE); + + default: + MD_UNREACHABLE(); + } + return FALSE; +} + + +static int +md_is_container_compatible(const MD_CONTAINER* pivot, const MD_CONTAINER* container) +{ + /* Block quote has no "items" like lists. */ + if(container->ch == _T('>')) + return FALSE; + + if(container->ch != pivot->ch) + return FALSE; + if(container->mark_indent > pivot->contents_indent) + return FALSE; + + return TRUE; +} + +static int +md_push_container(MD_CTX* ctx, const MD_CONTAINER* container) +{ + if(ctx->n_containers >= ctx->alloc_containers) { + MD_CONTAINER* new_containers; + + ctx->alloc_containers = (ctx->alloc_containers > 0 + ? ctx->alloc_containers + ctx->alloc_containers / 2 + : 16); + new_containers = realloc(ctx->containers, ctx->alloc_containers * sizeof(MD_CONTAINER)); + if(new_containers == NULL) { + MD_LOG("realloc() failed."); + return -1; + } + + ctx->containers = new_containers; + } + + memcpy(&ctx->containers[ctx->n_containers++], container, sizeof(MD_CONTAINER)); + return 0; +} + +static int +md_enter_child_containers(MD_CTX* ctx, int n_children) +{ + int i; + int ret = 0; + + for(i = ctx->n_containers - n_children; i < ctx->n_containers; i++) { + MD_CONTAINER* c = &ctx->containers[i]; + int is_ordered_list = FALSE; + + switch(c->ch) { + case _T(')'): + case _T('.'): + is_ordered_list = TRUE; + MD_FALLTHROUGH(); + + case _T('-'): + case _T('+'): + case _T('*'): + /* Remember offset in ctx->block_bytes so we can revisit the + * block if we detect it is a loose list. */ + md_end_current_block(ctx); + c->block_byte_off = ctx->n_block_bytes; + + MD_CHECK(md_push_container_bytes(ctx, + (is_ordered_list ? MD_BLOCK_OL : MD_BLOCK_UL), + c->start, c->ch, MD_BLOCK_CONTAINER_OPENER)); + MD_CHECK(md_push_container_bytes(ctx, MD_BLOCK_LI, + c->task_mark_off, + (c->is_task ? CH(c->task_mark_off) : 0), + MD_BLOCK_CONTAINER_OPENER)); + break; + + case _T('>'): + MD_CHECK(md_push_container_bytes(ctx, MD_BLOCK_QUOTE, 0, 0, MD_BLOCK_CONTAINER_OPENER)); + break; + + default: + MD_UNREACHABLE(); + break; + } + } + +abort: + return ret; +} + +static int +md_leave_child_containers(MD_CTX* ctx, int n_keep) +{ + int ret = 0; + + while(ctx->n_containers > n_keep) { + MD_CONTAINER* c = &ctx->containers[ctx->n_containers-1]; + int is_ordered_list = FALSE; + + switch(c->ch) { + case _T(')'): + case _T('.'): + is_ordered_list = TRUE; + MD_FALLTHROUGH(); + + case _T('-'): + case _T('+'): + case _T('*'): + MD_CHECK(md_push_container_bytes(ctx, MD_BLOCK_LI, + c->task_mark_off, (c->is_task ? CH(c->task_mark_off) : 0), + MD_BLOCK_CONTAINER_CLOSER)); + MD_CHECK(md_push_container_bytes(ctx, + (is_ordered_list ? MD_BLOCK_OL : MD_BLOCK_UL), 0, + c->ch, MD_BLOCK_CONTAINER_CLOSER)); + break; + + case _T('>'): + MD_CHECK(md_push_container_bytes(ctx, MD_BLOCK_QUOTE, 0, + 0, MD_BLOCK_CONTAINER_CLOSER)); + break; + + default: + MD_UNREACHABLE(); + break; + } + + ctx->n_containers--; + } + +abort: + return ret; +} + +static int +md_is_container_mark(MD_CTX* ctx, unsigned indent, OFF beg, OFF* p_end, MD_CONTAINER* p_container) +{ + OFF off = beg; + OFF max_end; + + if(off >= ctx->size || indent >= ctx->code_indent_offset) + return FALSE; + + /* Check for block quote mark. */ + if(CH(off) == _T('>')) { + off++; + p_container->ch = _T('>'); + p_container->is_loose = FALSE; + p_container->is_task = FALSE; + p_container->mark_indent = indent; + p_container->contents_indent = indent + 1; + *p_end = off; + return TRUE; + } + + /* Check for list item bullet mark. */ + if(ISANYOF(off, _T("-+*")) && (off+1 >= ctx->size || ISBLANK(off+1) || ISNEWLINE(off+1))) { + p_container->ch = CH(off); + p_container->is_loose = FALSE; + p_container->is_task = FALSE; + p_container->mark_indent = indent; + p_container->contents_indent = indent + 1; + *p_end = off+1; + return TRUE; + } + + /* Check for ordered list item marks. */ + max_end = off + 9; + if(max_end > ctx->size) + max_end = ctx->size; + p_container->start = 0; + while(off < max_end && ISDIGIT(off)) { + p_container->start = p_container->start * 10 + CH(off) - _T('0'); + off++; + } + if(off > beg && + off < ctx->size && + (CH(off) == _T('.') || CH(off) == _T(')')) && + (off+1 >= ctx->size || ISBLANK(off+1) || ISNEWLINE(off+1))) + { + p_container->ch = CH(off); + p_container->is_loose = FALSE; + p_container->is_task = FALSE; + p_container->mark_indent = indent; + p_container->contents_indent = indent + off - beg + 1; + *p_end = off+1; + return TRUE; + } + + return FALSE; +} + +static unsigned +md_line_indentation(MD_CTX* ctx, unsigned total_indent, OFF beg, OFF* p_end) +{ + OFF off = beg; + unsigned indent = total_indent; + + while(off < ctx->size && ISBLANK(off)) { + if(CH(off) == _T('\t')) + indent = (indent + 4) & ~3; + else + indent++; + off++; + } + + *p_end = off; + return indent - total_indent; +} + +static const MD_LINE_ANALYSIS md_dummy_blank_line = { MD_LINE_BLANK, 0, 0, 0, 0 }; + +/* Analyze type of the line and find some its properties. This serves as a + * main input for determining type and boundaries of a block. */ +static int +md_analyze_line(MD_CTX* ctx, OFF beg, OFF* p_end, + const MD_LINE_ANALYSIS* pivot_line, MD_LINE_ANALYSIS* line) +{ + unsigned total_indent = 0; + int n_parents = 0; + int n_brothers = 0; + int n_children = 0; + MD_CONTAINER container = { 0 }; + int prev_line_has_list_loosening_effect = ctx->last_line_has_list_loosening_effect; + OFF off = beg; + OFF hr_killer = 0; + int ret = 0; + + line->indent = md_line_indentation(ctx, total_indent, off, &off); + total_indent += line->indent; + line->beg = off; + + /* Given the indentation and block quote marks '>', determine how many of + * the current containers are our parents. */ + while(n_parents < ctx->n_containers) { + MD_CONTAINER* c = &ctx->containers[n_parents]; + + if(c->ch == _T('>') && line->indent < ctx->code_indent_offset && + off < ctx->size && CH(off) == _T('>')) + { + /* Block quote mark. */ + off++; + total_indent++; + line->indent = md_line_indentation(ctx, total_indent, off, &off); + total_indent += line->indent; + + /* The optional 1st space after '>' is part of the block quote mark. */ + if(line->indent > 0) + line->indent--; + + line->beg = off; + + } else if(c->ch != _T('>') && line->indent >= c->contents_indent) { + /* List. */ + line->indent -= c->contents_indent; + } else { + break; + } + + n_parents++; + } + + if(off >= ctx->size || ISNEWLINE(off)) { + /* Blank line does not need any real indentation to be nested inside + * a list. */ + if(n_brothers + n_children == 0) { + while(n_parents < ctx->n_containers && ctx->containers[n_parents].ch != _T('>')) + n_parents++; + } + } + + while(TRUE) { + /* Check whether we are fenced code continuation. */ + if(pivot_line->type == MD_LINE_FENCEDCODE) { + line->beg = off; + + /* We are another MD_LINE_FENCEDCODE unless we are closing fence + * which we transform into MD_LINE_BLANK. */ + if(line->indent < ctx->code_indent_offset) { + if(md_is_closing_code_fence(ctx, CH(pivot_line->beg), off, &off)) { + line->type = MD_LINE_BLANK; + ctx->last_line_has_list_loosening_effect = FALSE; + break; + } + } + + /* Change indentation accordingly to the initial code fence. */ + if(n_parents == ctx->n_containers) { + if(line->indent > pivot_line->indent) + line->indent -= pivot_line->indent; + else + line->indent = 0; + + line->type = MD_LINE_FENCEDCODE; + break; + } + } + + /* Check whether we are HTML block continuation. */ + if(pivot_line->type == MD_LINE_HTML && ctx->html_block_type > 0) { + if(n_parents < ctx->n_containers) { + /* HTML block is implicitly ended if the enclosing container + * block ends. */ + ctx->html_block_type = 0; + } else { + int html_block_type; + + html_block_type = md_is_html_block_end_condition(ctx, off, &off); + if(html_block_type > 0) { + MD_ASSERT(html_block_type == ctx->html_block_type); + + /* Make sure this is the last line of the block. */ + ctx->html_block_type = 0; + + /* Some end conditions serve as blank lines at the same time. */ + if(html_block_type == 6 || html_block_type == 7) { + line->type = MD_LINE_BLANK; + line->indent = 0; + break; + } + } + + line->type = MD_LINE_HTML; + n_parents = ctx->n_containers; + break; + } + } + + /* Check for blank line. */ + if(off >= ctx->size || ISNEWLINE(off)) { + if(pivot_line->type == MD_LINE_INDENTEDCODE && n_parents == ctx->n_containers) { + line->type = MD_LINE_INDENTEDCODE; + if(line->indent > ctx->code_indent_offset) + line->indent -= ctx->code_indent_offset; + else + line->indent = 0; + ctx->last_line_has_list_loosening_effect = FALSE; + } else { + line->type = MD_LINE_BLANK; + ctx->last_line_has_list_loosening_effect = (n_parents > 0 && + n_brothers + n_children == 0 && + ctx->containers[n_parents-1].ch != _T('>')); + + #if 1 + /* See https://github.com/mity/md4c/issues/6 + * + * This ugly checking tests we are in (yet empty) list item but + * not its very first line (i.e. not the line with the list + * item mark). + * + * If we are such a blank line, then any following non-blank + * line which would be part of the list item actually has to + * end the list because according to the specification, "a list + * item can begin with at most one blank line." + */ + if(n_parents > 0 && ctx->containers[n_parents-1].ch != _T('>') && + n_brothers + n_children == 0 && ctx->current_block == NULL && + ctx->n_block_bytes > (int) sizeof(MD_BLOCK)) + { + MD_BLOCK* top_block = (MD_BLOCK*) ((char*)ctx->block_bytes + ctx->n_block_bytes - sizeof(MD_BLOCK)); + if(top_block->type == MD_BLOCK_LI) + ctx->last_list_item_starts_with_two_blank_lines = TRUE; + } + #endif + } + break; + } else { + #if 1 + /* This is the 2nd half of the hack. If the flag is set (i.e. there + * was a 2nd blank line at the beginning of the list item) and if + * we would otherwise still belong to the list item, we enforce + * the end of the list. */ + ctx->last_line_has_list_loosening_effect = FALSE; + if(ctx->last_list_item_starts_with_two_blank_lines) { + if(n_parents > 0 && ctx->containers[n_parents-1].ch != _T('>') && + n_brothers + n_children == 0 && ctx->current_block == NULL && + ctx->n_block_bytes > (int) sizeof(MD_BLOCK)) + { + MD_BLOCK* top_block = (MD_BLOCK*) ((char*)ctx->block_bytes + ctx->n_block_bytes - sizeof(MD_BLOCK)); + if(top_block->type == MD_BLOCK_LI) + n_parents--; + } + + ctx->last_list_item_starts_with_two_blank_lines = FALSE; + } + #endif + } + + /* Check whether we are Setext underline. */ + if(line->indent < ctx->code_indent_offset && pivot_line->type == MD_LINE_TEXT + && off < ctx->size && ISANYOF2(off, _T('='), _T('-')) + && (n_parents == ctx->n_containers)) + { + unsigned level; + + if(md_is_setext_underline(ctx, off, &off, &level)) { + line->type = MD_LINE_SETEXTUNDERLINE; + line->data = level; + break; + } + } + + /* Check for thematic break line. */ + if(line->indent < ctx->code_indent_offset + && off < ctx->size && off >= hr_killer + && ISANYOF(off, _T("-_*"))) + { + if(md_is_hr_line(ctx, off, &off, &hr_killer)) { + line->type = MD_LINE_HR; + break; + } + } + + /* Check for "brother" container. I.e. whether we are another list item + * in already started list. */ + if(n_parents < ctx->n_containers && n_brothers + n_children == 0) { + OFF tmp; + + if(md_is_container_mark(ctx, line->indent, off, &tmp, &container) && + md_is_container_compatible(&ctx->containers[n_parents], &container)) + { + pivot_line = &md_dummy_blank_line; + + off = tmp; + + total_indent += container.contents_indent - container.mark_indent; + line->indent = md_line_indentation(ctx, total_indent, off, &off); + total_indent += line->indent; + line->beg = off; + + /* Some of the following whitespace actually still belongs to the mark. */ + if(off >= ctx->size || ISNEWLINE(off)) { + container.contents_indent++; + } else if(line->indent <= ctx->code_indent_offset) { + container.contents_indent += line->indent; + line->indent = 0; + } else { + container.contents_indent += 1; + line->indent--; + } + + ctx->containers[n_parents].mark_indent = container.mark_indent; + ctx->containers[n_parents].contents_indent = container.contents_indent; + + n_brothers++; + continue; + } + } + + /* Check for indented code. + * Note indented code block cannot interrupt a paragraph. */ + if(line->indent >= ctx->code_indent_offset && + (pivot_line->type == MD_LINE_BLANK || pivot_line->type == MD_LINE_INDENTEDCODE)) + { + line->type = MD_LINE_INDENTEDCODE; + MD_ASSERT(line->indent >= ctx->code_indent_offset); + line->indent -= ctx->code_indent_offset; + line->data = 0; + break; + } + + /* Check for start of a new container block. */ + if(line->indent < ctx->code_indent_offset && + md_is_container_mark(ctx, line->indent, off, &off, &container)) + { + if(pivot_line->type == MD_LINE_TEXT && n_parents == ctx->n_containers && + (off >= ctx->size || ISNEWLINE(off)) && container.ch != _T('>')) + { + /* Noop. List mark followed by a blank line cannot interrupt a paragraph. */ + } else if(pivot_line->type == MD_LINE_TEXT && n_parents == ctx->n_containers && + ISANYOF2_(container.ch, _T('.'), _T(')')) && container.start != 1) + { + /* Noop. Ordered list cannot interrupt a paragraph unless the start index is 1. */ + } else { + total_indent += container.contents_indent - container.mark_indent; + line->indent = md_line_indentation(ctx, total_indent, off, &off); + total_indent += line->indent; + + line->beg = off; + line->data = container.ch; + + /* Some of the following whitespace actually still belongs to the mark. */ + if(off >= ctx->size || ISNEWLINE(off)) { + container.contents_indent++; + } else if(line->indent <= ctx->code_indent_offset) { + container.contents_indent += line->indent; + line->indent = 0; + } else { + container.contents_indent += 1; + line->indent--; + } + + if(n_brothers + n_children == 0) + pivot_line = &md_dummy_blank_line; + + if(n_children == 0) + MD_CHECK(md_leave_child_containers(ctx, n_parents + n_brothers)); + + n_children++; + MD_CHECK(md_push_container(ctx, &container)); + continue; + } + } + + /* Check whether we are table continuation. */ + if(pivot_line->type == MD_LINE_TABLE && n_parents == ctx->n_containers) { + line->type = MD_LINE_TABLE; + break; + } + + /* Check for ATX header. */ + if(line->indent < ctx->code_indent_offset && + off < ctx->size && CH(off) == _T('#')) + { + unsigned level; + + if(md_is_atxheader_line(ctx, off, &line->beg, &off, &level)) { + line->type = MD_LINE_ATXHEADER; + line->data = level; + break; + } + } + + /* Check whether we are starting code fence. */ + if(off < ctx->size && ISANYOF2(off, _T('`'), _T('~'))) { + if(md_is_opening_code_fence(ctx, off, &off)) { + line->type = MD_LINE_FENCEDCODE; + line->data = 1; + break; + } + } + + /* Check for start of raw HTML block. */ + if(off < ctx->size && CH(off) == _T('<') + && !(ctx->parser.flags & MD_FLAG_NOHTMLBLOCKS)) + { + ctx->html_block_type = md_is_html_block_start_condition(ctx, off); + + /* HTML block type 7 cannot interrupt paragraph. */ + if(ctx->html_block_type == 7 && pivot_line->type == MD_LINE_TEXT) + ctx->html_block_type = 0; + + if(ctx->html_block_type > 0) { + /* The line itself also may immediately close the block. */ + if(md_is_html_block_end_condition(ctx, off, &off) == ctx->html_block_type) { + /* Make sure this is the last line of the block. */ + ctx->html_block_type = 0; + } + + line->type = MD_LINE_HTML; + break; + } + } + + /* Check for table underline. */ + if((ctx->parser.flags & MD_FLAG_TABLES) && pivot_line->type == MD_LINE_TEXT + && off < ctx->size && ISANYOF3(off, _T('|'), _T('-'), _T(':')) + && n_parents == ctx->n_containers) + { + unsigned col_count; + + if(ctx->current_block != NULL && ctx->current_block->n_lines == 1 && + md_is_table_underline(ctx, off, &off, &col_count)) + { + line->data = col_count; + line->type = MD_LINE_TABLEUNDERLINE; + break; + } + } + + /* By default, we are normal text line. */ + line->type = MD_LINE_TEXT; + if(pivot_line->type == MD_LINE_TEXT && n_brothers + n_children == 0) { + /* Lazy continuation. */ + n_parents = ctx->n_containers; + } + + /* Check for task mark. */ + if((ctx->parser.flags & MD_FLAG_TASKLISTS) && n_brothers + n_children > 0 && + ISANYOF_(ctx->containers[ctx->n_containers-1].ch, _T("-+*.)"))) + { + OFF tmp = off; + + while(tmp < ctx->size && tmp < off + 3 && ISBLANK(tmp)) + tmp++; + if(tmp + 2 < ctx->size && CH(tmp) == _T('[') && + ISANYOF(tmp+1, _T("xX ")) && CH(tmp+2) == _T(']') && + (tmp + 3 == ctx->size || ISBLANK(tmp+3) || ISNEWLINE(tmp+3))) + { + MD_CONTAINER* task_container = (n_children > 0 ? &ctx->containers[ctx->n_containers-1] : &container); + task_container->is_task = TRUE; + task_container->task_mark_off = tmp + 1; + off = tmp + 3; + while(off < ctx->size && ISWHITESPACE(off)) + off++; + if (off == ctx->size) break; + line->beg = off; + } + } + + break; + } + + /* Scan for end of the line. + * + * Note this is quite a bottleneck of the parsing as we here iterate almost + * over compete document. + */ +#if defined __linux__ && !defined MD4C_USE_UTF16 + /* Recent glibc versions have superbly optimized strcspn(), even using + * vectorization if available. */ + if(ctx->doc_ends_with_newline && off < ctx->size) { + while(TRUE) { + off += (OFF) strcspn(STR(off), "\r\n"); + + /* strcspn() can stop on zero terminator; but that can appear + * anywhere in the Markfown input... */ + if(CH(off) == _T('\0')) + off++; + else + break; + } + } else +#endif + { + /* Optimization: Use some loop unrolling. */ + while(off + 3 < ctx->size && !ISNEWLINE(off+0) && !ISNEWLINE(off+1) + && !ISNEWLINE(off+2) && !ISNEWLINE(off+3)) + off += 4; + while(off < ctx->size && !ISNEWLINE(off)) + off++; + } + + /* Set end of the line. */ + line->end = off; + + /* But for ATX header, we should exclude the optional trailing mark. */ + if(line->type == MD_LINE_ATXHEADER) { + OFF tmp = line->end; + while(tmp > line->beg && CH(tmp-1) == _T(' ')) + tmp--; + while(tmp > line->beg && CH(tmp-1) == _T('#')) + tmp--; + if(tmp == line->beg || CH(tmp-1) == _T(' ') || (ctx->parser.flags & MD_FLAG_PERMISSIVEATXHEADERS)) + line->end = tmp; + } + + /* Trim trailing spaces. */ + if(line->type != MD_LINE_INDENTEDCODE && line->type != MD_LINE_FENCEDCODE) { + while(line->end > line->beg && CH(line->end-1) == _T(' ')) + line->end--; + } + + /* Eat also the new line. */ + if(off < ctx->size && CH(off) == _T('\r')) + off++; + if(off < ctx->size && CH(off) == _T('\n')) + off++; + + *p_end = off; + + /* If we belong to a list after seeing a blank line, the list is loose. */ + if(prev_line_has_list_loosening_effect && line->type != MD_LINE_BLANK && n_parents + n_brothers > 0) { + MD_CONTAINER* c = &ctx->containers[n_parents + n_brothers - 1]; + if(c->ch != _T('>')) { + MD_BLOCK* block = (MD_BLOCK*) (((char*)ctx->block_bytes) + c->block_byte_off); + block->flags |= MD_BLOCK_LOOSE_LIST; + } + } + + /* Leave any containers we are not part of anymore. */ + if(n_children == 0 && n_parents + n_brothers < ctx->n_containers) + MD_CHECK(md_leave_child_containers(ctx, n_parents + n_brothers)); + + /* Enter any container we found a mark for. */ + if(n_brothers > 0) { + MD_ASSERT(n_brothers == 1); + MD_CHECK(md_push_container_bytes(ctx, MD_BLOCK_LI, + ctx->containers[n_parents].task_mark_off, + (ctx->containers[n_parents].is_task ? CH(ctx->containers[n_parents].task_mark_off) : 0), + MD_BLOCK_CONTAINER_CLOSER)); + MD_CHECK(md_push_container_bytes(ctx, MD_BLOCK_LI, + container.task_mark_off, + (container.is_task ? CH(container.task_mark_off) : 0), + MD_BLOCK_CONTAINER_OPENER)); + ctx->containers[n_parents].is_task = container.is_task; + ctx->containers[n_parents].task_mark_off = container.task_mark_off; + } + + if(n_children > 0) + MD_CHECK(md_enter_child_containers(ctx, n_children)); + +abort: + return ret; +} + +static int +md_process_line(MD_CTX* ctx, const MD_LINE_ANALYSIS** p_pivot_line, MD_LINE_ANALYSIS* line) +{ + const MD_LINE_ANALYSIS* pivot_line = *p_pivot_line; + int ret = 0; + + /* Blank line ends current leaf block. */ + if(line->type == MD_LINE_BLANK) { + MD_CHECK(md_end_current_block(ctx)); + *p_pivot_line = &md_dummy_blank_line; + return 0; + } + + /* Some line types form block on their own. */ + if(line->type == MD_LINE_HR || line->type == MD_LINE_ATXHEADER) { + MD_CHECK(md_end_current_block(ctx)); + + /* Add our single-line block. */ + MD_CHECK(md_start_new_block(ctx, line)); + MD_CHECK(md_add_line_into_current_block(ctx, line)); + MD_CHECK(md_end_current_block(ctx)); + *p_pivot_line = &md_dummy_blank_line; + return 0; + } + + /* MD_LINE_SETEXTUNDERLINE changes meaning of the current block and ends it. */ + if(line->type == MD_LINE_SETEXTUNDERLINE) { + MD_ASSERT(ctx->current_block != NULL); + ctx->current_block->type = MD_BLOCK_H; + ctx->current_block->data = line->data; + ctx->current_block->flags |= MD_BLOCK_SETEXT_HEADER; + MD_CHECK(md_add_line_into_current_block(ctx, line)); + MD_CHECK(md_end_current_block(ctx)); + if(ctx->current_block == NULL) { + *p_pivot_line = &md_dummy_blank_line; + } else { + /* This happens if we have consumed all the body as link ref. defs. + * and downgraded the underline into start of a new paragraph block. */ + line->type = MD_LINE_TEXT; + *p_pivot_line = line; + } + return 0; + } + + /* MD_LINE_TABLEUNDERLINE changes meaning of the current block. */ + if(line->type == MD_LINE_TABLEUNDERLINE) { + MD_ASSERT(ctx->current_block != NULL); + MD_ASSERT(ctx->current_block->n_lines == 1); + ctx->current_block->type = MD_BLOCK_TABLE; + ctx->current_block->data = line->data; + MD_ASSERT(pivot_line != &md_dummy_blank_line); + ((MD_LINE_ANALYSIS*)pivot_line)->type = MD_LINE_TABLE; + MD_CHECK(md_add_line_into_current_block(ctx, line)); + return 0; + } + + /* The current block also ends if the line has different type. */ + if(line->type != pivot_line->type) + MD_CHECK(md_end_current_block(ctx)); + + /* The current line may start a new block. */ + if(ctx->current_block == NULL) { + MD_CHECK(md_start_new_block(ctx, line)); + *p_pivot_line = line; + } + + /* In all other cases the line is just a continuation of the current block. */ + MD_CHECK(md_add_line_into_current_block(ctx, line)); + +abort: + return ret; +} + +static int +md_process_doc(MD_CTX *ctx) +{ + const MD_LINE_ANALYSIS* pivot_line = &md_dummy_blank_line; + MD_LINE_ANALYSIS line_buf[2]; + MD_LINE_ANALYSIS* line = &line_buf[0]; + OFF off = 0; + int ret = 0; + + MD_ENTER_BLOCK(MD_BLOCK_DOC, NULL); + + while(off < ctx->size) { + if(line == pivot_line) + line = (line == &line_buf[0] ? &line_buf[1] : &line_buf[0]); + + MD_CHECK(md_analyze_line(ctx, off, &off, pivot_line, line)); + MD_CHECK(md_process_line(ctx, &pivot_line, line)); + } + + md_end_current_block(ctx); + + MD_CHECK(md_build_ref_def_hashtable(ctx)); + + /* Process all blocks. */ + MD_CHECK(md_leave_child_containers(ctx, 0)); + MD_CHECK(md_process_all_blocks(ctx)); + + MD_LEAVE_BLOCK(MD_BLOCK_DOC, NULL); + +abort: + +#if 0 + /* Output some memory consumption statistics. */ + { + char buffer[256]; + sprintf(buffer, "Alloced %u bytes for block buffer.", + (unsigned)(ctx->alloc_block_bytes)); + MD_LOG(buffer); + + sprintf(buffer, "Alloced %u bytes for containers buffer.", + (unsigned)(ctx->alloc_containers * sizeof(MD_CONTAINER))); + MD_LOG(buffer); + + sprintf(buffer, "Alloced %u bytes for marks buffer.", + (unsigned)(ctx->alloc_marks * sizeof(MD_MARK))); + MD_LOG(buffer); + + sprintf(buffer, "Alloced %u bytes for aux. buffer.", + (unsigned)(ctx->alloc_buffer * sizeof(MD_CHAR))); + MD_LOG(buffer); + } +#endif + + return ret; +} + + +/******************** + *** Public API *** + ********************/ + +int +md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata) +{ + MD_CTX ctx; + int i; + int ret; + + if(parser->abi_version != 0) { + if(parser->debug_log != NULL) + parser->debug_log("Unsupported abi_version.", userdata); + return -1; + } + + /* Setup context structure. */ + memset(&ctx, 0, sizeof(MD_CTX)); + ctx.text = text; + ctx.size = size; + memcpy(&ctx.parser, parser, sizeof(MD_PARSER)); + ctx.userdata = userdata; + ctx.code_indent_offset = (ctx.parser.flags & MD_FLAG_NOINDENTEDCODEBLOCKS) ? (OFF)(-1) : 4; + md_build_mark_char_map(&ctx); + ctx.doc_ends_with_newline = (size > 0 && ISNEWLINE_(text[size-1])); + + /* Reset all unresolved opener mark chains. */ + for(i = 0; i < (int) SIZEOF_ARRAY(ctx.mark_chains); i++) { + ctx.mark_chains[i].head = -1; + ctx.mark_chains[i].tail = -1; + } + ctx.unresolved_link_head = -1; + ctx.unresolved_link_tail = -1; + + /* All the work. */ + ret = md_process_doc(&ctx); + + /* Clean-up. */ + md_free_ref_defs(&ctx); + md_free_ref_def_hashtable(&ctx); + free(ctx.buffer); + free(ctx.marks); + free(ctx.block_bytes); + free(ctx.containers); + + return ret; +} diff --git a/tdemarkdown/md4c/src/md4c.h b/tdemarkdown/md4c/src/md4c.h new file mode 100644 index 000000000..95f78f9b9 --- /dev/null +++ b/tdemarkdown/md4c/src/md4c.h @@ -0,0 +1,405 @@ +/* + * MD4C: Markdown parser for C + * (http://github.com/mity/md4c) + * + * Copyright (c) 2016-2020 Martin Mitas + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#ifndef MD4C_H +#define MD4C_H + +#ifdef __cplusplus + extern "C" { +#endif + +#if defined MD4C_USE_UTF16 + /* Magic to support UTF-16. Note that in order to use it, you have to define + * the macro MD4C_USE_UTF16 both when building MD4C as well as when + * including this header in your code. */ + #ifdef _WIN32 + #include <windows.h> + typedef WCHAR MD_CHAR; + #else + #error MD4C_USE_UTF16 is only supported on Windows. + #endif +#else + typedef char MD_CHAR; +#endif + +typedef unsigned MD_SIZE; +typedef unsigned MD_OFFSET; + + +/* Block represents a part of document hierarchy structure like a paragraph + * or list item. + */ +typedef enum MD_BLOCKTYPE { + /* <body>...</body> */ + MD_BLOCK_DOC = 0, + + /* <blockquote>...</blockquote> */ + MD_BLOCK_QUOTE, + + /* <ul>...</ul> + * Detail: Structure MD_BLOCK_UL_DETAIL. */ + MD_BLOCK_UL, + + /* <ol>...</ol> + * Detail: Structure MD_BLOCK_OL_DETAIL. */ + MD_BLOCK_OL, + + /* <li>...</li> + * Detail: Structure MD_BLOCK_LI_DETAIL. */ + MD_BLOCK_LI, + + /* <hr> */ + MD_BLOCK_HR, + + /* <h1>...</h1> (for levels up to 6) + * Detail: Structure MD_BLOCK_H_DETAIL. */ + MD_BLOCK_H, + + /* <pre><code>...</code></pre> + * Note the text lines within code blocks are terminated with '\n' + * instead of explicit MD_TEXT_BR. */ + MD_BLOCK_CODE, + + /* Raw HTML block. This itself does not correspond to any particular HTML + * tag. The contents of it _is_ raw HTML source intended to be put + * in verbatim form to the HTML output. */ + MD_BLOCK_HTML, + + /* <p>...</p> */ + MD_BLOCK_P, + + /* <table>...</table> and its contents. + * Detail: Structure MD_BLOCK_TABLE_DETAIL (for MD_BLOCK_TABLE), + * structure MD_BLOCK_TD_DETAIL (for MD_BLOCK_TH and MD_BLOCK_TD) + * Note all of these are used only if extension MD_FLAG_TABLES is enabled. */ + MD_BLOCK_TABLE, + MD_BLOCK_THEAD, + MD_BLOCK_TBODY, + MD_BLOCK_TR, + MD_BLOCK_TH, + MD_BLOCK_TD +} MD_BLOCKTYPE; + +/* Span represents an in-line piece of a document which should be rendered with + * the same font, color and other attributes. A sequence of spans forms a block + * like paragraph or list item. */ +typedef enum MD_SPANTYPE { + /* <em>...</em> */ + MD_SPAN_EM, + + /* <strong>...</strong> */ + MD_SPAN_STRONG, + + /* <a href="xxx">...</a> + * Detail: Structure MD_SPAN_A_DETAIL. */ + MD_SPAN_A, + + /* <img src="xxx">...</a> + * Detail: Structure MD_SPAN_IMG_DETAIL. + * Note: Image text can contain nested spans and even nested images. + * If rendered into ALT attribute of HTML <IMG> tag, it's responsibility + * of the parser to deal with it. + */ + MD_SPAN_IMG, + + /* <code>...</code> */ + MD_SPAN_CODE, + + /* <del>...</del> + * Note: Recognized only when MD_FLAG_STRIKETHROUGH is enabled. + */ + MD_SPAN_DEL, + + /* For recognizing inline ($) and display ($$) equations + * Note: Recognized only when MD_FLAG_LATEXMATHSPANS is enabled. + */ + MD_SPAN_LATEXMATH, + MD_SPAN_LATEXMATH_DISPLAY, + + /* Wiki links + * Note: Recognized only when MD_FLAG_WIKILINKS is enabled. + */ + MD_SPAN_WIKILINK, + + /* <u>...</u> + * Note: Recognized only when MD_FLAG_UNDERLINE is enabled. */ + MD_SPAN_U +} MD_SPANTYPE; + +/* Text is the actual textual contents of span. */ +typedef enum MD_TEXTTYPE { + /* Normal text. */ + MD_TEXT_NORMAL = 0, + + /* NULL character. CommonMark requires replacing NULL character with + * the replacement char U+FFFD, so this allows caller to do that easily. */ + MD_TEXT_NULLCHAR, + + /* Line breaks. + * Note these are not sent from blocks with verbatim output (MD_BLOCK_CODE + * or MD_BLOCK_HTML). In such cases, '\n' is part of the text itself. */ + MD_TEXT_BR, /* <br> (hard break) */ + MD_TEXT_SOFTBR, /* '\n' in source text where it is not semantically meaningful (soft break) */ + + /* Entity. + * (a) Named entity, e.g. + * (Note MD4C does not have a list of known entities. + * Anything matching the regexp /&[A-Za-z][A-Za-z0-9]{1,47};/ is + * treated as a named entity.) + * (b) Numerical entity, e.g. Ӓ + * (c) Hexadecimal entity, e.g. ካ + * + * As MD4C is mostly encoding agnostic, application gets the verbatim + * entity text into the MD_PARSER::text_callback(). */ + MD_TEXT_ENTITY, + + /* Text in a code block (inside MD_BLOCK_CODE) or inlined code (`code`). + * If it is inside MD_BLOCK_CODE, it includes spaces for indentation and + * '\n' for new lines. MD_TEXT_BR and MD_TEXT_SOFTBR are not sent for this + * kind of text. */ + MD_TEXT_CODE, + + /* Text is a raw HTML. If it is contents of a raw HTML block (i.e. not + * an inline raw HTML), then MD_TEXT_BR and MD_TEXT_SOFTBR are not used. + * The text contains verbatim '\n' for the new lines. */ + MD_TEXT_HTML, + + /* Text is inside an equation. This is processed the same way as inlined code + * spans (`code`). */ + MD_TEXT_LATEXMATH +} MD_TEXTTYPE; + + +/* Alignment enumeration. */ +typedef enum MD_ALIGN { + MD_ALIGN_DEFAULT = 0, /* When unspecified. */ + MD_ALIGN_LEFT, + MD_ALIGN_CENTER, + MD_ALIGN_RIGHT +} MD_ALIGN; + + +/* String attribute. + * + * This wraps strings which are outside of a normal text flow and which are + * propagated within various detailed structures, but which still may contain + * string portions of different types like e.g. entities. + * + * So, for example, lets consider this image: + * + * ![image alt text](http://example.org/image.png 'foo " bar') + * + * The image alt text is propagated as a normal text via the MD_PARSER::text() + * callback. However, the image title ('foo " bar') is propagated as + * MD_ATTRIBUTE in MD_SPAN_IMG_DETAIL::title. + * + * Then the attribute MD_SPAN_IMG_DETAIL::title shall provide the following: + * -- [0]: "foo " (substr_types[0] == MD_TEXT_NORMAL; substr_offsets[0] == 0) + * -- [1]: """ (substr_types[1] == MD_TEXT_ENTITY; substr_offsets[1] == 4) + * -- [2]: " bar" (substr_types[2] == MD_TEXT_NORMAL; substr_offsets[2] == 10) + * -- [3]: (n/a) (n/a ; substr_offsets[3] == 14) + * + * Note that these invariants are always guaranteed: + * -- substr_offsets[0] == 0 + * -- substr_offsets[LAST+1] == size + * -- Currently, only MD_TEXT_NORMAL, MD_TEXT_ENTITY, MD_TEXT_NULLCHAR + * substrings can appear. This could change only of the specification + * changes. + */ +typedef struct MD_ATTRIBUTE { + const MD_CHAR* text; + MD_SIZE size; + const MD_TEXTTYPE* substr_types; + const MD_OFFSET* substr_offsets; +} MD_ATTRIBUTE; + + +/* Detailed info for MD_BLOCK_UL. */ +typedef struct MD_BLOCK_UL_DETAIL { + int is_tight; /* Non-zero if tight list, zero if loose. */ + MD_CHAR mark; /* Item bullet character in MarkDown source of the list, e.g. '-', '+', '*'. */ +} MD_BLOCK_UL_DETAIL; + +/* Detailed info for MD_BLOCK_OL. */ +typedef struct MD_BLOCK_OL_DETAIL { + unsigned start; /* Start index of the ordered list. */ + int is_tight; /* Non-zero if tight list, zero if loose. */ + MD_CHAR mark_delimiter; /* Character delimiting the item marks in MarkDown source, e.g. '.' or ')' */ +} MD_BLOCK_OL_DETAIL; + +/* Detailed info for MD_BLOCK_LI. */ +typedef struct MD_BLOCK_LI_DETAIL { + int is_task; /* Can be non-zero only with MD_FLAG_TASKLISTS */ + MD_CHAR task_mark; /* If is_task, then one of 'x', 'X' or ' '. Undefined otherwise. */ + MD_OFFSET task_mark_offset; /* If is_task, then offset in the input of the char between '[' and ']'. */ +} MD_BLOCK_LI_DETAIL; + +/* Detailed info for MD_BLOCK_H. */ +typedef struct MD_BLOCK_H_DETAIL { + unsigned level; /* Header level (1 - 6) */ +} MD_BLOCK_H_DETAIL; + +/* Detailed info for MD_BLOCK_CODE. */ +typedef struct MD_BLOCK_CODE_DETAIL { + MD_ATTRIBUTE info; + MD_ATTRIBUTE lang; + MD_CHAR fence_char; /* The character used for fenced code block; or zero for indented code block. */ +} MD_BLOCK_CODE_DETAIL; + +/* Detailed info for MD_BLOCK_TABLE. */ +typedef struct MD_BLOCK_TABLE_DETAIL { + unsigned col_count; /* Count of columns in the table. */ + unsigned head_row_count; /* Count of rows in the table header (currently always 1) */ + unsigned body_row_count; /* Count of rows in the table body */ +} MD_BLOCK_TABLE_DETAIL; + +/* Detailed info for MD_BLOCK_TH and MD_BLOCK_TD. */ +typedef struct MD_BLOCK_TD_DETAIL { + MD_ALIGN align; +} MD_BLOCK_TD_DETAIL; + +/* Detailed info for MD_SPAN_A. */ +typedef struct MD_SPAN_A_DETAIL { + MD_ATTRIBUTE href; + MD_ATTRIBUTE title; +} MD_SPAN_A_DETAIL; + +/* Detailed info for MD_SPAN_IMG. */ +typedef struct MD_SPAN_IMG_DETAIL { + MD_ATTRIBUTE src; + MD_ATTRIBUTE title; +} MD_SPAN_IMG_DETAIL; + +/* Detailed info for MD_SPAN_WIKILINK. */ +typedef struct MD_SPAN_WIKILINK { + MD_ATTRIBUTE target; +} MD_SPAN_WIKILINK_DETAIL; + +/* Flags specifying extensions/deviations from CommonMark specification. + * + * By default (when MD_PARSER::flags == 0), we follow CommonMark specification. + * The following flags may allow some extensions or deviations from it. + */ +#define MD_FLAG_COLLAPSEWHITESPACE 0x0001 /* In MD_TEXT_NORMAL, collapse non-trivial whitespace into single ' ' */ +#define MD_FLAG_PERMISSIVEATXHEADERS 0x0002 /* Do not require space in ATX headers ( ###header ) */ +#define MD_FLAG_PERMISSIVEURLAUTOLINKS 0x0004 /* Recognize URLs as autolinks even without '<', '>' */ +#define MD_FLAG_PERMISSIVEEMAILAUTOLINKS 0x0008 /* Recognize e-mails as autolinks even without '<', '>' and 'mailto:' */ +#define MD_FLAG_NOINDENTEDCODEBLOCKS 0x0010 /* Disable indented code blocks. (Only fenced code works.) */ +#define MD_FLAG_NOHTMLBLOCKS 0x0020 /* Disable raw HTML blocks. */ +#define MD_FLAG_NOHTMLSPANS 0x0040 /* Disable raw HTML (inline). */ +#define MD_FLAG_TABLES 0x0100 /* Enable tables extension. */ +#define MD_FLAG_STRIKETHROUGH 0x0200 /* Enable strikethrough extension. */ +#define MD_FLAG_PERMISSIVEWWWAUTOLINKS 0x0400 /* Enable WWW autolinks (even without any scheme prefix, if they begin with 'www.') */ +#define MD_FLAG_TASKLISTS 0x0800 /* Enable task list extension. */ +#define MD_FLAG_LATEXMATHSPANS 0x1000 /* Enable $ and $$ containing LaTeX equations. */ +#define MD_FLAG_WIKILINKS 0x2000 /* Enable wiki links extension. */ +#define MD_FLAG_UNDERLINE 0x4000 /* Enable underline extension (and disables '_' for normal emphasis). */ + +#define MD_FLAG_PERMISSIVEAUTOLINKS (MD_FLAG_PERMISSIVEEMAILAUTOLINKS | MD_FLAG_PERMISSIVEURLAUTOLINKS | MD_FLAG_PERMISSIVEWWWAUTOLINKS) +#define MD_FLAG_NOHTML (MD_FLAG_NOHTMLBLOCKS | MD_FLAG_NOHTMLSPANS) + +/* Convenient sets of flags corresponding to well-known Markdown dialects. + * + * Note we may only support subset of features of the referred dialect. + * The constant just enables those extensions which bring us as close as + * possible given what features we implement. + * + * ABI compatibility note: Meaning of these can change in time as new + * extensions, bringing the dialect closer to the original, are implemented. + */ +#define MD_DIALECT_COMMONMARK 0 +#define MD_DIALECT_GITHUB (MD_FLAG_PERMISSIVEAUTOLINKS | MD_FLAG_TABLES | MD_FLAG_STRIKETHROUGH | MD_FLAG_TASKLISTS) + +/* Parser structure. + */ +typedef struct MD_PARSER { + /* Reserved. Set to zero. + */ + unsigned abi_version; + + /* Dialect options. Bitmask of MD_FLAG_xxxx values. + */ + unsigned flags; + + /* Caller-provided rendering callbacks. + * + * For some block/span types, more detailed information is provided in a + * type-specific structure pointed by the argument 'detail'. + * + * The last argument of all callbacks, 'userdata', is just propagated from + * md_parse() and is available for any use by the application. + * + * Note any strings provided to the callbacks as their arguments or as + * members of any detail structure are generally not zero-terminated. + * Application has to take the respective size information into account. + * + * Any rendering callback may abort further parsing of the document by + * returning non-zero. + */ + int (*enter_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/); + int (*leave_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/); + + int (*enter_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/); + int (*leave_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/); + + int (*text)(MD_TEXTTYPE /*type*/, const MD_CHAR* /*text*/, MD_SIZE /*size*/, void* /*userdata*/); + + /* Debug callback. Optional (may be NULL). + * + * If provided and something goes wrong, this function gets called. + * This is intended for debugging and problem diagnosis for developers; + * it is not intended to provide any errors suitable for displaying to an + * end user. + */ + void (*debug_log)(const char* /*msg*/, void* /*userdata*/); + + /* Reserved. Set to NULL. + */ + void (*syntax)(void); +} MD_PARSER; + + +/* For backward compatibility. Do not use in new code. + */ +typedef MD_PARSER MD_RENDERER; + + +/* Parse the Markdown document stored in the string 'text' of size 'size'. + * The parser provides callbacks to be called during the parsing so the + * caller can render the document on the screen or convert the Markdown + * to another format. + * + * Zero is returned on success. If a runtime error occurs (e.g. a memory + * fails), -1 is returned. If the processing is aborted due any callback + * returning non-zero, the return value of the callback is returned. + */ +int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata); + + +#ifdef __cplusplus + } /* extern "C" { */ +#endif + +#endif /* MD4C_H */ diff --git a/tdemarkdown/md4c/src/md4c.pc.in b/tdemarkdown/md4c/src/md4c.pc.in new file mode 100644 index 000000000..cd8842dd5 --- /dev/null +++ b/tdemarkdown/md4c/src/md4c.pc.in @@ -0,0 +1,13 @@ +prefix=@CMAKE_INSTALL_PREFIX@ +exec_prefix=@CMAKE_INSTALL_PREFIX@ +libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@ +includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@ + +Name: @PROJECT_NAME@ +Description: Markdown parser library with a SAX-like callback-based interface. +Version: @PROJECT_VERSION@ +URL: @PROJECT_URL@ + +Requires: +Libs: -L${libdir} -lmd4c +Cflags: -I${includedir} diff --git a/tdemarkdown/md4c/test/LICENSE b/tdemarkdown/md4c/test/LICENSE new file mode 100644 index 000000000..69da849a0 --- /dev/null +++ b/tdemarkdown/md4c/test/LICENSE @@ -0,0 +1,64 @@ +The CommonMark spec (spec.txt) and DTD (CommonMark.dtd) are + +Copyright (C) 2014-16 John MacFarlane + +Released under the Creative Commons CC-BY-SA 4.0 license: +<http://creativecommons.org/licenses/by-sa/4.0/>. + +--- + +The test software in test/ and the programs in tools/ are + +Copyright (c) 2014, John MacFarlane + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above + copyright notice, this list of conditions and the following + disclaimer in the documentation and/or other materials provided + with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +--- + +The normalization code in runtests.py was derived from the +markdowntest project, Copyright 2013 Karl Dubost: + +The MIT License (MIT) + +Copyright (c) 2013 Karl Dubost + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE +LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/tdemarkdown/md4c/test/cmark.py b/tdemarkdown/md4c/test/cmark.py new file mode 100755 index 000000000..111086030 --- /dev/null +++ b/tdemarkdown/md4c/test/cmark.py @@ -0,0 +1,40 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- + +from ctypes import CDLL, c_char_p, c_long +from subprocess import * +import platform +import os + +def pipe_through_prog(prog, text): + p1 = Popen(prog.split(), stdout=PIPE, stdin=PIPE, stderr=PIPE) + [result, err] = p1.communicate(input=text.encode('utf-8')) + return [p1.returncode, result.decode('utf-8'), err] + +def use_library(lib, text): + textbytes = text.encode('utf-8') + textlen = len(textbytes) + return [0, lib(textbytes, textlen, 0).decode('utf-8'), ''] + +class CMark: + def __init__(self, prog=None, library_dir=None): + self.prog = prog + if prog: + self.to_html = lambda x: pipe_through_prog(prog, x) + else: + sysname = platform.system() + if sysname == 'Darwin': + libname = "libcmark.dylib" + elif sysname == 'Windows': + libname = "cmark.dll" + else: + libname = "libcmark.so" + if library_dir: + libpath = os.path.join(library_dir, libname) + else: + libpath = os.path.join("build", "src", libname) + cmark = CDLL(libpath) + markdown = cmark.cmark_markdown_to_html + markdown.restype = c_char_p + markdown.argtypes = [c_char_p, c_long] + self.to_html = lambda x: use_library(markdown, x) diff --git a/tdemarkdown/md4c/test/coverage.txt b/tdemarkdown/md4c/test/coverage.txt new file mode 100644 index 000000000..66d5cc8dc --- /dev/null +++ b/tdemarkdown/md4c/test/coverage.txt @@ -0,0 +1,522 @@ + +# Coverage + +This file is just a collection of unit tests not covered elsewhere. + +Most notably regression tests, tests improving code coverage and other useful +things may drop here. + +(However any tests requiring any additional command line option, like enabling +an extension, must be included in their respective files.) + + +## GitHub Issues + +### [Issue 2](https://github.com/mity/md4c/issues/2) + +Raw HTML block: + +```````````````````````````````` example +<gi att1=tok1 att2=tok2> +. +<gi att1=tok1 att2=tok2> +```````````````````````````````` + +Inline: + +```````````````````````````````` example +foo <gi att1=tok1 att2=tok2> bar +. +<p>foo <gi att1=tok1 att2=tok2> bar</p> +```````````````````````````````` + +Inline with a line break: + +```````````````````````````````` example +foo <gi att1=tok1 +att2=tok2> bar +. +<p>foo <gi att1=tok1 +att2=tok2> bar</p> +```````````````````````````````` + + +### [Issue 4](https://github.com/mity/md4c/issues/4) + +```````````````````````````````` example +![alt text with *entity* ©](img.png 'title') +. +<p><img src="img.png" alt="alt text with entity ©" title="title"></p> +```````````````````````````````` + + +### [Issue 9](https://github.com/mity/md4c/issues/9) + +```````````````````````````````` example +> [foo +> bar]: /url +> +> [foo bar] +. +<blockquote> +<p><a href="/url">foo +bar</a></p> +</blockquote> +```````````````````````````````` + + +### [Issue 10](https://github.com/mity/md4c/issues/10) + +```````````````````````````````` example +[x]: +x +- <? + + x +. +<ul> +<li><? + +x +</li> +</ul> +```````````````````````````````` + + +### [Issue 11](https://github.com/mity/md4c/issues/11) + +```````````````````````````````` example +x [link](/url "foo – bar") x +. +<p>x <a href="/url" title="foo – bar">link</a> x</p> +```````````````````````````````` + + +### [Issue 14](https://github.com/mity/md4c/issues/14) + +```````````````````````````````` example +a***b* c* +. +<p>a*<em><em>b</em> c</em></p> +```````````````````````````````` + + +### [Issue 15](https://github.com/mity/md4c/issues/15) + +```````````````````````````````` example +***b* c* +. +<p>*<em><em>b</em> c</em></p> +```````````````````````````````` + + +### [Issue 21](https://github.com/mity/md4c/issues/21) + +```````````````````````````````` example +a*b**c* +. +<p>a<em>b**c</em></p> +```````````````````````````````` + + +### [Issue 33](https://github.com/mity/md4c/issues/33) + +```````````````````````````````` example +```&&&&&&&& +. +<pre><code class="language-&&&&&&&&"></code></pre> +```````````````````````````````` + + +### [Issue 36](https://github.com/mity/md4c/issues/36) + +```````````````````````````````` example +__x_ _x___ +. +<p><em><em>x</em> <em>x</em></em>_</p> +```````````````````````````````` + + +### [Issue 39](https://github.com/mity/md4c/issues/39) + +```````````````````````````````` example +[\\]: x +. +```````````````````````````````` + + +### [Issue 40](https://github.com/mity/md4c/issues/40) + +```````````````````````````````` example +[x](url +'title' +)x +. +<p><a href="url" title="title">x</a>x</p> +```````````````````````````````` + + +### [Issue 65](https://github.com/mity/md4c/issues/65) + +```````````````````````````````` example +` +. +<p>`</p> +```````````````````````````````` + + +### [Issue 74](https://github.com/mity/md4c/issues/74) + +```````````````````````````````` example +[f]: +- + xx +- +. +<pre><code>xx +</code></pre> +<ul> +<li></li> +</ul> +```````````````````````````````` + + +### [Issue 78](https://github.com/mity/md4c/issues/78) + +```````````````````````````````` example +[SS ẞ]: /url +[ẞ SS] +. +<p><a href="/url">ẞ SS</a></p> +```````````````````````````````` + + +### [Issue 83](https://github.com/mity/md4c/issues/83) + +```````````````````````````````` example +foo +> +. +<p>foo</p> +<blockquote> +</blockquote> + +```````````````````````````````` + + +### [Issue 95](https://github.com/mity/md4c/issues/95) + +```````````````````````````````` example +. foo +. +<p>. foo</p> +```````````````````````````````` + + +### [Issue 96](https://github.com/mity/md4c/issues/96) + +```````````````````````````````` example +[ab]: /foo +[a] [ab] [abc] +. +<p>[a] <a href="/foo">ab</a> [abc]</p> +```````````````````````````````` + +```````````````````````````````` example +[a b]: /foo +[a b] +. +<p><a href="/foo">a b</a></p> +```````````````````````````````` + + +### [Issue 97](https://github.com/mity/md4c/issues/97) + +```````````````````````````````` example +*a **b c* d** +. +<p><em>a <em><em>b c</em> d</em></em></p> + +```````````````````````````````` + + +### [Issue 100](https://github.com/mity/md4c/issues/100) + +```````````````````````````````` example +<foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123> +. +<p><a href="mailto:foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123">foo@123456789012345678901234567890123456789012345678901234567890123.123456789012345678901234567890123456789012345678901234567890123</a></p> +```````````````````````````````` + +```````````````````````````````` example +<foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123> +. +<p><foo@123456789012345678901234567890123456789012345678901234567890123x.123456789012345678901234567890123456789012345678901234567890123></p> +```````````````````````````````` +(Note the `x` here which turns it over the max. allowed length limit.) + + +### [Issue 107](https://github.com/mity/md4c/issues/107) + +```````````````````````````````` example +***foo *bar baz*** +. +<p>*<strong>foo <em>bar baz</em></strong></p> + +```````````````````````````````` + + +### [Issue 124](https://github.com/mity/md4c/issues/124) + +```````````````````````````````` example +~~~ + x +~~~ + +~~~ + x +~~~ +. +<pre><code> x +</code></pre> +<pre><code> x +</code></pre> +```````````````````````````````` + + +### [Issue 131](https://github.com/mity/md4c/issues/131) + +```````````````````````````````` example +[![alt][img]][link] + +[img]: img_url +[link]: link_url +. +<p><a href="link_url"><img src="img_url" alt="alt"></a></p> +```````````````````````````````` + + +### [Issue 142](https://github.com/mity/md4c/issues/142) + +```````````````````````````````` example +[fooﬗ]: /url +[fooﬕ] +. +<p>[fooﬕ]</p> +```````````````````````````````` + + +### [Issue 149](https://github.com/mity/md4c/issues/149) + +```````````````````````````````` example +- <script> +- foo +bar +</script> +. +<ul> +<li><script> +</li> +<li>foo +bar +</script></li> +</ul> +```````````````````````````````` + + +## Code coverage + +### `md_is_unicode_whitespace__()` + +Unicode whitespace (here U+2000) forms a word boundary so these cannot be +resolved as emphasis span because there is no closer mark. + +```````````````````````````````` example +*foo *bar +. +<p>*foo *bar</p> +```````````````````````````````` + + +### `md_is_unicode_punct__()` + +Ditto for Unicode punctuation (here U+00A1). + +```````````````````````````````` example +*foo¡*bar +. +<p>*foo¡*bar</p> +```````````````````````````````` + + +### `md_get_unicode_fold_info()` + +```````````````````````````````` example +[Příliš žluťoučký kůň úpěl ďábelské ódy.] + +[PŘÍLIŠ ŽLUŤOUČKÝ KŮŇ ÚPĚL ĎÁBELSKÉ ÓDY.]: /url +. +<p><a href="/url">Příliš žluťoučký kůň úpěl ďábelské ódy.</a></p> +```````````````````````````````` + + +### `md_decode_utf8__()` and `md_decode_utf8_before__()` + +```````````````````````````````` example +á*Á (U+00E1, i.e. two byte UTF-8 sequence) + * (U+2000, i.e. three byte UTF-8 sequence) +. +<p>á*Á (U+00E1, i.e. two byte UTF-8 sequence) + * (U+2000, i.e. three byte UTF-8 sequence)</p> +```````````````````````````````` + + +### `md_is_link_destination_A()` + +```````````````````````````````` example +[link](</url\.with\.escape>) +. +<p><a href="/url.with.escape">link</a></p> +```````````````````````````````` + + +### `md_link_label_eq()` + +```````````````````````````````` example +[foo bar] + +[foo bar]: /url +. +<p><a href="/url">foo bar</a></p> +```````````````````````````````` + + +### `md_is_inline_link_spec()` + +```````````````````````````````` example +> [link](/url 'foo +> bar') +. +<blockquote> +<p><a href="/url" title="foo +bar">link</a></p> +</blockquote> +```````````````````````````````` + + +### `md_build_ref_def_hashtable()` + +All link labels in the following example all have the same FNV1a hash (after +normalization of the label, which means after converting to a vector of Unicode +codepoints and lowercase folding). + +So the example triggers quite complex code paths which are not otherwise easily +tested. + +```````````````````````````````` example +[foo]: /foo +[qnptgbh]: /qnptgbh +[abgbrwcv]: /abgbrwcv +[abgbrwcv]: /abgbrwcv2 +[abgbrwcv]: /abgbrwcv3 +[abgbrwcv]: /abgbrwcv4 +[alqadfgn]: /alqadfgn + +[foo] +[qnptgbh] +[abgbrwcv] +[alqadfgn] +[axgydtdu] +. +<p><a href="/foo">foo</a> +<a href="/qnptgbh">qnptgbh</a> +<a href="/abgbrwcv">abgbrwcv</a> +<a href="/alqadfgn">alqadfgn</a> +[axgydtdu]</p> +```````````````````````````````` + +For the sake of completeness, the following C program was used to find the hash +collisions by brute force: + +~~~ + +#include <stdio.h> +#include <string.h> + + +static unsigned etalon; + + + +#define MD_FNV1A_BASE 2166136261 +#define MD_FNV1A_PRIME 16777619 + +static inline unsigned +fnv1a(unsigned base, const void* data, size_t n) +{ + const unsigned char* buf = (const unsigned char*) data; + unsigned hash = base; + size_t i; + + for(i = 0; i < n; i++) { + hash ^= buf[i]; + hash *= MD_FNV1A_PRIME; + } + + return hash; +} + + +static unsigned +unicode_hash(const char* data, size_t n) +{ + unsigned value; + unsigned hash = MD_FNV1A_BASE; + int i; + + for(i = 0; i < n; i++) { + value = data[i]; + hash = fnv1a(hash, &value, sizeof(unsigned)); + } + + return hash; +} + + +static void +recurse(char* buffer, size_t off, size_t len) +{ + int ch; + + if(off < len - 1) { + for(ch = 'a'; ch <= 'z'; ch++) { + buffer[off] = ch; + recurse(buffer, off+1, len); + } + } else { + for(ch = 'a'; ch <= 'z'; ch++) { + buffer[off] = ch; + if(unicode_hash(buffer, len) == etalon) { + printf("Dup: %.*s\n", (int)len, buffer); + } + } + } +} + +int +main(int argc, char** argv) +{ + char buffer[32]; + int len; + + if(argc < 2) + etalon = unicode_hash("foo", 3); + else + etalon = unicode_hash(argv[1], strlen(argv[1])); + + for(len = 1; len <= sizeof(buffer); len++) + recurse(buffer, 0, len); + + return 0; +} +~~~ diff --git a/tdemarkdown/md4c/test/fuzz-input/commonmark.md b/tdemarkdown/md4c/test/fuzz-input/commonmark.md new file mode 100644 index 000000000..974d817ba --- /dev/null +++ b/tdemarkdown/md4c/test/fuzz-input/commonmark.md @@ -0,0 +1,40 @@ + +# h1 +## h2 +### h3 +#### h4 +##### h5 +###### h6 + +h1 +== + +h2 +-- + +-------------------- + + indented code + +``` +fenced code +``` + +<tag attr='val' attr2="val2"> + +> quote + +* list item +1. list item + +[ref]: /url + +paragraph +© Ӓ ꯍ +`code` +*emph* **strong** ***strong emph*** +_emph_ __strong__ ___strong emph___ +[ref] [ref][] [link](/url) +![ref] ![ref][] ![img](/url) +<http://example.com> <doe@example.com> +\\ \* \. \` \ diff --git a/tdemarkdown/md4c/test/fuzz-input/gfm.md b/tdemarkdown/md4c/test/fuzz-input/gfm.md new file mode 100644 index 000000000..dfdbc7290 --- /dev/null +++ b/tdemarkdown/md4c/test/fuzz-input/gfm.md @@ -0,0 +1,10 @@ +* [ ] unchecked +* [x] checked + + A | B | C +---|--:|:-: +aaa|bbb|ccc + +~del~ ~~del~~ + +http://example.com www.example.com doe@example.com diff --git a/tdemarkdown/md4c/test/fuzz-input/latex-math.md b/tdemarkdown/md4c/test/fuzz-input/latex-math.md new file mode 100644 index 000000000..d17af345d --- /dev/null +++ b/tdemarkdown/md4c/test/fuzz-input/latex-math.md @@ -0,0 +1 @@ +$a^2+b^2=c^2$ $$a^2+b^2=c^2$$ diff --git a/tdemarkdown/md4c/test/fuzz-input/wiki.md b/tdemarkdown/md4c/test/fuzz-input/wiki.md new file mode 100644 index 000000000..a4239745c --- /dev/null +++ b/tdemarkdown/md4c/test/fuzz-input/wiki.md @@ -0,0 +1 @@ +[[wiki]] [[wiki|label]] diff --git a/tdemarkdown/md4c/test/fuzzers/fuzz-mdhtml.c b/tdemarkdown/md4c/test/fuzzers/fuzz-mdhtml.c new file mode 100644 index 000000000..2d645d237 --- /dev/null +++ b/tdemarkdown/md4c/test/fuzzers/fuzz-mdhtml.c @@ -0,0 +1,35 @@ + +#include <stdint.h> +#include <stdlib.h> +#include "md4c-html.h" + + +static void +process_output(const MD_CHAR* text, MD_SIZE size, void* userdata) +{ + /* This is a dummy function because we don't need to generate any output + * actually. */ + return; +} + +int +LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) +{ + unsigned parser_flags, renderer_flags; + + if(size < 2 * sizeof(unsigned)) { + /* We interpret the 1st 8 bytes as parser flags and renderer flags. */ + return 0; + } + + parser_flags = *(unsigned*)data; + data += sizeof(unsigned); size -= sizeof(unsigned); + + renderer_flags = *(unsigned*)data; + data += sizeof(unsigned); size -= sizeof(unsigned); + + /* Allocate enough space */ + md_html(data, size, process_output, NULL, parser_flags, renderer_flags); + + return 0; +} diff --git a/tdemarkdown/md4c/test/latex-math.txt b/tdemarkdown/md4c/test/latex-math.txt new file mode 100644 index 000000000..2a5774ce0 --- /dev/null +++ b/tdemarkdown/md4c/test/latex-math.txt @@ -0,0 +1,39 @@ + +# LaTeX Math + +With the flag `MD_FLAG_LATEXMATHSPANS`, MD4C enables extension for recognition +of LaTeX style math spans. + +A math span is is any text wrapped in dollars or double dollars (`$...$` or +`$$...$$`). + +```````````````````````````````` example +$a+b=c$ Hello, world! +. +<p><x-equation>a+b=c</x-equation> Hello, world!</p> +```````````````````````````````` + +If the double dollar sign is used, the math span is a display math span. + +```````````````````````````````` example +This is a display equation: $$\int_a^b x dx$$. +. +<p>This is a display equation: <x-equation type="display">\int_a^b x dx</x-equation>.</p> +```````````````````````````````` + +Math spans may span multiple lines as they are normal spans: + +```````````````````````````````` example +$$ +\int_a^b +f(x) dx +$$ +. +<p><x-equation type="display">\int_a^b f(x) dx </x-equation></p> +```````````````````````````````` + +Note though that many (simple) renderers may output the math spans just as a +verbatim text. (This includes the HTML renderer used by the `md2html` utility.) + +Only advanced renderers which implement LaTeX math syntax can be expected to +provide better results. diff --git a/tdemarkdown/md4c/test/normalize.py b/tdemarkdown/md4c/test/normalize.py new file mode 100755 index 000000000..f8ece18d5 --- /dev/null +++ b/tdemarkdown/md4c/test/normalize.py @@ -0,0 +1,194 @@ +# -*- coding: utf-8 -*- +from html.parser import HTMLParser +import urllib + +try: + from html.parser import HTMLParseError +except ImportError: + # HTMLParseError was removed in Python 3.5. It could never be + # thrown, so we define a placeholder instead. + class HTMLParseError(Exception): + pass + +from html.entities import name2codepoint +import sys +import re +import html + +# Normalization code, adapted from +# https://github.com/karlcow/markdown-testsuite/ +significant_attrs = ["alt", "href", "src", "title"] +whitespace_re = re.compile('\s+') +class MyHTMLParser(HTMLParser): + def __init__(self): + HTMLParser.__init__(self) + self.convert_charrefs = False + self.last = "starttag" + self.in_pre = False + self.output = "" + self.last_tag = "" + def handle_data(self, data): + after_tag = self.last == "endtag" or self.last == "starttag" + after_block_tag = after_tag and self.is_block_tag(self.last_tag) + if after_tag and self.last_tag == "br": + data = data.lstrip('\n') + if not self.in_pre: + data = whitespace_re.sub(' ', data) + if after_block_tag and not self.in_pre: + if self.last == "starttag": + data = data.lstrip() + elif self.last == "endtag": + data = data.strip() + self.output += data + self.last = "data" + def handle_endtag(self, tag): + if tag == "pre": + self.in_pre = False + elif self.is_block_tag(tag): + self.output = self.output.rstrip() + self.output += "</" + tag + ">" + self.last_tag = tag + self.last = "endtag" + def handle_starttag(self, tag, attrs): + if tag == "pre": + self.in_pre = True + if self.is_block_tag(tag): + self.output = self.output.rstrip() + self.output += "<" + tag + # For now we don't strip out 'extra' attributes, because of + # raw HTML test cases. + # attrs = filter(lambda attr: attr[0] in significant_attrs, attrs) + if attrs: + attrs.sort() + for (k,v) in attrs: + self.output += " " + k + if v in ['href','src']: + self.output += ("=" + '"' + + urllib.quote(urllib.unquote(v), safe='/') + '"') + elif v != None: + self.output += ("=" + '"' + html.escape(v,quote=True) + '"') + self.output += ">" + self.last_tag = tag + self.last = "starttag" + def handle_startendtag(self, tag, attrs): + """Ignore closing tag for self-closing """ + self.handle_starttag(tag, attrs) + self.last_tag = tag + self.last = "endtag" + def handle_comment(self, data): + self.output += '<!--' + data + '-->' + self.last = "comment" + def handle_decl(self, data): + self.output += '<!' + data + '>' + self.last = "decl" + def unknown_decl(self, data): + self.output += '<!' + data + '>' + self.last = "decl" + def handle_pi(self,data): + self.output += '<?' + data + '>' + self.last = "pi" + def handle_entityref(self, name): + try: + c = chr(name2codepoint[name]) + except KeyError: + c = None + self.output_char(c, '&' + name + ';') + self.last = "ref" + def handle_charref(self, name): + try: + if name.startswith("x"): + c = chr(int(name[1:], 16)) + else: + c = chr(int(name)) + except ValueError: + c = None + self.output_char(c, '&' + name + ';') + self.last = "ref" + # Helpers. + def output_char(self, c, fallback): + if c == '<': + self.output += "<" + elif c == '>': + self.output += ">" + elif c == '&': + self.output += "&" + elif c == '"': + self.output += """ + elif c == None: + self.output += fallback + else: + self.output += c + + def is_block_tag(self,tag): + return (tag in ['article', 'header', 'aside', 'hgroup', 'blockquote', + 'hr', 'iframe', 'body', 'li', 'map', 'button', 'object', 'canvas', + 'ol', 'caption', 'output', 'col', 'p', 'colgroup', 'pre', 'dd', + 'progress', 'div', 'section', 'dl', 'table', 'td', 'dt', + 'tbody', 'embed', 'textarea', 'fieldset', 'tfoot', 'figcaption', + 'th', 'figure', 'thead', 'footer', 'tr', 'form', 'ul', + 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'video', 'script', 'style']) + +def normalize_html(html): + r""" + Return normalized form of HTML which ignores insignificant output + differences: + + Multiple inner whitespaces are collapsed to a single space (except + in pre tags): + + >>> normalize_html("<p>a \t b</p>") + '<p>a b</p>' + + >>> normalize_html("<p>a \t\nb</p>") + '<p>a b</p>' + + * Whitespace surrounding block-level tags is removed. + + >>> normalize_html("<p>a b</p>") + '<p>a b</p>' + + >>> normalize_html(" <p>a b</p>") + '<p>a b</p>' + + >>> normalize_html("<p>a b</p> ") + '<p>a b</p>' + + >>> normalize_html("\n\t<p>\n\t\ta b\t\t</p>\n\t") + '<p>a b</p>' + + >>> normalize_html("<i>a b</i> ") + '<i>a b</i> ' + + * Self-closing tags are converted to open tags. + + >>> normalize_html("<br />") + '<br>' + + * Attributes are sorted and lowercased. + + >>> normalize_html('<a title="bar" HREF="foo">x</a>') + '<a href="foo" title="bar">x</a>' + + * References are converted to unicode, except that '<', '>', '&', and + '"' are rendered using entities. + + >>> normalize_html("∀&><"") + '\u2200&><"' + + """ + html_chunk_re = re.compile("(\<!\[CDATA\[.*?\]\]\>|\<[^>]*\>|[^<]+)") + try: + parser = MyHTMLParser() + # We work around HTMLParser's limitations parsing CDATA + # by breaking the input into chunks and passing CDATA chunks + # through verbatim. + for chunk in re.finditer(html_chunk_re, html): + if chunk.group(0)[:8] == "<![CDATA": + parser.output += chunk.group(0) + else: + parser.feed(chunk.group(0)) + parser.close() + return parser.output + except HTMLParseError as e: + sys.stderr.write("Normalization error: " + e.msg + "\n") + return html # on error, return unnormalized HTML diff --git a/tdemarkdown/md4c/test/pathological_tests.py b/tdemarkdown/md4c/test/pathological_tests.py new file mode 100755 index 000000000..76cb9dfc0 --- /dev/null +++ b/tdemarkdown/md4c/test/pathological_tests.py @@ -0,0 +1,128 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- + +import re +import argparse +import sys +import platform +from cmark import CMark +from timeit import default_timer as timer + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description='Run cmark tests.') + parser.add_argument('-p', '--program', dest='program', nargs='?', default=None, + help='program to test') + parser.add_argument('--library-dir', dest='library_dir', nargs='?', + default=None, help='directory containing dynamic library') + args = parser.parse_args(sys.argv[1:]) + +cmark = CMark(prog=args.program, library_dir=args.library_dir) + +# list of pairs consisting of input and a regex that must match the output. +pathological = { + # note - some pythons have limit of 65535 for {num-matches} in re. + "U+0000": + ("abc\u0000de\u0000", + re.compile("abc\ufffd?de\ufffd?")), + "U+FEFF (Unicode BOM)": + ("\ufefffoo", + re.compile("<p>foo</p>")), + "nested strong emph": + (("*a **a " * 65000) + "b" + (" a** a*" * 65000), + re.compile("(<em>a <strong>a ){65000}b( a</strong> a</em>){65000}")), + "many emph closers with no openers": + (("a_ " * 65000), + re.compile("(a[_] ){64999}a_")), + "many emph openers with no closers": + (("_a " * 65000), + re.compile("(_a ){64999}_a")), + "many 3-emph openers with no closers": + (("a***" * 65000), + re.compile("(a<em><strong>a</strong></em>){32500}")), + "many link closers with no openers": + (("a]" * 65000), + re.compile("(a\]){65000}")), + "many link openers with no closers": + (("[a" * 65000), + re.compile("(\[a){65000}")), + "mismatched openers and closers": + (("*a_ " * 50000), + re.compile("([*]a[_] ){49999}[*]a_")), + "openers and closers multiple of 3": + (("a**b" + ("c* " * 50000)), + re.compile("a[*][*]b(c[*] ){49999}c[*]")), + "link openers and emph closers": + (("[ a_" * 50000), + re.compile("(\[ a_){50000}")), + "hard link/emph case": + ("**x [a*b**c*](d)", + re.compile("\\*\\*x <a href=\"d\">a<em>b\\*\\*c</em></a>")), + "nested brackets": + (("[" * 50000) + "a" + ("]" * 50000), + re.compile("\[{50000}a\]{50000}")), + "nested block quotes": + ((("> " * 50000) + "a"), + re.compile("(<blockquote>\r?\n){50000}")), + "backticks": + ("".join(map(lambda x: ("e" + "`" * x), range(1,1000))), + re.compile("^<p>[e`]*</p>\r?\n$")), + "many links": + ("[t](/u) " * 50000, + re.compile("(<a href=\"/u\">t</a> ?){50000}")), + "many references": + ("".join(map(lambda x: ("[" + str(x) + "]: u\n"), range(1,20000 * 16))) + "[0] " * 20000, + re.compile("(\[0\] ){19999}")), + "deeply nested lists": + ("".join(map(lambda x: (" " * x + "* a\n"), range(0,1000))), + re.compile("<ul>\r?\n(<li>a<ul>\r?\n){999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){999}")), + "many html openers and closers": + (("<>" * 50000), + re.compile("(<>){50000}")), + "many html proc. inst. openers": + (("x" + "<?" * 50000), + re.compile("x(<\\?){50000}")), + "many html CDATA openers": + (("x" + "<![CDATA[" * 50000), + re.compile("x(<!\\[CDATA\\[){50000}")), + "many backticks and escapes": + (("\\``" * 50000), + re.compile("(``){50000}")), + "many broken link titles": + (("[ (](" * 50000), + re.compile("(\[ \(\]\(){50000}")), + "broken thematic break": + (("* " * 50000 + "a"), + re.compile("<ul>\r?\n(<li><ul>\r?\n){49999}<li>a</li>\r?\n</ul>\r?\n(</li>\r?\n</ul>\r?\n){49999}")), + "nested invalid link references": + (("[" * 50000 + "]" * 50000 + "\n\n[a]: /b"), + re.compile("\[{50000}\]{50000}")) +} + +whitespace_re = re.compile('/s+/') +passed = 0 +errored = 0 +failed = 0 + +#print("Testing pathological cases:") +for description in pathological: + (inp, regex) = pathological[description] + start = timer() + [rc, actual, err] = cmark.to_html(inp) + end = timer() + if rc != 0: + errored += 1 + print('{:35} [ERRORED (return code %d)]'.format(description, rc)) + print(err) + elif regex.search(actual): + print('{:35} [PASSED] {:.3f} secs'.format(description, end-start)) + passed += 1 + else: + print('{:35} [FAILED]'.format(description)) + print(repr(actual)) + failed += 1 + +print("%d passed, %d failed, %d errored" % (passed, failed, errored)) +if (failed == 0 and errored == 0): + exit(0) +else: + exit(1) diff --git a/tdemarkdown/md4c/test/permissive-email-autolinks.txt b/tdemarkdown/md4c/test/permissive-email-autolinks.txt new file mode 100644 index 000000000..12e8786c9 --- /dev/null +++ b/tdemarkdown/md4c/test/permissive-email-autolinks.txt @@ -0,0 +1,50 @@ + +# Permissive E-mail Autolinks + +With the flag `MD_FLAG_PERMISSIVEEMAILAUTOLINKS`, MD4C enables more permissive +recognition of e-mail addresses and transforms them to autolinks, even if they +do not exactly follow the syntax of autolink as specified in CommonMark +specification. + +This is standard CommonMark e-mail autolink: + +```````````````````````````````` example +E-mail: <mailto:john.doe@gmail.com> +. +<p>E-mail: <a href="mailto:john.doe@gmail.com">mailto:john.doe@gmail.com</a></p> +```````````````````````````````` + +With the permissive autolinks enabled, this is sufficient: + +```````````````````````````````` example +E-mail: john.doe@gmail.com +. +<p>E-mail: <a href="mailto:john.doe@gmail.com">john.doe@gmail.com</a></p> +```````````````````````````````` + +`+` can occur before the `@`, but not after. + +```````````````````````````````` example +hello@mail+xyz.example isn't valid, but hello+xyz@mail.example is. +. +<p>hello@mail+xyz.example isn't valid, but <a href="mailto:hello+xyz@mail.example">hello+xyz@mail.example</a> is.</p> +```````````````````````````````` + +`.`, `-`, and `_` can occur on both sides of the `@`, but only `.` may occur at +the end of the email address, in which case it will not be considered part of +the address: + +```````````````````````````````` example +a.b-c_d@a.b + +a.b-c_d@a.b. + +a.b-c_d@a.b- + +a.b-c_d@a.b_ +. +<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a></p> +<p><a href="mailto:a.b-c_d@a.b">a.b-c_d@a.b</a>.</p> +<p>a.b-c_d@a.b-</p> +<p>a.b-c_d@a.b_</p> +```````````````````````````````` diff --git a/tdemarkdown/md4c/test/permissive-url-autolinks.txt b/tdemarkdown/md4c/test/permissive-url-autolinks.txt new file mode 100644 index 000000000..dfd6b5d4d --- /dev/null +++ b/tdemarkdown/md4c/test/permissive-url-autolinks.txt @@ -0,0 +1,99 @@ + +# Permissive URL Autolinks + +With the flag `MD_FLAG_PERMISSIVEURLAUTOLINKS`, MD4C enables more permissive recognition +of URLs and transform them to autolinks, even if they do not exactly follow the syntax +of autolink as specified in CommonMark specification. + +This is a standard CommonMark autolink: + +```````````````````````````````` example +Homepage: <https://github.com/mity/md4c> +. +<p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p> +```````````````````````````````` + +With the permissive autolinks enabled, this is sufficient: + +```````````````````````````````` example +Homepage: https://github.com/mity/md4c +. +<p>Homepage: <a href="https://github.com/mity/md4c">https://github.com/mity/md4c</a></p> +```````````````````````````````` + +But this permissive autolink feature can work only for very widely used URL +schemes, in alphabetical order `ftp:`, `http:`, `https:`. + +That's why this is not a permissive autolink: + +```````````````````````````````` example +ssh://root@example.com +. +<p>ssh://root@example.com</p> +```````````````````````````````` + +The same rules for path validation as for permissivve WWW autolinks apply. +Therefore the final question mark here is not part of the autolink: + +```````````````````````````````` example +Have you ever visited http://www.zombo.com? +. +<p>Have you ever visited <a href="http://www.zombo.com">http://www.zombo.com</a>?</p> +```````````````````````````````` + +But in contrast, in this example it is: + +```````````````````````````````` example +http://www.bing.com/search?q=md4c +. +<p><a href="http://www.bing.com/search?q=md4c">http://www.bing.com/search?q=md4c</a></p> +```````````````````````````````` + +And finally one complex example: + +```````````````````````````````` example +http://commonmark.org + +(Visit https://encrypted.google.com/search?q=Markup+(business)) + +Anonymous FTP is available at ftp://foo.bar.baz. +. +<p><a href="http://commonmark.org">http://commonmark.org</a></p> +<p>(Visit <a href="https://encrypted.google.com/search?q=Markup+(business)">https://encrypted.google.com/search?q=Markup+(business)</a>)</p> +<p>Anonymous FTP is available at <a href="ftp://foo.bar.baz">ftp://foo.bar.baz</a>.</p> +```````````````````````````````` + + +## GitHub Issues + +### [Issue 53](https://github.com/mity/md4c/issues/53) + +```````````````````````````````` example +This is [link](http://github.com/). +. +<p>This is <a href="http://github.com/">link</a>.</p> +```````````````````````````````` + +```````````````````````````````` example +This is [link](http://github.com/)X +. +<p>This is <a href="http://github.com/">link</a>X</p> +```````````````````````````````` + + +## [Issue 76](https://github.com/mity/md4c/issues/76) + +```````````````````````````````` example +*(http://example.com)* +. +<p><em>(<a href="http://example.com">http://example.com</a>)</em></p> +```````````````````````````````` + + +## [Issue 152](https://github.com/mity/md4c/issues/152) + +```````````````````````````````` example +[http://example.com](http://example.com) +. +<p><a href="http://example.com">http://example.com</a></p> +```````````````````````````````` diff --git a/tdemarkdown/md4c/test/permissive-www-autolinks.txt b/tdemarkdown/md4c/test/permissive-www-autolinks.txt new file mode 100644 index 000000000..046de9d7a --- /dev/null +++ b/tdemarkdown/md4c/test/permissive-www-autolinks.txt @@ -0,0 +1,107 @@ + +# Permissive WWW Autolinks + +With the flag `MD_FLAG_PERMISSIVEWWWAUTOLINKS`, MD4C enables recognition of +autolinks starting with `www.`, even if they do not exactly follow the syntax +of autolink as specified in CommonMark specification. + +These do not have to be enclosed in `<` and `>`, and they even do not need +any preceding scheme specification. + +The WWW autolink will be recognized when the text `www.` is found followed by a +valid domain. A valid domain consists of segments of alphanumeric characters, +underscores (`_`) and hyphens (`-`) separated by periods (`.`). There must be +at least one period, and no underscores may be present in the last two segments +of the domain. + +The scheme `http` will be inserted automatically: + +```````````````````````````````` example +www.commonmark.org +. +<p><a href="http://www.commonmark.org">www.commonmark.org</a></p> +```````````````````````````````` + +After a valid domain, zero or more non-space non-`<` characters may follow: + +```````````````````````````````` example +Visit www.commonmark.org/help for more information. +. +<p>Visit <a href="http://www.commonmark.org/help">www.commonmark.org/help</a> for more information.</p> +```````````````````````````````` + +We then apply extended autolink path validation as follows: + +Trailing punctuation (specifically, `?`, `!`, `.`, `,`, `:`, `*`, `_`, and `~`) +will not be considered part of the autolink, though they may be included in the +interior of the link: + +```````````````````````````````` example +Visit www.commonmark.org. + +Visit www.commonmark.org/a.b. +. +<p>Visit <a href="http://www.commonmark.org">www.commonmark.org</a>.</p> +<p>Visit <a href="http://www.commonmark.org/a.b">www.commonmark.org/a.b</a>.</p> +```````````````````````````````` + +When an autolink ends in `)`, we scan the entire autolink for the total number +of parentheses. If there is a greater number of closing parentheses than +opening ones, we don't consider the last character part of the autolink, in +order to facilitate including an autolink inside a parenthesis: + +```````````````````````````````` example +www.google.com/search?q=Markup+(business) + +(www.google.com/search?q=Markup+(business)) +. +<p><a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a></p> +<p>(<a href="http://www.google.com/search?q=Markup+(business)">www.google.com/search?q=Markup+(business)</a>)</p> +```````````````````````````````` + +This check is only done when the link ends in a closing parentheses `)`, so if +the only parentheses are in the interior of the autolink, no special rules are +applied: + +```````````````````````````````` example +www.google.com/search?q=(business))+ok +. +<p><a href="http://www.google.com/search?q=(business))+ok">www.google.com/search?q=(business))+ok</a></p> +```````````````````````````````` + +If an autolink ends in a semicolon (`;`), we check to see if it appears to +resemble an [entity reference][entity references]; if the preceding text is `&` +followed by one or more alphanumeric characters. If so, it is excluded from +the autolink: + +```````````````````````````````` example +www.google.com/search?q=commonmark&hl=en + +www.google.com/search?q=commonmark&hl; +. +<p><a href="http://www.google.com/search?q=commonmark&hl=en">www.google.com/search?q=commonmark&hl=en</a></p> +<p><a href="http://www.google.com/search?q=commonmark">www.google.com/search?q=commonmark</a>&hl;</p> +```````````````````````````````` + +`<` immediately ends an autolink. + +```````````````````````````````` example +www.commonmark.org/he<lp +. +<p><a href="http://www.commonmark.org/he">www.commonmark.org/he</a><lp</p> +```````````````````````````````` + + +## GitHub Issues + +### [Issue 53](https://github.com/mity/md4c/issues/53) +```````````````````````````````` example +This is [link](www.github.com/). +. +<p>This is <a href="www.github.com/">link</a>.</p> +```````````````````````````````` +```````````````````````````````` example +This is [link](www.github.com/)X +. +<p>This is <a href="www.github.com/">link</a>X</p> +```````````````````````````````` diff --git a/tdemarkdown/md4c/test/spec.txt b/tdemarkdown/md4c/test/spec.txt new file mode 100644 index 000000000..fefb308bb --- /dev/null +++ b/tdemarkdown/md4c/test/spec.txt @@ -0,0 +1,9756 @@ +--- +title: CommonMark Spec +author: John MacFarlane +version: '0.30' +date: '2021-06-19' +license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' +... + +# Introduction + +## What is Markdown? + +Markdown is a plain text format for writing structured documents, +based on conventions for indicating formatting in email +and usenet posts. It was developed by John Gruber (with +help from Aaron Swartz) and released in 2004 in the form of a +[syntax description](http://daringfireball.net/projects/markdown/syntax) +and a Perl script (`Markdown.pl`) for converting Markdown to +HTML. In the next decade, dozens of implementations were +developed in many languages. Some extended the original +Markdown syntax with conventions for footnotes, tables, and +other document elements. Some allowed Markdown documents to be +rendered in formats other than HTML. Websites like Reddit, +StackOverflow, and GitHub had millions of people using Markdown. +And Markdown started to be used beyond the web, to author books, +articles, slide shows, letters, and lecture notes. + +What distinguishes Markdown from many other lightweight markup +syntaxes, which are often easier to write, is its readability. +As Gruber writes: + +> The overriding design goal for Markdown's formatting syntax is +> to make it as readable as possible. The idea is that a +> Markdown-formatted document should be publishable as-is, as +> plain text, without looking like it's been marked up with tags +> or formatting instructions. +> (<http://daringfireball.net/projects/markdown/>) + +The point can be illustrated by comparing a sample of +[AsciiDoc](http://www.methods.co.nz/asciidoc/) with +an equivalent sample of Markdown. Here is a sample of +AsciiDoc from the AsciiDoc manual: + +``` +1. List item one. ++ +List item one continued with a second paragraph followed by an +Indented block. ++ +................. +$ ls *.sh +$ mv *.sh ~/tmp +................. ++ +List item continued with a third paragraph. + +2. List item two continued with an open block. ++ +-- +This paragraph is part of the preceding list item. + +a. This list is nested and does not require explicit item +continuation. ++ +This paragraph is part of the preceding list item. + +b. List item b. + +This paragraph belongs to item two of the outer list. +-- +``` + +And here is the equivalent in Markdown: +``` +1. List item one. + + List item one continued with a second paragraph followed by an + Indented block. + + $ ls *.sh + $ mv *.sh ~/tmp + + List item continued with a third paragraph. + +2. List item two continued with an open block. + + This paragraph is part of the preceding list item. + + 1. This list is nested and does not require explicit item continuation. + + This paragraph is part of the preceding list item. + + 2. List item b. + + This paragraph belongs to item two of the outer list. +``` + +The AsciiDoc version is, arguably, easier to write. You don't need +to worry about indentation. But the Markdown version is much easier +to read. The nesting of list items is apparent to the eye in the +source, not just in the processed document. + +## Why is a spec needed? + +John Gruber's [canonical description of Markdown's +syntax](http://daringfireball.net/projects/markdown/syntax) +does not specify the syntax unambiguously. Here are some examples of +questions it does not answer: + +1. How much indentation is needed for a sublist? The spec says that + continuation paragraphs need to be indented four spaces, but is + not fully explicit about sublists. It is natural to think that + they, too, must be indented four spaces, but `Markdown.pl` does + not require that. This is hardly a "corner case," and divergences + between implementations on this issue often lead to surprises for + users in real documents. (See [this comment by John + Gruber](http://article.gmane.org/gmane.text.markdown.general/1997).) + +2. Is a blank line needed before a block quote or heading? + Most implementations do not require the blank line. However, + this can lead to unexpected results in hard-wrapped text, and + also to ambiguities in parsing (note that some implementations + put the heading inside the blockquote, while others do not). + (John Gruber has also spoken [in favor of requiring the blank + lines](http://article.gmane.org/gmane.text.markdown.general/2146).) + +3. Is a blank line needed before an indented code block? + (`Markdown.pl` requires it, but this is not mentioned in the + documentation, and some implementations do not require it.) + + ``` markdown + paragraph + code? + ``` + +4. What is the exact rule for determining when list items get + wrapped in `<p>` tags? Can a list be partially "loose" and partially + "tight"? What should we do with a list like this? + + ``` markdown + 1. one + + 2. two + 3. three + ``` + + Or this? + + ``` markdown + 1. one + - a + + - b + 2. two + ``` + + (There are some relevant comments by John Gruber + [here](http://article.gmane.org/gmane.text.markdown.general/2554).) + +5. Can list markers be indented? Can ordered list markers be right-aligned? + + ``` markdown + 8. item 1 + 9. item 2 + 10. item 2a + ``` + +6. Is this one list with a thematic break in its second item, + or two lists separated by a thematic break? + + ``` markdown + * a + * * * * * + * b + ``` + +7. When list markers change from numbers to bullets, do we have + two lists or one? (The Markdown syntax description suggests two, + but the perl scripts and many other implementations produce one.) + + ``` markdown + 1. fee + 2. fie + - foe + - fum + ``` + +8. What are the precedence rules for the markers of inline structure? + For example, is the following a valid link, or does the code span + take precedence ? + + ``` markdown + [a backtick (`)](/url) and [another backtick (`)](/url). + ``` + +9. What are the precedence rules for markers of emphasis and strong + emphasis? For example, how should the following be parsed? + + ``` markdown + *foo *bar* baz* + ``` + +10. What are the precedence rules between block-level and inline-level + structure? For example, how should the following be parsed? + + ``` markdown + - `a long code span can contain a hyphen like this + - and it can screw things up` + ``` + +11. Can list items include section headings? (`Markdown.pl` does not + allow this, but does allow blockquotes to include headings.) + + ``` markdown + - # Heading + ``` + +12. Can list items be empty? + + ``` markdown + * a + * + * b + ``` + +13. Can link references be defined inside block quotes or list items? + + ``` markdown + > Blockquote [foo]. + > + > [foo]: /url + ``` + +14. If there are multiple definitions for the same reference, which takes + precedence? + + ``` markdown + [foo]: /url1 + [foo]: /url2 + + [foo][] + ``` + +In the absence of a spec, early implementers consulted `Markdown.pl` +to resolve these ambiguities. But `Markdown.pl` was quite buggy, and +gave manifestly bad results in many cases, so it was not a +satisfactory replacement for a spec. + +Because there is no unambiguous spec, implementations have diverged +considerably. As a result, users are often surprised to find that +a document that renders one way on one system (say, a GitHub wiki) +renders differently on another (say, converting to docbook using +pandoc). To make matters worse, because nothing in Markdown counts +as a "syntax error," the divergence often isn't discovered right away. + +## About this document + +This document attempts to specify Markdown syntax unambiguously. +It contains many examples with side-by-side Markdown and +HTML. These are intended to double as conformance tests. An +accompanying script `spec_tests.py` can be used to run the tests +against any Markdown program: + + python test/spec_tests.py --spec spec.txt --program PROGRAM + +Since this document describes how Markdown is to be parsed into +an abstract syntax tree, it would have made sense to use an abstract +representation of the syntax tree instead of HTML. But HTML is capable +of representing the structural distinctions we need to make, and the +choice of HTML for the tests makes it possible to run the tests against +an implementation without writing an abstract syntax tree renderer. + +Note that not every feature of the HTML samples is mandated by +the spec. For example, the spec says what counts as a link +destination, but it doesn't mandate that non-ASCII characters in +the URL be percent-encoded. To use the automatic tests, +implementers will need to provide a renderer that conforms to +the expectations of the spec examples (percent-encoding +non-ASCII characters in URLs). But a conforming implementation +can use a different renderer and may choose not to +percent-encode non-ASCII characters in URLs. + +This document is generated from a text file, `spec.txt`, written +in Markdown with a small extension for the side-by-side tests. +The script `tools/makespec.py` can be used to convert `spec.txt` into +HTML or CommonMark (which can then be converted into other formats). + +In the examples, the `→` character is used to represent tabs. + +# Preliminaries + +## Characters and lines + +Any sequence of [characters] is a valid CommonMark +document. + +A [character](@) is a Unicode code point. Although some +code points (for example, combining accents) do not correspond to +characters in an intuitive sense, all code points count as characters +for purposes of this spec. + +This spec does not specify an encoding; it thinks of lines as composed +of [characters] rather than bytes. A conforming parser may be limited +to a certain encoding. + +A [line](@) is a sequence of zero or more [characters] +other than line feed (`U+000A`) or carriage return (`U+000D`), +followed by a [line ending] or by the end of file. + +A [line ending](@) is a line feed (`U+000A`), a carriage return +(`U+000D`) not followed by a line feed, or a carriage return and a +following line feed. + +A line containing no characters, or a line containing only spaces +(`U+0020`) or tabs (`U+0009`), is called a [blank line](@). + +The following definitions of character classes will be used in this spec: + +A [Unicode whitespace character](@) is +any code point in the Unicode `Zs` general category, or a tab (`U+0009`), +line feed (`U+000A`), form feed (`U+000C`), or carriage return (`U+000D`). + +[Unicode whitespace](@) is a sequence of one or more +[Unicode whitespace characters]. + +A [tab](@) is `U+0009`. + +A [space](@) is `U+0020`. + +An [ASCII control character](@) is a character between `U+0000–1F` (both +including) or `U+007F`. + +An [ASCII punctuation character](@) +is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, +`*`, `+`, `,`, `-`, `.`, `/` (U+0021–2F), +`:`, `;`, `<`, `=`, `>`, `?`, `@` (U+003A–0040), +`[`, `\`, `]`, `^`, `_`, `` ` `` (U+005B–0060), +`{`, `|`, `}`, or `~` (U+007B–007E). + +A [Unicode punctuation character](@) is an [ASCII +punctuation character] or anything in +the general Unicode categories `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. + +## Tabs + +Tabs in lines are not expanded to [spaces]. However, +in contexts where spaces help to define block structure, +tabs behave as if they were replaced by spaces with a tab stop +of 4 characters. + +Thus, for example, a tab can be used instead of four spaces +in an indented code block. (Note, however, that internal +tabs are passed through as literal tabs, not expanded to +spaces.) + +```````````````````````````````` example +→foo→baz→→bim +. +<pre><code>foo→baz→→bim +</code></pre> +```````````````````````````````` + +```````````````````````````````` example + →foo→baz→→bim +. +<pre><code>foo→baz→→bim +</code></pre> +```````````````````````````````` + +```````````````````````````````` example + a→a + ὐ→a +. +<pre><code>a→a +ὐ→a +</code></pre> +```````````````````````````````` + +In the following example, a continuation paragraph of a list +item is indented with a tab; this has exactly the same effect +as indentation with four spaces would: + +```````````````````````````````` example + - foo + +→bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +```````````````````````````````` + +```````````````````````````````` example +- foo + +→→bar +. +<ul> +<li> +<p>foo</p> +<pre><code> bar +</code></pre> +</li> +</ul> +```````````````````````````````` + +Normally the `>` that begins a block quote may be followed +optionally by a space, which is not considered part of the +content. In the following case `>` is followed by a tab, +which is treated as if it were expanded into three spaces. +Since one of these spaces is considered part of the +delimiter, `foo` is considered to be indented six spaces +inside the block quote context, so we get an indented +code block starting with two spaces. + +```````````````````````````````` example +>→→foo +. +<blockquote> +<pre><code> foo +</code></pre> +</blockquote> +```````````````````````````````` + +```````````````````````````````` example +-→→foo +. +<ul> +<li> +<pre><code> foo +</code></pre> +</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example + foo +→bar +. +<pre><code>foo +bar +</code></pre> +```````````````````````````````` + +```````````````````````````````` example + - foo + - bar +→ - baz +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li>baz</li> +</ul> +</li> +</ul> +</li> +</ul> +```````````````````````````````` + +```````````````````````````````` example +#→Foo +. +<h1>Foo</h1> +```````````````````````````````` + +```````````````````````````````` example +*→*→*→ +. +<hr /> +```````````````````````````````` + + +## Insecure characters + +For security reasons, the Unicode character `U+0000` must be replaced +with the REPLACEMENT CHARACTER (`U+FFFD`). + + +## Backslash escapes + +Any ASCII punctuation character may be backslash-escaped: + +```````````````````````````````` example +\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~ +. +<p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p> +```````````````````````````````` + + +Backslashes before other characters are treated as literal +backslashes: + +```````````````````````````````` example +\→\A\a\ \3\φ\« +. +<p>\→\A\a\ \3\φ\«</p> +```````````````````````````````` + + +Escaped characters are treated as regular characters and do +not have their usual Markdown meanings: + +```````````````````````````````` example +\*not emphasized* +\<br/> not a tag +\[not a link](/foo) +\`not code` +1\. not a list +\* not a list +\# not a heading +\[foo]: /url "not a reference" +\ö not a character entity +. +<p>*not emphasized* +<br/> not a tag +[not a link](/foo) +`not code` +1. not a list +* not a list +# not a heading +[foo]: /url "not a reference" +&ouml; not a character entity</p> +```````````````````````````````` + + +If a backslash is itself escaped, the following character is not: + +```````````````````````````````` example +\\*emphasis* +. +<p>\<em>emphasis</em></p> +```````````````````````````````` + + +A backslash at the end of the line is a [hard line break]: + +```````````````````````````````` example +foo\ +bar +. +<p>foo<br /> +bar</p> +```````````````````````````````` + + +Backslash escapes do not work in code blocks, code spans, autolinks, or +raw HTML: + +```````````````````````````````` example +`` \[\` `` +. +<p><code>\[\`</code></p> +```````````````````````````````` + + +```````````````````````````````` example + \[\] +. +<pre><code>\[\] +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +~~~ +\[\] +~~~ +. +<pre><code>\[\] +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +<http://example.com?find=\*> +. +<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<a href="/bar\/)"> +. +<a href="/bar\/)"> +```````````````````````````````` + + +But they work in all other contexts, including URLs and link titles, +link references, and [info strings] in [fenced code blocks]: + +```````````````````````````````` example +[foo](/bar\* "ti\*tle") +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo] + +[foo]: /bar\* "ti\*tle" +. +<p><a href="/bar*" title="ti*tle">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +``` foo\+bar +foo +``` +. +<pre><code class="language-foo+bar">foo +</code></pre> +```````````````````````````````` + + +## Entity and numeric character references + +Valid HTML entity references and numeric character references +can be used in place of the corresponding Unicode character, +with the following exceptions: + +- Entity and character references are not recognized in code + blocks and code spans. + +- Entity and character references cannot stand in place of + special characters that define structural elements in + CommonMark. For example, although `*` can be used + in place of a literal `*` character, `*` cannot replace + `*` in emphasis delimiters, bullet list markers, or thematic + breaks. + +Conforming CommonMark parsers need not store information about +whether a particular character was represented in the source +using a Unicode character or an entity reference. + +[Entity references](@) consist of `&` + any of the valid +HTML5 entity names + `;`. The +document <https://html.spec.whatwg.org/entities.json> +is used as an authoritative source for the valid entity +references and their corresponding code points. + +```````````````````````````````` example + & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸ +. +<p> & © Æ Ď +¾ ℋ ⅆ +∲ ≧̸</p> +```````````````````````````````` + + +[Decimal numeric character +references](@) +consist of `&#` + a string of 1--7 arabic digits + `;`. A +numeric character reference is parsed as the corresponding +Unicode character. Invalid Unicode code points will be replaced by +the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons, +the code point `U+0000` will also be replaced by `U+FFFD`. + +```````````````````````````````` example +# Ӓ Ϡ � +. +<p># Ӓ Ϡ �</p> +```````````````````````````````` + + +[Hexadecimal numeric character +references](@) consist of `&#` + +either `X` or `x` + a string of 1-6 hexadecimal digits + `;`. +They too are parsed as the corresponding Unicode character (this +time specified with a hexadecimal numeral instead of decimal). + +```````````````````````````````` example +" ആ ಫ +. +<p>" ആ ಫ</p> +```````````````````````````````` + + +Here are some nonentities: + +```````````````````````````````` example +  &x; &#; &#x; +� +&#abcdef0; +&ThisIsNotDefined; &hi?; +. +<p>&nbsp &x; &#; &#x; +&#87654321; +&#abcdef0; +&ThisIsNotDefined; &hi?;</p> +```````````````````````````````` + + +Although HTML5 does accept some entity references +without a trailing semicolon (such as `©`), these are not +recognized here, because it makes the grammar too ambiguous: + +```````````````````````````````` example +© +. +<p>&copy</p> +```````````````````````````````` + + +Strings that are not on the list of HTML5 named entities are not +recognized as entity references either: + +```````````````````````````````` example +&MadeUpEntity; +. +<p>&MadeUpEntity;</p> +```````````````````````````````` + + +Entity and numeric character references are recognized in any +context besides code spans or code blocks, including +URLs, [link titles], and [fenced code block][] [info strings]: + +```````````````````````````````` example +<a href="öö.html"> +. +<a href="öö.html"> +```````````````````````````````` + + +```````````````````````````````` example +[foo](/föö "föö") +. +<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo] + +[foo]: /föö "föö" +. +<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +``` föö +foo +``` +. +<pre><code class="language-föö">foo +</code></pre> +```````````````````````````````` + + +Entity and numeric character references are treated as literal +text in code spans and code blocks: + +```````````````````````````````` example +`föö` +. +<p><code>f&ouml;&ouml;</code></p> +```````````````````````````````` + + +```````````````````````````````` example + föfö +. +<pre><code>f&ouml;f&ouml; +</code></pre> +```````````````````````````````` + + +Entity and numeric character references cannot be used +in place of symbols indicating structure in CommonMark +documents. + +```````````````````````````````` example +*foo* +*foo* +. +<p>*foo* +<em>foo</em></p> +```````````````````````````````` + +```````````````````````````````` example +* foo + +* foo +. +<p>* foo</p> +<ul> +<li>foo</li> +</ul> +```````````````````````````````` + +```````````````````````````````` example +foo bar +. +<p>foo + +bar</p> +```````````````````````````````` + +```````````````````````````````` example +	foo +. +<p>→foo</p> +```````````````````````````````` + + +```````````````````````````````` example +[a](url "tit") +. +<p>[a](url "tit")</p> +```````````````````````````````` + + + +# Blocks and inlines + +We can think of a document as a sequence of +[blocks](@)---structural elements like paragraphs, block +quotations, lists, headings, rules, and code blocks. Some blocks (like +block quotes and list items) contain other blocks; others (like +headings and paragraphs) contain [inline](@) content---text, +links, emphasized text, images, code spans, and so on. + +## Precedence + +Indicators of block structure always take precedence over indicators +of inline structure. So, for example, the following is a list with +two items, not a list with one item containing a code span: + +```````````````````````````````` example +- `one +- two` +. +<ul> +<li>`one</li> +<li>two`</li> +</ul> +```````````````````````````````` + + +This means that parsing can proceed in two steps: first, the block +structure of the document can be discerned; second, text lines inside +paragraphs, headings, and other block constructs can be parsed for inline +structure. The second step requires information about link reference +definitions that will be available only at the end of the first +step. Note that the first step requires processing lines in sequence, +but the second can be parallelized, since the inline parsing of +one block element does not affect the inline parsing of any other. + +## Container blocks and leaf blocks + +We can divide blocks into two types: +[container blocks](#container-blocks), +which can contain other blocks, and [leaf blocks](#leaf-blocks), +which cannot. + +# Leaf blocks + +This section describes the different kinds of leaf block that make up a +Markdown document. + +## Thematic breaks + +A line consisting of optionally up to three spaces of indentation, followed by a +sequence of three or more matching `-`, `_`, or `*` characters, each followed +optionally by any number of spaces or tabs, forms a +[thematic break](@). + +```````````````````````````````` example +*** +--- +___ +. +<hr /> +<hr /> +<hr /> +```````````````````````````````` + + +Wrong characters: + +```````````````````````````````` example ++++ +. +<p>+++</p> +```````````````````````````````` + + +```````````````````````````````` example +=== +. +<p>===</p> +```````````````````````````````` + + +Not enough characters: + +```````````````````````````````` example +-- +** +__ +. +<p>-- +** +__</p> +```````````````````````````````` + + +Up to three spaces of indentation are allowed: + +```````````````````````````````` example + *** + *** + *** +. +<hr /> +<hr /> +<hr /> +```````````````````````````````` + + +Four spaces of indentation is too many: + +```````````````````````````````` example + *** +. +<pre><code>*** +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +Foo + *** +. +<p>Foo +***</p> +```````````````````````````````` + + +More than three characters may be used: + +```````````````````````````````` example +_____________________________________ +. +<hr /> +```````````````````````````````` + + +Spaces and tabs are allowed between the characters: + +```````````````````````````````` example + - - - +. +<hr /> +```````````````````````````````` + + +```````````````````````````````` example + ** * ** * ** * ** +. +<hr /> +```````````````````````````````` + + +```````````````````````````````` example +- - - - +. +<hr /> +```````````````````````````````` + + +Spaces and tabs are allowed at the end: + +```````````````````````````````` example +- - - - +. +<hr /> +```````````````````````````````` + + +However, no other characters may occur in the line: + +```````````````````````````````` example +_ _ _ _ a + +a------ + +---a--- +. +<p>_ _ _ _ a</p> +<p>a------</p> +<p>---a---</p> +```````````````````````````````` + + +It is required that all of the characters other than spaces or tabs be the same. +So, this is not a thematic break: + +```````````````````````````````` example + *-* +. +<p><em>-</em></p> +```````````````````````````````` + + +Thematic breaks do not need blank lines before or after: + +```````````````````````````````` example +- foo +*** +- bar +. +<ul> +<li>foo</li> +</ul> +<hr /> +<ul> +<li>bar</li> +</ul> +```````````````````````````````` + + +Thematic breaks can interrupt a paragraph: + +```````````````````````````````` example +Foo +*** +bar +. +<p>Foo</p> +<hr /> +<p>bar</p> +```````````````````````````````` + + +If a line of dashes that meets the above conditions for being a +thematic break could also be interpreted as the underline of a [setext +heading], the interpretation as a +[setext heading] takes precedence. Thus, for example, +this is a setext heading, not a paragraph followed by a thematic break: + +```````````````````````````````` example +Foo +--- +bar +. +<h2>Foo</h2> +<p>bar</p> +```````````````````````````````` + + +When both a thematic break and a list item are possible +interpretations of a line, the thematic break takes precedence: + +```````````````````````````````` example +* Foo +* * * +* Bar +. +<ul> +<li>Foo</li> +</ul> +<hr /> +<ul> +<li>Bar</li> +</ul> +```````````````````````````````` + + +If you want a thematic break in a list item, use a different bullet: + +```````````````````````````````` example +- Foo +- * * * +. +<ul> +<li>Foo</li> +<li> +<hr /> +</li> +</ul> +```````````````````````````````` + + +## ATX headings + +An [ATX heading](@) +consists of a string of characters, parsed as inline content, between an +opening sequence of 1--6 unescaped `#` characters and an optional +closing sequence of any number of unescaped `#` characters. +The opening sequence of `#` characters must be followed by spaces or tabs, or +by the end of line. The optional closing sequence of `#`s must be preceded by +spaces or tabs and may be followed by spaces or tabs only. The opening +`#` character may be preceded by up to three spaces of indentation. The raw +contents of the heading are stripped of leading and trailing space or tabs +before being parsed as inline content. The heading level is equal to the number +of `#` characters in the opening sequence. + +Simple headings: + +```````````````````````````````` example +# foo +## foo +### foo +#### foo +##### foo +###### foo +. +<h1>foo</h1> +<h2>foo</h2> +<h3>foo</h3> +<h4>foo</h4> +<h5>foo</h5> +<h6>foo</h6> +```````````````````````````````` + + +More than six `#` characters is not a heading: + +```````````````````````````````` example +####### foo +. +<p>####### foo</p> +```````````````````````````````` + + +At least one space or tab is required between the `#` characters and the +heading's contents, unless the heading is empty. Note that many +implementations currently do not require the space. However, the +space was required by the +[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py), +and it helps prevent things like the following from being parsed as +headings: + +```````````````````````````````` example +#5 bolt + +#hashtag +. +<p>#5 bolt</p> +<p>#hashtag</p> +```````````````````````````````` + + +This is not a heading, because the first `#` is escaped: + +```````````````````````````````` example +\## foo +. +<p>## foo</p> +```````````````````````````````` + + +Contents are parsed as inlines: + +```````````````````````````````` example +# foo *bar* \*baz\* +. +<h1>foo <em>bar</em> *baz*</h1> +```````````````````````````````` + + +Leading and trailing spaces or tabs are ignored in parsing inline content: + +```````````````````````````````` example +# foo +. +<h1>foo</h1> +```````````````````````````````` + + +Up to three spaces of indentation are allowed: + +```````````````````````````````` example + ### foo + ## foo + # foo +. +<h3>foo</h3> +<h2>foo</h2> +<h1>foo</h1> +```````````````````````````````` + + +Four spaces of indentation is too many: + +```````````````````````````````` example + # foo +. +<pre><code># foo +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +foo + # bar +. +<p>foo +# bar</p> +```````````````````````````````` + + +A closing sequence of `#` characters is optional: + +```````````````````````````````` example +## foo ## + ### bar ### +. +<h2>foo</h2> +<h3>bar</h3> +```````````````````````````````` + + +It need not be the same length as the opening sequence: + +```````````````````````````````` example +# foo ################################## +##### foo ## +. +<h1>foo</h1> +<h5>foo</h5> +```````````````````````````````` + + +Spaces or tabs are allowed after the closing sequence: + +```````````````````````````````` example +### foo ### +. +<h3>foo</h3> +```````````````````````````````` + + +A sequence of `#` characters with anything but spaces or tabs following it +is not a closing sequence, but counts as part of the contents of the +heading: + +```````````````````````````````` example +### foo ### b +. +<h3>foo ### b</h3> +```````````````````````````````` + + +The closing sequence must be preceded by a space or tab: + +```````````````````````````````` example +# foo# +. +<h1>foo#</h1> +```````````````````````````````` + + +Backslash-escaped `#` characters do not count as part +of the closing sequence: + +```````````````````````````````` example +### foo \### +## foo #\## +# foo \# +. +<h3>foo ###</h3> +<h2>foo ###</h2> +<h1>foo #</h1> +```````````````````````````````` + + +ATX headings need not be separated from surrounding content by blank +lines, and they can interrupt paragraphs: + +```````````````````````````````` example +**** +## foo +**** +. +<hr /> +<h2>foo</h2> +<hr /> +```````````````````````````````` + + +```````````````````````````````` example +Foo bar +# baz +Bar foo +. +<p>Foo bar</p> +<h1>baz</h1> +<p>Bar foo</p> +```````````````````````````````` + + +ATX headings can be empty: + +```````````````````````````````` example +## +# +### ### +. +<h2></h2> +<h1></h1> +<h3></h3> +```````````````````````````````` + + +## Setext headings + +A [setext heading](@) consists of one or more +lines of text, not interrupted by a blank line, of which the first line does not +have more than 3 spaces of indentation, followed by +a [setext heading underline]. The lines of text must be such +that, were they not followed by the setext heading underline, +they would be interpreted as a paragraph: they cannot be +interpretable as a [code fence], [ATX heading][ATX headings], +[block quote][block quotes], [thematic break][thematic breaks], +[list item][list items], or [HTML block][HTML blocks]. + +A [setext heading underline](@) is a sequence of +`=` characters or a sequence of `-` characters, with no more than 3 +spaces of indentation and any number of trailing spaces or tabs. If a line +containing a single `-` can be interpreted as an +empty [list items], it should be interpreted this way +and not as a [setext heading underline]. + +The heading is a level 1 heading if `=` characters are used in +the [setext heading underline], and a level 2 heading if `-` +characters are used. The contents of the heading are the result +of parsing the preceding lines of text as CommonMark inline +content. + +In general, a setext heading need not be preceded or followed by a +blank line. However, it cannot interrupt a paragraph, so when a +setext heading comes after a paragraph, a blank line is needed between +them. + +Simple examples: + +```````````````````````````````` example +Foo *bar* +========= + +Foo *bar* +--------- +. +<h1>Foo <em>bar</em></h1> +<h2>Foo <em>bar</em></h2> +```````````````````````````````` + + +The content of the header may span more than one line: + +```````````````````````````````` example +Foo *bar +baz* +==== +. +<h1>Foo <em>bar +baz</em></h1> +```````````````````````````````` + +The contents are the result of parsing the headings's raw +content as inlines. The heading's raw content is formed by +concatenating the lines and removing initial and final +spaces or tabs. + +```````````````````````````````` example + Foo *bar +baz*→ +==== +. +<h1>Foo <em>bar +baz</em></h1> +```````````````````````````````` + + +The underlining can be any length: + +```````````````````````````````` example +Foo +------------------------- + +Foo += +. +<h2>Foo</h2> +<h1>Foo</h1> +```````````````````````````````` + + +The heading content can be preceded by up to three spaces of indentation, and +need not line up with the underlining: + +```````````````````````````````` example + Foo +--- + + Foo +----- + + Foo + === +. +<h2>Foo</h2> +<h2>Foo</h2> +<h1>Foo</h1> +```````````````````````````````` + + +Four spaces of indentation is too many: + +```````````````````````````````` example + Foo + --- + + Foo +--- +. +<pre><code>Foo +--- + +Foo +</code></pre> +<hr /> +```````````````````````````````` + + +The setext heading underline can be preceded by up to three spaces of +indentation, and may have trailing spaces or tabs: + +```````````````````````````````` example +Foo + ---- +. +<h2>Foo</h2> +```````````````````````````````` + + +Four spaces of indentation is too many: + +```````````````````````````````` example +Foo + --- +. +<p>Foo +---</p> +```````````````````````````````` + + +The setext heading underline cannot contain internal spaces or tabs: + +```````````````````````````````` example +Foo += = + +Foo +--- - +. +<p>Foo += =</p> +<p>Foo</p> +<hr /> +```````````````````````````````` + + +Trailing spaces or tabs in the content line do not cause a hard line break: + +```````````````````````````````` example +Foo +----- +. +<h2>Foo</h2> +```````````````````````````````` + + +Nor does a backslash at the end: + +```````````````````````````````` example +Foo\ +---- +. +<h2>Foo\</h2> +```````````````````````````````` + + +Since indicators of block structure take precedence over +indicators of inline structure, the following are setext headings: + +```````````````````````````````` example +`Foo +---- +` + +<a title="a lot +--- +of dashes"/> +. +<h2>`Foo</h2> +<p>`</p> +<h2><a title="a lot</h2> +<p>of dashes"/></p> +```````````````````````````````` + + +The setext heading underline cannot be a [lazy continuation +line] in a list item or block quote: + +```````````````````````````````` example +> Foo +--- +. +<blockquote> +<p>Foo</p> +</blockquote> +<hr /> +```````````````````````````````` + + +```````````````````````````````` example +> foo +bar +=== +. +<blockquote> +<p>foo +bar +===</p> +</blockquote> +```````````````````````````````` + + +```````````````````````````````` example +- Foo +--- +. +<ul> +<li>Foo</li> +</ul> +<hr /> +```````````````````````````````` + + +A blank line is needed between a paragraph and a following +setext heading, since otherwise the paragraph becomes part +of the heading's content: + +```````````````````````````````` example +Foo +Bar +--- +. +<h2>Foo +Bar</h2> +```````````````````````````````` + + +But in general a blank line is not required before or after +setext headings: + +```````````````````````````````` example +--- +Foo +--- +Bar +--- +Baz +. +<hr /> +<h2>Foo</h2> +<h2>Bar</h2> +<p>Baz</p> +```````````````````````````````` + + +Setext headings cannot be empty: + +```````````````````````````````` example + +==== +. +<p>====</p> +```````````````````````````````` + + +Setext heading text lines must not be interpretable as block +constructs other than paragraphs. So, the line of dashes +in these examples gets interpreted as a thematic break: + +```````````````````````````````` example +--- +--- +. +<hr /> +<hr /> +```````````````````````````````` + + +```````````````````````````````` example +- foo +----- +. +<ul> +<li>foo</li> +</ul> +<hr /> +```````````````````````````````` + + +```````````````````````````````` example + foo +--- +. +<pre><code>foo +</code></pre> +<hr /> +```````````````````````````````` + + +```````````````````````````````` example +> foo +----- +. +<blockquote> +<p>foo</p> +</blockquote> +<hr /> +```````````````````````````````` + + +If you want a heading with `> foo` as its literal text, you can +use backslash escapes: + +```````````````````````````````` example +\> foo +------ +. +<h2>> foo</h2> +```````````````````````````````` + + +**Compatibility note:** Most existing Markdown implementations +do not allow the text of setext headings to span multiple lines. +But there is no consensus about how to interpret + +``` markdown +Foo +bar +--- +baz +``` + +One can find four different interpretations: + +1. paragraph "Foo", heading "bar", paragraph "baz" +2. paragraph "Foo bar", thematic break, paragraph "baz" +3. paragraph "Foo bar --- baz" +4. heading "Foo bar", paragraph "baz" + +We find interpretation 4 most natural, and interpretation 4 +increases the expressive power of CommonMark, by allowing +multiline headings. Authors who want interpretation 1 can +put a blank line after the first paragraph: + +```````````````````````````````` example +Foo + +bar +--- +baz +. +<p>Foo</p> +<h2>bar</h2> +<p>baz</p> +```````````````````````````````` + + +Authors who want interpretation 2 can put blank lines around +the thematic break, + +```````````````````````````````` example +Foo +bar + +--- + +baz +. +<p>Foo +bar</p> +<hr /> +<p>baz</p> +```````````````````````````````` + + +or use a thematic break that cannot count as a [setext heading +underline], such as + +```````````````````````````````` example +Foo +bar +* * * +baz +. +<p>Foo +bar</p> +<hr /> +<p>baz</p> +```````````````````````````````` + + +Authors who want interpretation 3 can use backslash escapes: + +```````````````````````````````` example +Foo +bar +\--- +baz +. +<p>Foo +bar +--- +baz</p> +```````````````````````````````` + + +## Indented code blocks + +An [indented code block](@) is composed of one or more +[indented chunks] separated by blank lines. +An [indented chunk](@) is a sequence of non-blank lines, +each preceded by four or more spaces of indentation. The contents of the code +block are the literal contents of the lines, including trailing +[line endings], minus four spaces of indentation. +An indented code block has no [info string]. + +An indented code block cannot interrupt a paragraph, so there must be +a blank line between a paragraph and a following indented code block. +(A blank line is not needed, however, between a code block and a following +paragraph.) + +```````````````````````````````` example + a simple + indented code block +. +<pre><code>a simple + indented code block +</code></pre> +```````````````````````````````` + + +If there is any ambiguity between an interpretation of indentation +as a code block and as indicating that material belongs to a [list +item][list items], the list item interpretation takes precedence: + +```````````````````````````````` example + - foo + + bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +1. foo + + - bar +. +<ol> +<li> +<p>foo</p> +<ul> +<li>bar</li> +</ul> +</li> +</ol> +```````````````````````````````` + + + +The contents of a code block are literal text, and do not get parsed +as Markdown: + +```````````````````````````````` example + <a/> + *hi* + + - one +. +<pre><code><a/> +*hi* + +- one +</code></pre> +```````````````````````````````` + + +Here we have three chunks separated by blank lines: + +```````````````````````````````` example + chunk1 + + chunk2 + + + + chunk3 +. +<pre><code>chunk1 + +chunk2 + + + +chunk3 +</code></pre> +```````````````````````````````` + + +Any initial spaces or tabs beyond four spaces of indentation will be included in +the content, even in interior blank lines: + +```````````````````````````````` example + chunk1 + + chunk2 +. +<pre><code>chunk1 + + chunk2 +</code></pre> +```````````````````````````````` + + +An indented code block cannot interrupt a paragraph. (This +allows hanging indents and the like.) + +```````````````````````````````` example +Foo + bar + +. +<p>Foo +bar</p> +```````````````````````````````` + + +However, any non-blank line with fewer than four spaces of indentation ends +the code block immediately. So a paragraph may occur immediately +after indented code: + +```````````````````````````````` example + foo +bar +. +<pre><code>foo +</code></pre> +<p>bar</p> +```````````````````````````````` + + +And indented code can occur immediately before and after other kinds of +blocks: + +```````````````````````````````` example +# Heading + foo +Heading +------ + foo +---- +. +<h1>Heading</h1> +<pre><code>foo +</code></pre> +<h2>Heading</h2> +<pre><code>foo +</code></pre> +<hr /> +```````````````````````````````` + + +The first line can be preceded by more than four spaces of indentation: + +```````````````````````````````` example + foo + bar +. +<pre><code> foo +bar +</code></pre> +```````````````````````````````` + + +Blank lines preceding or following an indented code block +are not included in it: + +```````````````````````````````` example + + + foo + + +. +<pre><code>foo +</code></pre> +```````````````````````````````` + + +Trailing spaces or tabs are included in the code block's content: + +```````````````````````````````` example + foo +. +<pre><code>foo +</code></pre> +```````````````````````````````` + + + +## Fenced code blocks + +A [code fence](@) is a sequence +of at least three consecutive backtick characters (`` ` ``) or +tildes (`~`). (Tildes and backticks cannot be mixed.) +A [fenced code block](@) +begins with a code fence, preceded by up to three spaces of indentation. + +The line with the opening code fence may optionally contain some text +following the code fence; this is trimmed of leading and trailing +spaces or tabs and called the [info string](@). If the [info string] comes +after a backtick fence, it may not contain any backtick +characters. (The reason for this restriction is that otherwise +some inline code would be incorrectly interpreted as the +beginning of a fenced code block.) + +The content of the code block consists of all subsequent lines, until +a closing [code fence] of the same type as the code block +began with (backticks or tildes), and with at least as many backticks +or tildes as the opening code fence. If the leading code fence is +preceded by N spaces of indentation, then up to N spaces of indentation are +removed from each line of the content (if present). (If a content line is not +indented, it is preserved unchanged. If it is indented N spaces or less, all +of the indentation is removed.) + +The closing code fence may be preceded by up to three spaces of indentation, and +may be followed only by spaces or tabs, which are ignored. If the end of the +containing block (or document) is reached and no closing code fence +has been found, the code block contains all of the lines after the +opening code fence until the end of the containing block (or +document). (An alternative spec would require backtracking in the +event that a closing code fence is not found. But this makes parsing +much less efficient, and there seems to be no real down side to the +behavior described here.) + +A fenced code block may interrupt a paragraph, and does not require +a blank line either before or after. + +The content of a code fence is treated as literal text, not parsed +as inlines. The first word of the [info string] is typically used to +specify the language of the code sample, and rendered in the `class` +attribute of the `code` tag. However, this spec does not mandate any +particular treatment of the [info string]. + +Here is a simple example with backticks: + +```````````````````````````````` example +``` +< + > +``` +. +<pre><code>< + > +</code></pre> +```````````````````````````````` + + +With tildes: + +```````````````````````````````` example +~~~ +< + > +~~~ +. +<pre><code>< + > +</code></pre> +```````````````````````````````` + +Fewer than three backticks is not enough: + +```````````````````````````````` example +`` +foo +`` +. +<p><code>foo</code></p> +```````````````````````````````` + +The closing code fence must use the same character as the opening +fence: + +```````````````````````````````` example +``` +aaa +~~~ +``` +. +<pre><code>aaa +~~~ +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +~~~ +aaa +``` +~~~ +. +<pre><code>aaa +``` +</code></pre> +```````````````````````````````` + + +The closing code fence must be at least as long as the opening fence: + +```````````````````````````````` example +```` +aaa +``` +`````` +. +<pre><code>aaa +``` +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +~~~~ +aaa +~~~ +~~~~ +. +<pre><code>aaa +~~~ +</code></pre> +```````````````````````````````` + + +Unclosed code blocks are closed by the end of the document +(or the enclosing [block quote][block quotes] or [list item][list items]): + +```````````````````````````````` example +``` +. +<pre><code></code></pre> +```````````````````````````````` + + +```````````````````````````````` example +````` + +``` +aaa +. +<pre><code> +``` +aaa +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +> ``` +> aaa + +bbb +. +<blockquote> +<pre><code>aaa +</code></pre> +</blockquote> +<p>bbb</p> +```````````````````````````````` + + +A code block can have all empty lines as its content: + +```````````````````````````````` example +``` + + +``` +. +<pre><code> + +</code></pre> +```````````````````````````````` + + +A code block can be empty: + +```````````````````````````````` example +``` +``` +. +<pre><code></code></pre> +```````````````````````````````` + + +Fences can be indented. If the opening fence is indented, +content lines will have equivalent opening indentation removed, +if present: + +```````````````````````````````` example + ``` + aaa +aaa +``` +. +<pre><code>aaa +aaa +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example + ``` +aaa + aaa +aaa + ``` +. +<pre><code>aaa +aaa +aaa +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example + ``` + aaa + aaa + aaa + ``` +. +<pre><code>aaa + aaa +aaa +</code></pre> +```````````````````````````````` + + +Four spaces of indentation is too many: + +```````````````````````````````` example + ``` + aaa + ``` +. +<pre><code>``` +aaa +``` +</code></pre> +```````````````````````````````` + + +Closing fences may be preceded by up to three spaces of indentation, and their +indentation need not match that of the opening fence: + +```````````````````````````````` example +``` +aaa + ``` +. +<pre><code>aaa +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example + ``` +aaa + ``` +. +<pre><code>aaa +</code></pre> +```````````````````````````````` + + +This is not a closing fence, because it is indented 4 spaces: + +```````````````````````````````` example +``` +aaa + ``` +. +<pre><code>aaa + ``` +</code></pre> +```````````````````````````````` + + + +Code fences (opening and closing) cannot contain internal spaces or tabs: + +```````````````````````````````` example +``` ``` +aaa +. +<p><code> </code> +aaa</p> +```````````````````````````````` + + +```````````````````````````````` example +~~~~~~ +aaa +~~~ ~~ +. +<pre><code>aaa +~~~ ~~ +</code></pre> +```````````````````````````````` + + +Fenced code blocks can interrupt paragraphs, and can be followed +directly by paragraphs, without a blank line between: + +```````````````````````````````` example +foo +``` +bar +``` +baz +. +<p>foo</p> +<pre><code>bar +</code></pre> +<p>baz</p> +```````````````````````````````` + + +Other blocks can also occur before and after fenced code blocks +without an intervening blank line: + +```````````````````````````````` example +foo +--- +~~~ +bar +~~~ +# baz +. +<h2>foo</h2> +<pre><code>bar +</code></pre> +<h1>baz</h1> +```````````````````````````````` + + +An [info string] can be provided after the opening code fence. +Although this spec doesn't mandate any particular treatment of +the info string, the first word is typically used to specify +the language of the code block. In HTML output, the language is +normally indicated by adding a class to the `code` element consisting +of `language-` followed by the language name. + +```````````````````````````````` example +```ruby +def foo(x) + return 3 +end +``` +. +<pre><code class="language-ruby">def foo(x) + return 3 +end +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +~~~~ ruby startline=3 $%@#$ +def foo(x) + return 3 +end +~~~~~~~ +. +<pre><code class="language-ruby">def foo(x) + return 3 +end +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +````; +```` +. +<pre><code class="language-;"></code></pre> +```````````````````````````````` + + +[Info strings] for backtick code blocks cannot contain backticks: + +```````````````````````````````` example +``` aa ``` +foo +. +<p><code>aa</code> +foo</p> +```````````````````````````````` + + +[Info strings] for tilde code blocks can contain backticks and tildes: + +```````````````````````````````` example +~~~ aa ``` ~~~ +foo +~~~ +. +<pre><code class="language-aa">foo +</code></pre> +```````````````````````````````` + + +Closing code fences cannot have [info strings]: + +```````````````````````````````` example +``` +``` aaa +``` +. +<pre><code>``` aaa +</code></pre> +```````````````````````````````` + + + +## HTML blocks + +An [HTML block](@) is a group of lines that is treated +as raw HTML (and will not be escaped in HTML output). + +There are seven kinds of [HTML block], which can be defined by their +start and end conditions. The block begins with a line that meets a +[start condition](@) (after up to three optional spaces of indentation). +It ends with the first subsequent line that meets a matching +[end condition](@), or the last line of the document, or the last line of +the [container block](#container-blocks) containing the current HTML +block, if no line is encountered that meets the [end condition]. If +the first line meets both the [start condition] and the [end +condition], the block will contain just that line. + +1. **Start condition:** line begins with the string `<pre`, +`<script`, `<style`, or `<textarea` (case-insensitive), followed by a space, +a tab, the string `>`, or the end of the line.\ +**End condition:** line contains an end tag +`</pre>`, `</script>`, `</style>`, or `</textarea>` (case-insensitive; it +need not match the start tag). + +2. **Start condition:** line begins with the string `<!--`.\ +**End condition:** line contains the string `-->`. + +3. **Start condition:** line begins with the string `<?`.\ +**End condition:** line contains the string `?>`. + +4. **Start condition:** line begins with the string `<!` +followed by an ASCII letter.\ +**End condition:** line contains the character `>`. + +5. **Start condition:** line begins with the string +`<![CDATA[`.\ +**End condition:** line contains the string `]]>`. + +6. **Start condition:** line begins the string `<` or `</` +followed by one of the strings (case-insensitive) `address`, +`article`, `aside`, `base`, `basefont`, `blockquote`, `body`, +`caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, +`dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, +`footer`, `form`, `frame`, `frameset`, +`h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `head`, `header`, `hr`, +`html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, +`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, +`section`, `source`, `summary`, `table`, `tbody`, `td`, +`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed +by a space, a tab, the end of the line, the string `>`, or +the string `/>`.\ +**End condition:** line is followed by a [blank line]. + +7. **Start condition:** line begins with a complete [open tag] +(with any [tag name] other than `pre`, `script`, +`style`, or `textarea`) or a complete [closing tag], +followed by zero or more spaces and tabs, followed by the end of the line.\ +**End condition:** line is followed by a [blank line]. + +HTML blocks continue until they are closed by their appropriate +[end condition], or the last line of the document or other [container +block](#container-blocks). This means any HTML **within an HTML +block** that might otherwise be recognised as a start condition will +be ignored by the parser and passed through as-is, without changing +the parser's state. + +For instance, `<pre>` within an HTML block started by `<table>` will not affect +the parser state; as the HTML block was started in by start condition 6, it +will end at any blank line. This can be surprising: + +```````````````````````````````` example +<table><tr><td> +<pre> +**Hello**, + +_world_. +</pre> +</td></tr></table> +. +<table><tr><td> +<pre> +**Hello**, +<p><em>world</em>. +</pre></p> +</td></tr></table> +```````````````````````````````` + +In this case, the HTML block is terminated by the blank line — the `**Hello**` +text remains verbatim — and regular parsing resumes, with a paragraph, +emphasised `world` and inline and block HTML following. + +All types of [HTML blocks] except type 7 may interrupt +a paragraph. Blocks of type 7 may not interrupt a paragraph. +(This restriction is intended to prevent unwanted interpretation +of long tags inside a wrapped paragraph as starting HTML blocks.) + +Some simple examples follow. Here are some basic HTML blocks +of type 6: + +```````````````````````````````` example +<table> + <tr> + <td> + hi + </td> + </tr> +</table> + +okay. +. +<table> + <tr> + <td> + hi + </td> + </tr> +</table> +<p>okay.</p> +```````````````````````````````` + + +```````````````````````````````` example + <div> + *hello* + <foo><a> +. + <div> + *hello* + <foo><a> +```````````````````````````````` + + +A block can also start with a closing tag: + +```````````````````````````````` example +</div> +*foo* +. +</div> +*foo* +```````````````````````````````` + + +Here we have two HTML blocks with a Markdown paragraph between them: + +```````````````````````````````` example +<DIV CLASS="foo"> + +*Markdown* + +</DIV> +. +<DIV CLASS="foo"> +<p><em>Markdown</em></p> +</DIV> +```````````````````````````````` + + +The tag on the first line can be partial, as long +as it is split where there would be whitespace: + +```````````````````````````````` example +<div id="foo" + class="bar"> +</div> +. +<div id="foo" + class="bar"> +</div> +```````````````````````````````` + + +```````````````````````````````` example +<div id="foo" class="bar + baz"> +</div> +. +<div id="foo" class="bar + baz"> +</div> +```````````````````````````````` + + +An open tag need not be closed: +```````````````````````````````` example +<div> +*foo* + +*bar* +. +<div> +*foo* +<p><em>bar</em></p> +```````````````````````````````` + + + +A partial tag need not even be completed (garbage +in, garbage out): + +```````````````````````````````` example +<div id="foo" +*hi* +. +<div id="foo" +*hi* +```````````````````````````````` + + +```````````````````````````````` example +<div class +foo +. +<div class +foo +```````````````````````````````` + + +The initial tag doesn't even need to be a valid +tag, as long as it starts like one: + +```````````````````````````````` example +<div *???-&&&-<--- +*foo* +. +<div *???-&&&-<--- +*foo* +```````````````````````````````` + + +In type 6 blocks, the initial tag need not be on a line by +itself: + +```````````````````````````````` example +<div><a href="bar">*foo*</a></div> +. +<div><a href="bar">*foo*</a></div> +```````````````````````````````` + + +```````````````````````````````` example +<table><tr><td> +foo +</td></tr></table> +. +<table><tr><td> +foo +</td></tr></table> +```````````````````````````````` + + +Everything until the next blank line or end of document +gets included in the HTML block. So, in the following +example, what looks like a Markdown code block +is actually part of the HTML block, which continues until a blank +line or the end of the document is reached: + +```````````````````````````````` example +<div></div> +``` c +int x = 33; +``` +. +<div></div> +``` c +int x = 33; +``` +```````````````````````````````` + + +To start an [HTML block] with a tag that is *not* in the +list of block-level tags in (6), you must put the tag by +itself on the first line (and it must be complete): + +```````````````````````````````` example +<a href="foo"> +*bar* +</a> +. +<a href="foo"> +*bar* +</a> +```````````````````````````````` + + +In type 7 blocks, the [tag name] can be anything: + +```````````````````````````````` example +<Warning> +*bar* +</Warning> +. +<Warning> +*bar* +</Warning> +```````````````````````````````` + + +```````````````````````````````` example +<i class="foo"> +*bar* +</i> +. +<i class="foo"> +*bar* +</i> +```````````````````````````````` + + +```````````````````````````````` example +</ins> +*bar* +. +</ins> +*bar* +```````````````````````````````` + + +These rules are designed to allow us to work with tags that +can function as either block-level or inline-level tags. +The `<del>` tag is a nice example. We can surround content with +`<del>` tags in three different ways. In this case, we get a raw +HTML block, because the `<del>` tag is on a line by itself: + +```````````````````````````````` example +<del> +*foo* +</del> +. +<del> +*foo* +</del> +```````````````````````````````` + + +In this case, we get a raw HTML block that just includes +the `<del>` tag (because it ends with the following blank +line). So the contents get interpreted as CommonMark: + +```````````````````````````````` example +<del> + +*foo* + +</del> +. +<del> +<p><em>foo</em></p> +</del> +```````````````````````````````` + + +Finally, in this case, the `<del>` tags are interpreted +as [raw HTML] *inside* the CommonMark paragraph. (Because +the tag is not on a line by itself, we get inline HTML +rather than an [HTML block].) + +```````````````````````````````` example +<del>*foo*</del> +. +<p><del><em>foo</em></del></p> +```````````````````````````````` + + +HTML tags designed to contain literal content +(`pre`, `script`, `style`, `textarea`), comments, processing instructions, +and declarations are treated somewhat differently. +Instead of ending at the first blank line, these blocks +end at the first line containing a corresponding end tag. +As a result, these blocks can contain blank lines: + +A pre tag (type 1): + +```````````````````````````````` example +<pre language="haskell"><code> +import Text.HTML.TagSoup + +main :: IO () +main = print $ parseTags tags +</code></pre> +okay +. +<pre language="haskell"><code> +import Text.HTML.TagSoup + +main :: IO () +main = print $ parseTags tags +</code></pre> +<p>okay</p> +```````````````````````````````` + + +A script tag (type 1): + +```````````````````````````````` example +<script type="text/javascript"> +// JavaScript example + +document.getElementById("demo").innerHTML = "Hello JavaScript!"; +</script> +okay +. +<script type="text/javascript"> +// JavaScript example + +document.getElementById("demo").innerHTML = "Hello JavaScript!"; +</script> +<p>okay</p> +```````````````````````````````` + + +A textarea tag (type 1): + +```````````````````````````````` example +<textarea> + +*foo* + +_bar_ + +</textarea> +. +<textarea> + +*foo* + +_bar_ + +</textarea> +```````````````````````````````` + +A style tag (type 1): + +```````````````````````````````` example +<style + type="text/css"> +h1 {color:red;} + +p {color:blue;} +</style> +okay +. +<style + type="text/css"> +h1 {color:red;} + +p {color:blue;} +</style> +<p>okay</p> +```````````````````````````````` + + +If there is no matching end tag, the block will end at the +end of the document (or the enclosing [block quote][block quotes] +or [list item][list items]): + +```````````````````````````````` example +<style + type="text/css"> + +foo +. +<style + type="text/css"> + +foo +```````````````````````````````` + + +```````````````````````````````` example +> <div> +> foo + +bar +. +<blockquote> +<div> +foo +</blockquote> +<p>bar</p> +```````````````````````````````` + + +```````````````````````````````` example +- <div> +- foo +. +<ul> +<li> +<div> +</li> +<li>foo</li> +</ul> +```````````````````````````````` + + +The end tag can occur on the same line as the start tag: + +```````````````````````````````` example +<style>p{color:red;}</style> +*foo* +. +<style>p{color:red;}</style> +<p><em>foo</em></p> +```````````````````````````````` + + +```````````````````````````````` example +<!-- foo -->*bar* +*baz* +. +<!-- foo -->*bar* +<p><em>baz</em></p> +```````````````````````````````` + + +Note that anything on the last line after the +end tag will be included in the [HTML block]: + +```````````````````````````````` example +<script> +foo +</script>1. *bar* +. +<script> +foo +</script>1. *bar* +```````````````````````````````` + + +A comment (type 2): + +```````````````````````````````` example +<!-- Foo + +bar + baz --> +okay +. +<!-- Foo + +bar + baz --> +<p>okay</p> +```````````````````````````````` + + + +A processing instruction (type 3): + +```````````````````````````````` example +<?php + + echo '>'; + +?> +okay +. +<?php + + echo '>'; + +?> +<p>okay</p> +```````````````````````````````` + + +A declaration (type 4): + +```````````````````````````````` example +<!DOCTYPE html> +. +<!DOCTYPE html> +```````````````````````````````` + + +CDATA (type 5): + +```````````````````````````````` example +<![CDATA[ +function matchwo(a,b) +{ + if (a < b && a < 0) then { + return 1; + + } else { + + return 0; + } +} +]]> +okay +. +<![CDATA[ +function matchwo(a,b) +{ + if (a < b && a < 0) then { + return 1; + + } else { + + return 0; + } +} +]]> +<p>okay</p> +```````````````````````````````` + + +The opening tag can be preceded by up to three spaces of indentation, but not +four: + +```````````````````````````````` example + <!-- foo --> + + <!-- foo --> +. + <!-- foo --> +<pre><code><!-- foo --> +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example + <div> + + <div> +. + <div> +<pre><code><div> +</code></pre> +```````````````````````````````` + + +An HTML block of types 1--6 can interrupt a paragraph, and need not be +preceded by a blank line. + +```````````````````````````````` example +Foo +<div> +bar +</div> +. +<p>Foo</p> +<div> +bar +</div> +```````````````````````````````` + + +However, a following blank line is needed, except at the end of +a document, and except for blocks of types 1--5, [above][HTML +block]: + +```````````````````````````````` example +<div> +bar +</div> +*foo* +. +<div> +bar +</div> +*foo* +```````````````````````````````` + + +HTML blocks of type 7 cannot interrupt a paragraph: + +```````````````````````````````` example +Foo +<a href="bar"> +baz +. +<p>Foo +<a href="bar"> +baz</p> +```````````````````````````````` + + +This rule differs from John Gruber's original Markdown syntax +specification, which says: + +> The only restrictions are that block-level HTML elements — +> e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from +> surrounding content by blank lines, and the start and end tags of the +> block should not be indented with spaces or tabs. + +In some ways Gruber's rule is more restrictive than the one given +here: + +- It requires that an HTML block be preceded by a blank line. +- It does not allow the start tag to be indented. +- It requires a matching end tag, which it also does not allow to + be indented. + +Most Markdown implementations (including some of Gruber's own) do not +respect all of these restrictions. + +There is one respect, however, in which Gruber's rule is more liberal +than the one given here, since it allows blank lines to occur inside +an HTML block. There are two reasons for disallowing them here. +First, it removes the need to parse balanced tags, which is +expensive and can require backtracking from the end of the document +if no matching end tag is found. Second, it provides a very simple +and flexible way of including Markdown content inside HTML tags: +simply separate the Markdown from the HTML using blank lines: + +Compare: + +```````````````````````````````` example +<div> + +*Emphasized* text. + +</div> +. +<div> +<p><em>Emphasized</em> text.</p> +</div> +```````````````````````````````` + + +```````````````````````````````` example +<div> +*Emphasized* text. +</div> +. +<div> +*Emphasized* text. +</div> +```````````````````````````````` + + +Some Markdown implementations have adopted a convention of +interpreting content inside tags as text if the open tag has +the attribute `markdown=1`. The rule given above seems a simpler and +more elegant way of achieving the same expressive power, which is also +much simpler to parse. + +The main potential drawback is that one can no longer paste HTML +blocks into Markdown documents with 100% reliability. However, +*in most cases* this will work fine, because the blank lines in +HTML are usually followed by HTML block tags. For example: + +```````````````````````````````` example +<table> + +<tr> + +<td> +Hi +</td> + +</tr> + +</table> +. +<table> +<tr> +<td> +Hi +</td> +</tr> +</table> +```````````````````````````````` + + +There are problems, however, if the inner tags are indented +*and* separated by spaces, as then they will be interpreted as +an indented code block: + +```````````````````````````````` example +<table> + + <tr> + + <td> + Hi + </td> + + </tr> + +</table> +. +<table> + <tr> +<pre><code><td> + Hi +</td> +</code></pre> + </tr> +</table> +```````````````````````````````` + + +Fortunately, blank lines are usually not necessary and can be +deleted. The exception is inside `<pre>` tags, but as described +[above][HTML blocks], raw HTML blocks starting with `<pre>` +*can* contain blank lines. + +## Link reference definitions + +A [link reference definition](@) +consists of a [link label], optionally preceded by up to three spaces of +indentation, followed +by a colon (`:`), optional spaces or tabs (including up to one +[line ending]), a [link destination], +optional spaces or tabs (including up to one +[line ending]), and an optional [link +title], which if it is present must be separated +from the [link destination] by spaces or tabs. +No further character may occur. + +A [link reference definition] +does not correspond to a structural element of a document. Instead, it +defines a label which can be used in [reference links] +and reference-style [images] elsewhere in the document. [Link +reference definitions] can come either before or after the links that use +them. + +```````````````````````````````` example +[foo]: /url "title" + +[foo] +. +<p><a href="/url" title="title">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example + [foo]: + /url + 'the title' + +[foo] +. +<p><a href="/url" title="the title">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[Foo*bar\]]:my_(url) 'title (with parens)' + +[Foo*bar\]] +. +<p><a href="my_(url)" title="title (with parens)">Foo*bar]</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[Foo bar]: +<my url> +'title' + +[Foo bar] +. +<p><a href="my%20url" title="title">Foo bar</a></p> +```````````````````````````````` + + +The title may extend over multiple lines: + +```````````````````````````````` example +[foo]: /url ' +title +line1 +line2 +' + +[foo] +. +<p><a href="/url" title=" +title +line1 +line2 +">foo</a></p> +```````````````````````````````` + + +However, it may not contain a [blank line]: + +```````````````````````````````` example +[foo]: /url 'title + +with blank line' + +[foo] +. +<p>[foo]: /url 'title</p> +<p>with blank line'</p> +<p>[foo]</p> +```````````````````````````````` + + +The title may be omitted: + +```````````````````````````````` example +[foo]: +/url + +[foo] +. +<p><a href="/url">foo</a></p> +```````````````````````````````` + + +The link destination may not be omitted: + +```````````````````````````````` example +[foo]: + +[foo] +. +<p>[foo]:</p> +<p>[foo]</p> +```````````````````````````````` + + However, an empty link destination may be specified using + angle brackets: + +```````````````````````````````` example +[foo]: <> + +[foo] +. +<p><a href="">foo</a></p> +```````````````````````````````` + +The title must be separated from the link destination by +spaces or tabs: + +```````````````````````````````` example +[foo]: <bar>(baz) + +[foo] +. +<p>[foo]: <bar>(baz)</p> +<p>[foo]</p> +```````````````````````````````` + + +Both title and destination can contain backslash escapes +and literal backslashes: + +```````````````````````````````` example +[foo]: /url\bar\*baz "foo\"bar\baz" + +[foo] +. +<p><a href="/url%5Cbar*baz" title="foo"bar\baz">foo</a></p> +```````````````````````````````` + + +A link can come before its corresponding definition: + +```````````````````````````````` example +[foo] + +[foo]: url +. +<p><a href="url">foo</a></p> +```````````````````````````````` + + +If there are several matching definitions, the first one takes +precedence: + +```````````````````````````````` example +[foo] + +[foo]: first +[foo]: second +. +<p><a href="first">foo</a></p> +```````````````````````````````` + + +As noted in the section on [Links], matching of labels is +case-insensitive (see [matches]). + +```````````````````````````````` example +[FOO]: /url + +[Foo] +. +<p><a href="/url">Foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[ΑΓΩ]: /φου + +[αγω] +. +<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p> +```````````````````````````````` + + +Whether something is a [link reference definition] is +independent of whether the link reference it defines is +used in the document. Thus, for example, the following +document contains just a link reference definition, and +no visible content: + +```````````````````````````````` example +[foo]: /url +. +```````````````````````````````` + + +Here is another one: + +```````````````````````````````` example +[ +foo +]: /url +bar +. +<p>bar</p> +```````````````````````````````` + + +This is not a link reference definition, because there are +characters other than spaces or tabs after the title: + +```````````````````````````````` example +[foo]: /url "title" ok +. +<p>[foo]: /url "title" ok</p> +```````````````````````````````` + + +This is a link reference definition, but it has no title: + +```````````````````````````````` example +[foo]: /url +"title" ok +. +<p>"title" ok</p> +```````````````````````````````` + + +This is not a link reference definition, because it is indented +four spaces: + +```````````````````````````````` example + [foo]: /url "title" + +[foo] +. +<pre><code>[foo]: /url "title" +</code></pre> +<p>[foo]</p> +```````````````````````````````` + + +This is not a link reference definition, because it occurs inside +a code block: + +```````````````````````````````` example +``` +[foo]: /url +``` + +[foo] +. +<pre><code>[foo]: /url +</code></pre> +<p>[foo]</p> +```````````````````````````````` + + +A [link reference definition] cannot interrupt a paragraph. + +```````````````````````````````` example +Foo +[bar]: /baz + +[bar] +. +<p>Foo +[bar]: /baz</p> +<p>[bar]</p> +```````````````````````````````` + + +However, it can directly follow other block elements, such as headings +and thematic breaks, and it need not be followed by a blank line. + +```````````````````````````````` example +# [Foo] +[foo]: /url +> bar +. +<h1><a href="/url">Foo</a></h1> +<blockquote> +<p>bar</p> +</blockquote> +```````````````````````````````` + +```````````````````````````````` example +[foo]: /url +bar +=== +[foo] +. +<h1>bar</h1> +<p><a href="/url">foo</a></p> +```````````````````````````````` + +```````````````````````````````` example +[foo]: /url +=== +[foo] +. +<p>=== +<a href="/url">foo</a></p> +```````````````````````````````` + + +Several [link reference definitions] +can occur one after another, without intervening blank lines. + +```````````````````````````````` example +[foo]: /foo-url "foo" +[bar]: /bar-url + "bar" +[baz]: /baz-url + +[foo], +[bar], +[baz] +. +<p><a href="/foo-url" title="foo">foo</a>, +<a href="/bar-url" title="bar">bar</a>, +<a href="/baz-url">baz</a></p> +```````````````````````````````` + + +[Link reference definitions] can occur +inside block containers, like lists and block quotations. They +affect the entire document, not just the container in which they +are defined: + +```````````````````````````````` example +[foo] + +> [foo]: /url +. +<p><a href="/url">foo</a></p> +<blockquote> +</blockquote> +```````````````````````````````` + + +## Paragraphs + +A sequence of non-blank lines that cannot be interpreted as other +kinds of blocks forms a [paragraph](@). +The contents of the paragraph are the result of parsing the +paragraph's raw content as inlines. The paragraph's raw content +is formed by concatenating the lines and removing initial and final +spaces or tabs. + +A simple example with two paragraphs: + +```````````````````````````````` example +aaa + +bbb +. +<p>aaa</p> +<p>bbb</p> +```````````````````````````````` + + +Paragraphs can contain multiple lines, but no blank lines: + +```````````````````````````````` example +aaa +bbb + +ccc +ddd +. +<p>aaa +bbb</p> +<p>ccc +ddd</p> +```````````````````````````````` + + +Multiple blank lines between paragraphs have no effect: + +```````````````````````````````` example +aaa + + +bbb +. +<p>aaa</p> +<p>bbb</p> +```````````````````````````````` + + +Leading spaces or tabs are skipped: + +```````````````````````````````` example + aaa + bbb +. +<p>aaa +bbb</p> +```````````````````````````````` + + +Lines after the first may be indented any amount, since indented +code blocks cannot interrupt paragraphs. + +```````````````````````````````` example +aaa + bbb + ccc +. +<p>aaa +bbb +ccc</p> +```````````````````````````````` + + +However, the first line may be preceded by up to three spaces of indentation. +Four spaces of indentation is too many: + +```````````````````````````````` example + aaa +bbb +. +<p>aaa +bbb</p> +```````````````````````````````` + + +```````````````````````````````` example + aaa +bbb +. +<pre><code>aaa +</code></pre> +<p>bbb</p> +```````````````````````````````` + + +Final spaces or tabs are stripped before inline parsing, so a paragraph +that ends with two or more spaces will not end with a [hard line +break]: + +```````````````````````````````` example +aaa +bbb +. +<p>aaa<br /> +bbb</p> +```````````````````````````````` + + +## Blank lines + +[Blank lines] between block-level elements are ignored, +except for the role they play in determining whether a [list] +is [tight] or [loose]. + +Blank lines at the beginning and end of the document are also ignored. + +```````````````````````````````` example + + +aaa + + +# aaa + + +. +<p>aaa</p> +<h1>aaa</h1> +```````````````````````````````` + + + +# Container blocks + +A [container block](#container-blocks) is a block that has other +blocks as its contents. There are two basic kinds of container blocks: +[block quotes] and [list items]. +[Lists] are meta-containers for [list items]. + +We define the syntax for container blocks recursively. The general +form of the definition is: + +> If X is a sequence of blocks, then the result of +> transforming X in such-and-such a way is a container of type Y +> with these blocks as its content. + +So, we explain what counts as a block quote or list item by explaining +how these can be *generated* from their contents. This should suffice +to define the syntax, although it does not give a recipe for *parsing* +these constructions. (A recipe is provided below in the section entitled +[A parsing strategy](#appendix-a-parsing-strategy).) + +## Block quotes + +A [block quote marker](@), +optionally preceded by up to three spaces of indentation, +consists of (a) the character `>` together with a following space of +indentation, or (b) a single character `>` not followed by a space of +indentation. + +The following rules define [block quotes]: + +1. **Basic case.** If a string of lines *Ls* constitute a sequence + of blocks *Bs*, then the result of prepending a [block quote + marker] to the beginning of each line in *Ls* + is a [block quote](#block-quotes) containing *Bs*. + +2. **Laziness.** If a string of lines *Ls* constitute a [block + quote](#block-quotes) with contents *Bs*, then the result of deleting + the initial [block quote marker] from one or + more lines in which the next character other than a space or tab after the + [block quote marker] is [paragraph continuation + text] is a block quote with *Bs* as its content. + [Paragraph continuation text](@) is text + that will be parsed as part of the content of a paragraph, but does + not occur at the beginning of the paragraph. + +3. **Consecutiveness.** A document cannot contain two [block + quotes] in a row unless there is a [blank line] between them. + +Nothing else counts as a [block quote](#block-quotes). + +Here is a simple example: + +```````````````````````````````` example +> # Foo +> bar +> baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +```````````````````````````````` + + +The space or tab after the `>` characters can be omitted: + +```````````````````````````````` example +># Foo +>bar +> baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +```````````````````````````````` + + +The `>` characters can be preceded by up to three spaces of indentation: + +```````````````````````````````` example + > # Foo + > bar + > baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +```````````````````````````````` + + +Four spaces of indentation is too many: + +```````````````````````````````` example + > # Foo + > bar + > baz +. +<pre><code>> # Foo +> bar +> baz +</code></pre> +```````````````````````````````` + + +The Laziness clause allows us to omit the `>` before +[paragraph continuation text]: + +```````````````````````````````` example +> # Foo +> bar +baz +. +<blockquote> +<h1>Foo</h1> +<p>bar +baz</p> +</blockquote> +```````````````````````````````` + + +A block quote can contain some lazy and some non-lazy +continuation lines: + +```````````````````````````````` example +> bar +baz +> foo +. +<blockquote> +<p>bar +baz +foo</p> +</blockquote> +```````````````````````````````` + + +Laziness only applies to lines that would have been continuations of +paragraphs had they been prepended with [block quote markers]. +For example, the `> ` cannot be omitted in the second line of + +``` markdown +> foo +> --- +``` + +without changing the meaning: + +```````````````````````````````` example +> foo +--- +. +<blockquote> +<p>foo</p> +</blockquote> +<hr /> +```````````````````````````````` + + +Similarly, if we omit the `> ` in the second line of + +``` markdown +> - foo +> - bar +``` + +then the block quote ends after the first line: + +```````````````````````````````` example +> - foo +- bar +. +<blockquote> +<ul> +<li>foo</li> +</ul> +</blockquote> +<ul> +<li>bar</li> +</ul> +```````````````````````````````` + + +For the same reason, we can't omit the `> ` in front of +subsequent lines of an indented or fenced code block: + +```````````````````````````````` example +> foo + bar +. +<blockquote> +<pre><code>foo +</code></pre> +</blockquote> +<pre><code>bar +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +> ``` +foo +``` +. +<blockquote> +<pre><code></code></pre> +</blockquote> +<p>foo</p> +<pre><code></code></pre> +```````````````````````````````` + + +Note that in the following case, we have a [lazy +continuation line]: + +```````````````````````````````` example +> foo + - bar +. +<blockquote> +<p>foo +- bar</p> +</blockquote> +```````````````````````````````` + + +To see why, note that in + +```markdown +> foo +> - bar +``` + +the `- bar` is indented too far to start a list, and can't +be an indented code block because indented code blocks cannot +interrupt paragraphs, so it is [paragraph continuation text]. + +A block quote can be empty: + +```````````````````````````````` example +> +. +<blockquote> +</blockquote> +```````````````````````````````` + + +```````````````````````````````` example +> +> +> +. +<blockquote> +</blockquote> +```````````````````````````````` + + +A block quote can have initial or final blank lines: + +```````````````````````````````` example +> +> foo +> +. +<blockquote> +<p>foo</p> +</blockquote> +```````````````````````````````` + + +A blank line always separates block quotes: + +```````````````````````````````` example +> foo + +> bar +. +<blockquote> +<p>foo</p> +</blockquote> +<blockquote> +<p>bar</p> +</blockquote> +```````````````````````````````` + + +(Most current Markdown implementations, including John Gruber's +original `Markdown.pl`, will parse this example as a single block quote +with two paragraphs. But it seems better to allow the author to decide +whether two block quotes or one are wanted.) + +Consecutiveness means that if we put these block quotes together, +we get a single block quote: + +```````````````````````````````` example +> foo +> bar +. +<blockquote> +<p>foo +bar</p> +</blockquote> +```````````````````````````````` + + +To get a block quote with two paragraphs, use: + +```````````````````````````````` example +> foo +> +> bar +. +<blockquote> +<p>foo</p> +<p>bar</p> +</blockquote> +```````````````````````````````` + + +Block quotes can interrupt paragraphs: + +```````````````````````````````` example +foo +> bar +. +<p>foo</p> +<blockquote> +<p>bar</p> +</blockquote> +```````````````````````````````` + + +In general, blank lines are not needed before or after block +quotes: + +```````````````````````````````` example +> aaa +*** +> bbb +. +<blockquote> +<p>aaa</p> +</blockquote> +<hr /> +<blockquote> +<p>bbb</p> +</blockquote> +```````````````````````````````` + + +However, because of laziness, a blank line is needed between +a block quote and a following paragraph: + +```````````````````````````````` example +> bar +baz +. +<blockquote> +<p>bar +baz</p> +</blockquote> +```````````````````````````````` + + +```````````````````````````````` example +> bar + +baz +. +<blockquote> +<p>bar</p> +</blockquote> +<p>baz</p> +```````````````````````````````` + + +```````````````````````````````` example +> bar +> +baz +. +<blockquote> +<p>bar</p> +</blockquote> +<p>baz</p> +```````````````````````````````` + + +It is a consequence of the Laziness rule that any number +of initial `>`s may be omitted on a continuation line of a +nested block quote: + +```````````````````````````````` example +> > > foo +bar +. +<blockquote> +<blockquote> +<blockquote> +<p>foo +bar</p> +</blockquote> +</blockquote> +</blockquote> +```````````````````````````````` + + +```````````````````````````````` example +>>> foo +> bar +>>baz +. +<blockquote> +<blockquote> +<blockquote> +<p>foo +bar +baz</p> +</blockquote> +</blockquote> +</blockquote> +```````````````````````````````` + + +When including an indented code block in a block quote, +remember that the [block quote marker] includes +both the `>` and a following space of indentation. So *five spaces* are needed +after the `>`: + +```````````````````````````````` example +> code + +> not code +. +<blockquote> +<pre><code>code +</code></pre> +</blockquote> +<blockquote> +<p>not code</p> +</blockquote> +```````````````````````````````` + + + +## List items + +A [list marker](@) is a +[bullet list marker] or an [ordered list marker]. + +A [bullet list marker](@) +is a `-`, `+`, or `*` character. + +An [ordered list marker](@) +is a sequence of 1--9 arabic digits (`0-9`), followed by either a +`.` character or a `)` character. (The reason for the length +limit is that with 10 digits we start seeing integer overflows +in some browsers.) + +The following rules define [list items]: + +1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of + blocks *Bs* starting with a character other than a space or tab, and *M* is + a list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces of indentation, + then the result of prepending *M* and the following spaces to the first line + of Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a + list item with *Bs* as its contents. The type of the list item + (bullet or ordered) is determined by the type of its list marker. + If the list item is ordered, then it is also assigned a start + number, based on the ordered list marker. + + Exceptions: + + 1. When the first list item in a [list] interrupts + a paragraph---that is, when it starts on a line that would + otherwise count as [paragraph continuation text]---then (a) + the lines *Ls* must not begin with a blank line, and (b) if + the list item is ordered, the start number must be 1. + 2. If any line is a [thematic break][thematic breaks] then + that line is not a list item. + +For example, let *Ls* be the lines + +```````````````````````````````` example +A paragraph +with two lines. + + indented code + +> A block quote. +. +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +```````````````````````````````` + + +And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says +that the following is an ordered list item with start number 1, +and the same contents as *Ls*: + +```````````````````````````````` example +1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +```````````````````````````````` + + +The most important thing to notice is that the position of +the text after the list marker determines how much indentation +is needed in subsequent blocks in the list item. If the list +marker takes up two spaces of indentation, and there are three spaces between +the list marker and the next character other than a space or tab, then blocks +must be indented five spaces in order to fall under the list +item. + +Here are some examples showing how far content must be indented to be +put under the list item: + +```````````````````````````````` example +- one + + two +. +<ul> +<li>one</li> +</ul> +<p>two</p> +```````````````````````````````` + + +```````````````````````````````` example +- one + + two +. +<ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example + - one + + two +. +<ul> +<li>one</li> +</ul> +<pre><code> two +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example + - one + + two +. +<ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +```````````````````````````````` + + +It is tempting to think of this in terms of columns: the continuation +blocks must be indented at least to the column of the first character other than +a space or tab after the list marker. However, that is not quite right. +The spaces of indentation after the list marker determine how much relative +indentation is needed. Which column this indentation reaches will depend on +how the list item is embedded in other constructions, as shown by +this example: + +```````````````````````````````` example + > > 1. one +>> +>> two +. +<blockquote> +<blockquote> +<ol> +<li> +<p>one</p> +<p>two</p> +</li> +</ol> +</blockquote> +</blockquote> +```````````````````````````````` + + +Here `two` occurs in the same column as the list marker `1.`, +but is actually contained in the list item, because there is +sufficient indentation after the last containing blockquote marker. + +The converse is also possible. In the following example, the word `two` +occurs far to the right of the initial text of the list item, `one`, but +it is not considered part of the list item, because it is not indented +far enough past the blockquote marker: + +```````````````````````````````` example +>>- one +>> + > > two +. +<blockquote> +<blockquote> +<ul> +<li>one</li> +</ul> +<p>two</p> +</blockquote> +</blockquote> +```````````````````````````````` + + +Note that at least one space or tab is needed between the list marker and +any following content, so these are not list items: + +```````````````````````````````` example +-one + +2.two +. +<p>-one</p> +<p>2.two</p> +```````````````````````````````` + + +A list item may contain blocks that are separated by more than +one blank line. + +```````````````````````````````` example +- foo + + + bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +```````````````````````````````` + + +A list item may contain any kind of block: + +```````````````````````````````` example +1. foo + + ``` + bar + ``` + + baz + + > bam +. +<ol> +<li> +<p>foo</p> +<pre><code>bar +</code></pre> +<p>baz</p> +<blockquote> +<p>bam</p> +</blockquote> +</li> +</ol> +```````````````````````````````` + + +A list item that contains an indented code block will preserve +empty lines within the code block verbatim. + +```````````````````````````````` example +- Foo + + bar + + + baz +. +<ul> +<li> +<p>Foo</p> +<pre><code>bar + + +baz +</code></pre> +</li> +</ul> +```````````````````````````````` + +Note that ordered list start numbers must be nine digits or less: + +```````````````````````````````` example +123456789. ok +. +<ol start="123456789"> +<li>ok</li> +</ol> +```````````````````````````````` + + +```````````````````````````````` example +1234567890. not ok +. +<p>1234567890. not ok</p> +```````````````````````````````` + + +A start number may begin with 0s: + +```````````````````````````````` example +0. ok +. +<ol start="0"> +<li>ok</li> +</ol> +```````````````````````````````` + + +```````````````````````````````` example +003. ok +. +<ol start="3"> +<li>ok</li> +</ol> +```````````````````````````````` + + +A start number may not be negative: + +```````````````````````````````` example +-1. not ok +. +<p>-1. not ok</p> +```````````````````````````````` + + + +2. **Item starting with indented code.** If a sequence of lines *Ls* + constitute a sequence of blocks *Bs* starting with an indented code + block, and *M* is a list marker of width *W* followed by + one space of indentation, then the result of prepending *M* and the + following space to the first line of *Ls*, and indenting subsequent lines + of *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. + +An indented code block will have to be preceded by four spaces of indentation +beyond the edge of the region where text will be included in the list item. +In the following case that is 6 spaces: + +```````````````````````````````` example +- foo + + bar +. +<ul> +<li> +<p>foo</p> +<pre><code>bar +</code></pre> +</li> +</ul> +```````````````````````````````` + + +And in this case it is 11 spaces: + +```````````````````````````````` example + 10. foo + + bar +. +<ol start="10"> +<li> +<p>foo</p> +<pre><code>bar +</code></pre> +</li> +</ol> +```````````````````````````````` + + +If the *first* block in the list item is an indented code block, +then by rule #2, the contents must be preceded by *one* space of indentation +after the list marker: + +```````````````````````````````` example + indented code + +paragraph + + more code +. +<pre><code>indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre> +```````````````````````````````` + + +```````````````````````````````` example +1. indented code + + paragraph + + more code +. +<ol> +<li> +<pre><code>indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre> +</li> +</ol> +```````````````````````````````` + + +Note that an additional space of indentation is interpreted as space +inside the code block: + +```````````````````````````````` example +1. indented code + + paragraph + + more code +. +<ol> +<li> +<pre><code> indented code +</code></pre> +<p>paragraph</p> +<pre><code>more code +</code></pre> +</li> +</ol> +```````````````````````````````` + + +Note that rules #1 and #2 only apply to two cases: (a) cases +in which the lines to be included in a list item begin with a +characer other than a space or tab, and (b) cases in which +they begin with an indented code +block. In a case like the following, where the first block begins with +three spaces of indentation, the rules do not allow us to form a list item by +indenting the whole thing and prepending a list marker: + +```````````````````````````````` example + foo + +bar +. +<p>foo</p> +<p>bar</p> +```````````````````````````````` + + +```````````````````````````````` example +- foo + + bar +. +<ul> +<li>foo</li> +</ul> +<p>bar</p> +```````````````````````````````` + + +This is not a significant restriction, because when a block is preceded by up to +three spaces of indentation, the indentation can always be removed without +a change in interpretation, allowing rule #1 to be applied. So, in +the above case: + +```````````````````````````````` example +- foo + + bar +. +<ul> +<li> +<p>foo</p> +<p>bar</p> +</li> +</ul> +```````````````````````````````` + + +3. **Item starting with a blank line.** If a sequence of lines *Ls* + starting with a single [blank line] constitute a (possibly empty) + sequence of blocks *Bs*, and *M* is a list marker of width *W*, + then the result of prepending *M* to the first line of *Ls*, and + preceding subsequent lines of *Ls* by *W + 1* spaces of indentation, is a + list item with *Bs* as its contents. + If a line is empty, then it need not be indented. The type of the + list item (bullet or ordered) is determined by the type of its list + marker. If the list item is ordered, then it is also assigned a + start number, based on the ordered list marker. + +Here are some list items that start with a blank line but are not empty: + +```````````````````````````````` example +- + foo +- + ``` + bar + ``` +- + baz +. +<ul> +<li>foo</li> +<li> +<pre><code>bar +</code></pre> +</li> +<li> +<pre><code>baz +</code></pre> +</li> +</ul> +```````````````````````````````` + +When the list item starts with a blank line, the number of spaces +following the list marker doesn't change the required indentation: + +```````````````````````````````` example +- + foo +. +<ul> +<li>foo</li> +</ul> +```````````````````````````````` + + +A list item can begin with at most one blank line. +In the following example, `foo` is not part of the list +item: + +```````````````````````````````` example +- + + foo +. +<ul> +<li></li> +</ul> +<p>foo</p> +```````````````````````````````` + + +Here is an empty bullet list item: + +```````````````````````````````` example +- foo +- +- bar +. +<ul> +<li>foo</li> +<li></li> +<li>bar</li> +</ul> +```````````````````````````````` + + +It does not matter whether there are spaces or tabs following the [list marker]: + +```````````````````````````````` example +- foo +- +- bar +. +<ul> +<li>foo</li> +<li></li> +<li>bar</li> +</ul> +```````````````````````````````` + + +Here is an empty ordered list item: + +```````````````````````````````` example +1. foo +2. +3. bar +. +<ol> +<li>foo</li> +<li></li> +<li>bar</li> +</ol> +```````````````````````````````` + + +A list may start or end with an empty list item: + +```````````````````````````````` example +* +. +<ul> +<li></li> +</ul> +```````````````````````````````` + +However, an empty list item cannot interrupt a paragraph: + +```````````````````````````````` example +foo +* + +foo +1. +. +<p>foo +*</p> +<p>foo +1.</p> +```````````````````````````````` + + +4. **Indentation.** If a sequence of lines *Ls* constitutes a list item + according to rule #1, #2, or #3, then the result of preceding each line + of *Ls* by up to three spaces of indentation (the same for each line) also + constitutes a list item with the same contents and attributes. If a line is + empty, then it need not be indented. + +Indented one space: + +```````````````````````````````` example + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +```````````````````````````````` + + +Indented two spaces: + +```````````````````````````````` example + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +```````````````````````````````` + + +Indented three spaces: + +```````````````````````````````` example + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +```````````````````````````````` + + +Four spaces indent gives a code block: + +```````````````````````````````` example + 1. A paragraph + with two lines. + + indented code + + > A block quote. +. +<pre><code>1. A paragraph + with two lines. + + indented code + + > A block quote. +</code></pre> +```````````````````````````````` + + + +5. **Laziness.** If a string of lines *Ls* constitute a [list + item](#list-items) with contents *Bs*, then the result of deleting + some or all of the indentation from one or more lines in which the + next character other than a space or tab after the indentation is + [paragraph continuation text] is a + list item with the same contents and attributes. The unindented + lines are called + [lazy continuation line](@)s. + +Here is an example with [lazy continuation lines]: + +```````````````````````````````` example + 1. A paragraph +with two lines. + + indented code + + > A block quote. +. +<ol> +<li> +<p>A paragraph +with two lines.</p> +<pre><code>indented code +</code></pre> +<blockquote> +<p>A block quote.</p> +</blockquote> +</li> +</ol> +```````````````````````````````` + + +Indentation can be partially deleted: + +```````````````````````````````` example + 1. A paragraph + with two lines. +. +<ol> +<li>A paragraph +with two lines.</li> +</ol> +```````````````````````````````` + + +These examples show how laziness can work in nested structures: + +```````````````````````````````` example +> 1. > Blockquote +continued here. +. +<blockquote> +<ol> +<li> +<blockquote> +<p>Blockquote +continued here.</p> +</blockquote> +</li> +</ol> +</blockquote> +```````````````````````````````` + + +```````````````````````````````` example +> 1. > Blockquote +> continued here. +. +<blockquote> +<ol> +<li> +<blockquote> +<p>Blockquote +continued here.</p> +</blockquote> +</li> +</ol> +</blockquote> +```````````````````````````````` + + + +6. **That's all.** Nothing that is not counted as a list item by rules + #1--5 counts as a [list item](#list-items). + +The rules for sublists follow from the general rules +[above][List items]. A sublist must be indented the same number +of spaces of indentation a paragraph would need to be in order to be included +in the list item. + +So, in this case we need two spaces indent: + +```````````````````````````````` example +- foo + - bar + - baz + - boo +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li>baz +<ul> +<li>boo</li> +</ul> +</li> +</ul> +</li> +</ul> +</li> +</ul> +```````````````````````````````` + + +One is not enough: + +```````````````````````````````` example +- foo + - bar + - baz + - boo +. +<ul> +<li>foo</li> +<li>bar</li> +<li>baz</li> +<li>boo</li> +</ul> +```````````````````````````````` + + +Here we need four, because the list marker is wider: + +```````````````````````````````` example +10) foo + - bar +. +<ol start="10"> +<li>foo +<ul> +<li>bar</li> +</ul> +</li> +</ol> +```````````````````````````````` + + +Three is not enough: + +```````````````````````````````` example +10) foo + - bar +. +<ol start="10"> +<li>foo</li> +</ol> +<ul> +<li>bar</li> +</ul> +```````````````````````````````` + + +A list may be the first block in a list item: + +```````````````````````````````` example +- - foo +. +<ul> +<li> +<ul> +<li>foo</li> +</ul> +</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +1. - 2. foo +. +<ol> +<li> +<ul> +<li> +<ol start="2"> +<li>foo</li> +</ol> +</li> +</ul> +</li> +</ol> +```````````````````````````````` + + +A list item can contain a heading: + +```````````````````````````````` example +- # Foo +- Bar + --- + baz +. +<ul> +<li> +<h1>Foo</h1> +</li> +<li> +<h2>Bar</h2> +baz</li> +</ul> +```````````````````````````````` + + +### Motivation + +John Gruber's Markdown spec says the following about list items: + +1. "List markers typically start at the left margin, but may be indented + by up to three spaces. List markers must be followed by one or more + spaces or a tab." + +2. "To make lists look nice, you can wrap items with hanging indents.... + But if you don't want to, you don't have to." + +3. "List items may consist of multiple paragraphs. Each subsequent + paragraph in a list item must be indented by either 4 spaces or one + tab." + +4. "It looks nice if you indent every line of the subsequent paragraphs, + but here again, Markdown will allow you to be lazy." + +5. "To put a blockquote within a list item, the blockquote's `>` + delimiters need to be indented." + +6. "To put a code block within a list item, the code block needs to be + indented twice — 8 spaces or two tabs." + +These rules specify that a paragraph under a list item must be indented +four spaces (presumably, from the left margin, rather than the start of +the list marker, but this is not said), and that code under a list item +must be indented eight spaces instead of the usual four. They also say +that a block quote must be indented, but not by how much; however, the +example given has four spaces indentation. Although nothing is said +about other kinds of block-level content, it is certainly reasonable to +infer that *all* block elements under a list item, including other +lists, must be indented four spaces. This principle has been called the +*four-space rule*. + +The four-space rule is clear and principled, and if the reference +implementation `Markdown.pl` had followed it, it probably would have +become the standard. However, `Markdown.pl` allowed paragraphs and +sublists to start with only two spaces indentation, at least on the +outer level. Worse, its behavior was inconsistent: a sublist of an +outer-level list needed two spaces indentation, but a sublist of this +sublist needed three spaces. It is not surprising, then, that different +implementations of Markdown have developed very different rules for +determining what comes under a list item. (Pandoc and python-Markdown, +for example, stuck with Gruber's syntax description and the four-space +rule, while discount, redcarpet, marked, PHP Markdown, and others +followed `Markdown.pl`'s behavior more closely.) + +Unfortunately, given the divergences between implementations, there +is no way to give a spec for list items that will be guaranteed not +to break any existing documents. However, the spec given here should +correctly handle lists formatted with either the four-space rule or +the more forgiving `Markdown.pl` behavior, provided they are laid out +in a way that is natural for a human to read. + +The strategy here is to let the width and indentation of the list marker +determine the indentation necessary for blocks to fall under the list +item, rather than having a fixed and arbitrary number. The writer can +think of the body of the list item as a unit which gets indented to the +right enough to fit the list marker (and any indentation on the list +marker). (The laziness rule, #5, then allows continuation lines to be +unindented if needed.) + +This rule is superior, we claim, to any rule requiring a fixed level of +indentation from the margin. The four-space rule is clear but +unnatural. It is quite unintuitive that + +``` markdown +- foo + + bar + + - baz +``` + +should be parsed as two lists with an intervening paragraph, + +``` html +<ul> +<li>foo</li> +</ul> +<p>bar</p> +<ul> +<li>baz</li> +</ul> +``` + +as the four-space rule demands, rather than a single list, + +``` html +<ul> +<li> +<p>foo</p> +<p>bar</p> +<ul> +<li>baz</li> +</ul> +</li> +</ul> +``` + +The choice of four spaces is arbitrary. It can be learned, but it is +not likely to be guessed, and it trips up beginners regularly. + +Would it help to adopt a two-space rule? The problem is that such +a rule, together with the rule allowing up to three spaces of indentation for +the initial list marker, allows text that is indented *less than* the +original list marker to be included in the list item. For example, +`Markdown.pl` parses + +``` markdown + - one + + two +``` + +as a single list item, with `two` a continuation paragraph: + +``` html +<ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +``` + +and similarly + +``` markdown +> - one +> +> two +``` + +as + +``` html +<blockquote> +<ul> +<li> +<p>one</p> +<p>two</p> +</li> +</ul> +</blockquote> +``` + +This is extremely unintuitive. + +Rather than requiring a fixed indent from the margin, we could require +a fixed indent (say, two spaces, or even one space) from the list marker (which +may itself be indented). This proposal would remove the last anomaly +discussed. Unlike the spec presented above, it would count the following +as a list item with a subparagraph, even though the paragraph `bar` +is not indented as far as the first paragraph `foo`: + +``` markdown + 10. foo + + bar +``` + +Arguably this text does read like a list item with `bar` as a subparagraph, +which may count in favor of the proposal. However, on this proposal indented +code would have to be indented six spaces after the list marker. And this +would break a lot of existing Markdown, which has the pattern: + +``` markdown +1. foo + + indented code +``` + +where the code is indented eight spaces. The spec above, by contrast, will +parse this text as expected, since the code block's indentation is measured +from the beginning of `foo`. + +The one case that needs special treatment is a list item that *starts* +with indented code. How much indentation is required in that case, since +we don't have a "first paragraph" to measure from? Rule #2 simply stipulates +that in such cases, we require one space indentation from the list marker +(and then the normal four spaces for the indented code). This will match the +four-space rule in cases where the list marker plus its initial indentation +takes four spaces (a common case), but diverge in other cases. + +## Lists + +A [list](@) is a sequence of one or more +list items [of the same type]. The list items +may be separated by any number of blank lines. + +Two list items are [of the same type](@) +if they begin with a [list marker] of the same type. +Two list markers are of the +same type if (a) they are bullet list markers using the same character +(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same +delimiter (either `.` or `)`). + +A list is an [ordered list](@) +if its constituent list items begin with +[ordered list markers], and a +[bullet list](@) if its constituent list +items begin with [bullet list markers]. + +The [start number](@) +of an [ordered list] is determined by the list number of +its initial list item. The numbers of subsequent list items are +disregarded. + +A list is [loose](@) if any of its constituent +list items are separated by blank lines, or if any of its constituent +list items directly contain two block-level elements with a blank line +between them. Otherwise a list is [tight](@). +(The difference in HTML output is that paragraphs in a loose list are +wrapped in `<p>` tags, while paragraphs in a tight list are not.) + +Changing the bullet or ordered list delimiter starts a new list: + +```````````````````````````````` example +- foo +- bar ++ baz +. +<ul> +<li>foo</li> +<li>bar</li> +</ul> +<ul> +<li>baz</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +1. foo +2. bar +3) baz +. +<ol> +<li>foo</li> +<li>bar</li> +</ol> +<ol start="3"> +<li>baz</li> +</ol> +```````````````````````````````` + + +In CommonMark, a list can interrupt a paragraph. That is, +no blank line is needed to separate a paragraph from a following +list: + +```````````````````````````````` example +Foo +- bar +- baz +. +<p>Foo</p> +<ul> +<li>bar</li> +<li>baz</li> +</ul> +```````````````````````````````` + +`Markdown.pl` does not allow this, through fear of triggering a list +via a numeral in a hard-wrapped line: + +``` markdown +The number of windows in my house is +14. The number of doors is 6. +``` + +Oddly, though, `Markdown.pl` *does* allow a blockquote to +interrupt a paragraph, even though the same considerations might +apply. + +In CommonMark, we do allow lists to interrupt paragraphs, for +two reasons. First, it is natural and not uncommon for people +to start lists without blank lines: + +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` + +Second, we are attracted to a + +> [principle of uniformity](@): +> if a chunk of text has a certain +> meaning, it will continue to have the same meaning when put into a +> container block (such as a list item or blockquote). + +(Indeed, the spec for [list items] and [block quotes] presupposes +this principle.) This principle implies that if + +``` markdown + * I need to buy + - new shoes + - a coat + - a plane ticket +``` + +is a list item containing a paragraph followed by a nested sublist, +as all Markdown implementations agree it is (though the paragraph +may be rendered without `<p>` tags, since the list is "tight"), +then + +``` markdown +I need to buy +- new shoes +- a coat +- a plane ticket +``` + +by itself should be a paragraph followed by a nested sublist. + +Since it is well established Markdown practice to allow lists to +interrupt paragraphs inside list items, the [principle of +uniformity] requires us to allow this outside list items as +well. ([reStructuredText](http://docutils.sourceforge.net/rst.html) +takes a different approach, requiring blank lines before lists +even inside other list items.) + +In order to solve of unwanted lists in paragraphs with +hard-wrapped numerals, we allow only lists starting with `1` to +interrupt paragraphs. Thus, + +```````````````````````````````` example +The number of windows in my house is +14. The number of doors is 6. +. +<p>The number of windows in my house is +14. The number of doors is 6.</p> +```````````````````````````````` + +We may still get an unintended result in cases like + +```````````````````````````````` example +The number of windows in my house is +1. The number of doors is 6. +. +<p>The number of windows in my house is</p> +<ol> +<li>The number of doors is 6.</li> +</ol> +```````````````````````````````` + +but this rule should prevent most spurious list captures. + +There can be any number of blank lines between items: + +```````````````````````````````` example +- foo + +- bar + + +- baz +. +<ul> +<li> +<p>foo</p> +</li> +<li> +<p>bar</p> +</li> +<li> +<p>baz</p> +</li> +</ul> +```````````````````````````````` + +```````````````````````````````` example +- foo + - bar + - baz + + + bim +. +<ul> +<li>foo +<ul> +<li>bar +<ul> +<li> +<p>baz</p> +<p>bim</p> +</li> +</ul> +</li> +</ul> +</li> +</ul> +```````````````````````````````` + + +To separate consecutive lists of the same type, or to separate a +list from an indented code block that would otherwise be parsed +as a subparagraph of the final list item, you can insert a blank HTML +comment: + +```````````````````````````````` example +- foo +- bar + +<!-- --> + +- baz +- bim +. +<ul> +<li>foo</li> +<li>bar</li> +</ul> +<!-- --> +<ul> +<li>baz</li> +<li>bim</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +- foo + + notcode + +- foo + +<!-- --> + + code +. +<ul> +<li> +<p>foo</p> +<p>notcode</p> +</li> +<li> +<p>foo</p> +</li> +</ul> +<!-- --> +<pre><code>code +</code></pre> +```````````````````````````````` + + +List items need not be indented to the same level. The following +list items will be treated as items at the same list level, +since none is indented enough to belong to the previous list +item: + +```````````````````````````````` example +- a + - b + - c + - d + - e + - f +- g +. +<ul> +<li>a</li> +<li>b</li> +<li>c</li> +<li>d</li> +<li>e</li> +<li>f</li> +<li>g</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +1. a + + 2. b + + 3. c +. +<ol> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +<li> +<p>c</p> +</li> +</ol> +```````````````````````````````` + +Note, however, that list items may not be preceded by more than +three spaces of indentation. Here `- e` is treated as a paragraph continuation +line, because it is indented more than three spaces: + +```````````````````````````````` example +- a + - b + - c + - d + - e +. +<ul> +<li>a</li> +<li>b</li> +<li>c</li> +<li>d +- e</li> +</ul> +```````````````````````````````` + +And here, `3. c` is treated as in indented code block, +because it is indented four spaces and preceded by a +blank line. + +```````````````````````````````` example +1. a + + 2. b + + 3. c +. +<ol> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +</ol> +<pre><code>3. c +</code></pre> +```````````````````````````````` + + +This is a loose list, because there is a blank line between +two of the list items: + +```````````````````````````````` example +- a +- b + +- c +. +<ul> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +<li> +<p>c</p> +</li> +</ul> +```````````````````````````````` + + +So is this, with a empty second item: + +```````````````````````````````` example +* a +* + +* c +. +<ul> +<li> +<p>a</p> +</li> +<li></li> +<li> +<p>c</p> +</li> +</ul> +```````````````````````````````` + + +These are loose lists, even though there are no blank lines between the items, +because one of the items directly contains two block-level elements +with a blank line between them: + +```````````````````````````````` example +- a +- b + + c +- d +. +<ul> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +<p>c</p> +</li> +<li> +<p>d</p> +</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +- a +- b + + [ref]: /url +- d +. +<ul> +<li> +<p>a</p> +</li> +<li> +<p>b</p> +</li> +<li> +<p>d</p> +</li> +</ul> +```````````````````````````````` + + +This is a tight list, because the blank lines are in a code block: + +```````````````````````````````` example +- a +- ``` + b + + + ``` +- c +. +<ul> +<li>a</li> +<li> +<pre><code>b + + +</code></pre> +</li> +<li>c</li> +</ul> +```````````````````````````````` + + +This is a tight list, because the blank line is between two +paragraphs of a sublist. So the sublist is loose while +the outer list is tight: + +```````````````````````````````` example +- a + - b + + c +- d +. +<ul> +<li>a +<ul> +<li> +<p>b</p> +<p>c</p> +</li> +</ul> +</li> +<li>d</li> +</ul> +```````````````````````````````` + + +This is a tight list, because the blank line is inside the +block quote: + +```````````````````````````````` example +* a + > b + > +* c +. +<ul> +<li>a +<blockquote> +<p>b</p> +</blockquote> +</li> +<li>c</li> +</ul> +```````````````````````````````` + + +This list is tight, because the consecutive block elements +are not separated by blank lines: + +```````````````````````````````` example +- a + > b + ``` + c + ``` +- d +. +<ul> +<li>a +<blockquote> +<p>b</p> +</blockquote> +<pre><code>c +</code></pre> +</li> +<li>d</li> +</ul> +```````````````````````````````` + + +A single-paragraph list is tight: + +```````````````````````````````` example +- a +. +<ul> +<li>a</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +- a + - b +. +<ul> +<li>a +<ul> +<li>b</li> +</ul> +</li> +</ul> +```````````````````````````````` + + +This list is loose, because of the blank line between the +two block elements in the list item: + +```````````````````````````````` example +1. ``` + foo + ``` + + bar +. +<ol> +<li> +<pre><code>foo +</code></pre> +<p>bar</p> +</li> +</ol> +```````````````````````````````` + + +Here the outer list is loose, the inner list tight: + +```````````````````````````````` example +* foo + * bar + + baz +. +<ul> +<li> +<p>foo</p> +<ul> +<li>bar</li> +</ul> +<p>baz</p> +</li> +</ul> +```````````````````````````````` + + +```````````````````````````````` example +- a + - b + - c + +- d + - e + - f +. +<ul> +<li> +<p>a</p> +<ul> +<li>b</li> +<li>c</li> +</ul> +</li> +<li> +<p>d</p> +<ul> +<li>e</li> +<li>f</li> +</ul> +</li> +</ul> +```````````````````````````````` + + +# Inlines + +Inlines are parsed sequentially from the beginning of the character +stream to the end (left to right, in left-to-right languages). +Thus, for example, in + +```````````````````````````````` example +`hi`lo` +. +<p><code>hi</code>lo`</p> +```````````````````````````````` + +`hi` is parsed as code, leaving the backtick at the end as a literal +backtick. + + + +## Code spans + +A [backtick string](@) +is a string of one or more backtick characters (`` ` ``) that is neither +preceded nor followed by a backtick. + +A [code span](@) begins with a backtick string and ends with +a backtick string of equal length. The contents of the code span are +the characters between these two backtick strings, normalized in the +following ways: + +- First, [line endings] are converted to [spaces]. +- If the resulting string both begins *and* ends with a [space] + character, but does not consist entirely of [space] + characters, a single [space] character is removed from the + front and back. This allows you to include code that begins + or ends with backtick characters, which must be separated by + whitespace from the opening or closing backtick strings. + +This is a simple code span: + +```````````````````````````````` example +`foo` +. +<p><code>foo</code></p> +```````````````````````````````` + + +Here two backticks are used, because the code contains a backtick. +This example also illustrates stripping of a single leading and +trailing space: + +```````````````````````````````` example +`` foo ` bar `` +. +<p><code>foo ` bar</code></p> +```````````````````````````````` + + +This example shows the motivation for stripping leading and trailing +spaces: + +```````````````````````````````` example +` `` ` +. +<p><code>``</code></p> +```````````````````````````````` + +Note that only *one* space is stripped: + +```````````````````````````````` example +` `` ` +. +<p><code> `` </code></p> +```````````````````````````````` + +The stripping only happens if the space is on both +sides of the string: + +```````````````````````````````` example +` a` +. +<p><code> a</code></p> +```````````````````````````````` + +Only [spaces], and not [unicode whitespace] in general, are +stripped in this way: + +```````````````````````````````` example +` b ` +. +<p><code> b </code></p> +```````````````````````````````` + +No stripping occurs if the code span contains only spaces: + +```````````````````````````````` example +` ` +` ` +. +<p><code> </code> +<code> </code></p> +```````````````````````````````` + + +[Line endings] are treated like spaces: + +```````````````````````````````` example +`` +foo +bar +baz +`` +. +<p><code>foo bar baz</code></p> +```````````````````````````````` + +```````````````````````````````` example +`` +foo +`` +. +<p><code>foo </code></p> +```````````````````````````````` + + +Interior spaces are not collapsed: + +```````````````````````````````` example +`foo bar +baz` +. +<p><code>foo bar baz</code></p> +```````````````````````````````` + +Note that browsers will typically collapse consecutive spaces +when rendering `<code>` elements, so it is recommended that +the following CSS be used: + + code{white-space: pre-wrap;} + + +Note that backslash escapes do not work in code spans. All backslashes +are treated literally: + +```````````````````````````````` example +`foo\`bar` +. +<p><code>foo\</code>bar`</p> +```````````````````````````````` + + +Backslash escapes are never needed, because one can always choose a +string of *n* backtick characters as delimiters, where the code does +not contain any strings of exactly *n* backtick characters. + +```````````````````````````````` example +``foo`bar`` +. +<p><code>foo`bar</code></p> +```````````````````````````````` + +```````````````````````````````` example +` foo `` bar ` +. +<p><code>foo `` bar</code></p> +```````````````````````````````` + + +Code span backticks have higher precedence than any other inline +constructs except HTML tags and autolinks. Thus, for example, this is +not parsed as emphasized text, since the second `*` is part of a code +span: + +```````````````````````````````` example +*foo`*` +. +<p>*foo<code>*</code></p> +```````````````````````````````` + + +And this is not parsed as a link: + +```````````````````````````````` example +[not a `link](/foo`) +. +<p>[not a <code>link](/foo</code>)</p> +```````````````````````````````` + + +Code spans, HTML tags, and autolinks have the same precedence. +Thus, this is code: + +```````````````````````````````` example +`<a href="`">` +. +<p><code><a href="</code>">`</p> +```````````````````````````````` + + +But this is an HTML tag: + +```````````````````````````````` example +<a href="`">` +. +<p><a href="`">`</p> +```````````````````````````````` + + +And this is code: + +```````````````````````````````` example +`<http://foo.bar.`baz>` +. +<p><code><http://foo.bar.</code>baz>`</p> +```````````````````````````````` + + +But this is an autolink: + +```````````````````````````````` example +<http://foo.bar.`baz>` +. +<p><a href="http://foo.bar.%60baz">http://foo.bar.`baz</a>`</p> +```````````````````````````````` + + +When a backtick string is not closed by a matching backtick string, +we just have literal backticks: + +```````````````````````````````` example +```foo`` +. +<p>```foo``</p> +```````````````````````````````` + + +```````````````````````````````` example +`foo +. +<p>`foo</p> +```````````````````````````````` + +The following case also illustrates the need for opening and +closing backtick strings to be equal in length: + +```````````````````````````````` example +`foo``bar`` +. +<p>`foo<code>bar</code></p> +```````````````````````````````` + + +## Emphasis and strong emphasis + +John Gruber's original [Markdown syntax +description](http://daringfireball.net/projects/markdown/syntax#em) says: + +> Markdown treats asterisks (`*`) and underscores (`_`) as indicators of +> emphasis. Text wrapped with one `*` or `_` will be wrapped with an HTML +> `<em>` tag; double `*`'s or `_`'s will be wrapped with an HTML `<strong>` +> tag. + +This is enough for most users, but these rules leave much undecided, +especially when it comes to nested emphasis. The original +`Markdown.pl` test suite makes it clear that triple `***` and +`___` delimiters can be used for strong emphasis, and most +implementations have also allowed the following patterns: + +``` markdown +***strong emph*** +***strong** in emph* +***emph* in strong** +**in strong *emph*** +*in emph **strong*** +``` + +The following patterns are less widely supported, but the intent +is clear and they are useful (especially in contexts like bibliography +entries): + +``` markdown +*emph *with emph* in it* +**strong **with strong** in it** +``` + +Many implementations have also restricted intraword emphasis to +the `*` forms, to avoid unwanted emphasis in words containing +internal underscores. (It is best practice to put these in code +spans, but users often do not.) + +``` markdown +internal emphasis: foo*bar*baz +no emphasis: foo_bar_baz +``` + +The rules given below capture all of these patterns, while allowing +for efficient parsing strategies that do not backtrack. + +First, some definitions. A [delimiter run](@) is either +a sequence of one or more `*` characters that is not preceded or +followed by a non-backslash-escaped `*` character, or a sequence +of one or more `_` characters that is not preceded or followed by +a non-backslash-escaped `_` character. + +A [left-flanking delimiter run](@) is +a [delimiter run] that is (1) not followed by [Unicode whitespace], +and either (2a) not followed by a [Unicode punctuation character], or +(2b) followed by a [Unicode punctuation character] and +preceded by [Unicode whitespace] or a [Unicode punctuation character]. +For purposes of this definition, the beginning and the end of +the line count as Unicode whitespace. + +A [right-flanking delimiter run](@) is +a [delimiter run] that is (1) not preceded by [Unicode whitespace], +and either (2a) not preceded by a [Unicode punctuation character], or +(2b) preceded by a [Unicode punctuation character] and +followed by [Unicode whitespace] or a [Unicode punctuation character]. +For purposes of this definition, the beginning and the end of +the line count as Unicode whitespace. + +Here are some examples of delimiter runs. + + - left-flanking but not right-flanking: + + ``` + ***abc + _abc + **"abc" + _"abc" + ``` + + - right-flanking but not left-flanking: + + ``` + abc*** + abc_ + "abc"** + "abc"_ + ``` + + - Both left and right-flanking: + + ``` + abc***def + "abc"_"def" + ``` + + - Neither left nor right-flanking: + + ``` + abc *** def + a _ b + ``` + +(The idea of distinguishing left-flanking and right-flanking +delimiter runs based on the character before and the character +after comes from Roopesh Chander's +[vfmd](http://www.vfmd.org/vfmd-spec/specification/#procedure-for-identifying-emphasis-tags). +vfmd uses the terminology "emphasis indicator string" instead of "delimiter +run," and its rules for distinguishing left- and right-flanking runs +are a bit more complex than the ones given here.) + +The following rules define emphasis and strong emphasis: + +1. A single `*` character [can open emphasis](@) + iff (if and only if) it is part of a [left-flanking delimiter run]. + +2. A single `_` character [can open emphasis] iff + it is part of a [left-flanking delimiter run] + and either (a) not part of a [right-flanking delimiter run] + or (b) part of a [right-flanking delimiter run] + preceded by a [Unicode punctuation character]. + +3. A single `*` character [can close emphasis](@) + iff it is part of a [right-flanking delimiter run]. + +4. A single `_` character [can close emphasis] iff + it is part of a [right-flanking delimiter run] + and either (a) not part of a [left-flanking delimiter run] + or (b) part of a [left-flanking delimiter run] + followed by a [Unicode punctuation character]. + +5. A double `**` [can open strong emphasis](@) + iff it is part of a [left-flanking delimiter run]. + +6. A double `__` [can open strong emphasis] iff + it is part of a [left-flanking delimiter run] + and either (a) not part of a [right-flanking delimiter run] + or (b) part of a [right-flanking delimiter run] + preceded by a [Unicode punctuation character]. + +7. A double `**` [can close strong emphasis](@) + iff it is part of a [right-flanking delimiter run]. + +8. A double `__` [can close strong emphasis] iff + it is part of a [right-flanking delimiter run] + and either (a) not part of a [left-flanking delimiter run] + or (b) part of a [left-flanking delimiter run] + followed by a [Unicode punctuation character]. + +9. Emphasis begins with a delimiter that [can open emphasis] and ends + with a delimiter that [can close emphasis], and that uses the same + character (`_` or `*`) as the opening delimiter. The + opening and closing delimiters must belong to separate + [delimiter runs]. If one of the delimiters can both + open and close emphasis, then the sum of the lengths of the + delimiter runs containing the opening and closing delimiters + must not be a multiple of 3 unless both lengths are + multiples of 3. + +10. Strong emphasis begins with a delimiter that + [can open strong emphasis] and ends with a delimiter that + [can close strong emphasis], and that uses the same character + (`_` or `*`) as the opening delimiter. The + opening and closing delimiters must belong to separate + [delimiter runs]. If one of the delimiters can both open + and close strong emphasis, then the sum of the lengths of + the delimiter runs containing the opening and closing + delimiters must not be a multiple of 3 unless both lengths + are multiples of 3. + +11. A literal `*` character cannot occur at the beginning or end of + `*`-delimited emphasis or `**`-delimited strong emphasis, unless it + is backslash-escaped. + +12. A literal `_` character cannot occur at the beginning or end of + `_`-delimited emphasis or `__`-delimited strong emphasis, unless it + is backslash-escaped. + +Where rules 1--12 above are compatible with multiple parsings, +the following principles resolve ambiguity: + +13. The number of nestings should be minimized. Thus, for example, + an interpretation `<strong>...</strong>` is always preferred to + `<em><em>...</em></em>`. + +14. An interpretation `<em><strong>...</strong></em>` is always + preferred to `<strong><em>...</em></strong>`. + +15. When two potential emphasis or strong emphasis spans overlap, + so that the second begins before the first ends and ends after + the first ends, the first takes precedence. Thus, for example, + `*foo _bar* baz_` is parsed as `<em>foo _bar</em> baz_` rather + than `*foo <em>bar* baz</em>`. + +16. When there are two potential emphasis or strong emphasis spans + with the same closing delimiter, the shorter one (the one that + opens later) takes precedence. Thus, for example, + `**foo **bar baz**` is parsed as `**foo <strong>bar baz</strong>` + rather than `<strong>foo **bar baz</strong>`. + +17. Inline code spans, links, images, and HTML tags group more tightly + than emphasis. So, when there is a choice between an interpretation + that contains one of these elements and one that does not, the + former always wins. Thus, for example, `*[foo*](bar)` is + parsed as `*<a href="bar">foo*</a>` rather than as + `<em>[foo</em>](bar)`. + +These rules can be illustrated through a series of examples. + +Rule 1: + +```````````````````````````````` example +*foo bar* +. +<p><em>foo bar</em></p> +```````````````````````````````` + + +This is not emphasis, because the opening `*` is followed by +whitespace, and hence not part of a [left-flanking delimiter run]: + +```````````````````````````````` example +a * foo bar* +. +<p>a * foo bar*</p> +```````````````````````````````` + + +This is not emphasis, because the opening `*` is preceded +by an alphanumeric and followed by punctuation, and hence +not part of a [left-flanking delimiter run]: + +```````````````````````````````` example +a*"foo"* +. +<p>a*"foo"*</p> +```````````````````````````````` + + +Unicode nonbreaking spaces count as whitespace, too: + +```````````````````````````````` example +* a * +. +<p>* a *</p> +```````````````````````````````` + + +Intraword emphasis with `*` is permitted: + +```````````````````````````````` example +foo*bar* +. +<p>foo<em>bar</em></p> +```````````````````````````````` + + +```````````````````````````````` example +5*6*78 +. +<p>5<em>6</em>78</p> +```````````````````````````````` + + +Rule 2: + +```````````````````````````````` example +_foo bar_ +. +<p><em>foo bar</em></p> +```````````````````````````````` + + +This is not emphasis, because the opening `_` is followed by +whitespace: + +```````````````````````````````` example +_ foo bar_ +. +<p>_ foo bar_</p> +```````````````````````````````` + + +This is not emphasis, because the opening `_` is preceded +by an alphanumeric and followed by punctuation: + +```````````````````````````````` example +a_"foo"_ +. +<p>a_"foo"_</p> +```````````````````````````````` + + +Emphasis with `_` is not allowed inside words: + +```````````````````````````````` example +foo_bar_ +. +<p>foo_bar_</p> +```````````````````````````````` + + +```````````````````````````````` example +5_6_78 +. +<p>5_6_78</p> +```````````````````````````````` + + +```````````````````````````````` example +пристаням_стремятся_ +. +<p>пристаням_стремятся_</p> +```````````````````````````````` + + +Here `_` does not generate emphasis, because the first delimiter run +is right-flanking and the second left-flanking: + +```````````````````````````````` example +aa_"bb"_cc +. +<p>aa_"bb"_cc</p> +```````````````````````````````` + + +This is emphasis, even though the opening delimiter is +both left- and right-flanking, because it is preceded by +punctuation: + +```````````````````````````````` example +foo-_(bar)_ +. +<p>foo-<em>(bar)</em></p> +```````````````````````````````` + + +Rule 3: + +This is not emphasis, because the closing delimiter does +not match the opening delimiter: + +```````````````````````````````` example +_foo* +. +<p>_foo*</p> +```````````````````````````````` + + +This is not emphasis, because the closing `*` is preceded by +whitespace: + +```````````````````````````````` example +*foo bar * +. +<p>*foo bar *</p> +```````````````````````````````` + + +A line ending also counts as whitespace: + +```````````````````````````````` example +*foo bar +* +. +<p>*foo bar +*</p> +```````````````````````````````` + + +This is not emphasis, because the second `*` is +preceded by punctuation and followed by an alphanumeric +(hence it is not part of a [right-flanking delimiter run]: + +```````````````````````````````` example +*(*foo) +. +<p>*(*foo)</p> +```````````````````````````````` + + +The point of this restriction is more easily appreciated +with this example: + +```````````````````````````````` example +*(*foo*)* +. +<p><em>(<em>foo</em>)</em></p> +```````````````````````````````` + + +Intraword emphasis with `*` is allowed: + +```````````````````````````````` example +*foo*bar +. +<p><em>foo</em>bar</p> +```````````````````````````````` + + + +Rule 4: + +This is not emphasis, because the closing `_` is preceded by +whitespace: + +```````````````````````````````` example +_foo bar _ +. +<p>_foo bar _</p> +```````````````````````````````` + + +This is not emphasis, because the second `_` is +preceded by punctuation and followed by an alphanumeric: + +```````````````````````````````` example +_(_foo) +. +<p>_(_foo)</p> +```````````````````````````````` + + +This is emphasis within emphasis: + +```````````````````````````````` example +_(_foo_)_ +. +<p><em>(<em>foo</em>)</em></p> +```````````````````````````````` + + +Intraword emphasis is disallowed for `_`: + +```````````````````````````````` example +_foo_bar +. +<p>_foo_bar</p> +```````````````````````````````` + + +```````````````````````````````` example +_пристаням_стремятся +. +<p>_пристаням_стремятся</p> +```````````````````````````````` + + +```````````````````````````````` example +_foo_bar_baz_ +. +<p><em>foo_bar_baz</em></p> +```````````````````````````````` + + +This is emphasis, even though the closing delimiter is +both left- and right-flanking, because it is followed by +punctuation: + +```````````````````````````````` example +_(bar)_. +. +<p><em>(bar)</em>.</p> +```````````````````````````````` + + +Rule 5: + +```````````````````````````````` example +**foo bar** +. +<p><strong>foo bar</strong></p> +```````````````````````````````` + + +This is not strong emphasis, because the opening delimiter is +followed by whitespace: + +```````````````````````````````` example +** foo bar** +. +<p>** foo bar**</p> +```````````````````````````````` + + +This is not strong emphasis, because the opening `**` is preceded +by an alphanumeric and followed by punctuation, and hence +not part of a [left-flanking delimiter run]: + +```````````````````````````````` example +a**"foo"** +. +<p>a**"foo"**</p> +```````````````````````````````` + + +Intraword strong emphasis with `**` is permitted: + +```````````````````````````````` example +foo**bar** +. +<p>foo<strong>bar</strong></p> +```````````````````````````````` + + +Rule 6: + +```````````````````````````````` example +__foo bar__ +. +<p><strong>foo bar</strong></p> +```````````````````````````````` + + +This is not strong emphasis, because the opening delimiter is +followed by whitespace: + +```````````````````````````````` example +__ foo bar__ +. +<p>__ foo bar__</p> +```````````````````````````````` + + +A line ending counts as whitespace: +```````````````````````````````` example +__ +foo bar__ +. +<p>__ +foo bar__</p> +```````````````````````````````` + + +This is not strong emphasis, because the opening `__` is preceded +by an alphanumeric and followed by punctuation: + +```````````````````````````````` example +a__"foo"__ +. +<p>a__"foo"__</p> +```````````````````````````````` + + +Intraword strong emphasis is forbidden with `__`: + +```````````````````````````````` example +foo__bar__ +. +<p>foo__bar__</p> +```````````````````````````````` + + +```````````````````````````````` example +5__6__78 +. +<p>5__6__78</p> +```````````````````````````````` + + +```````````````````````````````` example +пристаням__стремятся__ +. +<p>пристаням__стремятся__</p> +```````````````````````````````` + + +```````````````````````````````` example +__foo, __bar__, baz__ +. +<p><strong>foo, <strong>bar</strong>, baz</strong></p> +```````````````````````````````` + + +This is strong emphasis, even though the opening delimiter is +both left- and right-flanking, because it is preceded by +punctuation: + +```````````````````````````````` example +foo-__(bar)__ +. +<p>foo-<strong>(bar)</strong></p> +```````````````````````````````` + + + +Rule 7: + +This is not strong emphasis, because the closing delimiter is preceded +by whitespace: + +```````````````````````````````` example +**foo bar ** +. +<p>**foo bar **</p> +```````````````````````````````` + + +(Nor can it be interpreted as an emphasized `*foo bar *`, because of +Rule 11.) + +This is not strong emphasis, because the second `**` is +preceded by punctuation and followed by an alphanumeric: + +```````````````````````````````` example +**(**foo) +. +<p>**(**foo)</p> +```````````````````````````````` + + +The point of this restriction is more easily appreciated +with these examples: + +```````````````````````````````` example +*(**foo**)* +. +<p><em>(<strong>foo</strong>)</em></p> +```````````````````````````````` + + +```````````````````````````````` example +**Gomphocarpus (*Gomphocarpus physocarpus*, syn. +*Asclepias physocarpa*)** +. +<p><strong>Gomphocarpus (<em>Gomphocarpus physocarpus</em>, syn. +<em>Asclepias physocarpa</em>)</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo "*bar*" foo** +. +<p><strong>foo "<em>bar</em>" foo</strong></p> +```````````````````````````````` + + +Intraword emphasis: + +```````````````````````````````` example +**foo**bar +. +<p><strong>foo</strong>bar</p> +```````````````````````````````` + + +Rule 8: + +This is not strong emphasis, because the closing delimiter is +preceded by whitespace: + +```````````````````````````````` example +__foo bar __ +. +<p>__foo bar __</p> +```````````````````````````````` + + +This is not strong emphasis, because the second `__` is +preceded by punctuation and followed by an alphanumeric: + +```````````````````````````````` example +__(__foo) +. +<p>__(__foo)</p> +```````````````````````````````` + + +The point of this restriction is more easily appreciated +with this example: + +```````````````````````````````` example +_(__foo__)_ +. +<p><em>(<strong>foo</strong>)</em></p> +```````````````````````````````` + + +Intraword strong emphasis is forbidden with `__`: + +```````````````````````````````` example +__foo__bar +. +<p>__foo__bar</p> +```````````````````````````````` + + +```````````````````````````````` example +__пристаням__стремятся +. +<p>__пристаням__стремятся</p> +```````````````````````````````` + + +```````````````````````````````` example +__foo__bar__baz__ +. +<p><strong>foo__bar__baz</strong></p> +```````````````````````````````` + + +This is strong emphasis, even though the closing delimiter is +both left- and right-flanking, because it is followed by +punctuation: + +```````````````````````````````` example +__(bar)__. +. +<p><strong>(bar)</strong>.</p> +```````````````````````````````` + + +Rule 9: + +Any nonempty sequence of inline elements can be the contents of an +emphasized span. + +```````````````````````````````` example +*foo [bar](/url)* +. +<p><em>foo <a href="/url">bar</a></em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo +bar* +. +<p><em>foo +bar</em></p> +```````````````````````````````` + + +In particular, emphasis and strong emphasis can be nested +inside emphasis: + +```````````````````````````````` example +_foo __bar__ baz_ +. +<p><em>foo <strong>bar</strong> baz</em></p> +```````````````````````````````` + + +```````````````````````````````` example +_foo _bar_ baz_ +. +<p><em>foo <em>bar</em> baz</em></p> +```````````````````````````````` + + +```````````````````````````````` example +__foo_ bar_ +. +<p><em><em>foo</em> bar</em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo *bar** +. +<p><em>foo <em>bar</em></em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo **bar** baz* +. +<p><em>foo <strong>bar</strong> baz</em></p> +```````````````````````````````` + +```````````````````````````````` example +*foo**bar**baz* +. +<p><em>foo<strong>bar</strong>baz</em></p> +```````````````````````````````` + +Note that in the preceding case, the interpretation + +``` markdown +<p><em>foo</em><em>bar<em></em>baz</em></p> +``` + + +is precluded by the condition that a delimiter that +can both open and close (like the `*` after `foo`) +cannot form emphasis if the sum of the lengths of +the delimiter runs containing the opening and +closing delimiters is a multiple of 3 unless +both lengths are multiples of 3. + + +For the same reason, we don't get two consecutive +emphasis sections in this example: + +```````````````````````````````` example +*foo**bar* +. +<p><em>foo**bar</em></p> +```````````````````````````````` + + +The same condition ensures that the following +cases are all strong emphasis nested inside +emphasis, even when the interior whitespace is +omitted: + + +```````````````````````````````` example +***foo** bar* +. +<p><em><strong>foo</strong> bar</em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo **bar*** +. +<p><em>foo <strong>bar</strong></em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo**bar*** +. +<p><em>foo<strong>bar</strong></em></p> +```````````````````````````````` + + +When the lengths of the interior closing and opening +delimiter runs are *both* multiples of 3, though, +they can match to create emphasis: + +```````````````````````````````` example +foo***bar***baz +. +<p>foo<em><strong>bar</strong></em>baz</p> +```````````````````````````````` + +```````````````````````````````` example +foo******bar*********baz +. +<p>foo<strong><strong><strong>bar</strong></strong></strong>***baz</p> +```````````````````````````````` + + +Indefinite levels of nesting are possible: + +```````````````````````````````` example +*foo **bar *baz* bim** bop* +. +<p><em>foo <strong>bar <em>baz</em> bim</strong> bop</em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo [*bar*](/url)* +. +<p><em>foo <a href="/url"><em>bar</em></a></em></p> +```````````````````````````````` + + +There can be no empty emphasis or strong emphasis: + +```````````````````````````````` example +** is not an empty emphasis +. +<p>** is not an empty emphasis</p> +```````````````````````````````` + + +```````````````````````````````` example +**** is not an empty strong emphasis +. +<p>**** is not an empty strong emphasis</p> +```````````````````````````````` + + + +Rule 10: + +Any nonempty sequence of inline elements can be the contents of an +strongly emphasized span. + +```````````````````````````````` example +**foo [bar](/url)** +. +<p><strong>foo <a href="/url">bar</a></strong></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo +bar** +. +<p><strong>foo +bar</strong></p> +```````````````````````````````` + + +In particular, emphasis and strong emphasis can be nested +inside strong emphasis: + +```````````````````````````````` example +__foo _bar_ baz__ +. +<p><strong>foo <em>bar</em> baz</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +__foo __bar__ baz__ +. +<p><strong>foo <strong>bar</strong> baz</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +____foo__ bar__ +. +<p><strong><strong>foo</strong> bar</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo **bar**** +. +<p><strong>foo <strong>bar</strong></strong></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo *bar* baz** +. +<p><strong>foo <em>bar</em> baz</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo*bar*baz** +. +<p><strong>foo<em>bar</em>baz</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +***foo* bar** +. +<p><strong><em>foo</em> bar</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo *bar*** +. +<p><strong>foo <em>bar</em></strong></p> +```````````````````````````````` + + +Indefinite levels of nesting are possible: + +```````````````````````````````` example +**foo *bar **baz** +bim* bop** +. +<p><strong>foo <em>bar <strong>baz</strong> +bim</em> bop</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo [*bar*](/url)** +. +<p><strong>foo <a href="/url"><em>bar</em></a></strong></p> +```````````````````````````````` + + +There can be no empty emphasis or strong emphasis: + +```````````````````````````````` example +__ is not an empty emphasis +. +<p>__ is not an empty emphasis</p> +```````````````````````````````` + + +```````````````````````````````` example +____ is not an empty strong emphasis +. +<p>____ is not an empty strong emphasis</p> +```````````````````````````````` + + + +Rule 11: + +```````````````````````````````` example +foo *** +. +<p>foo ***</p> +```````````````````````````````` + + +```````````````````````````````` example +foo *\** +. +<p>foo <em>*</em></p> +```````````````````````````````` + + +```````````````````````````````` example +foo *_* +. +<p>foo <em>_</em></p> +```````````````````````````````` + + +```````````````````````````````` example +foo ***** +. +<p>foo *****</p> +```````````````````````````````` + + +```````````````````````````````` example +foo **\*** +. +<p>foo <strong>*</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +foo **_** +. +<p>foo <strong>_</strong></p> +```````````````````````````````` + + +Note that when delimiters do not match evenly, Rule 11 determines +that the excess literal `*` characters will appear outside of the +emphasis, rather than inside it: + +```````````````````````````````` example +**foo* +. +<p>*<em>foo</em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo** +. +<p><em>foo</em>*</p> +```````````````````````````````` + + +```````````````````````````````` example +***foo** +. +<p>*<strong>foo</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +****foo* +. +<p>***<em>foo</em></p> +```````````````````````````````` + + +```````````````````````````````` example +**foo*** +. +<p><strong>foo</strong>*</p> +```````````````````````````````` + + +```````````````````````````````` example +*foo**** +. +<p><em>foo</em>***</p> +```````````````````````````````` + + + +Rule 12: + +```````````````````````````````` example +foo ___ +. +<p>foo ___</p> +```````````````````````````````` + + +```````````````````````````````` example +foo _\__ +. +<p>foo <em>_</em></p> +```````````````````````````````` + + +```````````````````````````````` example +foo _*_ +. +<p>foo <em>*</em></p> +```````````````````````````````` + + +```````````````````````````````` example +foo _____ +. +<p>foo _____</p> +```````````````````````````````` + + +```````````````````````````````` example +foo __\___ +. +<p>foo <strong>_</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +foo __*__ +. +<p>foo <strong>*</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +__foo_ +. +<p>_<em>foo</em></p> +```````````````````````````````` + + +Note that when delimiters do not match evenly, Rule 12 determines +that the excess literal `_` characters will appear outside of the +emphasis, rather than inside it: + +```````````````````````````````` example +_foo__ +. +<p><em>foo</em>_</p> +```````````````````````````````` + + +```````````````````````````````` example +___foo__ +. +<p>_<strong>foo</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +____foo_ +. +<p>___<em>foo</em></p> +```````````````````````````````` + + +```````````````````````````````` example +__foo___ +. +<p><strong>foo</strong>_</p> +```````````````````````````````` + + +```````````````````````````````` example +_foo____ +. +<p><em>foo</em>___</p> +```````````````````````````````` + + +Rule 13 implies that if you want emphasis nested directly inside +emphasis, you must use different delimiters: + +```````````````````````````````` example +**foo** +. +<p><strong>foo</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +*_foo_* +. +<p><em><em>foo</em></em></p> +```````````````````````````````` + + +```````````````````````````````` example +__foo__ +. +<p><strong>foo</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +_*foo*_ +. +<p><em><em>foo</em></em></p> +```````````````````````````````` + + +However, strong emphasis within strong emphasis is possible without +switching delimiters: + +```````````````````````````````` example +****foo**** +. +<p><strong><strong>foo</strong></strong></p> +```````````````````````````````` + + +```````````````````````````````` example +____foo____ +. +<p><strong><strong>foo</strong></strong></p> +```````````````````````````````` + + + +Rule 13 can be applied to arbitrarily long sequences of +delimiters: + +```````````````````````````````` example +******foo****** +. +<p><strong><strong><strong>foo</strong></strong></strong></p> +```````````````````````````````` + + +Rule 14: + +```````````````````````````````` example +***foo*** +. +<p><em><strong>foo</strong></em></p> +```````````````````````````````` + + +```````````````````````````````` example +_____foo_____ +. +<p><em><strong><strong>foo</strong></strong></em></p> +```````````````````````````````` + + +Rule 15: + +```````````````````````````````` example +*foo _bar* baz_ +. +<p><em>foo _bar</em> baz_</p> +```````````````````````````````` + + +```````````````````````````````` example +*foo __bar *baz bim__ bam* +. +<p><em>foo <strong>bar *baz bim</strong> bam</em></p> +```````````````````````````````` + + +Rule 16: + +```````````````````````````````` example +**foo **bar baz** +. +<p>**foo <strong>bar baz</strong></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo *bar baz* +. +<p>*foo <em>bar baz</em></p> +```````````````````````````````` + + +Rule 17: + +```````````````````````````````` example +*[bar*](/url) +. +<p>*<a href="/url">bar*</a></p> +```````````````````````````````` + + +```````````````````````````````` example +_foo [bar_](/url) +. +<p>_foo <a href="/url">bar_</a></p> +```````````````````````````````` + + +```````````````````````````````` example +*<img src="foo" title="*"/> +. +<p>*<img src="foo" title="*"/></p> +```````````````````````````````` + + +```````````````````````````````` example +**<a href="**"> +. +<p>**<a href="**"></p> +```````````````````````````````` + + +```````````````````````````````` example +__<a href="__"> +. +<p>__<a href="__"></p> +```````````````````````````````` + + +```````````````````````````````` example +*a `*`* +. +<p><em>a <code>*</code></em></p> +```````````````````````````````` + + +```````````````````````````````` example +_a `_`_ +. +<p><em>a <code>_</code></em></p> +```````````````````````````````` + + +```````````````````````````````` example +**a<http://foo.bar/?q=**> +. +<p>**a<a href="http://foo.bar/?q=**">http://foo.bar/?q=**</a></p> +```````````````````````````````` + + +```````````````````````````````` example +__a<http://foo.bar/?q=__> +. +<p>__a<a href="http://foo.bar/?q=__">http://foo.bar/?q=__</a></p> +```````````````````````````````` + + + +## Links + +A link contains [link text] (the visible text), a [link destination] +(the URI that is the link destination), and optionally a [link title]. +There are two basic kinds of links in Markdown. In [inline links] the +destination and title are given immediately after the link text. In +[reference links] the destination and title are defined elsewhere in +the document. + +A [link text](@) consists of a sequence of zero or more +inline elements enclosed by square brackets (`[` and `]`). The +following rules apply: + +- Links may not contain other links, at any level of nesting. If + multiple otherwise valid link definitions appear nested inside each + other, the inner-most definition is used. + +- Brackets are allowed in the [link text] only if (a) they + are backslash-escaped or (b) they appear as a matched pair of brackets, + with an open bracket `[`, a sequence of zero or more inlines, and + a close bracket `]`. + +- Backtick [code spans], [autolinks], and raw [HTML tags] bind more tightly + than the brackets in link text. Thus, for example, + `` [foo`]` `` could not be a link text, since the second `]` + is part of a code span. + +- The brackets in link text bind more tightly than markers for + [emphasis and strong emphasis]. Thus, for example, `*[foo*](url)` is a link. + +A [link destination](@) consists of either + +- a sequence of zero or more characters between an opening `<` and a + closing `>` that contains no line endings or unescaped + `<` or `>` characters, or + +- a nonempty sequence of characters that does not start with `<`, + does not include [ASCII control characters][ASCII control character] + or [space] character, and includes parentheses only if (a) they are + backslash-escaped or (b) they are part of a balanced pair of + unescaped parentheses. + (Implementations may impose limits on parentheses nesting to + avoid performance issues, but at least three levels of nesting + should be supported.) + +A [link title](@) consists of either + +- a sequence of zero or more characters between straight double-quote + characters (`"`), including a `"` character only if it is + backslash-escaped, or + +- a sequence of zero or more characters between straight single-quote + characters (`'`), including a `'` character only if it is + backslash-escaped, or + +- a sequence of zero or more characters between matching parentheses + (`(...)`), including a `(` or `)` character only if it is + backslash-escaped. + +Although [link titles] may span multiple lines, they may not contain +a [blank line]. + +An [inline link](@) consists of a [link text] followed immediately +by a left parenthesis `(`, an optional [link destination], an optional +[link title], and a right parenthesis `)`. +These four components may be separated by spaces, tabs, and up to one line +ending. +If both [link destination] and [link title] are present, they *must* be +separated by spaces, tabs, and up to one line ending. + +The link's text consists of the inlines contained +in the [link text] (excluding the enclosing square brackets). +The link's URI consists of the link destination, excluding enclosing +`<...>` if present, with backslash-escapes in effect as described +above. The link's title consists of the link title, excluding its +enclosing delimiters, with backslash-escapes in effect as described +above. + +Here is a simple inline link: + +```````````````````````````````` example +[link](/uri "title") +. +<p><a href="/uri" title="title">link</a></p> +```````````````````````````````` + + +The title, the link text and even +the destination may be omitted: + +```````````````````````````````` example +[link](/uri) +. +<p><a href="/uri">link</a></p> +```````````````````````````````` + +```````````````````````````````` example +[](./target.md) +. +<p><a href="./target.md"></a></p> +```````````````````````````````` + + +```````````````````````````````` example +[link]() +. +<p><a href="">link</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[link](<>) +. +<p><a href="">link</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[]() +. +<p><a href=""></a></p> +```````````````````````````````` + +The destination can only contain spaces if it is +enclosed in pointy brackets: + +```````````````````````````````` example +[link](/my uri) +. +<p>[link](/my uri)</p> +```````````````````````````````` + +```````````````````````````````` example +[link](</my uri>) +. +<p><a href="/my%20uri">link</a></p> +```````````````````````````````` + +The destination cannot contain line endings, +even if enclosed in pointy brackets: + +```````````````````````````````` example +[link](foo +bar) +. +<p>[link](foo +bar)</p> +```````````````````````````````` + +```````````````````````````````` example +[link](<foo +bar>) +. +<p>[link](<foo +bar>)</p> +```````````````````````````````` + +The destination can contain `)` if it is enclosed +in pointy brackets: + +```````````````````````````````` example +[a](<b)c>) +. +<p><a href="b)c">a</a></p> +```````````````````````````````` + +Pointy brackets that enclose links must be unescaped: + +```````````````````````````````` example +[link](<foo\>) +. +<p>[link](<foo>)</p> +```````````````````````````````` + +These are not links, because the opening pointy bracket +is not matched properly: + +```````````````````````````````` example +[a](<b)c +[a](<b)c> +[a](<b>c) +. +<p>[a](<b)c +[a](<b)c> +[a](<b>c)</p> +```````````````````````````````` + +Parentheses inside the link destination may be escaped: + +```````````````````````````````` example +[link](\(foo\)) +. +<p><a href="(foo)">link</a></p> +```````````````````````````````` + +Any number of parentheses are allowed without escaping, as long as they are +balanced: + +```````````````````````````````` example +[link](foo(and(bar))) +. +<p><a href="foo(and(bar))">link</a></p> +```````````````````````````````` + +However, if you have unbalanced parentheses, you need to escape or use the +`<...>` form: + +```````````````````````````````` example +[link](foo(and(bar)) +. +<p>[link](foo(and(bar))</p> +```````````````````````````````` + + +```````````````````````````````` example +[link](foo\(and\(bar\)) +. +<p><a href="foo(and(bar)">link</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[link](<foo(and(bar)>) +. +<p><a href="foo(and(bar)">link</a></p> +```````````````````````````````` + + +Parentheses and other symbols can also be escaped, as usual +in Markdown: + +```````````````````````````````` example +[link](foo\)\:) +. +<p><a href="foo):">link</a></p> +```````````````````````````````` + + +A link can contain fragment identifiers and queries: + +```````````````````````````````` example +[link](#fragment) + +[link](http://example.com#fragment) + +[link](http://example.com?foo=3#frag) +. +<p><a href="#fragment">link</a></p> +<p><a href="http://example.com#fragment">link</a></p> +<p><a href="http://example.com?foo=3#frag">link</a></p> +```````````````````````````````` + + +Note that a backslash before a non-escapable character is +just a backslash: + +```````````````````````````````` example +[link](foo\bar) +. +<p><a href="foo%5Cbar">link</a></p> +```````````````````````````````` + + +URL-escaping should be left alone inside the destination, as all +URL-escaped characters are also valid URL characters. Entity and +numerical character references in the destination will be parsed +into the corresponding Unicode code points, as usual. These may +be optionally URL-escaped when written as HTML, but this spec +does not enforce any particular policy for rendering URLs in +HTML or other formats. Renderers may make different decisions +about how to escape or normalize URLs in the output. + +```````````````````````````````` example +[link](foo%20bä) +. +<p><a href="foo%20b%C3%A4">link</a></p> +```````````````````````````````` + + +Note that, because titles can often be parsed as destinations, +if you try to omit the destination and keep the title, you'll +get unexpected results: + +```````````````````````````````` example +[link]("title") +. +<p><a href="%22title%22">link</a></p> +```````````````````````````````` + + +Titles may be in single quotes, double quotes, or parentheses: + +```````````````````````````````` example +[link](/url "title") +[link](/url 'title') +[link](/url (title)) +. +<p><a href="/url" title="title">link</a> +<a href="/url" title="title">link</a> +<a href="/url" title="title">link</a></p> +```````````````````````````````` + + +Backslash escapes and entity and numeric character references +may be used in titles: + +```````````````````````````````` example +[link](/url "title \""") +. +<p><a href="/url" title="title """>link</a></p> +```````````````````````````````` + + +Titles must be separated from the link using spaces, tabs, and up to one line +ending. +Other [Unicode whitespace] like non-breaking space doesn't work. + +```````````````````````````````` example +[link](/url "title") +. +<p><a href="/url%C2%A0%22title%22">link</a></p> +```````````````````````````````` + + +Nested balanced quotes are not allowed without escaping: + +```````````````````````````````` example +[link](/url "title "and" title") +. +<p>[link](/url "title "and" title")</p> +```````````````````````````````` + + +But it is easy to work around this by using a different quote type: + +```````````````````````````````` example +[link](/url 'title "and" title') +. +<p><a href="/url" title="title "and" title">link</a></p> +```````````````````````````````` + + +(Note: `Markdown.pl` did allow double quotes inside a double-quoted +title, and its test suite included a test demonstrating this. +But it is hard to see a good rationale for the extra complexity this +brings, since there are already many ways---backslash escaping, +entity and numeric character references, or using a different +quote type for the enclosing title---to write titles containing +double quotes. `Markdown.pl`'s handling of titles has a number +of other strange features. For example, it allows single-quoted +titles in inline links, but not reference links. And, in +reference links but not inline links, it allows a title to begin +with `"` and end with `)`. `Markdown.pl` 1.0.1 even allows +titles with no closing quotation mark, though 1.0.2b8 does not. +It seems preferable to adopt a simple, rational rule that works +the same way in inline links and link reference definitions.) + +Spaces, tabs, and up to one line ending is allowed around the destination and +title: + +```````````````````````````````` example +[link]( /uri + "title" ) +. +<p><a href="/uri" title="title">link</a></p> +```````````````````````````````` + + +But it is not allowed between the link text and the +following parenthesis: + +```````````````````````````````` example +[link] (/uri) +. +<p>[link] (/uri)</p> +```````````````````````````````` + + +The link text may contain balanced brackets, but not unbalanced ones, +unless they are escaped: + +```````````````````````````````` example +[link [foo [bar]]](/uri) +. +<p><a href="/uri">link [foo [bar]]</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[link] bar](/uri) +. +<p>[link] bar](/uri)</p> +```````````````````````````````` + + +```````````````````````````````` example +[link [bar](/uri) +. +<p>[link <a href="/uri">bar</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[link \[bar](/uri) +. +<p><a href="/uri">link [bar</a></p> +```````````````````````````````` + + +The link text may contain inline content: + +```````````````````````````````` example +[link *foo **bar** `#`*](/uri) +. +<p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p> +```````````````````````````````` + + +```````````````````````````````` example +[![moon](moon.jpg)](/uri) +. +<p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p> +```````````````````````````````` + + +However, links may not contain other links, at any level of nesting. + +```````````````````````````````` example +[foo [bar](/uri)](/uri) +. +<p>[foo <a href="/uri">bar</a>](/uri)</p> +```````````````````````````````` + + +```````````````````````````````` example +[foo *[bar [baz](/uri)](/uri)*](/uri) +. +<p>[foo <em>[bar <a href="/uri">baz</a>](/uri)</em>](/uri)</p> +```````````````````````````````` + + +```````````````````````````````` example +![[[foo](uri1)](uri2)](uri3) +. +<p><img src="uri3" alt="[foo](uri2)" /></p> +```````````````````````````````` + + +These cases illustrate the precedence of link text grouping over +emphasis grouping: + +```````````````````````````````` example +*[foo*](/uri) +. +<p>*<a href="/uri">foo*</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo *bar](baz*) +. +<p><a href="baz*">foo *bar</a></p> +```````````````````````````````` + + +Note that brackets that *aren't* part of links do not take +precedence: + +```````````````````````````````` example +*foo [bar* baz] +. +<p><em>foo [bar</em> baz]</p> +```````````````````````````````` + + +These cases illustrate the precedence of HTML tags, code spans, +and autolinks over link grouping: + +```````````````````````````````` example +[foo <bar attr="](baz)"> +. +<p>[foo <bar attr="](baz)"></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo`](/uri)` +. +<p>[foo<code>](/uri)</code></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo<http://example.com/?search=](uri)> +. +<p>[foo<a href="http://example.com/?search=%5D(uri)">http://example.com/?search=](uri)</a></p> +```````````````````````````````` + + +There are three kinds of [reference link](@)s: +[full](#full-reference-link), [collapsed](#collapsed-reference-link), +and [shortcut](#shortcut-reference-link). + +A [full reference link](@) +consists of a [link text] immediately followed by a [link label] +that [matches] a [link reference definition] elsewhere in the document. + +A [link label](@) begins with a left bracket (`[`) and ends +with the first right bracket (`]`) that is not backslash-escaped. +Between these brackets there must be at least one character that is not a space, +tab, or line ending. +Unescaped square bracket characters are not allowed inside the +opening and closing square brackets of [link labels]. A link +label can have at most 999 characters inside the square +brackets. + +One label [matches](@) +another just in case their normalized forms are equal. To normalize a +label, strip off the opening and closing brackets, +perform the *Unicode case fold*, strip leading and trailing +spaces, tabs, and line endings, and collapse consecutive internal +spaces, tabs, and line endings to a single space. If there are multiple +matching reference link definitions, the one that comes first in the +document is used. (It is desirable in such cases to emit a warning.) + +The link's URI and title are provided by the matching [link +reference definition]. + +Here is a simple example: + +```````````````````````````````` example +[foo][bar] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +```````````````````````````````` + + +The rules for the [link text] are the same as with +[inline links]. Thus: + +The link text may contain balanced brackets, but not unbalanced ones, +unless they are escaped: + +```````````````````````````````` example +[link [foo [bar]]][ref] + +[ref]: /uri +. +<p><a href="/uri">link [foo [bar]]</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[link \[bar][ref] + +[ref]: /uri +. +<p><a href="/uri">link [bar</a></p> +```````````````````````````````` + + +The link text may contain inline content: + +```````````````````````````````` example +[link *foo **bar** `#`*][ref] + +[ref]: /uri +. +<p><a href="/uri">link <em>foo <strong>bar</strong> <code>#</code></em></a></p> +```````````````````````````````` + + +```````````````````````````````` example +[![moon](moon.jpg)][ref] + +[ref]: /uri +. +<p><a href="/uri"><img src="moon.jpg" alt="moon" /></a></p> +```````````````````````````````` + + +However, links may not contain other links, at any level of nesting. + +```````````````````````````````` example +[foo [bar](/uri)][ref] + +[ref]: /uri +. +<p>[foo <a href="/uri">bar</a>]<a href="/uri">ref</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo *bar [baz][ref]*][ref] + +[ref]: /uri +. +<p>[foo <em>bar <a href="/uri">baz</a></em>]<a href="/uri">ref</a></p> +```````````````````````````````` + + +(In the examples above, we have two [shortcut reference links] +instead of one [full reference link].) + +The following cases illustrate the precedence of link text grouping over +emphasis grouping: + +```````````````````````````````` example +*[foo*][ref] + +[ref]: /uri +. +<p>*<a href="/uri">foo*</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo *bar][ref]* + +[ref]: /uri +. +<p><a href="/uri">foo *bar</a>*</p> +```````````````````````````````` + + +These cases illustrate the precedence of HTML tags, code spans, +and autolinks over link grouping: + +```````````````````````````````` example +[foo <bar attr="][ref]"> + +[ref]: /uri +. +<p>[foo <bar attr="][ref]"></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo`][ref]` + +[ref]: /uri +. +<p>[foo<code>][ref]</code></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo<http://example.com/?search=][ref]> + +[ref]: /uri +. +<p>[foo<a href="http://example.com/?search=%5D%5Bref%5D">http://example.com/?search=][ref]</a></p> +```````````````````````````````` + + +Matching is case-insensitive: + +```````````````````````````````` example +[foo][BaR] + +[bar]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +```````````````````````````````` + + +Unicode case fold is used: + +```````````````````````````````` example +[ẞ] + +[SS]: /url +. +<p><a href="/url">ẞ</a></p> +```````````````````````````````` + + +Consecutive internal spaces, tabs, and line endings are treated as one space for +purposes of determining matching: + +```````````````````````````````` example +[Foo + bar]: /url + +[Baz][Foo bar] +. +<p><a href="/url">Baz</a></p> +```````````````````````````````` + + +No spaces, tabs, or line endings are allowed between the [link text] and the +[link label]: + +```````````````````````````````` example +[foo] [bar] + +[bar]: /url "title" +. +<p>[foo] <a href="/url" title="title">bar</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[foo] +[bar] + +[bar]: /url "title" +. +<p>[foo] +<a href="/url" title="title">bar</a></p> +```````````````````````````````` + + +This is a departure from John Gruber's original Markdown syntax +description, which explicitly allows whitespace between the link +text and the link label. It brings reference links in line with +[inline links], which (according to both original Markdown and +this spec) cannot have whitespace after the link text. More +importantly, it prevents inadvertent capture of consecutive +[shortcut reference links]. If whitespace is allowed between the +link text and the link label, then in the following we will have +a single reference link, not two shortcut reference links, as +intended: + +``` markdown +[foo] +[bar] + +[foo]: /url1 +[bar]: /url2 +``` + +(Note that [shortcut reference links] were introduced by Gruber +himself in a beta version of `Markdown.pl`, but never included +in the official syntax description. Without shortcut reference +links, it is harmless to allow space between the link text and +link label; but once shortcut references are introduced, it is +too dangerous to allow this, as it frequently leads to +unintended results.) + +When there are multiple matching [link reference definitions], +the first is used: + +```````````````````````````````` example +[foo]: /url1 + +[foo]: /url2 + +[bar][foo] +. +<p><a href="/url1">bar</a></p> +```````````````````````````````` + + +Note that matching is performed on normalized strings, not parsed +inline content. So the following does not match, even though the +labels define equivalent inline content: + +```````````````````````````````` example +[bar][foo\!] + +[foo!]: /url +. +<p>[bar][foo!]</p> +```````````````````````````````` + + +[Link labels] cannot contain brackets, unless they are +backslash-escaped: + +```````````````````````````````` example +[foo][ref[] + +[ref[]: /uri +. +<p>[foo][ref[]</p> +<p>[ref[]: /uri</p> +```````````````````````````````` + + +```````````````````````````````` example +[foo][ref[bar]] + +[ref[bar]]: /uri +. +<p>[foo][ref[bar]]</p> +<p>[ref[bar]]: /uri</p> +```````````````````````````````` + + +```````````````````````````````` example +[[[foo]]] + +[[[foo]]]: /url +. +<p>[[[foo]]]</p> +<p>[[[foo]]]: /url</p> +```````````````````````````````` + + +```````````````````````````````` example +[foo][ref\[] + +[ref\[]: /uri +. +<p><a href="/uri">foo</a></p> +```````````````````````````````` + + +Note that in this example `]` is not backslash-escaped: + +```````````````````````````````` example +[bar\\]: /uri + +[bar\\] +. +<p><a href="/uri">bar\</a></p> +```````````````````````````````` + + +A [link label] must contain at least one character that is not a space, tab, or +line ending: + +```````````````````````````````` example +[] + +[]: /uri +. +<p>[]</p> +<p>[]: /uri</p> +```````````````````````````````` + + +```````````````````````````````` example +[ + ] + +[ + ]: /uri +. +<p>[ +]</p> +<p>[ +]: /uri</p> +```````````````````````````````` + + +A [collapsed reference link](@) +consists of a [link label] that [matches] a +[link reference definition] elsewhere in the +document, followed by the string `[]`. +The contents of the first link label are parsed as inlines, +which are used as the link's text. The link's URI and title are +provided by the matching reference link definition. Thus, +`[foo][]` is equivalent to `[foo][foo]`. + +```````````````````````````````` example +[foo][] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[*foo* bar][] + +[*foo* bar]: /url "title" +. +<p><a href="/url" title="title"><em>foo</em> bar</a></p> +```````````````````````````````` + + +The link labels are case-insensitive: + +```````````````````````````````` example +[Foo][] + +[foo]: /url "title" +. +<p><a href="/url" title="title">Foo</a></p> +```````````````````````````````` + + + +As with full reference links, spaces, tabs, or line endings are not +allowed between the two sets of brackets: + +```````````````````````````````` example +[foo] +[] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a> +[]</p> +```````````````````````````````` + + +A [shortcut reference link](@) +consists of a [link label] that [matches] a +[link reference definition] elsewhere in the +document and is not followed by `[]` or a link label. +The contents of the first link label are parsed as inlines, +which are used as the link's text. The link's URI and title +are provided by the matching link reference definition. +Thus, `[foo]` is equivalent to `[foo][]`. + +```````````````````````````````` example +[foo] + +[foo]: /url "title" +. +<p><a href="/url" title="title">foo</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[*foo* bar] + +[*foo* bar]: /url "title" +. +<p><a href="/url" title="title"><em>foo</em> bar</a></p> +```````````````````````````````` + + +```````````````````````````````` example +[[*foo* bar]] + +[*foo* bar]: /url "title" +. +<p>[<a href="/url" title="title"><em>foo</em> bar</a>]</p> +```````````````````````````````` + + +```````````````````````````````` example +[[bar [foo] + +[foo]: /url +. +<p>[[bar <a href="/url">foo</a></p> +```````````````````````````````` + + +The link labels are case-insensitive: + +```````````````````````````````` example +[Foo] + +[foo]: /url "title" +. +<p><a href="/url" title="title">Foo</a></p> +```````````````````````````````` + + +A space after the link text should be preserved: + +```````````````````````````````` example +[foo] bar + +[foo]: /url +. +<p><a href="/url">foo</a> bar</p> +```````````````````````````````` + + +If you just want bracketed text, you can backslash-escape the +opening bracket to avoid links: + +```````````````````````````````` example +\[foo] + +[foo]: /url "title" +. +<p>[foo]</p> +```````````````````````````````` + + +Note that this is a link, because a link label ends with the first +following closing bracket: + +```````````````````````````````` example +[foo*]: /url + +*[foo*] +. +<p>*<a href="/url">foo*</a></p> +```````````````````````````````` + + +Full and compact references take precedence over shortcut +references: + +```````````````````````````````` example +[foo][bar] + +[foo]: /url1 +[bar]: /url2 +. +<p><a href="/url2">foo</a></p> +```````````````````````````````` + +```````````````````````````````` example +[foo][] + +[foo]: /url1 +. +<p><a href="/url1">foo</a></p> +```````````````````````````````` + +Inline links also take precedence: + +```````````````````````````````` example +[foo]() + +[foo]: /url1 +. +<p><a href="">foo</a></p> +```````````````````````````````` + +```````````````````````````````` example +[foo](not a link) + +[foo]: /url1 +. +<p><a href="/url1">foo</a>(not a link)</p> +```````````````````````````````` + +In the following case `[bar][baz]` is parsed as a reference, +`[foo]` as normal text: + +```````````````````````````````` example +[foo][bar][baz] + +[baz]: /url +. +<p>[foo]<a href="/url">bar</a></p> +```````````````````````````````` + + +Here, though, `[foo][bar]` is parsed as a reference, since +`[bar]` is defined: + +```````````````````````````````` example +[foo][bar][baz] + +[baz]: /url1 +[bar]: /url2 +. +<p><a href="/url2">foo</a><a href="/url1">baz</a></p> +```````````````````````````````` + + +Here `[foo]` is not parsed as a shortcut reference, because it +is followed by a link label (even though `[bar]` is not defined): + +```````````````````````````````` example +[foo][bar][baz] + +[baz]: /url1 +[foo]: /url2 +. +<p>[foo]<a href="/url1">bar</a></p> +```````````````````````````````` + + + +## Images + +Syntax for images is like the syntax for links, with one +difference. Instead of [link text], we have an +[image description](@). The rules for this are the +same as for [link text], except that (a) an +image description starts with `![` rather than `[`, and +(b) an image description may contain links. +An image description has inline elements +as its contents. When an image is rendered to HTML, +this is standardly used as the image's `alt` attribute. + +```````````````````````````````` example +![foo](/url "title") +. +<p><img src="/url" alt="foo" title="title" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![foo *bar*] + +[foo *bar*]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![foo ![bar](/url)](/url2) +. +<p><img src="/url2" alt="foo bar" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![foo [bar](/url)](/url2) +. +<p><img src="/url2" alt="foo bar" /></p> +```````````````````````````````` + + +Though this spec is concerned with parsing, not rendering, it is +recommended that in rendering to HTML, only the plain string content +of the [image description] be used. Note that in +the above example, the alt attribute's value is `foo bar`, not `foo +[bar](/url)` or `foo <a href="/url">bar</a>`. Only the plain string +content is rendered, without formatting. + +```````````````````````````````` example +![foo *bar*][] + +[foo *bar*]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![foo *bar*][foobar] + +[FOOBAR]: train.jpg "train & tracks" +. +<p><img src="train.jpg" alt="foo bar" title="train & tracks" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![foo](train.jpg) +. +<p><img src="train.jpg" alt="foo" /></p> +```````````````````````````````` + + +```````````````````````````````` example +My ![foo bar](/path/to/train.jpg "title" ) +. +<p>My <img src="/path/to/train.jpg" alt="foo bar" title="title" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![foo](<url>) +. +<p><img src="url" alt="foo" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![](/url) +. +<p><img src="/url" alt="" /></p> +```````````````````````````````` + + +Reference-style: + +```````````````````````````````` example +![foo][bar] + +[bar]: /url +. +<p><img src="/url" alt="foo" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![foo][bar] + +[BAR]: /url +. +<p><img src="/url" alt="foo" /></p> +```````````````````````````````` + + +Collapsed: + +```````````````````````````````` example +![foo][] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![*foo* bar][] + +[*foo* bar]: /url "title" +. +<p><img src="/url" alt="foo bar" title="title" /></p> +```````````````````````````````` + + +The labels are case-insensitive: + +```````````````````````````````` example +![Foo][] + +[foo]: /url "title" +. +<p><img src="/url" alt="Foo" title="title" /></p> +```````````````````````````````` + + +As with reference links, spaces, tabs, and line endings, are not allowed +between the two sets of brackets: + +```````````````````````````````` example +![foo] +[] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /> +[]</p> +```````````````````````````````` + + +Shortcut: + +```````````````````````````````` example +![foo] + +[foo]: /url "title" +. +<p><img src="/url" alt="foo" title="title" /></p> +```````````````````````````````` + + +```````````````````````````````` example +![*foo* bar] + +[*foo* bar]: /url "title" +. +<p><img src="/url" alt="foo bar" title="title" /></p> +```````````````````````````````` + + +Note that link labels cannot contain unescaped brackets: + +```````````````````````````````` example +![[foo]] + +[[foo]]: /url "title" +. +<p>![[foo]]</p> +<p>[[foo]]: /url "title"</p> +```````````````````````````````` + + +The link labels are case-insensitive: + +```````````````````````````````` example +![Foo] + +[foo]: /url "title" +. +<p><img src="/url" alt="Foo" title="title" /></p> +```````````````````````````````` + + +If you just want a literal `!` followed by bracketed text, you can +backslash-escape the opening `[`: + +```````````````````````````````` example +!\[foo] + +[foo]: /url "title" +. +<p>![foo]</p> +```````````````````````````````` + + +If you want a link after a literal `!`, backslash-escape the +`!`: + +```````````````````````````````` example +\![foo] + +[foo]: /url "title" +. +<p>!<a href="/url" title="title">foo</a></p> +```````````````````````````````` + + +## Autolinks + +[Autolink](@)s are absolute URIs and email addresses inside +`<` and `>`. They are parsed as links, with the URL or email address +as the link label. + +A [URI autolink](@) consists of `<`, followed by an +[absolute URI] followed by `>`. It is parsed as +a link to the URI, with the URI as the link's label. + +An [absolute URI](@), +for these purposes, consists of a [scheme] followed by a colon (`:`) +followed by zero or more characters other [ASCII control +characters][ASCII control character], [space], `<`, and `>`. +If the URI includes these characters, they must be percent-encoded +(e.g. `%20` for a space). + +For purposes of this spec, a [scheme](@) is any sequence +of 2--32 characters beginning with an ASCII letter and followed +by any combination of ASCII letters, digits, or the symbols plus +("+"), period ("."), or hyphen ("-"). + +Here are some valid autolinks: + +```````````````````````````````` example +<http://foo.bar.baz> +. +<p><a href="http://foo.bar.baz">http://foo.bar.baz</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<http://foo.bar.baz/test?q=hello&id=22&boolean> +. +<p><a href="http://foo.bar.baz/test?q=hello&id=22&boolean">http://foo.bar.baz/test?q=hello&id=22&boolean</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<irc://foo.bar:2233/baz> +. +<p><a href="irc://foo.bar:2233/baz">irc://foo.bar:2233/baz</a></p> +```````````````````````````````` + + +Uppercase is also fine: + +```````````````````````````````` example +<MAILTO:FOO@BAR.BAZ> +. +<p><a href="MAILTO:FOO@BAR.BAZ">MAILTO:FOO@BAR.BAZ</a></p> +```````````````````````````````` + + +Note that many strings that count as [absolute URIs] for +purposes of this spec are not valid URIs, because their +schemes are not registered or because of other problems +with their syntax: + +```````````````````````````````` example +<a+b+c:d> +. +<p><a href="a+b+c:d">a+b+c:d</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<made-up-scheme://foo,bar> +. +<p><a href="made-up-scheme://foo,bar">made-up-scheme://foo,bar</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<http://../> +. +<p><a href="http://../">http://../</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<localhost:5001/foo> +. +<p><a href="localhost:5001/foo">localhost:5001/foo</a></p> +```````````````````````````````` + + +Spaces are not allowed in autolinks: + +```````````````````````````````` example +<http://foo.bar/baz bim> +. +<p><http://foo.bar/baz bim></p> +```````````````````````````````` + + +Backslash-escapes do not work inside autolinks: + +```````````````````````````````` example +<http://example.com/\[\> +. +<p><a href="http://example.com/%5C%5B%5C">http://example.com/\[\</a></p> +```````````````````````````````` + + +An [email autolink](@) +consists of `<`, followed by an [email address], +followed by `>`. The link's label is the email address, +and the URL is `mailto:` followed by the email address. + +An [email address](@), +for these purposes, is anything that matches +the [non-normative regex from the HTML5 +spec](https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)): + + /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? + (?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/ + +Examples of email autolinks: + +```````````````````````````````` example +<foo@bar.example.com> +. +<p><a href="mailto:foo@bar.example.com">foo@bar.example.com</a></p> +```````````````````````````````` + + +```````````````````````````````` example +<foo+special@Bar.baz-bar0.com> +. +<p><a href="mailto:foo+special@Bar.baz-bar0.com">foo+special@Bar.baz-bar0.com</a></p> +```````````````````````````````` + + +Backslash-escapes do not work inside email autolinks: + +```````````````````````````````` example +<foo\+@bar.example.com> +. +<p><foo+@bar.example.com></p> +```````````````````````````````` + + +These are not autolinks: + +```````````````````````````````` example +<> +. +<p><></p> +```````````````````````````````` + + +```````````````````````````````` example +< http://foo.bar > +. +<p>< http://foo.bar ></p> +```````````````````````````````` + + +```````````````````````````````` example +<m:abc> +. +<p><m:abc></p> +```````````````````````````````` + + +```````````````````````````````` example +<foo.bar.baz> +. +<p><foo.bar.baz></p> +```````````````````````````````` + + +```````````````````````````````` example +http://example.com +. +<p>http://example.com</p> +```````````````````````````````` + + +```````````````````````````````` example +foo@bar.example.com +. +<p>foo@bar.example.com</p> +```````````````````````````````` + + +## Raw HTML + +Text between `<` and `>` that looks like an HTML tag is parsed as a +raw HTML tag and will be rendered in HTML without escaping. +Tag and attribute names are not limited to current HTML tags, +so custom tags (and even, say, DocBook tags) may be used. + +Here is the grammar for tags: + +A [tag name](@) consists of an ASCII letter +followed by zero or more ASCII letters, digits, or +hyphens (`-`). + +An [attribute](@) consists of spaces, tabs, and up to one line ending, +an [attribute name], and an optional +[attribute value specification]. + +An [attribute name](@) +consists of an ASCII letter, `_`, or `:`, followed by zero or more ASCII +letters, digits, `_`, `.`, `:`, or `-`. (Note: This is the XML +specification restricted to ASCII. HTML5 is laxer.) + +An [attribute value specification](@) +consists of optional spaces, tabs, and up to one line ending, +a `=` character, optional spaces, tabs, and up to one line ending, +and an [attribute value]. + +An [attribute value](@) +consists of an [unquoted attribute value], +a [single-quoted attribute value], or a [double-quoted attribute value]. + +An [unquoted attribute value](@) +is a nonempty string of characters not +including spaces, tabs, line endings, `"`, `'`, `=`, `<`, `>`, or `` ` ``. + +A [single-quoted attribute value](@) +consists of `'`, zero or more +characters not including `'`, and a final `'`. + +A [double-quoted attribute value](@) +consists of `"`, zero or more +characters not including `"`, and a final `"`. + +An [open tag](@) consists of a `<` character, a [tag name], +zero or more [attributes], optional spaces, tabs, and up to one line ending, +an optional `/` character, and a `>` character. + +A [closing tag](@) consists of the string `</`, a +[tag name], optional spaces, tabs, and up to one line ending, and the character +`>`. + +An [HTML comment](@) consists of `<!--` + *text* + `-->`, +where *text* does not start with `>` or `->`, does not end with `-`, +and does not contain `--`. (See the +[HTML5 spec](http://www.w3.org/TR/html5/syntax.html#comments).) + +A [processing instruction](@) +consists of the string `<?`, a string +of characters not including the string `?>`, and the string +`?>`. + +A [declaration](@) consists of the string `<!`, an ASCII letter, zero or more +characters not including the character `>`, and the character `>`. + +A [CDATA section](@) consists of +the string `<![CDATA[`, a string of characters not including the string +`]]>`, and the string `]]>`. + +An [HTML tag](@) consists of an [open tag], a [closing tag], +an [HTML comment], a [processing instruction], a [declaration], +or a [CDATA section]. + +Here are some simple open tags: + +```````````````````````````````` example +<a><bab><c2c> +. +<p><a><bab><c2c></p> +```````````````````````````````` + + +Empty elements: + +```````````````````````````````` example +<a/><b2/> +. +<p><a/><b2/></p> +```````````````````````````````` + + +Whitespace is allowed: + +```````````````````````````````` example +<a /><b2 +data="foo" > +. +<p><a /><b2 +data="foo" ></p> +```````````````````````````````` + + +With attributes: + +```````````````````````````````` example +<a foo="bar" bam = 'baz <em>"</em>' +_boolean zoop:33=zoop:33 /> +. +<p><a foo="bar" bam = 'baz <em>"</em>' +_boolean zoop:33=zoop:33 /></p> +```````````````````````````````` + + +Custom tag names can be used: + +```````````````````````````````` example +Foo <responsive-image src="foo.jpg" /> +. +<p>Foo <responsive-image src="foo.jpg" /></p> +```````````````````````````````` + + +Illegal tag names, not parsed as HTML: + +```````````````````````````````` example +<33> <__> +. +<p><33> <__></p> +```````````````````````````````` + + +Illegal attribute names: + +```````````````````````````````` example +<a h*#ref="hi"> +. +<p><a h*#ref="hi"></p> +```````````````````````````````` + + +Illegal attribute values: + +```````````````````````````````` example +<a href="hi'> <a href=hi'> +. +<p><a href="hi'> <a href=hi'></p> +```````````````````````````````` + + +Illegal whitespace: + +```````````````````````````````` example +< a>< +foo><bar/ > +<foo bar=baz +bim!bop /> +. +<p>< a>< +foo><bar/ > +<foo bar=baz +bim!bop /></p> +```````````````````````````````` + + +Missing whitespace: + +```````````````````````````````` example +<a href='bar'title=title> +. +<p><a href='bar'title=title></p> +```````````````````````````````` + + +Closing tags: + +```````````````````````````````` example +</a></foo > +. +<p></a></foo ></p> +```````````````````````````````` + + +Illegal attributes in closing tag: + +```````````````````````````````` example +</a href="foo"> +. +<p></a href="foo"></p> +```````````````````````````````` + + +Comments: + +```````````````````````````````` example +foo <!-- this is a +comment - with hyphen --> +. +<p>foo <!-- this is a +comment - with hyphen --></p> +```````````````````````````````` + + +```````````````````````````````` example +foo <!-- not a comment -- two hyphens --> +. +<p>foo <!-- not a comment -- two hyphens --></p> +```````````````````````````````` + + +Not comments: + +```````````````````````````````` example +foo <!--> foo --> + +foo <!-- foo---> +. +<p>foo <!--> foo --></p> +<p>foo <!-- foo---></p> +```````````````````````````````` + + +Processing instructions: + +```````````````````````````````` example +foo <?php echo $a; ?> +. +<p>foo <?php echo $a; ?></p> +```````````````````````````````` + + +Declarations: + +```````````````````````````````` example +foo <!ELEMENT br EMPTY> +. +<p>foo <!ELEMENT br EMPTY></p> +```````````````````````````````` + + +CDATA sections: + +```````````````````````````````` example +foo <![CDATA[>&<]]> +. +<p>foo <![CDATA[>&<]]></p> +```````````````````````````````` + + +Entity and numeric character references are preserved in HTML +attributes: + +```````````````````````````````` example +foo <a href="ö"> +. +<p>foo <a href="ö"></p> +```````````````````````````````` + + +Backslash escapes do not work in HTML attributes: + +```````````````````````````````` example +foo <a href="\*"> +. +<p>foo <a href="\*"></p> +```````````````````````````````` + + +```````````````````````````````` example +<a href="\""> +. +<p><a href="""></p> +```````````````````````````````` + + +## Hard line breaks + +A line ending (not in a code span or HTML tag) that is preceded +by two or more spaces and does not occur at the end of a block +is parsed as a [hard line break](@) (rendered +in HTML as a `<br />` tag): + +```````````````````````````````` example +foo +baz +. +<p>foo<br /> +baz</p> +```````````````````````````````` + + +For a more visible alternative, a backslash before the +[line ending] may be used instead of two or more spaces: + +```````````````````````````````` example +foo\ +baz +. +<p>foo<br /> +baz</p> +```````````````````````````````` + + +More than two spaces can be used: + +```````````````````````````````` example +foo +baz +. +<p>foo<br /> +baz</p> +```````````````````````````````` + + +Leading spaces at the beginning of the next line are ignored: + +```````````````````````````````` example +foo + bar +. +<p>foo<br /> +bar</p> +```````````````````````````````` + + +```````````````````````````````` example +foo\ + bar +. +<p>foo<br /> +bar</p> +```````````````````````````````` + + +Hard line breaks can occur inside emphasis, links, and other constructs +that allow inline content: + +```````````````````````````````` example +*foo +bar* +. +<p><em>foo<br /> +bar</em></p> +```````````````````````````````` + + +```````````````````````````````` example +*foo\ +bar* +. +<p><em>foo<br /> +bar</em></p> +```````````````````````````````` + + +Hard line breaks do not occur inside code spans + +```````````````````````````````` example +`code +span` +. +<p><code>code span</code></p> +```````````````````````````````` + + +```````````````````````````````` example +`code\ +span` +. +<p><code>code\ span</code></p> +```````````````````````````````` + + +or HTML tags: + +```````````````````````````````` example +<a href="foo +bar"> +. +<p><a href="foo +bar"></p> +```````````````````````````````` + + +```````````````````````````````` example +<a href="foo\ +bar"> +. +<p><a href="foo\ +bar"></p> +```````````````````````````````` + + +Hard line breaks are for separating inline content within a block. +Neither syntax for hard line breaks works at the end of a paragraph or +other block element: + +```````````````````````````````` example +foo\ +. +<p>foo\</p> +```````````````````````````````` + + +```````````````````````````````` example +foo +. +<p>foo</p> +```````````````````````````````` + + +```````````````````````````````` example +### foo\ +. +<h3>foo\</h3> +```````````````````````````````` + + +```````````````````````````````` example +### foo +. +<h3>foo</h3> +```````````````````````````````` + + +## Soft line breaks + +A regular line ending (not in a code span or HTML tag) that is not +preceded by two or more spaces or a backslash is parsed as a +[softbreak](@). (A soft line break may be rendered in HTML either as a +[line ending] or as a space. The result will be the same in +browsers. In the examples here, a [line ending] will be used.) + +```````````````````````````````` example +foo +baz +. +<p>foo +baz</p> +```````````````````````````````` + + +Spaces at the end of the line and beginning of the next line are +removed: + +```````````````````````````````` example +foo + baz +. +<p>foo +baz</p> +```````````````````````````````` + + +A conforming parser may render a soft line break in HTML either as a +line ending or as a space. + +A renderer may also provide an option to render soft line breaks +as hard line breaks. + +## Textual content + +Any characters not given an interpretation by the above rules will +be parsed as plain textual content. + +```````````````````````````````` example +hello $.;'there +. +<p>hello $.;'there</p> +```````````````````````````````` + + +```````````````````````````````` example +Foo χρῆν +. +<p>Foo χρῆν</p> +```````````````````````````````` + + +Internal spaces are preserved verbatim: + +```````````````````````````````` example +Multiple spaces +. +<p>Multiple spaces</p> +```````````````````````````````` + + +<!-- END TESTS --> + +# Appendix: A parsing strategy + +In this appendix we describe some features of the parsing strategy +used in the CommonMark reference implementations. + +## Overview + +Parsing has two phases: + +1. In the first phase, lines of input are consumed and the block +structure of the document---its division into paragraphs, block quotes, +list items, and so on---is constructed. Text is assigned to these +blocks but not parsed. Link reference definitions are parsed and a +map of links is constructed. + +2. In the second phase, the raw text contents of paragraphs and headings +are parsed into sequences of Markdown inline elements (strings, +code spans, links, emphasis, and so on), using the map of link +references constructed in phase 1. + +At each point in processing, the document is represented as a tree of +**blocks**. The root of the tree is a `document` block. The `document` +may have any number of other blocks as **children**. These children +may, in turn, have other blocks as children. The last child of a block +is normally considered **open**, meaning that subsequent lines of input +can alter its contents. (Blocks that are not open are **closed**.) +Here, for example, is a possible document tree, with the open blocks +marked by arrows: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +``` + +## Phase 1: block structure + +Each line that is processed has an effect on this tree. The line is +analyzed and, depending on its contents, the document may be altered +in one or more of the following ways: + +1. One or more open blocks may be closed. +2. One or more new blocks may be created as children of the + last open block. +3. Text may be added to the last (deepest) open block remaining + on the tree. + +Once a line has been incorporated into the tree in this way, +it can be discarded, so input can be read in a stream. + +For each line, we follow this procedure: + +1. First we iterate through the open blocks, starting with the +root document, and descending through last children down to the last +open block. Each block imposes a condition that the line must satisfy +if the block is to remain open. For example, a block quote requires a +`>` character. A paragraph requires a non-blank line. +In this phase we may match all or just some of the open +blocks. But we cannot close unmatched blocks yet, because we may have a +[lazy continuation line]. + +2. Next, after consuming the continuation markers for existing +blocks, we look for new block starts (e.g. `>` for a block quote). +If we encounter a new block start, we close any blocks unmatched +in step 1 before creating the new block as a child of the last +matched container block. + +3. Finally, we look at the remainder of the line (after block +markers like `>`, list markers, and indentation have been consumed). +This is text that can be incorporated into the last open +block (a paragraph, code block, heading, or raw HTML). + +Setext headings are formed when we see a line of a paragraph +that is a [setext heading underline]. + +Reference link definitions are detected when a paragraph is closed; +the accumulated text lines are parsed to see if they begin with +one or more reference link definitions. Any remainder becomes a +normal paragraph. + +We can see how this works by considering how the tree above is +generated by four lines of Markdown: + +``` markdown +> Lorem ipsum dolor +sit amet. +> - Qui *quodsi iracundia* +> - aliquando id +``` + +At the outset, our document model is just + +``` tree +-> document +``` + +The first line of our text, + +``` markdown +> Lorem ipsum dolor +``` + +causes a `block_quote` block to be created as a child of our +open `document` block, and a `paragraph` block as a child of +the `block_quote`. Then the text is added to the last open +block, the `paragraph`: + +``` tree +-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor" +``` + +The next line, + +``` markdown +sit amet. +``` + +is a "lazy continuation" of the open `paragraph`, so it gets added +to the paragraph's text: + +``` tree +-> document + -> block_quote + -> paragraph + "Lorem ipsum dolor\nsit amet." +``` + +The third line, + +``` markdown +> - Qui *quodsi iracundia* +``` + +causes the `paragraph` block to be closed, and a new `list` block +opened as a child of the `block_quote`. A `list_item` is also +added as a child of the `list`, and a `paragraph` as a child of +the `list_item`. The text is then added to the new `paragraph`: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + -> list_item + -> paragraph + "Qui *quodsi iracundia*" +``` + +The fourth line, + +``` markdown +> - aliquando id +``` + +causes the `list_item` (and its child the `paragraph`) to be closed, +and a new `list_item` opened up as child of the `list`. A `paragraph` +is added as a child of the new `list_item`, to contain the text. +We thus obtain the final tree: + +``` tree +-> document + -> block_quote + paragraph + "Lorem ipsum dolor\nsit amet." + -> list (type=bullet tight=true bullet_char=-) + list_item + paragraph + "Qui *quodsi iracundia*" + -> list_item + -> paragraph + "aliquando id" +``` + +## Phase 2: inline structure + +Once all of the input has been parsed, all open blocks are closed. + +We then "walk the tree," visiting every node, and parse raw +string contents of paragraphs and headings as inlines. At this +point we have seen all the link reference definitions, so we can +resolve reference links as we go. + +``` tree +document + block_quote + paragraph + str "Lorem ipsum dolor" + softbreak + str "sit amet." + list (type=bullet tight=true bullet_char=-) + list_item + paragraph + str "Qui " + emph + str "quodsi iracundia" + list_item + paragraph + str "aliquando id" +``` + +Notice how the [line ending] in the first paragraph has +been parsed as a `softbreak`, and the asterisks in the first list item +have become an `emph`. + +### An algorithm for parsing nested emphasis and links + +By far the trickiest part of inline parsing is handling emphasis, +strong emphasis, links, and images. This is done using the following +algorithm. + +When we're parsing inlines and we hit either + +- a run of `*` or `_` characters, or +- a `[` or `![` + +we insert a text node with these symbols as its literal content, and we +add a pointer to this text node to the [delimiter stack](@). + +The [delimiter stack] is a doubly linked list. Each +element contains a pointer to a text node, plus information about + +- the type of delimiter (`[`, `![`, `*`, `_`) +- the number of delimiters, +- whether the delimiter is "active" (all are active to start), and +- whether the delimiter is a potential opener, a potential closer, + or both (which depends on what sort of characters precede + and follow the delimiters). + +When we hit a `]` character, we call the *look for link or image* +procedure (see below). + +When we hit the end of the input, we call the *process emphasis* +procedure (see below), with `stack_bottom` = NULL. + +#### *look for link or image* + +Starting at the top of the delimiter stack, we look backwards +through the stack for an opening `[` or `![` delimiter. + +- If we don't find one, we return a literal text node `]`. + +- If we do find one, but it's not *active*, we remove the inactive + delimiter from the stack, and return a literal text node `]`. + +- If we find one and it's active, then we parse ahead to see if + we have an inline link/image, reference link/image, compact reference + link/image, or shortcut reference link/image. + + + If we don't, then we remove the opening delimiter from the + delimiter stack and return a literal text node `]`. + + + If we do, then + + * We return a link or image node whose children are the inlines + after the text node pointed to by the opening delimiter. + + * We run *process emphasis* on these inlines, with the `[` opener + as `stack_bottom`. + + * We remove the opening delimiter. + + * If we have a link (and not an image), we also set all + `[` delimiters before the opening delimiter to *inactive*. (This + will prevent us from getting links within links.) + +#### *process emphasis* + +Parameter `stack_bottom` sets a lower bound to how far we +descend in the [delimiter stack]. If it is NULL, we can +go all the way to the bottom. Otherwise, we stop before +visiting `stack_bottom`. + +Let `current_position` point to the element on the [delimiter stack] +just above `stack_bottom` (or the first element if `stack_bottom` +is NULL). + +We keep track of the `openers_bottom` for each delimiter +type (`*`, `_`), indexed to the length of the closing delimiter run +(modulo 3) and to whether the closing delimiter can also be an +opener. Initialize this to `stack_bottom`. + +Then we repeat the following until we run out of potential +closers: + +- Move `current_position` forward in the delimiter stack (if needed) + until we find the first potential closer with delimiter `*` or `_`. + (This will be the potential closer closest + to the beginning of the input -- the first one in parse order.) + +- Now, look back in the stack (staying above `stack_bottom` and + the `openers_bottom` for this delimiter type) for the + first matching potential opener ("matching" means same delimiter). + +- If one is found: + + + Figure out whether we have emphasis or strong emphasis: + if both closer and opener spans have length >= 2, we have + strong, otherwise regular. + + + Insert an emph or strong emph node accordingly, after + the text node corresponding to the opener. + + + Remove any delimiters between the opener and closer from + the delimiter stack. + + + Remove 1 (for regular emph) or 2 (for strong emph) delimiters + from the opening and closing text nodes. If they become empty + as a result, remove them and remove the corresponding element + of the delimiter stack. If the closing node is removed, reset + `current_position` to the next element in the stack. + +- If none is found: + + + Set `openers_bottom` to the element before `current_position`. + (We know that there are no openers for this kind of closer up to and + including this point, so this puts a lower bound on future searches.) + + + If the closer at `current_position` is not a potential opener, + remove it from the delimiter stack (since we know it can't + be a closer either). + + + Advance `current_position` to the next element in the stack. + +After we're done, we remove all delimiters above `stack_bottom` from the +delimiter stack. diff --git a/tdemarkdown/md4c/test/spec_tests.py b/tdemarkdown/md4c/test/spec_tests.py new file mode 100755 index 000000000..c739e5f9a --- /dev/null +++ b/tdemarkdown/md4c/test/spec_tests.py @@ -0,0 +1,144 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- + +import sys +from difflib import unified_diff +import argparse +import re +import json +from cmark import CMark +from normalize import normalize_html + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description='Run cmark tests.') + parser.add_argument('-p', '--program', dest='program', nargs='?', default=None, + help='program to test') + parser.add_argument('-s', '--spec', dest='spec', nargs='?', default='spec.txt', + help='path to spec') + parser.add_argument('-P', '--pattern', dest='pattern', nargs='?', + default=None, help='limit to sections matching regex pattern') + parser.add_argument('--library-dir', dest='library_dir', nargs='?', + default=None, help='directory containing dynamic library') + parser.add_argument('--no-normalize', dest='normalize', + action='store_const', const=False, default=True, + help='do not normalize HTML') + parser.add_argument('-d', '--dump-tests', dest='dump_tests', + action='store_const', const=True, default=False, + help='dump tests in JSON format') + parser.add_argument('--debug-normalization', dest='debug_normalization', + action='store_const', const=True, + default=False, help='filter stdin through normalizer for testing') + parser.add_argument('-n', '--number', type=int, default=None, + help='only consider the test with the given number') + args = parser.parse_args(sys.argv[1:]) + +def out(str): + sys.stdout.buffer.write(str.encode('utf-8')) + +def print_test_header(headertext, example_number, start_line, end_line): + out("Example %d (lines %d-%d) %s\n" % (example_number,start_line,end_line,headertext)) + +def do_test(test, normalize, result_counts): + [retcode, actual_html, err] = cmark.to_html(test['markdown']) + if retcode == 0: + expected_html = test['html'] + unicode_error = None + if normalize: + try: + passed = normalize_html(actual_html) == normalize_html(expected_html) + except UnicodeDecodeError as e: + unicode_error = e + passed = False + else: + passed = actual_html == expected_html + if passed: + result_counts['pass'] += 1 + else: + print_test_header(test['section'], test['example'], test['start_line'], test['end_line']) + out(test['markdown'] + '\n') + if unicode_error: + out("Unicode error: " + str(unicode_error) + '\n') + out("Expected: " + repr(expected_html) + '\n') + out("Got: " + repr(actual_html) + '\n') + else: + expected_html_lines = expected_html.splitlines(True) + actual_html_lines = actual_html.splitlines(True) + for diffline in unified_diff(expected_html_lines, actual_html_lines, + "expected HTML", "actual HTML"): + out(diffline) + out('\n') + result_counts['fail'] += 1 + else: + print_test_header(test['section'], test['example'], test['start_line'], test['end_line']) + out("program returned error code %d\n" % retcode) + sys.stdout.buffer.write(err) + result_counts['error'] += 1 + +def get_tests(specfile): + line_number = 0 + start_line = 0 + end_line = 0 + example_number = 0 + markdown_lines = [] + html_lines = [] + state = 0 # 0 regular text, 1 markdown example, 2 html output + headertext = '' + tests = [] + + header_re = re.compile('#+ ') + + with open(specfile, 'r', encoding='utf-8', newline='\n') as specf: + for line in specf: + line_number = line_number + 1 + l = line.strip() + #if l == "`" * 32 + " example": + if re.match("`{32} example( [a-z]{1,})?", l): + state = 1 + elif state == 2 and l == "`" * 32: + state = 0 + example_number = example_number + 1 + end_line = line_number + tests.append({ + "markdown":''.join(markdown_lines).replace('→',"\t"), + "html":''.join(html_lines).replace('→',"\t"), + "example": example_number, + "start_line": start_line, + "end_line": end_line, + "section": headertext}) + start_line = 0 + markdown_lines = [] + html_lines = [] + elif l == ".": + state = 2 + elif state == 1: + if start_line == 0: + start_line = line_number - 1 + markdown_lines.append(line) + elif state == 2: + html_lines.append(line) + elif state == 0 and re.match(header_re, line): + headertext = header_re.sub('', line).strip() + return tests + +if __name__ == "__main__": + if args.debug_normalization: + out(normalize_html(sys.stdin.read())) + exit(0) + + all_tests = get_tests(args.spec) + if args.pattern: + pattern_re = re.compile(args.pattern, re.IGNORECASE) + else: + pattern_re = re.compile('.') + tests = [ test for test in all_tests if re.search(pattern_re, test['section']) and (not args.number or test['example'] == args.number) ] + if args.dump_tests: + out(json.dumps(tests, ensure_ascii=False, indent=2)) + exit(0) + else: + skipped = len(all_tests) - len(tests) + cmark = CMark(prog=args.program, library_dir=args.library_dir) + result_counts = {'pass': 0, 'fail': 0, 'error': 0, 'skip': skipped} + for test in tests: + do_test(test, args.normalize, result_counts) + out("{pass} passed, {fail} failed, {error} errored, {skip} skipped\n".format(**result_counts)) + exit(result_counts['fail'] + result_counts['error']) diff --git a/tdemarkdown/md4c/test/strikethrough.txt b/tdemarkdown/md4c/test/strikethrough.txt new file mode 100644 index 000000000..884ce5983 --- /dev/null +++ b/tdemarkdown/md4c/test/strikethrough.txt @@ -0,0 +1,75 @@ + +# Strike-Through + +With the flag `MD_FLAG_STRIKETHROUGH`, MD4C enables extension for recognition +of strike-through spans. + +Strike-through text is any text wrapped in one or two tildes (`~`). + +```````````````````````````````` example +~Hi~ Hello, world! +. +<p><del>Hi</del> Hello, world!</p> +```````````````````````````````` + +If the length of the opener and closer doesn't match, the strike-through is +not recognized. + +```````````````````````````````` example +This ~text~~ is curious. +. +<p>This ~text~~ is curious.</p> +```````````````````````````````` + +Too long tilde sequence won't be recognized: + +```````````````````````````````` example +foo ~~~bar~~~ +. +<p>foo ~~~bar~~~</p> +```````````````````````````````` + +Also note the markers cannot open a strike-through span if they are followed +with a whitespace; and similarly, then cannot close the span if they are +preceded with a whitespace: + +```````````````````````````````` example +~foo ~bar +. +<p>~foo ~bar</p> +```````````````````````````````` + + +As with regular emphasis delimiters, a new paragraph will cause the cessation +of parsing a strike-through: + +```````````````````````````````` example +This ~~has a + +new paragraph~~. +. +<p>This ~~has a</p> +<p>new paragraph~~.</p> +```````````````````````````````` + + +## GitHub Issues + +### [Issue 69](https://github.com/mity/md4c/issues/69) +```````````````````````````````` example +~`foo`~ +. +<p><del><code>foo</code></del></p> +```````````````````````````````` + +```````````````````````````````` example +~*foo*~ +. +<p><del><em>foo</em></del></p> +```````````````````````````````` + +```````````````````````````````` example +*~foo~* +. +<p><em><del>foo</del></em></p> +```````````````````````````````` diff --git a/tdemarkdown/md4c/test/tables.txt b/tdemarkdown/md4c/test/tables.txt new file mode 100644 index 000000000..b220f6685 --- /dev/null +++ b/tdemarkdown/md4c/test/tables.txt @@ -0,0 +1,357 @@ + +# Tables + +With the flag `MD_FLAG_TABLES`, MD4C enables extension for recognition of +tables. + +Basic table example of a table with two columns and three lines (when not +counting the header) is as follows: + +```````````````````````````````` example +| Column 1 | Column 2 | +|----------|----------| +| foo | bar | +| baz | qux | +| quux | quuz | +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td>foo</td><td>bar</td></tr> +<tr><td>baz</td><td>qux</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +</tbody> +</table> +```````````````````````````````` + +The leading and succeeding pipe characters (`|`) on each line are optional: + +```````````````````````````````` example +Column 1 | Column 2 | +---------|--------- | +foo | bar | +baz | qux | +quux | quuz | +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td>foo</td><td>bar</td></tr> +<tr><td>baz</td><td>qux</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +</tbody> +</table> +```````````````````````````````` + +```````````````````````````````` example +| Column 1 | Column 2 +|----------|--------- +| foo | bar +| baz | qux +| quux | quuz +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td>foo</td><td>bar</td></tr> +<tr><td>baz</td><td>qux</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +</tbody> +</table> +```````````````````````````````` + +```````````````````````````````` example +Column 1 | Column 2 +---------|--------- +foo | bar +baz | qux +quux | quuz +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td>foo</td><td>bar</td></tr> +<tr><td>baz</td><td>qux</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +</tbody> +</table> +```````````````````````````````` + +However for one-column table, at least one pipe has to be used in the table +header underline, otherwise it would be parsed as a Setext title followed by +a paragraph. + +```````````````````````````````` example +Column 1 +-------- +foo +baz +quux +. +<h2>Column 1</h2> +<p>foo +baz +quux</p> +```````````````````````````````` + +Leading and trailing whitespace in a table cell is ignored and the columns do +not need to be aligned. + +```````````````````````````````` example +Column 1 |Column 2 +---|--- +foo | bar +baz| qux +quux|quuz +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td>foo</td><td>bar</td></tr> +<tr><td>baz</td><td>qux</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +</tbody> +</table> +```````````````````````````````` + +The table cannot interrupt a paragraph. + +```````````````````````````````` example +Lorem ipsum dolor sit amet. +| Column 1 | Column 2 +| ---------|--------- +| foo | bar +| baz | qux +| quux | quuz +. +<p>Lorem ipsum dolor sit amet. +| Column 1 | Column 2 +| ---------|--------- +| foo | bar +| baz | qux +| quux | quuz</p> +```````````````````````````````` + +Similarly, paragraph cannot interrupt a table: + +```````````````````````````````` example +Column 1 | Column 2 +---------|--------- +foo | bar +baz | qux +quux | quuz +Lorem ipsum dolor sit amet. +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td>foo</td><td>bar</td></tr> +<tr><td>baz</td><td>qux</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +<tr><td>Lorem ipsum dolor sit amet.</td><td></td></tr> +</tbody> +</table> +```````````````````````````````` + +The first, the last or both the first and the last dash in each column +underline can be replaced with a colon (`:`) to request left, right or middle +alignment of the respective column: + +```````````````````````````````` example +| Column 1 | Column 2 | Column 3 | Column 4 | +|----------|:---------|:--------:|---------:| +| default | left | center | right | +. +<table> +<thead> +<tr><th>Column 1</th><th align="left">Column 2</th><th align="center">Column 3</th><th align="right">Column 4</th></tr> +</thead> +<tbody> +<tr><td>default</td><td align="left">left</td><td align="center">center</td><td align="right">right</td></tr> +</tbody> +</table> +```````````````````````````````` + +To include a literal pipe character in any cell, it has to be escaped. + +```````````````````````````````` example +Column 1 | Column 2 +---------|--------- +foo | bar +baz | qux \| xyzzy +quux | quuz +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td>foo</td><td>bar</td></tr> +<tr><td>baz</td><td>qux | xyzzy</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +</tbody> +</table> +```````````````````````````````` + +Contents of each cell is parsed as an inline text which may contents any +inline Markdown spans like emphasis, strong emphasis, links etc. + +```````````````````````````````` example +Column 1 | Column 2 +---------|--------- +*foo* | bar +**baz** | [qux] +quux | [quuz](/url2) + +[qux]: /url +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td><em>foo</em></td><td>bar</td></tr> +<tr><td><strong>baz</strong></td><td><a href="/url">qux</a></td></tr> +<tr><td>quux</td><td><a href="/url2">quuz</a></td></tr> +</tbody> +</table> +```````````````````````````````` + +However pipes which are inside a code span are not recognized as cell +boundaries. + +```````````````````````````````` example +Column 1 | Column 2 +---------|--------- +`foo | bar` +baz | qux +quux | quuz +. +<table> +<thead> +<tr><th>Column 1</th><th>Column 2</th></tr> +</thead> +<tbody> +<tr><td><code>foo | bar</code></td><td></td></tr> +<tr><td>baz</td><td>qux</td></tr> +<tr><td>quux</td><td>quuz</td></tr> +</tbody> +</table> +```````````````````````````````` + + +## GitHub Issues + +### [Issue 41](https://github.com/mity/md4c/issues/41) +```````````````````````````````` example +* x|x +---|--- +. +<ul> +<li>x|x +---|---</li> +</ul> +```````````````````````````````` +(Not a table, because the underline has wrong indentation and is not part of the +list item.) + +```````````````````````````````` example +* x|x + ---|--- +x|x +. +<ul> +<li><table> +<thead> +<tr> +<th>x</th> +<th>x</th> +</tr> +</thead> +</table> +</li> +</ul> +<p>x|x</p> +```````````````````````````````` +(Here the underline has the right indentation so the table is detected. +But the last line is not part of it due its indentation.) + + +### [Issue 42](https://github.com/mity/md4c/issues/42) + +```````````````````````````````` example +] http://x.x *x* + +|x|x| +|---|---| +|x| +. +<p>] http://x.x <em>x</em></p> +<table> +<thead> +<tr> +<th>x</th> +<th>x</th> +</tr> +</thead> +<tbody> +<tr> +<td>x</td> +<td></td> +</tr> +</tbody> +</table> +```````````````````````````````` + + +### [Issue 104](https://github.com/mity/md4c/issues/104) + +```````````````````````````````` example +A | B +--- | --- +[x](url) +. +<table> +<thead> +<tr> +<th>A</th> +<th>B</th> +</tr> +</thead> +<tbody> +<tr> +<td><a href="url">x</a></td> +<td></td> +</tr> +</tbody> +</table> +```````````````````````````````` + + +### [Issue 138](https://github.com/mity/md4c/issues/138) + +```````````````````````````````` example +| abc | def | +| --- | --- | +. +<table> +<thead> +<tr> +<th>abc</th> +<th>def</th> +</tr> +</thead> +</table> +```````````````````````````````` diff --git a/tdemarkdown/md4c/test/tasklists.txt b/tdemarkdown/md4c/test/tasklists.txt new file mode 100644 index 000000000..aae1bf8eb --- /dev/null +++ b/tdemarkdown/md4c/test/tasklists.txt @@ -0,0 +1,117 @@ + +# Tasklists + +With the flag `MD_FLAG_TASKLISTS`, MD4C enables extension for recognition of +task lists. + +Basic task list may look as follows: + +```````````````````````````````` example + * [x] foo + * [X] bar + * [ ] baz +. +<ul> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li> +</ul> +```````````````````````````````` + +Task lists can also be in ordered lists: + +```````````````````````````````` example + 1. [x] foo + 2. [X] bar + 3. [ ] baz +. +<ol> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li> +</ol> +```````````````````````````````` + +Task lists can also be nested in ordinary lists: + +```````````````````````````````` example + * xxx: + * [x] foo + * [x] bar + * [ ] baz + * yyy: + * [ ] qux + * [x] quux + * [ ] quuz +. +<ul> +<li>xxx: +<ul> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li> +</ul></li> +<li>yyy: +<ul> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li> +</ul></li> +</ul> +```````````````````````````````` + +Or in a parent task list: + +```````````````````````````````` example + 1. [x] xxx: + * [x] foo + * [x] bar + * [ ] baz + 2. [ ] yyy: + * [ ] qux + * [x] quux + * [ ] quuz +. +<ol> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx: +<ul> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>foo</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>bar</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>baz</li> +</ul></li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy: +<ul> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>qux</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>quux</li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>quuz</li> +</ul></li> +</ol> +```````````````````````````````` + +Also, ordinary lists can be nested in the task lists. + +```````````````````````````````` example + * [x] xxx: + * foo + * bar + * baz + * [ ] yyy: + * qux + * quux + * quuz +. +<ul> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled checked>xxx: +<ul> +<li>foo</li> +<li>bar</li> +<li>baz</li> +</ul></li> +<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled>yyy: +<ul> +<li>qux</li> +<li>quux</li> +<li>quuz</li> +</ul></li> +</ul> +```````````````````````````````` diff --git a/tdemarkdown/md4c/test/underline.txt b/tdemarkdown/md4c/test/underline.txt new file mode 100644 index 000000000..289e97fa1 --- /dev/null +++ b/tdemarkdown/md4c/test/underline.txt @@ -0,0 +1,39 @@ + +# Underline + +With the flag `MD_FLAG_UNDERLINE`, MD4C sees underscore `_` rather as a mark +denoting an underlined span rather than an ordinary emphasis (or a strong +emphasis). + +```````````````````````````````` example +_foo_ +. +<p><u>foo</u></p> +```````````````````````````````` + +In sequences of multiple underscores, each single one translates into an +underline span mark. + +```````````````````````````````` example +___foo___ +. +<p><u><u><u>foo</u></u></u></p> +```````````````````````````````` + +Intra-word underscores are not recognized as underline marks: + +```````````````````````````````` example +foo_bar_baz +. +<p>foo_bar_baz</p> +```````````````````````````````` + +Also the parser follows the standard understanding when the underscore can +or cannot open or close a span. Therefore there is no underline in the following +example because no underline can be seen as a closing mark. + +```````````````````````````````` example +_foo _bar +. +<p>_foo _bar</p> +```````````````````````````````` diff --git a/tdemarkdown/md4c/test/wiki-links.txt b/tdemarkdown/md4c/test/wiki-links.txt new file mode 100644 index 000000000..00d394e7f --- /dev/null +++ b/tdemarkdown/md4c/test/wiki-links.txt @@ -0,0 +1,232 @@ + +# Wiki Links + +With the flag `MD_FLAG_WIKILINKS`, MD4C recognizes wiki links. + +The simple wiki-link is a wiki-link destination enclosed in `[[` followed with +`]]`. + +```````````````````````````````` example +[[foo]] +. +<p><x-wikilink data-target="foo">foo</x-wikilink></p> +```````````````````````````````` + +However wiki-link may contain an explicit label, delimited from the destination +with `|`. + +```````````````````````````````` example +[[foo|bar]] +. +<p><x-wikilink data-target="foo">bar</x-wikilink></p> +```````````````````````````````` + +A wiki-link destination cannot be empty. + +```````````````````````````````` example +[[]] +. +<p>[[]]</p> +```````````````````````````````` + +```````````````````````````````` example +[[|foo]] +. +<p>[[|foo]]</p> +```````````````````````````````` + + +The wiki-link destination cannot contain a new line. + +```````````````````````````````` example +[[foo +bar]] +. +<p>[[foo +bar]]</p> +```````````````````````````````` + +```````````````````````````````` example +[[foo +bar|baz]] +. +<p>[[foo +bar|baz]]</p> +```````````````````````````````` + +The wiki-link destination is rendered verbatim; inline markup in it is not +recognized. + +```````````````````````````````` example +[[*foo*]] +. +<p><x-wikilink data-target="*foo*">*foo*</x-wikilink></p> +```````````````````````````````` + +```````````````````````````````` example +[[foo|![bar](bar.jpg)]] +. +<p><x-wikilink data-target="foo"><img src="bar.jpg" alt="bar"></x-wikilink></p> +```````````````````````````````` + +With multiple `|` delimiters, only the first one is recognized and the other +ones are part of the label. + +```````````````````````````````` example +[[foo|bar|baz]] +. +<p><x-wikilink data-target="foo">bar|baz</x-wikilink></p> +```````````````````````````````` + +However the delimiter `|` can be escaped with `/`. + +```````````````````````````````` example +[[foo\|bar|baz]] +. +<p><x-wikilink data-target="foo|bar">baz</x-wikilink></p> +```````````````````````````````` + +The label can contain inline elements. + +```````````````````````````````` example +[[foo|*bar*]] +. +<p><x-wikilink data-target="foo"><em>bar</em></x-wikilink></p> +```````````````````````````````` + +Empty explicit label is the same as using the implicit label; i.e. the verbatim +destination string is used as the label. + +```````````````````````````````` example +[[foo|]] +. +<p><x-wikilink data-target="foo">foo</x-wikilink></p> +```````````````````````````````` + +The label can span multiple lines. + +```````````````````````````````` example +[[foo|foo +bar +baz]] +. +<p><x-wikilink data-target="foo">foo +bar +baz</x-wikilink></p> +```````````````````````````````` + +Wiki-links have higher priority than links. + +```````````````````````````````` example +[[foo]](foo.jpg) +. +<p><x-wikilink data-target="foo">foo</x-wikilink>(foo.jpg)</p> +```````````````````````````````` + +```````````````````````````````` example +[foo]: /url + +[[foo]] +. +<p><x-wikilink data-target="foo">foo</x-wikilink></p> +```````````````````````````````` + +Wiki links can be inlined in tables. + +```````````````````````````````` example +| A | B | +|------------------|-----| +| [[foo|*bar*]] | baz | +. +<table> +<thead> +<tr> +<th>A</th> +<th>B</th> +</tr> +</thead> +<tbody> +<tr> +<td><x-wikilink data-target="foo"><em>bar</em></x-wikilink></td> +<td>baz</td> +</tr> +</tbody> +</table> +```````````````````````````````` + +Wiki-links are not prioritized over images. + +```````````````````````````````` example +![[foo]](foo.jpg) +. +<p><img src="foo.jpg" alt="[foo]"></p> +```````````````````````````````` + +Something that may look like a wiki-link at first, but turns out not to be, +is recognized as a normal link. + +```````````````````````````````` example +[[foo] + +[foo]: /url +. +<p>[<a href="/url">foo</a></p> +```````````````````````````````` + +Escaping the opening `[` escapes only that one character, not the whole `[[` +opener: + +```````````````````````````````` example +\[[foo]] + +[foo]: /url +. +<p>[<a href="/url">foo</a>]</p> +```````````````````````````````` + +Like with other inline links, the innermost wiki-link is preferred. + +```````````````````````````````` example +[[foo[[bar]]]] +. +<p>[[foo<x-wikilink data-target="bar">bar</x-wikilink>]]</p> +```````````````````````````````` + +There is limit of 100 characters for the wiki-link destination. + +```````````````````````````````` example +[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]] +[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]] +. +<p>[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901]] +[[12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901|foo]]</p> +```````````````````````````````` + +100 characters inside a wiki link target works. + +```````````````````````````````` example +[[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890]] +[[1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890|foo]] +. +<p><x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890</x-wikilink> +<x-wikilink data-target="1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890">foo</x-wikilink></p> +```````````````````````````````` + +The limit on link content does not include any characters belonging to a block +quote, if the label spans multiple lines contained in a block quote. + +```````````````````````````````` example +> [[12345678901234567890123456789012345678901234567890|1234567890 +> 1234567890 +> 1234567890 +> 1234567890 +> 123456789]] +. +<blockquote> +<p><x-wikilink data-target="12345678901234567890123456789012345678901234567890">1234567890 +1234567890 +1234567890 +1234567890 +123456789</x-wikilink></p> +</blockquote> +```````````````````````````````` |