Opened 2 years ago
Closed 22 months ago
#28490 closed enhancement (fixed)
Let Sage honour PEP 515 (py3)
Reported by:  tmonteil  Owned by:  

Priority:  major  Milestone:  sage9.1 
Component:  python3  Keywords:  preparser, py3 
Cc:  slelievre  Merged in:  
Authors:  Vincent Delecroix, John Palmieri  Reviewers:  Thierry Monteil, Vincent Delecroix, John Palmieri 
Report Upstream:  N/A  Work issues:  
Branch:  4c0787d (Commits, GitHub, GitLab)  Commit:  4c0787ddabd93c2b79d652131503685c98d12888 
Dependencies:  Stopgaps: 
Description (last modified by )
In Python 3, underscored numbers like 10_000_000.0
are allowed, but the preparser, as well as integer and real number constructors, do not handle them yet, see this Ask Sage question and PEP 515.
Change History (45)
comment:1 Changed 2 years ago by
 Cc slelievre added
 Description modified (diff)
 Keywords preparser py3 added
comment:2 Changed 2 years ago by
 Branch set to u/jhpalmieri/preparse_underscore
comment:3 Changed 2 years ago by
 Commit set to 3601e61b65ce7c67e272e4d16be6a555318a6351
 Status changed from new to needs_review
comment:4 Changed 2 years ago by
 Commit changed from 3601e61b65ce7c67e272e4d16be6a555318a6351 to cfcdc57ecce11491d8428c50a5ecfc4beea5663a
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
cfcdc57  trac 28490: preparse underscores in numeric literals.

comment:5 Changed 2 years ago by
 Reviewers set to Thierry Monteil
 Status changed from needs_review to needs_work
In my understanding of the PEP (confirmed by testing on a ipython3 console), trailing underscores are not allowed, however we currently have:
sage: 100_000_ 100000
Also, perhaps could you show the feature in action with a concrete example, something like:
The preparser can handle PEP 515 (see :trac:`28490`):: sage: 1_000_000 + 3_000 1003000
comment:6 Changed 2 years ago by
 Commit changed from cfcdc57ecce11491d8428c50a5ecfc4beea5663a to 61f06181e04f3c1266b22cd502bf223850df0ac0
Branch pushed to git repo; I updated commit sha1. New commits:
61f0618  trac 28490: do not allow trailing underscores

comment:7 Changed 2 years ago by
 Commit changed from 61f06181e04f3c1266b22cd502bf223850df0ac0 to a26c28960fd2c67b2415ed3bd75f9acac42d5a76
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
a26c289  trac 28490: do not allow trailing underscores

comment:8 Changed 2 years ago by
 Description modified (diff)
 Status changed from needs_work to needs_review
Okay, trailing underscores now should fail, and I added the doctest.
comment:9 Changed 2 years ago by
The patchbots seem unhappy, apparently the doctests do not behave the same there.
comment:10 Changed 2 years ago by
The doctests pass with Python 3, not with Python 2. I guess there is a question: using underscores is only valid starting in Python. Do we want to allow this in Sage built with Python 2?
Also, a leading zero is valid in Python 2 but not in Python 3. What should we do about that?
comment:11 Changed 2 years ago by
 Commit changed from a26c28960fd2c67b2415ed3bd75f9acac42d5a76 to 2fe807bad14c00b3c27421c8765893f0de56e1f2
Branch pushed to git repo; I updated commit sha1. New commits:
2fe807b  trac 28490: fix py2 behavior and doctests

comment:12 Changed 2 years ago by
I propose to allow underscores with Python 2.
comment:13 Changed 2 years ago by
My personal opinion:
 write a simple code with Python 3 in mind only, to avoid cumbersome code, since Python2 will be dropped soon anyway (an i guess nobody using Python 2 will complain that 1000_000 should not be accepted)
 use
# py2
and# py3
in doctests to make patchbots happy.
comment:14 Changed 2 years ago by
That's mostly what I did, except that with my code and Sage + Python 2, leading zeroes are still allowed. Aside from the # py2
and # py3
doctest tags, the only thing I've added to support Python 2 is to insert if (six.PY3 and ...)
instead of if (...)
. (This is to deal with leading zeroes.)
comment:15 Changed 2 years ago by
ping?
comment:16 Changed 2 years ago by
 Milestone changed from sage8.9 to sage9.1
Ticket retargeted after milestone closed
comment:17 followup: ↓ 19 Changed 22 months ago by
Could you factorize
# Multiple consecutive underscores is invalid Python # syntax. Do not preparse. if num.find('__') != 1: continue # Trailing underscore is invalid Python syntax. Do not # preparse. if num.endswith('_'): continue
into
# Multiple consecutive underscores or trailing underscore # is invalid Python syntax. Do not preparse. if num.find('__') != 1 or num.endswith('_'): continue
comment:18 Changed 22 months ago by
 Commit changed from 2fe807bad14c00b3c27421c8765893f0de56e1f2 to 8d22780cadd555599c2f50ec74c92274f668ccbc
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
9c3ba0c  trac 28490: preparse underscores in numeric literals.

ab0c26e  trac 28490: do not allow trailing underscores

172ea24  trac 28490: fix py2 behavior and doctests

8d22780  trac 28490: combine two "if" statements into one.

comment:19 in reply to: ↑ 17 Changed 22 months ago by
Replying to vdelecroix:
Could you factorize
# Multiple consecutive underscores is invalid Python # syntax. Do not preparse. if num.find('__') != 1: continue # Trailing underscore is invalid Python syntax. Do not # preparse. if num.endswith('_'): continueinto
# Multiple consecutive underscores or trailing underscore # is invalid Python syntax. Do not preparse. if num.find('__') != 1 or num.endswith('_'): continue
Done.
comment:20 followup: ↓ 25 Changed 22 months ago by
Why would you change 123_456
to 123456
? This works perfectly fine
sage: preparser(False) sage: Integer(123_456) 123456 sage: preparser(True)
The only problem seems to be that .0
is too often converted to .gen(0)
sage: preparse("100_000.0") '100_000.gen(0)'
comment:21 Changed 22 months ago by
Couldn't you simply change dec_num
to something like dec_num = r"\d+(_\d+)*"
?
comment:22 Changed 22 months ago by
 Status changed from needs_review to needs_work
And it is not the role of the Sage preparser to implement PEP requests.
comment:23 Changed 22 months ago by
In other words: it is good to make strings with underscored be recognized as numbers. But do not modify the input string beyond what is strictly necessary.
comment:24 Changed 22 months ago by
I put a branch at public/28490
which contains (I hope) the correct regexps. I also added more doctests for floating point numbers.
Of course Integer('1_23')
and RealNumber('1_1.')
are broken. But this is IMHO a distinct problem.
comment:25 in reply to: ↑ 20 ; followup: ↓ 27 Changed 22 months ago by
Replying to vdelecroix:
Why would you change
123_456
to123456
? This works perfectly fine
5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.
And it is not the role of the Sage preparser to implement PEP requests.
It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like 100_000.0
, is interpreted properly in Sage?
comment:26 Changed 22 months ago by
By the way, I removed dec_num
because it was redundant: covered (I think) by the other regexps.
comment:27 in reply to: ↑ 25 Changed 22 months ago by
Replying to jhpalmieri:
Replying to vdelecroix:
Why would you change
123_456
to123456
? This works perfectly fine5 months ago, when the ticket was opened, Python 2 was the default for Sage, and all of these things were completely broken with Python 2. Should this ticket provide support for Python 2? I haven't looked at your changes, so I don't know if they do.
Python 2 is supposed to be dropped in sage9.2. It should be supported for the next 9.1 release.
And it is not the role of the Sage preparser to implement PEP requests.
It's not a request anymore, it's been merged into Python. Whose role is it to make sure that valid Python 3 syntax, like
100_000.0
, is interpreted properly in Sage?
Agreed. Did you have a look at public/28490
? It does not touch to the interpreter beyond the update of regexp.
comment:28 followup: ↓ 30 Changed 22 months ago by
I looked at your branch. It doesn't work with Python 2, for what that's worth:
sage: 123_456 File "<ipythoninput1d4d70e3ee054>", line 1 Integer(123_456) ^ SyntaxError: invalid syntax
I also get a few doctest failures with Python 3, but they should be easy to fix.
You can delete dec_num
since it is redundant.

src/sage/repl/preparse.py
diff git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py index a699394d1b..9b877285e9 100644
a b def preparse_numeric_literals(code, extract=False): 830 830 831 831 global all_num_regex 832 832 if all_num_regex is None: 833 dec_num = r"\b\d+(_\d+)*"834 833 hex_num = r"\b0x[09af]+(_[09af]+)*" 835 834 oct_num = r"\b0o[07]+(_[07]+)*" 836 835 bin_num = r"\b0b[01]+(_[01]+)*" 837 836 # This is slightly annoying as floating point numbers may start 838 837 # with a decimal point, but if they do the \b will not match. 839 838 float_num = r"((\b\d+(_\d+)*([.](\d+(_\d+)*)?)?)([.]\d+(_\d+)*))(e[+]?\d+(_\d+)*)?" 840 all_num = r"((%s)(%s)(%s)(%s) (%s))(rjrLjrLrjLr)\b" % (hex_num, oct_num, bin_num, float_num, dec_num)839 all_num = r"((%s)(%s)(%s)(%s))(rjrLjrLrjLr)\b" % (hex_num, oct_num, bin_num, float_num) 841 840 all_num_regex = re.compile(all_num, re.I) 842 841 843 842 for m in all_num_regex.finditer(code):
The preparser is still broken with your branch:
sage: preparse('100_00.0') "RealNumber('100_00.gen(0)')"
We need to fix this and add a doctest.
comment:29 followup: ↓ 31 Changed 22 months ago by
First, with this change:

src/sage/repl/preparse.py
diff git a/src/sage/repl/preparse.py b/src/sage/repl/preparse.py index a699394d1b..dd798f06b7 100644
a b def preparse(line, reset=True, do_time=False, ignore_prompts=False, 1300 1300 1301 1301 # Generators 1302 1302 # R.0 > R.gen(0) 1303 L = re.sub(r'([ _azAZ]\w*[)\]])\.(\d+)', r'\1.gen(\2)', L)1303 L = re.sub(r'([azAZ]\w*[)\]])\.(\d+)', r'\1.gen(\2)', L) 1304 1304 1305 1305 # Use ^ for exponentiation and ^^ for xor 1306 1306 # (A side effect is that **** becomes xor as well.)
preparse(100_000.0)
works, but 100_000.0
does not:
sage: preparse('100_000.0') "RealNumber('100_000.0')" sage: 100_000.0  TypeError Traceback (most recent call last) ... TypeError: unable to convert '100_000.0' to a real number
So that's some progress. Then this change fixes the conversion to RealNumber
:

src/sage/rings/real_mpfr.pyx
diff git a/src/sage/rings/real_mpfr.pyx b/src/sage/rings/real_mpfr.pyx index 26cfddc0a2..8ec11ae38f 100644
a b cdef class RealNumber(sage.structure.element.RingElement): 1482 1482 mpfr_set_z(self.value, (<gmpy2.mpz>x).z, parent.rnd) 1483 1483 else: 1484 1484 s = str(x).replace(' ','') 1485 s = str(x).replace('_','') 1485 1486 s_lower = s.lower() 1486 1487 if s_lower == 'infinity': 1487 1488 raise ValueError('can only convert signed infinity to RR')
comment:30 in reply to: ↑ 28 Changed 22 months ago by
Replying to jhpalmieri:
I looked at your branch. It doesn't work with Python 2, for what that's worth:
sage: 123_456 File "<ipythoninput1d4d70e3ee054>", line 1 Integer(123_456) ^ SyntaxError: invalid syntax
It does work with Python2 as the above SyntaxError
is the expected behavior
$ python2 c '123_456' File "<string>", line 1 123_456 ^ SyntaxError: invalid syntax
I don't see why we would change that in Sage.
comment:31 in reply to: ↑ 29 ; followup: ↓ 32 Changed 22 months ago by
Replying to jhpalmieri:
First, with this change:
<SNIP>
preparse(100_000.0)
works, but100_000.0
does not:<SNIP>So that's some progress.
Nice! To my mind, this is the end. RealNumber
should accept this kind of strings as float
does.
sage: float('1_1.1_1e1_1') 1111000000000.0
It is not the job of the preparser to modify the input string.
Then this change fixes the conversion to
RealNumber
:<SNIP>
Which is precisely what I don't wnat to do here!
comment:32 in reply to: ↑ 31 ; followup: ↓ 33 Changed 22 months ago by
Replying to vdelecroix:
Then this change fixes the conversion to
RealNumber
:<SNIP>Which is precisely what I don't wnat to do here!
Why not? This is in the __init__
method for RealNumber
, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?
comment:33 in reply to: ↑ 32 Changed 22 months ago by
Replying to jhpalmieri:
Replying to vdelecroix:
Then this change fixes the conversion to
RealNumber
:<SNIP>Which is precisely what I don't wnat to do here!
Why not? This is in the
__init__
method forRealNumber
, where the string is already modified, as you can see from the surrounding lines. It's not in the preparser. Where else would you want to fix it?
Oups. I misread. It is fine modifying RealNumber
. But the title and description has to be modified as it is not only about touching the preparser. Also Integer
and Rational
constructors have to be fixed.
comment:34 Changed 22 months ago by
Your 'fix' from the first part of comment:29 was not one... I will update the branch in a second.
comment:35 Changed 22 months ago by
 Branch changed from u/jhpalmieri/preparse_underscore to public/28490
 Commit changed from 8d22780cadd555599c2f50ec74c92274f668ccbc to 604d18debfdbb45e69258e61f847b15d01d0440e
New commits:
604d18d  28490: let preparser handle correctly _ in numbers

comment:36 Changed 22 months ago by
 Commit changed from 604d18debfdbb45e69258e61f847b15d01d0440e to d40ba4464cb4e8e2f202985f7cfe6623556fe99a
comment:37 Changed 22 months ago by
 Commit changed from d40ba4464cb4e8e2f202985f7cfe6623556fe99a to 70496a9218e87daf75e25f77375760023991456e
Branch pushed to git repo; I updated commit sha1. New commits:
70496a9  28490: integer and real number constructor

comment:38 Changed 22 months ago by
 Status changed from needs_work to needs_review
comment:39 followup: ↓ 42 Changed 22 months ago by
With your branch in Python 2, Integer('1_3')
returns 13 but plain 1_3
raises a SyntaxError
. Is that what you intended? It's not consistent behavior.
comment:40 Changed 22 months ago by
 Summary changed from Let Sage preparser honour PEP 515 (py3) to Let Sage honour PEP 515 (py3)
comment:41 Changed 22 months ago by
 Description modified (diff)
comment:42 in reply to: ↑ 39 Changed 22 months ago by
Replying to jhpalmieri:
With your branch in Python 2,
Integer('1_3')
returns 13 but plain1_3
raises aSyntaxError
. Is that what you intended? It's not consistent behavior.
It is not nice I agree. But I do not quite see inconsistency. Some constructors (like Integer.__init__
) can perfectly support additional syntax. The problem to me is that on Python2, 1_3.
does not raise an error. But I do not want to worry about Python2. If you find a non invasive solution, feel free to add to the branch.
Related comment: it would be interesting to benchmark the two options for the preparser 123 > Integer(123)
to 123 > Integer('123')
. See the noticeable slowdown mentioned in this sagedevel thread.
comment:43 Changed 22 months ago by
 Commit changed from 70496a9218e87daf75e25f77375760023991456e to 4c0787ddabd93c2b79d652131503685c98d12888
Branch pushed to git repo; I updated commit sha1. New commits:
4c0787d  trac 28490: tag one doctest "py3"

comment:44 Changed 22 months ago by
 Reviewers changed from Thierry Monteil to Thierry Monteil, Vincent Delecroix, John Palmieri
 Status changed from needs_review to positive_review
Okay, good enough, I think.
comment:45 Changed 22 months ago by
 Branch changed from public/28490 to 4c0787ddabd93c2b79d652131503685c98d12888
 Resolution set to fixed
 Status changed from positive_review to closed
Here is an attempt. This does at least one other thing: it no longer allows leading zeroes in integers. This is not valid Python syntax, so Sage should not allow it, either.